Convonet System Architecture Diagram
Multi-LLM · LiveKit WebRTC · Domain Agents · Agent Monitor
Complete System Architecture
Color-Coded Components
User Components
Browser-based interfaces
Server Components
Flask/WebSocket services
AI/ML Components
LangGraph, Claude/Gemini/OpenAI, Deepgram
Tools/APIs
MCP tools, PostgreSQL, Google APIs
Transfer System
Twilio, FusionPBX
Agent Dashboard
JsSIP client
Agent Monitor
Tool calls, voice timing
Monitoring
Sentry, Logs
Storage
Redis, PostgreSQL
System Components
User Browser (Voice Assistant UI)
Browser-based LiveKit WebRTC voice interface where users interact with the AI assistant. Captures audio via LiveKit and plays back streaming TTS responses.
- LiveKit WebRTC audio capture and playback
- Real-time Socket.IO / LiveKit connection
- PIN authentication interface
- Domain agent selection (Productivity, Mortgage, Healthcare)
Flask/WebSocket Server
WebSocket Server (Socket.IO)
Manages real-time bidirectional communication between the browser and server. Handles WebSocket connections, message routing, and event broadcasting.
PIN Authentication Module
Validates user PIN credentials against PostgreSQL database. Creates authenticated sessions stored in Redis.
LiveKit WebRTC Voice Server
Processes audio streams via LiveKit, manages streaming STT/TTS pipeline, domain agents, and coordinates with LangGraph for AI responses. Located in convonet/webrtc_voice_server_socketio.py.
Redis (Audio Buffer & Session Management)
In-memory data store for session management and audio buffering. Stores authenticated sessions, audio chunks, and conversation state.
- Audio buffer management for WebRTC streams
- Session storage and authentication tokens
- Real-time Pub/Sub notifications
- Transfer flags and state management
Speech Processing
Deepgram STT (Speech-to-Text)
Streaming speech-to-text. Processes LiveKit/WebRTC audio and returns transcribed text for LangGraph.
- Streaming real-time transcription
- WebM format support
- High accuracy (95%+)
- Low latency (200-500ms)
TTS (Text-to-Speech)
Streaming TTS via Deepgram, ElevenLabs, or Cartesia. Converts AI text into natural speech for browser playback.
- Deepgram Aura-2
- ElevenLabs (optional)
- Cartesia (optional)
AI Orchestration
LangGraph (Assistant Graph)
Orchestrates the AI conversation flow. Manages state transitions, tool execution decisions, and coordinates between LLM calls and external tools.
- State machine for conversation management
- Tool condition evaluation
- Transfer intent detection
Multi-LLM (Claude, Gemini, OpenAI)
Generates intelligent responses based on user input and conversation context. Supports Claude, Gemini, and OpenAI models. Domain agents: Productivity, Mortgage, Healthcare.
Agent Monitor
Real-time observability for voice interactions. Tracks tool calls, voice timing (buffer capture, STT, agent start, first sentence, first audio), and elapsed time per tool. Located in convonet/agent_monitor.py.
- Tool call tracking with elapsed time
- Voice timing table (Step, Event, Elapsed, Delta)
- Dashboard at /agent-monitor/
Tools & External APIs
Tool Calling (MCP Tools)
Model Context Protocol (MCP) tools for external operations. 38 total tools including database operations, calendar integration, and call transfer.
- 36 database tools (todos, teams, users)
- 2 transfer tools (call transfer to agent)
PostgreSQL Database
Multi-tenant relational database storing todos, teams, users, and conversation history.
Google APIs (Calendar/OAuth)
Calendar integration with OAuth2 authentication for scheduling and event management.
FusionPBX Metadata Lookup
Retrieves extension information and call routing metadata from FusionPBX.
Transfer System
Twilio API (Programmable Voice)
Handles call transfer bridging between WebRTC user and FusionPBX agent. Provides SIP trunking and audio bridging capabilities.
- SIP INVITE to FusionPBX
- Audio bridge between user and agent
- Transfer status callbacks
FusionPBX (Google Cloud)
Private Branch Exchange (PBX) system running on Google Cloud VM. Routes calls to extension 2001 (agent dashboard).
- SIP server for call routing
- Extension 2001 for agent calls
- WSS support on port 7443
Agent Dashboard
Call-Center Agent Dashboard (JsSIP Client)
Browser-based softphone using JsSIP v3.10.1. Registers with FusionPBX over WebSocket Secure (WSS) and handles incoming calls from transferred users.
- JsSIP WebRTC client
- SIP registration with FusionPBX
- Incoming call handling
- Call controls (answer, hold, transfer, hangup)
User Info Popup (Call Controls)
Displays caller information retrieved from PostgreSQL. Shows user details, call context, and provides call control interface.
Monitoring
Sentry (Error Monitoring)
Production-grade error tracking and performance monitoring. Tracks all operations across the system with real-time alerts.
- Error tracking and alerting
- Performance monitoring (agent processing time)
- User context and session tracking
- Timeout and thread reset tracking
Application Logs
Comprehensive logging for debugging and audit trails. Logs all system operations, API calls, and state transitions.
Data Flow Summary
LiveKit WebRTC Flow (Normal Conversation)
User Browser → LiveKit WebRTC → PIN Auth (PostgreSQL) → Streaming STT (Deepgram) → LangGraph → Multi-LLM → Domain Agents → Tool Calling → Streaming TTS (Deepgram/ElevenLabs/Cartesia) → User Browser
Transfer Flow
User Request → LangGraph (Detect Intent) → Twilio API → FusionPBX (Extension 2001) → Agent Dashboard (JsSIP) → User Info Popup → Live Conversation
Monitoring
All Operations → Sentry Monitoring → Real-time Alerts & Performance Tracking
Architecture Overview
The Convonet System Architecture represents an enterprise-grade voice AI platform with multi-LLM
(Claude, Gemini, OpenAI), LiveKit WebRTC, and domain-specific agents (Productivity, Mortgage, Healthcare).
The system starts with user authentication via PIN validation against PostgreSQL, then processes
real-time audio through LiveKit WebRTC to streaming Deepgram STT for transcription.
LangGraph orchestrates the AI conversation flow, coordinating with the selected LLM for intelligent
responses and MCP tools for external operations. The system includes 38 tools for database operations,
calendar integration, and call transfer. Agent Monitor provides real-time observability for tool calls
and voice timing (buffer capture, STT, agent start, first sentence, first audio).
When a user requests transfer to a human agent, LangGraph detects the intent and initiates a Twilio
API call to bridge the user with a FusionPBX extension. The agent dashboard (JsSIP client) receives
the call, displays user information, and enables live conversation.
TTS supports Deepgram, ElevenLabs, and Cartesia. Sentry provides error monitoring and performance
tracking, while Redis manages session state and audio buffering. All operations use timeout handling
(8s/10s/12s) for Twilio compatibility.