Convonet System Architecture Diagram

Multi-LLM · LiveKit WebRTC · Domain Agents · Agent Monitor

← Back to Tech Spec

Complete System Architecture

Convonet System Architecture Flow Diagram

Color-Coded Components

User Components

Browser-based interfaces

Server Components

Flask/WebSocket services

AI/ML Components

LangGraph, Claude/Gemini/OpenAI, Deepgram

Tools/APIs

MCP tools, PostgreSQL, Google APIs

Transfer System

Twilio, FusionPBX

Agent Dashboard

JsSIP client

Agent Monitor

Tool calls, voice timing

Monitoring

Sentry, Logs

Storage

Redis, PostgreSQL

System Components

User Browser (Voice Assistant UI)

Browser-based LiveKit WebRTC voice interface where users interact with the AI assistant. Captures audio via LiveKit and plays back streaming TTS responses.

LiveKit WebRTC audio capture and playback
Real-time Socket.IO / LiveKit connection
PIN authentication interface
Domain agent selection (Productivity, Mortgage, Healthcare)

Flask/WebSocket Server

WebSocket Server (Socket.IO)

Manages real-time bidirectional communication between the browser and server. Handles WebSocket connections, message routing, and event broadcasting.

PIN Authentication Module

Validates user PIN credentials against PostgreSQL database. Creates authenticated sessions stored in Redis.

LiveKit WebRTC Voice Server

Processes audio streams via LiveKit, manages streaming STT/TTS pipeline, domain agents, and coordinates with LangGraph for AI responses. Located in convonet/webrtc_voice_server_socketio.py.

Redis (Audio Buffer & Session Management)

In-memory data store for session management and audio buffering. Stores authenticated sessions, audio chunks, and conversation state.

Audio buffer management for WebRTC streams
Session storage and authentication tokens
Real-time Pub/Sub notifications
Transfer flags and state management

Speech Processing

Deepgram STT (Speech-to-Text)

Streaming speech-to-text. Processes LiveKit/WebRTC audio and returns transcribed text for LangGraph.

Streaming real-time transcription
WebM format support
High accuracy (95%+)
Low latency (200-500ms)

TTS (Text-to-Speech)

Streaming TTS via Deepgram, ElevenLabs, or Cartesia. Converts AI text into natural speech for browser playback.

Deepgram Aura-2
ElevenLabs (optional)
Cartesia (optional)

AI Orchestration

LangGraph (Assistant Graph)

Orchestrates the AI conversation flow. Manages state transitions, tool execution decisions, and coordinates between LLM calls and external tools.

State machine for conversation management
Tool condition evaluation
Transfer intent detection

Multi-LLM (Claude, Gemini, OpenAI)

Generates intelligent responses based on user input and conversation context. Supports Claude, Gemini, and OpenAI models. Domain agents: Productivity, Mortgage, Healthcare.

Agent Monitor

Real-time observability for voice interactions. Tracks tool calls, voice timing (buffer capture, STT, agent start, first sentence, first audio), and elapsed time per tool. Located in convonet/agent_monitor.py.

Tool call tracking with elapsed time
Voice timing table (Step, Event, Elapsed, Delta)
Dashboard at /agent-monitor/

Tools & External APIs

Tool Calling (MCP Tools)

Model Context Protocol (MCP) tools for external operations. 38 total tools including database operations, calendar integration, and call transfer.

36 database tools (todos, teams, users)
2 transfer tools (call transfer to agent)

PostgreSQL Database

Multi-tenant relational database storing todos, teams, users, and conversation history.

Google APIs (Calendar/OAuth)

Calendar integration with OAuth2 authentication for scheduling and event management.

FusionPBX Metadata Lookup

Retrieves extension information and call routing metadata from FusionPBX.

Transfer System

Twilio API (Programmable Voice)

Handles call transfer bridging between WebRTC user and FusionPBX agent. Provides SIP trunking and audio bridging capabilities.

SIP INVITE to FusionPBX
Audio bridge between user and agent
Transfer status callbacks

FusionPBX (Google Cloud)

Private Branch Exchange (PBX) system running on Google Cloud VM. Routes calls to extension 2001 (agent dashboard).

SIP server for call routing
Extension 2001 for agent calls
WSS support on port 7443

Agent Dashboard

Call-Center Agent Dashboard (JsSIP Client)

Browser-based softphone using JsSIP v3.10.1. Registers with FusionPBX over WebSocket Secure (WSS) and handles incoming calls from transferred users.

JsSIP WebRTC client
SIP registration with FusionPBX
Incoming call handling
Call controls (answer, hold, transfer, hangup)

User Info Popup (Call Controls)

Displays caller information retrieved from PostgreSQL. Shows user details, call context, and provides call control interface.

Monitoring

Sentry (Error Monitoring)

Production-grade error tracking and performance monitoring. Tracks all operations across the system with real-time alerts.

Error tracking and alerting
Performance monitoring (agent processing time)
User context and session tracking
Timeout and thread reset tracking

Application Logs

Comprehensive logging for debugging and audit trails. Logs all system operations, API calls, and state transitions.

Data Flow Summary

LiveKit WebRTC Flow (Normal Conversation)

User Browser → LiveKit WebRTC → PIN Auth (PostgreSQL) → Streaming STT (Deepgram) → LangGraph → Multi-LLM → Domain Agents → Tool Calling → Streaming TTS (Deepgram/ElevenLabs/Cartesia) → User Browser

Transfer Flow

User Request → LangGraph (Detect Intent) → Twilio API → FusionPBX (Extension 2001) → Agent Dashboard (JsSIP) → User Info Popup → Live Conversation

Monitoring

All Operations → Sentry Monitoring → Real-time Alerts & Performance Tracking

Architecture Overview

The Convonet System Architecture represents an enterprise-grade voice AI platform with multi-LLM (Claude, Gemini, OpenAI), LiveKit WebRTC, and domain-specific agents (Productivity, Mortgage, Healthcare). The system starts with user authentication via PIN validation against PostgreSQL, then processes real-time audio through LiveKit WebRTC to streaming Deepgram STT for transcription.

LangGraph orchestrates the AI conversation flow, coordinating with the selected LLM for intelligent responses and MCP tools for external operations. The system includes 38 tools for database operations, calendar integration, and call transfer. Agent Monitor provides real-time observability for tool calls and voice timing (buffer capture, STT, agent start, first sentence, first audio).

When a user requests transfer to a human agent, LangGraph detects the intent and initiates a Twilio API call to bridge the user with a FusionPBX extension. The agent dashboard (JsSIP client) receives the call, displays user information, and enables live conversation.

TTS supports Deepgram, ElevenLabs, and Cartesia. Sentry provides error monitoring and performance tracking, while Redis manages session state and audio buffering. All operations use timeout handling (8s/10s/12s) for Twilio compatibility.