Convonet System Architecture Diagram

Multi-LLM · LiveKit WebRTC · Domain Agents · Agent Monitor

Complete System Architecture

Convonet System Architecture Flow Diagram

Color-Coded Components

User Components

Browser-based interfaces

Server Components

Flask/WebSocket services

AI/ML Components

LangGraph, Claude/Gemini/OpenAI, Deepgram

Tools/APIs

MCP tools, PostgreSQL, Google APIs

Transfer System

Twilio, FusionPBX

Agent Dashboard

JsSIP client

Agent Monitor

Tool calls, voice timing

Monitoring

Sentry, Logs

Storage

Redis, PostgreSQL

System Components

User Browser (Voice Assistant UI)

Browser-based LiveKit WebRTC voice interface where users interact with the AI assistant. Captures audio via LiveKit and plays back streaming TTS responses.

  • LiveKit WebRTC audio capture and playback
  • Real-time Socket.IO / LiveKit connection
  • PIN authentication interface
  • Domain agent selection (Productivity, Mortgage, Healthcare)

Flask/WebSocket Server

WebSocket Server (Socket.IO)

Manages real-time bidirectional communication between the browser and server. Handles WebSocket connections, message routing, and event broadcasting.

PIN Authentication Module

Validates user PIN credentials against PostgreSQL database. Creates authenticated sessions stored in Redis.

LiveKit WebRTC Voice Server

Processes audio streams via LiveKit, manages streaming STT/TTS pipeline, domain agents, and coordinates with LangGraph for AI responses. Located in convonet/webrtc_voice_server_socketio.py.

Redis (Audio Buffer & Session Management)

In-memory data store for session management and audio buffering. Stores authenticated sessions, audio chunks, and conversation state.

  • Audio buffer management for WebRTC streams
  • Session storage and authentication tokens
  • Real-time Pub/Sub notifications
  • Transfer flags and state management

Speech Processing

Deepgram STT (Speech-to-Text)

Streaming speech-to-text. Processes LiveKit/WebRTC audio and returns transcribed text for LangGraph.

  • Streaming real-time transcription
  • WebM format support
  • High accuracy (95%+)
  • Low latency (200-500ms)
TTS (Text-to-Speech)

Streaming TTS via Deepgram, ElevenLabs, or Cartesia. Converts AI text into natural speech for browser playback.

  • Deepgram Aura-2
  • ElevenLabs (optional)
  • Cartesia (optional)

AI Orchestration

LangGraph (Assistant Graph)

Orchestrates the AI conversation flow. Manages state transitions, tool execution decisions, and coordinates between LLM calls and external tools.

  • State machine for conversation management
  • Tool condition evaluation
  • Transfer intent detection
Multi-LLM (Claude, Gemini, OpenAI)

Generates intelligent responses based on user input and conversation context. Supports Claude, Gemini, and OpenAI models. Domain agents: Productivity, Mortgage, Healthcare.

Agent Monitor

Real-time observability for voice interactions. Tracks tool calls, voice timing (buffer capture, STT, agent start, first sentence, first audio), and elapsed time per tool. Located in convonet/agent_monitor.py.

  • Tool call tracking with elapsed time
  • Voice timing table (Step, Event, Elapsed, Delta)
  • Dashboard at /agent-monitor/

Tools & External APIs

Tool Calling (MCP Tools)

Model Context Protocol (MCP) tools for external operations. 38 total tools including database operations, calendar integration, and call transfer.

  • 36 database tools (todos, teams, users)
  • 2 transfer tools (call transfer to agent)
PostgreSQL Database

Multi-tenant relational database storing todos, teams, users, and conversation history.

Google APIs (Calendar/OAuth)

Calendar integration with OAuth2 authentication for scheduling and event management.

FusionPBX Metadata Lookup

Retrieves extension information and call routing metadata from FusionPBX.

Transfer System

Twilio API (Programmable Voice)

Handles call transfer bridging between WebRTC user and FusionPBX agent. Provides SIP trunking and audio bridging capabilities.

  • SIP INVITE to FusionPBX
  • Audio bridge between user and agent
  • Transfer status callbacks
FusionPBX (Google Cloud)

Private Branch Exchange (PBX) system running on Google Cloud VM. Routes calls to extension 2001 (agent dashboard).

  • SIP server for call routing
  • Extension 2001 for agent calls
  • WSS support on port 7443

Agent Dashboard

Call-Center Agent Dashboard (JsSIP Client)

Browser-based softphone using JsSIP v3.10.1. Registers with FusionPBX over WebSocket Secure (WSS) and handles incoming calls from transferred users.

  • JsSIP WebRTC client
  • SIP registration with FusionPBX
  • Incoming call handling
  • Call controls (answer, hold, transfer, hangup)
User Info Popup (Call Controls)

Displays caller information retrieved from PostgreSQL. Shows user details, call context, and provides call control interface.

Monitoring

Sentry (Error Monitoring)

Production-grade error tracking and performance monitoring. Tracks all operations across the system with real-time alerts.

  • Error tracking and alerting
  • Performance monitoring (agent processing time)
  • User context and session tracking
  • Timeout and thread reset tracking
Application Logs

Comprehensive logging for debugging and audit trails. Logs all system operations, API calls, and state transitions.

Data Flow Summary

LiveKit WebRTC Flow (Normal Conversation)

User BrowserLiveKit WebRTCPIN Auth (PostgreSQL)Streaming STT (Deepgram)LangGraphMulti-LLMDomain AgentsTool CallingStreaming TTS (Deepgram/ElevenLabs/Cartesia)User Browser

Transfer Flow

User RequestLangGraph (Detect Intent)Twilio APIFusionPBX (Extension 2001)Agent Dashboard (JsSIP)User Info PopupLive Conversation

Monitoring

All OperationsSentry MonitoringReal-time Alerts & Performance Tracking

Architecture Overview

The Convonet System Architecture represents an enterprise-grade voice AI platform with multi-LLM (Claude, Gemini, OpenAI), LiveKit WebRTC, and domain-specific agents (Productivity, Mortgage, Healthcare). The system starts with user authentication via PIN validation against PostgreSQL, then processes real-time audio through LiveKit WebRTC to streaming Deepgram STT for transcription.

LangGraph orchestrates the AI conversation flow, coordinating with the selected LLM for intelligent responses and MCP tools for external operations. The system includes 38 tools for database operations, calendar integration, and call transfer. Agent Monitor provides real-time observability for tool calls and voice timing (buffer capture, STT, agent start, first sentence, first audio).

When a user requests transfer to a human agent, LangGraph detects the intent and initiates a Twilio API call to bridge the user with a FusionPBX extension. The agent dashboard (JsSIP client) receives the call, displays user information, and enables live conversation.

TTS supports Deepgram, ElevenLabs, and Cartesia. Sentry provides error monitoring and performance tracking, while Redis manages session state and audio buffering. All operations use timeout handling (8s/10s/12s) for Twilio compatibility.