Complete System Architecture
Color-Coded Components
User Components
Browser-based interfaces
Server Components
Flask/WebSocket services
AI/ML Components
LangGraph, Claude, Deepgram
Tools/APIs
MCP tools, PostgreSQL, Google APIs
Transfer System
Twilio, FusionPBX
Agent Dashboard
JsSIP client
Monitoring
Sentry, Logs
Storage
Redis, PostgreSQL
System Components
User Browser (Voice Assistant UI)
Browser-based WebRTC voice interface where users interact with the AI assistant. Captures audio from the user's microphone and plays back AI responses.
- WebRTC audio capture and playback
- Real-time Socket.IO connection
- PIN authentication interface
Flask/WebSocket Server
WebSocket Server (Socket.IO)
Manages real-time bidirectional communication between the browser and server. Handles WebSocket connections, message routing, and event broadcasting.
PIN Authentication Module
Validates user PIN credentials against PostgreSQL database. Creates authenticated sessions stored in Redis.
WebRTC Voice Server
Processes audio streams from the browser, manages transcription pipeline, and coordinates with LangGraph for AI responses. Located in convonet/webrtc_voice_server.py.
Redis (Audio Buffer & Session Management)
In-memory data store for session management and audio buffering. Stores authenticated sessions, audio chunks, and conversation state.
- Audio buffer management for WebRTC streams
- Session storage and authentication tokens
- Real-time Pub/Sub notifications
- Transfer flags and state management
Speech Processing
Deepgram STT (Speech-to-Text)
Converts audio streams to text with high accuracy. Processes WebRTC audio buffers from Redis and returns transcribed text for LangGraph processing.
- Real-time audio transcription
- WebM format support
- High accuracy (95%+)
- Low latency (200-500ms)
Deepgram TTS (Text-to-Speech)
Converts AI-generated text responses into natural-sounding speech audio using Deepgram Aura-2. Returns audio streams for browser playback.
AI Orchestration
LangGraph (Assistant Graph)
Orchestrates the AI conversation flow. Manages state transitions, tool execution decisions, and coordinates between LLM calls and external tools.
- State machine for conversation management
- Tool condition evaluation
- Transfer intent detection
Claude LLM (Claude 3.5 Sonnet)
Generates intelligent responses based on user input and conversation context. Decides when tool execution is needed and produces final responses.
Tools & External APIs
Tool Calling (MCP Tools)
Model Context Protocol (MCP) tools for external operations. 38 total tools including database operations, calendar integration, and call transfer.
- 36 database tools (todos, teams, users)
- 2 transfer tools (call transfer to agent)
PostgreSQL Database
Multi-tenant relational database storing todos, teams, users, and conversation history.
Google APIs (Calendar/OAuth)
Calendar integration with OAuth2 authentication for scheduling and event management.
FusionPBX Metadata Lookup
Retrieves extension information and call routing metadata from FusionPBX.
Transfer System
Twilio API (Programmable Voice)
Handles call transfer bridging between WebRTC user and FusionPBX agent. Provides SIP trunking and audio bridging capabilities.
- SIP INVITE to FusionPBX
- Audio bridge between user and agent
- Transfer status callbacks
FusionPBX (Google Cloud)
Private Branch Exchange (PBX) system running on Google Cloud VM. Routes calls to extension 2001 (agent dashboard).
- SIP server for call routing
- Extension 2001 for agent calls
- WSS support on port 7443
Agent Dashboard
Call-Center Agent Dashboard (JsSIP Client)
Browser-based softphone using JsSIP v3.10.1. Registers with FusionPBX over WebSocket Secure (WSS) and handles incoming calls from transferred users.
- JsSIP WebRTC client
- SIP registration with FusionPBX
- Incoming call handling
- Call controls (answer, hold, transfer, hangup)
User Info Popup (Call Controls)
Displays caller information retrieved from PostgreSQL. Shows user details, call context, and provides call control interface.
Monitoring
Sentry (Error Monitoring)
Production-grade error tracking and performance monitoring. Tracks all operations across the system with real-time alerts.
- Error tracking and alerting
- Performance monitoring (agent processing time)
- User context and session tracking
- Timeout and thread reset tracking
Application Logs
Comprehensive logging for debugging and audit trails. Logs all system operations, API calls, and state transitions.
Data Flow Summary
WebRTC Flow (Normal Conversation)
User Browser → PIN Auth (PostgreSQL) → Redis Audio Buffer → Deepgram STT → LangGraph → Claude LLM → Tool Calling → PostgreSQL/Google APIs → Deepgram TTS → User Browser
Transfer Flow
User Request → LangGraph (Detect Intent) → Twilio API → FusionPBX (Extension 2001) → Agent Dashboard (JsSIP) → User Info Popup → Live Conversation
Monitoring
All Operations → Sentry Monitoring → Real-time Alerts & Performance Tracking
Architecture Overview
The Convonet System Architecture represents an enterprise-grade voice AI platform with comprehensive
integration of multiple technologies. The system starts with user authentication via PIN validation
against PostgreSQL, then processes real-time audio through WebRTC to Deepgram STT for transcription.
LangGraph orchestrates the AI conversation flow, coordinating with Claude LLM for intelligent responses
and MCP tools for external operations. The system includes 38 tools for database operations, calendar
integration, and call transfer capabilities.
When a user requests transfer to a human agent, LangGraph detects the intent and initiates a Twilio
API call to bridge the WebRTC user with a FusionPBX extension. The agent dashboard (JsSIP client)
receives the call, displays user information, and enables live conversation.
Throughout the system, Sentry provides comprehensive error monitoring and performance tracking, while
Redis manages session state and audio buffering. All operations are optimized with timeout handling
(8s/10s/12s) to ensure Twilio compatibility and reliable performance.