Convonet System Architecture Diagram

Complete System Flow Overview

Complete System Architecture

Convonet System Architecture Flow Diagram

Color-Coded Components

User Components

Browser-based interfaces

Server Components

Flask/WebSocket services

AI/ML Components

LangGraph, Claude, Deepgram

Tools/APIs

MCP tools, PostgreSQL, Google APIs

Transfer System

Twilio, FusionPBX

Agent Dashboard

JsSIP client

Monitoring

Sentry, Logs

Storage

Redis, PostgreSQL

System Components

User Browser (Voice Assistant UI)

Browser-based WebRTC voice interface where users interact with the AI assistant. Captures audio from the user's microphone and plays back AI responses.

  • WebRTC audio capture and playback
  • Real-time Socket.IO connection
  • PIN authentication interface

Flask/WebSocket Server

WebSocket Server (Socket.IO)

Manages real-time bidirectional communication between the browser and server. Handles WebSocket connections, message routing, and event broadcasting.

PIN Authentication Module

Validates user PIN credentials against PostgreSQL database. Creates authenticated sessions stored in Redis.

WebRTC Voice Server

Processes audio streams from the browser, manages transcription pipeline, and coordinates with LangGraph for AI responses. Located in convonet/webrtc_voice_server.py.

Redis (Audio Buffer & Session Management)

In-memory data store for session management and audio buffering. Stores authenticated sessions, audio chunks, and conversation state.

  • Audio buffer management for WebRTC streams
  • Session storage and authentication tokens
  • Real-time Pub/Sub notifications
  • Transfer flags and state management

Speech Processing

Deepgram STT (Speech-to-Text)

Converts audio streams to text with high accuracy. Processes WebRTC audio buffers from Redis and returns transcribed text for LangGraph processing.

  • Real-time audio transcription
  • WebM format support
  • High accuracy (95%+)
  • Low latency (200-500ms)
Deepgram TTS (Text-to-Speech)

Converts AI-generated text responses into natural-sounding speech audio using Deepgram Aura-2. Returns audio streams for browser playback.

AI Orchestration

LangGraph (Assistant Graph)

Orchestrates the AI conversation flow. Manages state transitions, tool execution decisions, and coordinates between LLM calls and external tools.

  • State machine for conversation management
  • Tool condition evaluation
  • Transfer intent detection
Claude LLM (Claude 3.5 Sonnet)

Generates intelligent responses based on user input and conversation context. Decides when tool execution is needed and produces final responses.

Tools & External APIs

Tool Calling (MCP Tools)

Model Context Protocol (MCP) tools for external operations. 38 total tools including database operations, calendar integration, and call transfer.

  • 36 database tools (todos, teams, users)
  • 2 transfer tools (call transfer to agent)
PostgreSQL Database

Multi-tenant relational database storing todos, teams, users, and conversation history.

Google APIs (Calendar/OAuth)

Calendar integration with OAuth2 authentication for scheduling and event management.

FusionPBX Metadata Lookup

Retrieves extension information and call routing metadata from FusionPBX.

Transfer System

Twilio API (Programmable Voice)

Handles call transfer bridging between WebRTC user and FusionPBX agent. Provides SIP trunking and audio bridging capabilities.

  • SIP INVITE to FusionPBX
  • Audio bridge between user and agent
  • Transfer status callbacks
FusionPBX (Google Cloud)

Private Branch Exchange (PBX) system running on Google Cloud VM. Routes calls to extension 2001 (agent dashboard).

  • SIP server for call routing
  • Extension 2001 for agent calls
  • WSS support on port 7443

Agent Dashboard

Call-Center Agent Dashboard (JsSIP Client)

Browser-based softphone using JsSIP v3.10.1. Registers with FusionPBX over WebSocket Secure (WSS) and handles incoming calls from transferred users.

  • JsSIP WebRTC client
  • SIP registration with FusionPBX
  • Incoming call handling
  • Call controls (answer, hold, transfer, hangup)
User Info Popup (Call Controls)

Displays caller information retrieved from PostgreSQL. Shows user details, call context, and provides call control interface.

Monitoring

Sentry (Error Monitoring)

Production-grade error tracking and performance monitoring. Tracks all operations across the system with real-time alerts.

  • Error tracking and alerting
  • Performance monitoring (agent processing time)
  • User context and session tracking
  • Timeout and thread reset tracking
Application Logs

Comprehensive logging for debugging and audit trails. Logs all system operations, API calls, and state transitions.

Data Flow Summary

WebRTC Flow (Normal Conversation)

User BrowserPIN Auth (PostgreSQL)Redis Audio BufferDeepgram STTLangGraphClaude LLMTool CallingPostgreSQL/Google APIsDeepgram TTSUser Browser

Transfer Flow

User RequestLangGraph (Detect Intent)Twilio APIFusionPBX (Extension 2001)Agent Dashboard (JsSIP)User Info PopupLive Conversation

Monitoring

All OperationsSentry MonitoringReal-time Alerts & Performance Tracking

Architecture Overview

The Convonet System Architecture represents an enterprise-grade voice AI platform with comprehensive integration of multiple technologies. The system starts with user authentication via PIN validation against PostgreSQL, then processes real-time audio through WebRTC to Deepgram STT for transcription.

LangGraph orchestrates the AI conversation flow, coordinating with Claude LLM for intelligent responses and MCP tools for external operations. The system includes 38 tools for database operations, calendar integration, and call transfer capabilities.

When a user requests transfer to a human agent, LangGraph detects the intent and initiates a Twilio API call to bridge the WebRTC user with a FusionPBX extension. The agent dashboard (JsSIP client) receives the call, displays user information, and enables live conversation.

Throughout the system, Sentry provides comprehensive error monitoring and performance tracking, while Redis manages session state and audio buffering. All operations are optimized with timeout handling (8s/10s/12s) to ensure Twilio compatibility and reliable performance.