Convonet Voice AI Productivity System

Multi-LLM (Claude, Gemini, OpenAI) + LiveKit WebRTC + Domain Agents (Productivity, Mortgage, Healthcare) + MCP Tools + Team Collaboration + Call Transfer + Agent Monitor + Sentry

FastAPI Flask LangGraph MCP (38 Tools) LangChain Claude · Gemini · OpenAI LiveKit WebRTC Google Calendar Twilio Voice Agent Monitor Team Collaboration FusionPBX Deepgram STT/TTS ElevenLabs TTS Cartesia TTS Sentry Redis Composio

Technical Architecture

System Architecture Overview

Complete System Flow Diagrams

System Architecture Diagram

Complete system flow overview with all components and their relationships

View Full Diagram →
Sequence Diagram

Step-by-step flow (52 steps) showing interactions between all components

View Full Diagram →
WebRTC Flow: User Browser → FastAPI Gateway (Direct/Legacy) → LiveKit WebRTC → PIN Auth (PostgreSQL) → Streaming STT (Deepgram) → LangGraph → Multi-LLM (Claude/Gemini/OpenAI) → Domain Agents → Tool Calling → Streaming TTS (Deepgram/ElevenLabs/Cartesia) → User
Transfer Flow: User Request → LangGraph (Detect Intent) → Twilio API → FusionPBX (Extension 2001) → Agent Dashboard (JsSIP) → User Info Popup → Live Conversation

Evolution to FastAPI

To ensure enterprise-grade scalability and native asynchronous support for real-time voice, the system has evolved from a pure Flask monolith to a FastAPI Hybrid Gateway.

  • ASGI Master Gateway: FastAPI serves as the primary entry point (asgi_main.py), handling high-concurrency connections and native WebSockets.
  • Flask Sub-system: The legacy Flask application remains fully functional, mounted as a sub-app via WSGIMiddleware.
  • Legacy Compatibility: Socket.IO connections are stabilized using a forced polling transport fallback, ensuring reliable session state through the WSGI bridge.
  • Uvicorn Deployment: The system now runs on the Uvicorn ASGI server, providing superior performance over traditional WSGI workers.

Request Routing Flow

/fastapi/* Native FastAPI Routes
/webrtc/* Legacy Flask (WSGI)
/socket.io/ Socket.IO (via WSGIMiddleware)
/* (All Root) Legacy Flask Monolith
Verified Stable in Production
Monitoring: All Operations → Sentry Monitoring → Real-time Alerts & Performance Tracking

Architecture Flow: Enterprise voice AI system with multi-LLM (Claude, Gemini, OpenAI), LiveKit WebRTC, domain-specific agents (Productivity, Mortgage, Healthcare), PIN authentication (PostgreSQL), team collaboration, Agent Monitor for real-time tool/voice timing, intelligent call transfer to FusionPBX via Twilio, streaming STT/TTS (Deepgram, ElevenLabs, Cartesia), and Sentry error monitoring with automatic thread reset recovery.

View System Architecture Diagram | View Sequence Diagram

Overview

The Convonet Voice AI Productivity System is an enterprise-grade platform that combines multi-LLM AI (Claude, Gemini, OpenAI), LiveKit WebRTC voice, domain-specific agents (Productivity, Mortgage, Healthcare), team collaboration, and intelligent call center integration. Built for hackathon demonstrations, it showcases advanced features including PIN authentication, role-based access control, Agent Monitor for real-time tool/voice timing, call transfer to FusionPBX, and Sentry monitoring.

The system enables teams to manage todos, assign tasks, and collaborate in real-time through web dashboards, voice commands, and seamless transfer to human agents. Voice runs on LiveKit WebRTC with streaming STT (Deepgram) and TTS (Deepgram, ElevenLabs, Cartesia). All data syncs with Google Calendar via OAuth2, with intelligent caching for sub-second voice responses.

Core Technologies

  • • LangGraph for agent orchestration with tools_condition
  • • Multi-LLM: Claude, Gemini, OpenAI
  • • Model Context Protocol (MCP) - 38 tools (36 DB + 2 transfer)
  • • LangChain for LLM integration & tool binding
  • • FastAPI + ASGI Master Gateway
  • • Flask + WSGIMiddleware (Legacy Sub-system)
  • • LiveKit WebRTC for real-time voice
  • • Deepgram STT; TTS: Deepgram, ElevenLabs, Cartesia
  • • SQLAlchemy ORM with PostgreSQL (multi-tenant)
  • • PIN authentication (4-6 digit voice PIN)
  • • Agent Monitor: real-time tool/voice timing
  • • Twilio Programmable Voice with Media Streams
  • • Google Calendar OAuth2 integration (optimized)
  • • FusionPBX SIP integration for call transfer
  • • Sentry.io error monitoring & performance tracking
  • • JsSIP WebRTC for browser-based softphone
  • • Render.com deployment with Uvicorn (ASGI)

Key Features

  • • Domain-specific agents: Productivity, Mortgage, Healthcare
  • • LiveKit WebRTC voice with streaming STT/TTS
  • • Agent Monitor: real-time tool calls & voice timing
  • • Multi-tenant team collaboration with role hierarchy
  • • PIN-based voice authentication (4-6 digits)
  • • Intelligent call transfer: AI → FusionPBX Extension 2001
  • • Multi-TTS: Deepgram, ElevenLabs, Cartesia
  • • Sentry error monitoring & performance tracking
  • • Optimized timeouts (8s/10s/12s) for Twilio compatibility
  • • 38 MCP tools (todos, calendar, teams, transfer)
  • • WebRTC call center with JsSIP softphone
  • • Google Calendar OAuth2 sync (background)
  • • FusionPBX integration on Google Cloud VM
  • • Production deployment on Render with auto-scaling

Recent Updates & Improvements (February 2026)

FastAPI Hybrid Gateway

  • ✓ ASGI Master Gateway (asgi_main.py)
  • ✓ Flask mounted via WSGIMiddleware
  • ✓ Native FastAPI Voice Gateway
  • ✓ Uvicorn production deployment
  • ✓ Optimized Socket.IO polling fallback
Files: asgi_main.py, fastapi_voice_gateway.py

Call Transfer to FusionPBX

  • ✓ Seamless AI → Human agent transfer
  • ✓ FusionPBX extension 2001 integration
  • ✓ SIP/WSS connectivity (Google Cloud VM)
  • ✓ Transfer detection via phrases or tool
  • ✓ Department routing (support, sales, etc.)
Files: call_transfer.py, CALL_TRANSFER_GUIDE.md

LiveKit WebRTC + Agent Monitor

  • ✓ LiveKit real-time voice (WebRTC)
  • ✓ Streaming STT (Deepgram) & TTS (Deepgram/ElevenLabs/Cartesia)
  • ✓ Domain agents: Productivity, Mortgage, Healthcare
  • ✓ Agent Monitor: tool calls & voice timing
  • ✓ Processing music during agent response
Files: webrtc_voice_server_socketio.py, agent_monitor.py, deepgram/

Composio Integration

  • ✓ Slack workspace integration
  • ✓ GitHub repository management
  • ✓ Gmail email automation
  • ✓ Notion workspace sync
  • ✓ External platform connectivity
Files: COMPOSIO_FEATURES_SUMMARY.md

Sentry Integration

  • ✓ Real-time error tracking & alerts
  • ✓ Performance monitoring (agent processing time)
  • ✓ User context & session tracking
  • ✓ Timeout & thread reset tracking
  • ✓ Production-grade observability
Integration: Flask + SQLAlchemy + Logging

Timeout Optimization

  • ✓ Tool timeout: 8s (from 20s)
  • ✓ Agent timeout: 10s (from 25s)
  • ✓ Webhook timeout: 12s (from 30s)
  • ✓ Stays under Twilio's 15s HTTP limit
  • ✓ Thread reset on timeout prevents errors
Result: 95%+ operations complete successfully

WebRTC Call Center

  • ✓ JsSIP v3.10.1 browser softphone
  • ✓ WebSocket Secure (WSS) on port 7443
  • ✓ Agent dashboard with SIP registration
  • ✓ Call control (answer, hold, transfer, hangup)
  • ✓ Google Cloud firewall configured
Platform: FusionPBX 34.26.59.14 (GCP VM)

Automatic Error Recovery

  • ✓ Thread reset with timestamped IDs
  • ✓ BrokenResourceError handling
  • ✓ tool_call_id incomplete error recovery
  • ✓ In-memory reset tracking (_reset_threads)
  • ✓ No cascading failures
Benefit: Self-healing conversation threads

Performance Optimization

  • ✓ Removed Google Calendar sync delay
  • ✓ Simplified JSON responses (no MCP breaks)
  • ✓ Agent processing time measurement
  • ✓ Transaction tracking per voice call
  • ✓ Custom Sentry metrics & measurements
Result: Sub-5s response times
2001
FusionPBX Extension
Deepgram
STT API
38
MCP Tools
12s
Max Response Time
100%
Sentry Trace Rate

LiveKit WebRTC Voice Integration Architecture

The LiveKit WebRTC Voice Interface provides browser-based voice interaction with real-time audio streaming via LiveKit, domain-specific agents (Productivity, Mortgage, Healthcare), streaming STT (Deepgram) and TTS (Deepgram, ElevenLabs, Cartesia), and seamless integration with the LangGraph AI agent. Users interact with the assistant directly through their web browser.

LiveKit Voice Assistant Architecture

LiveKit WebRTC Processing Flow
🌐
Browser
WebRTC
🚀
FastAPI Gateway
ASGI/WSGI
📡
Socket.IO
Long Polling
💾
Redis
Buffer
🎧
Deepgram
STT API
🧠
LangGraph
AI Agent

Flow: User Browser → LiveKit WebRTC → PIN Auth (PostgreSQL) → Streaming STT (Deepgram) → LangGraph → Multi-LLM (Claude/Gemini/OpenAI) → Domain Agents → Tool Calling → Streaming TTS (Deepgram/ElevenLabs/Cartesia) → User Browser

WebRTC Voice Assistant Flow Phases
View Detailed Sequence Diagram →
Phase 1: Authentication

Steps 1-7

WebSocket connection, PIN authentication via PostgreSQL, Redis session creation

Phase 2: Conversation Loop

Steps 8-31

LiveKit audio → Streaming STT (Deepgram) → LangGraph → Multi-LLM → Domain agents → Tool execution → Streaming TTS (Deepgram/ElevenLabs/Cartesia) → Response

Phase 3: Transfer Request

Steps 32-38

User requests transfer, LangGraph detects intent, sets transfer flag in Redis

Phase 4: Twilio Transfer

Steps 39-52

Twilio API → FusionPBX → Agent Dashboard (JsSIP) → Live conversation bridging

LiveKit WebRTC Architecture: Browser-based voice assistant with LiveKit WebRTC, PostgreSQL PIN authentication, streaming STT (Deepgram) and TTS (Deepgram, ElevenLabs, Cartesia), LangGraph AI orchestration with multi-LLM (Claude, Gemini, OpenAI), domain-specific agents (Productivity, Mortgage, Healthcare), Agent Monitor for real-time tool/voice timing, MCP tool integration, and seamless transfer to FusionPBX via Twilio.

Component Interaction Matrix
Component Input From Output To Purpose
User Browser User voice input WebSocket Server Captures audio, displays UI
WebSocket Server Browser, WebRTC Server Browser, Redis Manages real-time communication
PIN Auth WebSocket Server PostgreSQL Validates user credentials
Redis WebRTC Server, Tools Deepgram, WebRTC Server Buffers audio, stores session
Deepgram STT Redis Audio Buffer WebRTC Voice Server Converts speech to text
LangGraph WebRTC Server Claude LLM, Tools Orchestrates AI conversation flow
Claude LLM LangGraph LangGraph Generates responses, decides actions
Tool Calling LangGraph PostgreSQL, Google APIs Executes external operations
Deepgram TTS LangGraph Response WebRTC Server Converts text to speech
Twilio API WebRTC Server FusionPBX Bridges call to agent
FusionPBX Twilio Agent Dashboard Routes call to extension
Agent Dashboard FusionPBX PostgreSQL, User Displays call, shows user info

WebRTC Voice Interface

  • ✓ Browser-based voice recording
  • ✓ Real-time audio streaming
  • ✓ WebSocket communication
  • ✓ Audio format detection (WebM)
  • ✓ Base64 audio encoding
Technology: WebRTC + Socket.IO

Redis Session Management

  • ✓ Session storage & caching
  • ✓ Audio buffer management
  • ✓ Real-time notifications
  • ✓ User activity tracking
  • ✓ Session expiration handling
Integration: Redis Pub/Sub + Session Store

Composio Integration

  • ✓ Slack workspace integration
  • ✓ GitHub repository management
  • ✓ Gmail email automation
  • ✓ Notion workspace sync
  • ✓ Jira project management
Platforms: Slack, GitHub, Gmail, Notion, Jira

Module Structure

Convonet Project Structure

Project Root/
├── asgi_main.py                    # FastAPI ASGI Master Gateway (Entry Point)
├── app.py                          # Main Flask application (Mounted as sub-app)
├── app_memory_optimized.py         # Memory-optimized Flask application
├── passenger_wsgi.py               # WSGI entry point for legacy support
├── passenger_wsgi_memory_optimized.py  # WSGI entry for memory-optimized app
├── extensions.py                   # Flask extensions (db, login, migrate, etc.)
├── requirements.txt                # Python dependencies
├── requirements-render.txt         # Production dependencies (No Eventlet)
├── Procfile                        # Render.com process file (runs Uvicorn)
├── render-hybrid.yaml              # Render.com hybrid deployment config
├── render-build.sh                 # Build script for Render.com
├── templates/                      # Flask templates
│   ├── team_dashboard.html         # Team management UI
│   ├── register.html               # User registration
│   ├── convonet_tech_spec.html     # Technical documentation
│   ├── convonet_system_architecture.html  # System architecture diagram
│   ├── convonet_sequence_diagram.html     # Sequence diagram page
│   └── agent_monitor_dashboard.html       # Agent Monitor (tool/voice timing)
├── static/                         # Static assets
│   └── convonet/                   # Convonet static files
│       └── assets/                 # Images, CSS, JS
├── shared/                         # Shared utilities
│   ├── google_calendar.py          # Google Calendar integration
│   ├── helpers.py                  # Shared helper functions
│   └── schemas.py                  # Shared data schemas
└── convonet/                       # Convonet module
    ├── __init__.py                 # Package initialization
    ├── routes.py                   # Flask routes & Twilio webhooks (Sentry integrated)
    ├── assistant_graph_todo.py     # LangGraph agent (optimized timeouts)
    ├── state.py                    # Agent state management
    ├── asgi_main.py                # FastAPI ASGI application
    ├── fastapi_voice_gateway.py    # FastAPI-native Voice Assistant Gateway
    ├── webrtc_voice_server_socketio.py  # LiveKit WebRTC voice server
    ├── agent_monitor.py            # Agent Monitor (tool/voice timing)
    ├── deepgram/                   # Deepgram STT/TTS (streaming)
    ├── audio_player_routes.py      # Audio stream player routes
    ├── redis_manager.py            # Redis session & audio buffer management
    ├── composio_tools.py           # Composio tool integration
    ├── twilio_handler.py           # Twilio call handling
    ├── voice_intent_utils.py       # Voice intent detection utilities
    ├── voice_utils.py              # Voice processing utilities
    ├── environment_config.py       # Environment configuration
    ├── models/                     # Database models
    │   ├── base.py                 # Shared SQLAlchemy Base
    │   └── user_models.py          # User, Team, TeamMembership models
    ├── security/                   # Authentication & authorization
    │   └── auth.py                 # JWT authentication system
    ├── api_routes/                 # RESTful API endpoints
    │   ├── auth_routes.py          # User registration & login
    │   ├── jwt_scr.py              # JWT secret key management
    │   ├── team_routes.py          # Team management API
    │   └── team_todo_routes.py     # Team todo management API
    ├── migrations/                 # Database migrations
    │   ├── add_team_collaboration.py  # Team collaboration migration
    │   └── add_voice_pin.py        # Voice PIN authentication migration
    ├── mcps/                       # Model Context Protocol servers
    │   ├── mcp_config.json         # MCP server configuration
    │   └── local_servers/
    │       ├── db_todo.py          # 36 database tools (optimized)
    │       ├── call_transfer.py    # 2 call transfer tools (FusionPBX)
    │       └── google_calendar.py  # Calendar operations
    ├── static/                     # Convonet static assets
    │   └── assets/
    │       ├── img/                # Images & diagrams
    │       ├── css/                # Stylesheets
    │       └── js/                 # JavaScript files
    └── templates/                  # Convonet templates
        ├── convonet_todo_index.html  # Main todo interface
        └── webrtc_voice_assistant.html  # LiveKit voice assistant UI

Database Schema

users_convonet

  • • id: UUID (PK)
  • • email: String (unique)
  • • username: String (unique)
  • • password_hash: String
  • • first_name: String
  • • last_name: String
  • • is_active: Boolean
  • • is_verified: Boolean
  • • created_at: DateTime
  • • last_login_at: DateTime

teams_convonet

  • • id: UUID (PK)
  • • name: String
  • • description: Text
  • • is_active: Boolean
  • • created_at: DateTime
  • • updated_at: DateTime

team_memberships_convonet

  • • id: UUID (PK)
  • • team_id: UUID (FK)
  • • user_id: UUID (FK)
  • • role: Enum (owner/admin/member/viewer)
  • • joined_at: DateTime
  • • updated_at: DateTime

todos_convonet

  • • id: UUID (PK)
  • • title: String
  • • description: String
  • • completed: Boolean
  • • priority: Enum
  • • due_date: DateTime
  • • creator_id: UUID (FK)
  • • assignee_id: UUID (FK)
  • • team_id: UUID (FK)
  • • is_private: Boolean
  • • google_calendar_event_id

reminders_convonet

  • • id: UUID (PK)
  • • reminder_text: String
  • • importance: Enum
  • • reminder_date: DateTime
  • • google_calendar_event_id
  • • created_at: DateTime
  • • updated_at: DateTime

calendar_events_convonet

  • • id: UUID (PK)
  • • title: String
  • • description: String
  • • event_from: DateTime
  • • event_to: DateTime
  • • google_calendar_event_id
  • • created_at: DateTime
  • • updated_at: DateTime

call_recordings_convonet

  • • id: UUID (PK)
  • • call_sid: String (unique)
  • • from_number: String
  • • to_number: String
  • • recording_path: String
  • • transcription: Text
  • • status: String
  • • created_at: DateTime

LangGraph Agent Architecture

LangGraph Workflow Diagram

LangGraph Workflow Diagram

LangGraph Workflow: The agent can either continue to use tools or end the conversation based on user input and context.

Agent Components

  • TodoAgent Class: Main agent orchestrator with lazy initialization
  • StateGraph: Manages conversation flow and state
  • Assistant Node: GPT-4 reasoning and response generation
  • Tool Node: Executes 38 MCP tools
  • Conditional Edges: Routes between nodes based on tool calls
  • InMemorySaver: Checkpointer for state persistence

State Management

  • AgentState: Conversation state with message history
  • Message History: Maintains context across turns
  • Customer ID: User identification for multi-tenant
  • Thread ID: Conversation thread tracking
  • Lazy Loading: Prevents circular imports
  • ExceptionGroup Handling: Robust error recovery

Model Context Protocol (MCP) Integration

MCP provides a standardized way for AI agents to interact with external tools and services. The Convonet system uses MCP to expose 38 tools for database operations, team management, call transfer to FusionPBX, Google Calendar integration, and WebRTC voice processing with Deepgram STT.

MCP Server Configuration

{
  "mcpServers": {
    "db": {
      "command": "python",
      "args": ["./convonet/mcps/local_servers/db_todo.py"],
      "transport": "stdio",
      "env": {
        "DB_URI": "${DB_URI}",
        "GOOGLE_OAUTH2_TOKEN_B64": "${GOOGLE_OAUTH2_TOKEN_B64}",
        "GOOGLE_CLIENT_ID": "${GOOGLE_CLIENT_ID}",
        "GOOGLE_CLIENT_SECRET": "${GOOGLE_CLIENT_SECRET}"
      }
    }
  }
}

Available MCP Tools (38)

Todo Management (5)
  • • create_todo
  • • get_todos
  • • complete_todo
  • • update_todo
  • • delete_todo
Team Tools (8)
  • • create_team
  • • get_teams
  • • get_team_members
  • • create_team_todo
  • • add_team_member
  • • remove_team_member
  • • change_member_role
  • • search_users
Reminders (4)
  • • create_reminder
  • • get_reminders
  • • update_reminder
  • • delete_reminder
Calendar (6)
  • • create_calendar_event
  • • get_calendar_events
  • • update_calendar_event
  • • delete_calendar_event
  • • sync_google_calendar_events
  • • test_google_calendar
Call Transfer (2)
  • • transfer_to_agent
  • • get_available_departments

Enhanced LangGraph Tool Calls

The LangGraph implementation provides intelligent tool calling capabilities with dynamic tool selection and error handling. The agent automatically chooses appropriate tools based on user intent and maintains conversation context for seamless interactions.

Tool Calls Flow Diagram

Tool Calls Flow Diagram

Tool Calls Flow: LangGraph implementation showing dynamic tool selection and intelligent orchestration of MCP tools.

Tool Calls Features

MCP Integration
  • • Database operations via MCP servers
  • • Google Calendar synchronization
  • • Team collaboration tools
  • • Real-time tool discovery (38 tools)
  • • Secure tool communication via stdio
  • • Lazy loading for performance
Error Handling
  • • Graceful tool failure recovery
  • • ExceptionGroup unwrapping
  • • 20s timeout per tool
  • • 30s overall agent timeout
  • • Fallback strategies
  • • User-friendly error messages

Tool Features: Intelligent tool calling system with error recovery, timeout management, and seamless MCP integration.

Core Tool Calling Capabilities

  • Dynamic Tool Selection: LLM intelligently chooses appropriate tools based on user intent
  • Error Recovery: Graceful handling of tool failures with fallback strategies
  • Context Awareness: Tools access conversation history and maintain state
  • Streaming Responses: Real-time tool execution updates for better user experience
  • Async Execution: Non-blocking tool calls with proper timeout management
  • ExceptionGroup Handling: Unwraps and logs complex async exceptions

JWT Authentication System

Authentication Flow

1. User Registration
POST /api/auth/register → Bcrypt hash → JWT tokens
2. User Login
POST /api/auth/login → Verify password → Generate tokens
3. API Request
Bearer token → JWT validation → @require_auth
4. Token Refresh
POST /api/auth/refresh → New access token

Security Features

  • Password Hashing: Bcrypt with automatic salt
  • JWT Tokens: HS256 algorithm with secret key
  • Token Expiry: 30 min access, 7 day refresh
  • Authorization: @require_auth decorator
  • Role Validation: @require_role decorator
  • Team Membership: @require_team_member decorator
  • Auto Logout: Frontend handles expired tokens

JWT Token Structure

{
  "user_id": "uuid",
  "email": "user@example.com",
  "roles": ["user"],
  "team_id": "uuid",
  "type": "access",
  "exp": 1728589200,  // 30 minutes from issue
  "iat": 1728587400   // issued at timestamp
}

New Features Technical Details

Redis Session Management & Caching

Session Management
  • • Session creation, retrieval, updates, deletion
  • • Audio buffer storage (base64 encoded)
  • • User authentication state management
  • • Session expiration handling
Real-time Features
  • • Pub/Sub notifications for team updates
  • • Rate limiting (requests per minute)
  • • User activity tracking & analytics
  • • Cache invalidation strategies
# Redis connection with environment config
REDIS_URL = os.getenv('REDIS_URL', 'redis://localhost:6379')
# Session data structure
session_data = {
  'user_id': 'user-123', 'audio_buffer': 'base64...',
  'created_at': timestamp, 'expires_at': ttl
}

Composio External Tool Integration

Supported Platforms
  • Slack: Message sending, channel management
  • GitHub: Repository operations, issue tracking
  • Gmail: Email composition, sending, management
  • Notion: Page creation, database operations
  • Jira: Ticket creation, project management
Integration Features
  • • OAuth2 authentication for each platform
  • • Robust method discovery for API compatibility
  • • Error handling for missing methods
  • • Tool orchestration with LangGraph
# Composio tool loading with fallback methods
if hasattr(toolset, 'get_tools'):
    tools = toolset.get_tools(apps=["slack"])
elif hasattr(toolset, 'get_actions'):
    tools = toolset.get_actions(apps=["slack"])

Audio Stream Player & WebM Support

Audio Processing
  • • WebM format detection & handling
  • • Base64 audio buffer decoding
  • • Real-time audio streaming from Redis
  • • Audio format conversion (WebM → WAV fallback)
Player Features
  • • Session-based audio playback
  • • Audio file download (WebM/WAV)
  • • Audio buffer analysis & debugging
  • • Flask-SocketIO real-time updates
# WebM format detection
if audio_data.startswith(b'\x1a\x45\xdf\xa3'):
    return Response(audio_data, mimetype='audio/webm')
# Audio buffer from Redis
audio_buffer = session_data.get('audio_buffer', '')
audio_data = base64.b64decode(audio_buffer)

API Endpoints

Authentication

  • POST /api/auth/register
  • POST /api/auth/login
  • POST /api/auth/refresh
  • GET /api/auth/profile
  • PUT /api/auth/profile

Team Management

  • POST /api/teams/
  • GET /api/teams/
  • GET /api/teams/{id}
  • POST /api/teams/{id}/members
  • DELETE /api/teams/{id}/members/{user_id}
  • PUT /api/teams/{id}/members/{user_id}/role

Twilio Voice

  • POST /convonet_todo/twilio/call
  • POST /convonet_todo/twilio/process_audio
  • POST /convonet_todo/twilio/transfer

Webhook URL:

https://hjlees.com/convonet_todo/twilio/call

LiveKit WebRTC Voice

  • GET /webrtc/voice-assistant
  • GET /fastapi/voice-assistant (FastAPI Native)
  • WebSocket /voice
  • WebSocket /fastapi/voice-assistant/ws
  • GET /audio-player/

Features: LiveKit WebRTC, streaming STT (Deepgram), TTS (Deepgram/ElevenLabs/Cartesia), domain agents, Agent Monitor

Agent Monitor

  • GET /agent-monitor/

Real-time tool calls, voice timing, elapsed time per tool

Redis Audio Player

  • GET /audio-player/
  • GET /audio-player/sessions
  • GET /audio-player/audio/{session_id}
  • WebSocket /audio-stream

Features: View and analyze audio streams from Redis audio buffers, real-time session monitoring, audio playback and download

Team Dashboard

  • GET /team-dashboard
  • POST /api/teams/
  • GET /api/teams/{id}/todos
  • GET /api/teams/{id}/calendar

Features: Team collaboration dashboard with todo management, calendar integration, member management, real-time updates, and role-based access control

Demo Credentials: admin@convonet.com / admin123

Call Center Agent Dashboard

A complete browser-based SIP phone client with ACD (Automatic Call Distribution) capabilities, providing enterprise-grade call center management features for handling voice assistant transfers and customer support calls.

Agent Management

  • ✓ Secure agent authentication
  • ✓ SIP credential management
  • ✓ Session management
  • ✓ Agent state tracking (Ready/Not Ready/On Call/Wrap Up)
  • ✓ Time-in-state tracking
  • ✓ Activity logging

Call Handling

  • ✓ Incoming call notifications
  • ✓ Caller ID display
  • ✓ Answer/Reject controls
  • ✓ Call hold/unhold
  • ✓ Call transfer (blind & attended)
  • ✓ Outbound dialing
  • ✓ Call duration tracking

Customer Data Popup

  • ✓ Automatic customer info display
  • ✓ Customer ID & contact info
  • ✓ Account status & tier
  • ✓ Last contact date
  • ✓ Open tickets/cases
  • ✓ Lifetime value
  • ✓ Agent notes

SIP Integration

  • ✓ Browser-based SIP client (JsSIP)
  • ✓ WebRTC audio support
  • ✓ WebSocket Secure (WSS)
  • ✓ RFC 3261 compliant
  • ✓ Multiple codec support (G.711, Opus, G.722)
  • ✓ NAT traversal (STUN/TURN)

Dashboard Interface

  • ✓ Agent status panel
  • ✓ Call control panel
  • ✓ 12-key dialpad
  • ✓ Call history display
  • ✓ Real-time status updates
  • ✓ Responsive design (desktop/tablet/mobile)

Monitoring & Reporting

  • ✓ Agent metrics (calls handled, duration)
  • ✓ Call metrics (answer rate, wait time)
  • ✓ Real-time monitoring
  • ✓ Activity timeline
  • ✓ Availability percentage

Agent States

Logged Out
Agent not active
Logged In
Not ready for calls
Ready
Available for calls ✓
Not Ready
On break/unavailable
On Call
Currently on a call

Voice Assistant Transfer Integration

When a user requests to speak with a human agent during a WebRTC voice assistant session, the system automatically transfers the call to the Call Center Agent Dashboard:

Transfer Request: LangGraph detects transfer intent and triggers Twilio API call
Twilio Bridge: Twilio bridges the WebRTC call to FusionPBX SIP server
FusionPBX Routing: Call routed to agent extension (e.g., 2001)
Agent Dashboard: Agent dashboard receives incoming call with customer info popup
Call Answer: Agent answers and continues conversation with customer

API Endpoints

Agent Management
  • POST /call-center/api/agent/login
  • POST /call-center/api/agent/logout
  • POST /call-center/api/agent/ready
  • POST /call-center/api/agent/not-ready
  • GET /call-center/api/agent/status
Call Handling
  • POST /call-center/api/call/ringing
  • POST /call-center/api/call/answer
  • POST /call-center/api/call/drop
  • POST /call-center/api/call/hold
  • POST /call-center/api/call/transfer
Customer Data
  • GET /call-center/api/customer/{id}

Access: /call-center/

Call Center Agent Dashboard:

https://hjlees.com/call-center/ →

Browser-based SIP client requiring no installation. Compatible with Chrome, Firefox, Edge, and Safari. Integrates with FusionPBX for call routing and transfer from WebRTC voice assistant sessions.

Twilio Voice Interface

The Convonet assistant provides complete voice interaction through Twilio integration, supporting natural language commands for team collaboration and task management with Amazon Polly.Amy voice and barge-in capabilities for interruption.

Voice Processing Pipeline

🎤
Speech Input
Twilio Voice
👂
Speech-to-Text
Twilio STT
🧠
LangGraph Agent
AI Processing
🗣️
Text-to-Speech
Polly.Amy
🔊
Voice Response
Twilio TTS
User Speech
Recognition
AI Processing
Voice Output

Voice Pipeline: Complete end-to-end voice processing from speech input to AI-generated voice response with real-time streaming.

Voice Command Examples

Personal Productivity
  • 🗣️ "Create a high priority todo to review the quarterly report"
  • 🗣️ "Add a reminder to call the dentist tomorrow at 2 PM"
  • 🗣️ "Schedule a meeting for next Friday from 2 to 3 PM"
  • 🗣️ "Show me all my pending todos"
  • 🗣️ "Mark the grocery shopping todo as completed"
Team Collaboration
  • 🗣️ "Create a hackathon team"
  • 🗣️ "What teams are available?"
  • 🗣️ "Who are the members of the development team?"
  • 🗣️ "Create a high priority todo for the dev team"
  • 🗣️ "Add admin@convonet.com to the hackathon team as owner"
  • 🗣️ "Assign a code review task to John in the dev team"
  • 🗣️ "Change john@example.com to admin role in the dev team"

Voice Components

  • Speech-to-Text: Twilio speech recognition engine
  • Text-to-Speech: Amazon Polly.Amy voice
  • Barge-in: Interrupt AI while speaking
  • Speech Timeout: Auto detection (10s)
  • Agent Timeout: 30s processing time
  • Tool Timeout: 20s per MCP tool
  • Continuation: Multi-turn conversations
  • Exit Detection: Natural conversation ending

Processing Flow

  1. 1. User calls Twilio number
  2. 2. Twilio webhook triggers /convonet_todo/twilio/call
  3. 3. TwiML with <Gather> returned (10s timeout)
  4. 4. User speaks, Twilio recognizes speech
  5. 5. SpeechResult posted to /process_audio
  6. 6. LangGraph agent processes (30s timeout)
  7. 7. MCP tools execute (20s per tool)
  8. 8. Database & Calendar operations
  9. 9. Response converted to TwiML
  10. 10. Polly.Amy speaks response with barge-in
  11. 11. Conversation continues or ends

Twilio Phone Integration

The Convonet assistant can be accessed via phone calls through Twilio integration, allowing users to interact with the AI assistant through voice calls from any phone number. The system uses TwiML and Gather for speech recognition with barge-in capability.

Twilio Call Flow Architecture

📞
Phone Call
PSTN Network
T
Twilio
Voice API
🌐
Flask Webhook
TwiML Response
🔌
MCP Client
38 Tools
⚙️
Database
PostgreSQL
🧠
LangGraph Agent
AI Processing
Call Initiation
Webhook
Tool Execution
AI Processing

Call Flow: Phone → Twilio → Flask Webhook → LangGraph Agent → MCP Tools → Database/Calendar → TwiML Response

Twilio Components

  • Twilio Voice API: Handles incoming/outgoing calls
  • TwiML: XML-based call flow instructions
  • Gather Element: Collects speech input with barge-in
  • Say Element: Text-to-speech via Polly.Amy
  • Redirect Element: Call continuation logic
  • Webhook Endpoints: Flask routes for call handling

Call Flow Process

  1. 1. User calls Twilio phone number
  2. 2. Twilio webhook triggers /twilio/call
  3. 3. TwiML with <Gather> returned
  4. 4. Greeting: "Hello! I'm your Convonet productivity assistant"
  5. 5. User speaks command
  6. 6. Speech posted to /process_audio
  7. 7. LangGraph agent processes request
  8. 8. MCP tools execute database operations
  9. 9. TwiML response with confirmation
  10. 10. Call continues or ends based on user

Twilio Configuration

# Environment Variables
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+1234567890

# Webhook Endpoints
POST https://hjlees.com/convonet_todo/twilio/call
POST https://hjlees.com/convonet_todo/twilio/process_audio

# TwiML Response Example
<Response>
  <Gather action="/convonet_todo/twilio/process_audio" 
          method="POST" 
          input="speech" 
          speechTimeout="auto" 
          timeout="10" 
          bargeIn="true">
    <Say voice="Polly.Amy">Hello! I'm your Convonet productivity assistant...</Say>
  </Gather>
  <Say voice="Polly.Amy">I didn't hear anything. Please try again.</Say>
  <Redirect>/convonet_todo/twilio/call?is_continuation=true</Redirect>
</Response>

Twilio Setup Instructions

Step 1: Configure Voice Settings

In Twilio Console → Phone Numbers → Active Numbers → Select your number

Step 2: Set Webhook URL

A CALL COMES IN: https://hjlees.com/convonet_todo/twilio/call (HTTP POST)

Step 3: Test Integration

Call your Twilio number and speak a command

Test Case Demo

Watch a demonstration of the Convonet voice assistant in action, showcasing the complete workflow from user interaction to call transfer to a human agent.

Video Format: M4V (MPEG-4) | Demo: Complete voice assistant workflow including call transfer

Audio Note: If you don't hear sound, please check:

  • Ensure the video volume control is not muted (check the volume icon in the video player)
  • Check your browser's audio settings and system volume
  • Some browsers may have limited support for MP2 audio codec in M4V files
  • Try using Chrome or Firefox for best compatibility

🎤 Voice Interaction

Demonstrates natural language voice commands and real-time transcription

🤖 AI Processing

Shows LangGraph agent processing requests and tool calling capabilities

📞 Call Transfer

Illustrates seamless transfer from AI assistant to human agent dashboard