Convonet Voice AI Productivity System

Multi-LLM (Claude, Gemini, OpenAI) + LiveKit WebRTC + Domain Agents (Productivity, Mortgage, Healthcare) + MCP Tools + Team Collaboration + Call Transfer + Agent Monitor + Sentry

View on GitHub or Latest Release

FastAPI Flask LangGraph MCP (38 Tools) LangChain Claude · Gemini · OpenAI LiveKit WebRTC Google Calendar Twilio Voice Agent Monitor Team Collaboration FusionPBX Deepgram STT/TTS ElevenLabs TTS Cartesia TTS Sentry Redis Composio

Technical Architecture

System Architecture Overview

Complete System Flow Diagrams

System Architecture Diagram

Complete system flow overview with all components and their relationships

View Full Diagram →

Sequence Diagram

Step-by-step flow (52 steps) showing interactions between all components

View Full Diagram →

WebRTC Flow: User Browser → FastAPI Gateway (Direct/Legacy) → LiveKit WebRTC → PIN Auth (PostgreSQL) → Streaming STT (Deepgram) → LangGraph → Multi-LLM (Claude/Gemini/OpenAI) → Domain Agents → Tool Calling → Streaming TTS (Deepgram/ElevenLabs/Cartesia) → User

Transfer Flow: User Request → LangGraph (Detect Intent) → Twilio API → FusionPBX (Extension 2001) → Agent Dashboard (JsSIP) → User Info Popup → Live Conversation

Evolution to FastAPI

To ensure enterprise-grade scalability and native asynchronous support for real-time voice, the system has evolved from a pure Flask monolith to a FastAPI Hybrid Gateway.

• ASGI Master Gateway: FastAPI serves as the primary entry point (asgi_main.py), handling high-concurrency connections and native WebSockets.
• Flask Sub-system: The legacy Flask application remains fully functional, mounted as a sub-app via WSGIMiddleware.
• Legacy Compatibility: Socket.IO connections are stabilized using a forced polling transport fallback, ensuring reliable session state through the WSGI bridge.
• Uvicorn Deployment: The system now runs on the Uvicorn ASGI server, providing superior performance over traditional WSGI workers.

Request Routing Flow

/fastapi/* → Native FastAPI Routes

/webrtc/* → Legacy Flask (WSGI)

/socket.io/ → Socket.IO (via WSGIMiddleware)

/* (All Root) → Legacy Flask Monolith

Verified Stable in Production

Monitoring: All Operations → Sentry Monitoring → Real-time Alerts & Performance Tracking

Architecture Flow: Enterprise voice AI system with multi-LLM (Claude, Gemini, OpenAI), LiveKit WebRTC, domain-specific agents (Productivity, Mortgage, Healthcare), PIN authentication (PostgreSQL), team collaboration, Agent Monitor for real-time tool/voice timing, intelligent call transfer to FusionPBX via Twilio, streaming STT/TTS (Deepgram, ElevenLabs, Cartesia), and Sentry error monitoring with automatic thread reset recovery.

View System Architecture Diagram | View Sequence Diagram

Overview

The Convonet Voice AI Productivity System is an enterprise-grade platform that combines multi-LLM AI (Claude, Gemini, OpenAI), LiveKit WebRTC voice, domain-specific agents (Productivity, Mortgage, Healthcare), team collaboration, and intelligent call center integration. Built for hackathon demonstrations, it showcases advanced features including PIN authentication, role-based access control, Agent Monitor for real-time tool/voice timing, call transfer to FusionPBX, and Sentry monitoring.

The system enables teams to manage todos, assign tasks, and collaborate in real-time through web dashboards, voice commands, and seamless transfer to human agents. Voice runs on LiveKit WebRTC with streaming STT (Deepgram) and TTS (Deepgram, ElevenLabs, Cartesia). All data syncs with Google Calendar via OAuth2, with intelligent caching for sub-second voice responses.

Core Technologies

• LangGraph for agent orchestration with tools_condition
• Multi-LLM: Claude, Gemini, OpenAI
• Model Context Protocol (MCP) - 38 tools (36 DB + 2 transfer)
• LangChain for LLM integration & tool binding
• FastAPI + ASGI Master Gateway
• Flask + WSGIMiddleware (Legacy Sub-system)
• LiveKit WebRTC for real-time voice
• Deepgram STT; TTS: Deepgram, ElevenLabs, Cartesia
• SQLAlchemy ORM with PostgreSQL (multi-tenant)
• PIN authentication (4-6 digit voice PIN)
• Agent Monitor: real-time tool/voice timing
• Twilio Programmable Voice with Media Streams
• Google Calendar OAuth2 integration (optimized)
• FusionPBX SIP integration for call transfer
• Sentry.io error monitoring & performance tracking
• JsSIP WebRTC for browser-based softphone
• Render.com deployment with Uvicorn (ASGI)

Key Features

• Domain-specific agents: Productivity, Mortgage, Healthcare
• LiveKit WebRTC voice with streaming STT/TTS
• Agent Monitor: real-time tool calls & voice timing
• Multi-tenant team collaboration with role hierarchy
• PIN-based voice authentication (4-6 digits)
• Intelligent call transfer: AI → FusionPBX Extension 2001
• Multi-TTS: Deepgram, ElevenLabs, Cartesia
• Sentry error monitoring & performance tracking
• Optimized timeouts (8s/10s/12s) for Twilio compatibility
• 38 MCP tools (todos, calendar, teams, transfer)
• WebRTC call center with JsSIP softphone
• Google Calendar OAuth2 sync (background)
• FusionPBX integration on Google Cloud VM
• Production deployment on Render with auto-scaling

Recent Updates & Improvements (February 2026)

FastAPI Hybrid Gateway

✓ ASGI Master Gateway (asgi_main.py)
✓ Flask mounted via WSGIMiddleware
✓ Native FastAPI Voice Gateway
✓ Uvicorn production deployment
✓ Optimized Socket.IO polling fallback

Files: asgi_main.py, fastapi_voice_gateway.py

Call Transfer to FusionPBX

✓ Seamless AI → Human agent transfer
✓ FusionPBX extension 2001 integration
✓ SIP/WSS connectivity (Google Cloud VM)
✓ Transfer detection via phrases or tool
✓ Department routing (support, sales, etc.)

Files: call_transfer.py, CALL_TRANSFER_GUIDE.md

LiveKit WebRTC + Agent Monitor

✓ LiveKit real-time voice (WebRTC)
✓ Streaming STT (Deepgram) & TTS (Deepgram/ElevenLabs/Cartesia)
✓ Domain agents: Productivity, Mortgage, Healthcare
✓ Agent Monitor: tool calls & voice timing
✓ Processing music during agent response

Files: webrtc_voice_server_socketio.py, agent_monitor.py, deepgram/

Composio Integration

✓ Slack workspace integration
✓ GitHub repository management
✓ Gmail email automation
✓ Notion workspace sync
✓ External platform connectivity

Files: COMPOSIO_FEATURES_SUMMARY.md

Sentry Integration

✓ Real-time error tracking & alerts
✓ Performance monitoring (agent processing time)
✓ User context & session tracking
✓ Timeout & thread reset tracking
✓ Production-grade observability

Integration: Flask + SQLAlchemy + Logging

Timeout Optimization

✓ Tool timeout: 8s (from 20s)
✓ Agent timeout: 10s (from 25s)
✓ Webhook timeout: 12s (from 30s)
✓ Stays under Twilio's 15s HTTP limit
✓ Thread reset on timeout prevents errors

Result: 95%+ operations complete successfully

WebRTC Call Center

✓ JsSIP v3.10.1 browser softphone
✓ WebSocket Secure (WSS) on port 7443
✓ Agent dashboard with SIP registration
✓ Call control (answer, hold, transfer, hangup)
✓ Google Cloud firewall configured

Platform: FusionPBX 34.26.59.14 (GCP VM)

Automatic Error Recovery

✓ Thread reset with timestamped IDs
✓ BrokenResourceError handling
✓ tool_call_id incomplete error recovery
✓ In-memory reset tracking (_reset_threads)
✓ No cascading failures

Benefit: Self-healing conversation threads

Performance Optimization

✓ Removed Google Calendar sync delay
✓ Simplified JSON responses (no MCP breaks)
✓ Agent processing time measurement
✓ Transaction tracking per voice call
✓ Custom Sentry metrics & measurements

Result: Sub-5s response times

2001

FusionPBX Extension

Deepgram

STT API

MCP Tools

12s

Max Response Time

100%

Sentry Trace Rate

LiveKit WebRTC Voice Integration Architecture

The LiveKit WebRTC Voice Interface provides browser-based voice interaction with real-time audio streaming via LiveKit, domain-specific agents (Productivity, Mortgage, Healthcare), streaming STT (Deepgram) and TTS (Deepgram, ElevenLabs, Cartesia), and seamless integration with the LangGraph AI agent. Users interact with the assistant directly through their web browser.

Try LiveKit Voice Assistant →

/webrtc/voice-assistant

LiveKit Voice Assistant Architecture

LiveKit WebRTC Processing Flow

🌐

Browser

WebRTC

🚀

FastAPI Gateway

ASGI/WSGI

📡

Socket.IO

Long Polling

💾

Redis

Buffer

🎧

Deepgram

STT API

🧠

LangGraph

AI Agent

Flow: User Browser → LiveKit WebRTC → PIN Auth (PostgreSQL) → Streaming STT (Deepgram) → LangGraph → Multi-LLM (Claude/Gemini/OpenAI) → Domain Agents → Tool Calling → Streaming TTS (Deepgram/ElevenLabs/Cartesia) → User Browser

WebRTC Voice Assistant Flow Phases

View Detailed Sequence Diagram →

Phase 1: Authentication

Steps 1-7

WebSocket connection, PIN authentication via PostgreSQL, Redis session creation

Phase 2: Conversation Loop

Steps 8-31

LiveKit audio → Streaming STT (Deepgram) → LangGraph → Multi-LLM → Domain agents → Tool execution → Streaming TTS (Deepgram/ElevenLabs/Cartesia) → Response

Phase 3: Transfer Request

Steps 32-38

User requests transfer, LangGraph detects intent, sets transfer flag in Redis

Phase 4: Twilio Transfer

Steps 39-52

Twilio API → FusionPBX → Agent Dashboard (JsSIP) → Live conversation bridging

LiveKit WebRTC Architecture: Browser-based voice assistant with LiveKit WebRTC, PostgreSQL PIN authentication, streaming STT (Deepgram) and TTS (Deepgram, ElevenLabs, Cartesia), LangGraph AI orchestration with multi-LLM (Claude, Gemini, OpenAI), domain-specific agents (Productivity, Mortgage, Healthcare), Agent Monitor for real-time tool/voice timing, MCP tool integration, and seamless transfer to FusionPBX via Twilio.

Component Interaction Matrix

Component	Input From	Output To	Purpose
User Browser	User voice input	WebSocket Server	Captures audio, displays UI
WebSocket Server	Browser, WebRTC Server	Browser, Redis	Manages real-time communication
PIN Auth	WebSocket Server	PostgreSQL	Validates user credentials
Redis	WebRTC Server, Tools	Deepgram, WebRTC Server	Buffers audio, stores session
Deepgram STT	Redis Audio Buffer	WebRTC Voice Server	Converts speech to text
LangGraph	WebRTC Server	Claude LLM, Tools	Orchestrates AI conversation flow
Claude LLM	LangGraph	LangGraph	Generates responses, decides actions
Tool Calling	LangGraph	PostgreSQL, Google APIs	Executes external operations
Deepgram TTS	LangGraph Response	WebRTC Server	Converts text to speech
Twilio API	WebRTC Server	FusionPBX	Bridges call to agent
FusionPBX	Twilio	Agent Dashboard	Routes call to extension
Agent Dashboard	FusionPBX	PostgreSQL, User	Displays call, shows user info

WebRTC Voice Interface

✓ Browser-based voice recording
✓ Real-time audio streaming
✓ WebSocket communication
✓ Audio format detection (WebM)
✓ Base64 audio encoding

Technology: WebRTC + Socket.IO

Redis Session Management

✓ Session storage & caching
✓ Audio buffer management
✓ Real-time notifications
✓ User activity tracking
✓ Session expiration handling

Integration: Redis Pub/Sub + Session Store

Composio Integration

✓ Slack workspace integration
✓ GitHub repository management
✓ Gmail email automation
✓ Notion workspace sync
✓ Jira project management

Platforms: Slack, GitHub, Gmail, Notion, Jira

Module Structure

Convonet Project Structure

Project Root/
├── asgi_main.py                    # FastAPI ASGI Master Gateway (Entry Point)
├── app.py                          # Main Flask application (Mounted as sub-app)
├── app_memory_optimized.py         # Memory-optimized Flask application
├── passenger_wsgi.py               # WSGI entry point for legacy support
├── passenger_wsgi_memory_optimized.py  # WSGI entry for memory-optimized app
├── extensions.py                   # Flask extensions (db, login, migrate, etc.)
├── requirements.txt                # Python dependencies
├── requirements-render.txt         # Production dependencies (No Eventlet)
├── Procfile                        # Render.com process file (runs Uvicorn)
├── render-hybrid.yaml              # Render.com hybrid deployment config
├── render-build.sh                 # Build script for Render.com
├── templates/                      # Flask templates
│   ├── team_dashboard.html         # Team management UI
│   ├── register.html               # User registration
│   ├── convonet_tech_spec.html     # Technical documentation
│   ├── convonet_system_architecture.html  # System architecture diagram
│   ├── convonet_sequence_diagram.html     # Sequence diagram page
│   └── agent_monitor_dashboard.html       # Agent Monitor (tool/voice timing)
├── static/                         # Static assets
│   └── convonet/                   # Convonet static files
│       └── assets/                 # Images, CSS, JS
├── shared/                         # Shared utilities
│   ├── google_calendar.py          # Google Calendar integration
│   ├── helpers.py                  # Shared helper functions
│   └── schemas.py                  # Shared data schemas
└── convonet/                       # Convonet module
    ├── __init__.py                 # Package initialization
    ├── routes.py                   # Flask routes & Twilio webhooks (Sentry integrated)
    ├── assistant_graph_todo.py     # LangGraph agent (optimized timeouts)
    ├── state.py                    # Agent state management
    ├── asgi_main.py                # FastAPI ASGI application
    ├── fastapi_voice_gateway.py    # FastAPI-native Voice Assistant Gateway
    ├── webrtc_voice_server_socketio.py  # LiveKit WebRTC voice server
    ├── agent_monitor.py            # Agent Monitor (tool/voice timing)
    ├── deepgram/                   # Deepgram STT/TTS (streaming)
    ├── audio_player_routes.py      # Audio stream player routes
    ├── redis_manager.py            # Redis session & audio buffer management
    ├── composio_tools.py           # Composio tool integration
    ├── twilio_handler.py           # Twilio call handling
    ├── voice_intent_utils.py       # Voice intent detection utilities
    ├── voice_utils.py              # Voice processing utilities
    ├── environment_config.py       # Environment configuration
    ├── models/                     # Database models
    │   ├── base.py                 # Shared SQLAlchemy Base
    │   └── user_models.py          # User, Team, TeamMembership models
    ├── security/                   # Authentication & authorization
    │   └── auth.py                 # JWT authentication system
    ├── api_routes/                 # RESTful API endpoints
    │   ├── auth_routes.py          # User registration & login
    │   ├── jwt_scr.py              # JWT secret key management
    │   ├── team_routes.py          # Team management API
    │   └── team_todo_routes.py     # Team todo management API
    ├── migrations/                 # Database migrations
    │   ├── add_team_collaboration.py  # Team collaboration migration
    │   └── add_voice_pin.py        # Voice PIN authentication migration
    ├── mcps/                       # Model Context Protocol servers
    │   ├── mcp_config.json         # MCP server configuration
    │   └── local_servers/
    │       ├── db_todo.py          # 36 database tools (optimized)
    │       ├── call_transfer.py    # 2 call transfer tools (FusionPBX)
    │       └── google_calendar.py  # Calendar operations
    ├── static/                     # Convonet static assets
    │   └── assets/
    │       ├── img/                # Images & diagrams
    │       ├── css/                # Stylesheets
    │       └── js/                 # JavaScript files
    └── templates/                  # Convonet templates
        ├── convonet_todo_index.html  # Main todo interface
        └── webrtc_voice_assistant.html  # LiveKit voice assistant UI

Database Schema

users_convonet

• id: UUID (PK)
• email: String (unique)
• username: String (unique)
• password_hash: String
• first_name: String
• last_name: String
• is_active: Boolean
• is_verified: Boolean
• created_at: DateTime
• last_login_at: DateTime

teams_convonet

• id: UUID (PK)
• name: String
• description: Text
• is_active: Boolean
• created_at: DateTime
• updated_at: DateTime

team_memberships_convonet

• id: UUID (PK)
• team_id: UUID (FK)
• user_id: UUID (FK)
• role: Enum (owner/admin/member/viewer)
• joined_at: DateTime
• updated_at: DateTime

todos_convonet

• id: UUID (PK)
• title: String
• description: String
• completed: Boolean
• priority: Enum
• due_date: DateTime
• creator_id: UUID (FK)
• assignee_id: UUID (FK)
• team_id: UUID (FK)
• is_private: Boolean
• google_calendar_event_id

reminders_convonet

• id: UUID (PK)
• reminder_text: String
• importance: Enum
• reminder_date: DateTime
• google_calendar_event_id
• created_at: DateTime
• updated_at: DateTime

calendar_events_convonet

• id: UUID (PK)
• title: String
• description: String
• event_from: DateTime
• event_to: DateTime
• google_calendar_event_id
• created_at: DateTime
• updated_at: DateTime

call_recordings_convonet

• id: UUID (PK)
• call_sid: String (unique)
• from_number: String
• to_number: String
• recording_path: String
• transcription: Text
• status: String
• created_at: DateTime

LangGraph Agent Architecture

LangGraph Workflow Diagram

LangGraph Workflow: The agent can either continue to use tools or end the conversation based on user input and context.

Agent Components

• TodoAgent Class: Main agent orchestrator with lazy initialization
• StateGraph: Manages conversation flow and state
• Assistant Node: GPT-4 reasoning and response generation
• Tool Node: Executes 38 MCP tools
• Conditional Edges: Routes between nodes based on tool calls
• InMemorySaver: Checkpointer for state persistence

State Management

• AgentState: Conversation state with message history
• Message History: Maintains context across turns
• Customer ID: User identification for multi-tenant
• Thread ID: Conversation thread tracking
• Lazy Loading: Prevents circular imports
• ExceptionGroup Handling: Robust error recovery

Model Context Protocol (MCP) Integration

MCP provides a standardized way for AI agents to interact with external tools and services. The Convonet system uses MCP to expose 38 tools for database operations, team management, call transfer to FusionPBX, Google Calendar integration, and WebRTC voice processing with Deepgram STT.

MCP Server Configuration

{
  "mcpServers": {
    "db": {
      "command": "python",
      "args": ["./convonet/mcps/local_servers/db_todo.py"],
      "transport": "stdio",
      "env": {
        "DB_URI": "${DB_URI}",
        "GOOGLE_OAUTH2_TOKEN_B64": "${GOOGLE_OAUTH2_TOKEN_B64}",
        "GOOGLE_CLIENT_ID": "${GOOGLE_CLIENT_ID}",
        "GOOGLE_CLIENT_SECRET": "${GOOGLE_CLIENT_SECRET}"
      }
    }
  }
}

Available MCP Tools (38)

Todo Management (5)

• create_todo
• get_todos
• complete_todo
• update_todo
• delete_todo

Team Tools (8)

• create_team
• get_teams
• get_team_members
• create_team_todo
• add_team_member
• remove_team_member
• change_member_role
• search_users

Reminders (4)

• create_reminder
• get_reminders
• update_reminder
• delete_reminder

Calendar (6)

• create_calendar_event
• get_calendar_events
• update_calendar_event
• delete_calendar_event
• sync_google_calendar_events
• test_google_calendar

Call Transfer (2)

• transfer_to_agent
• get_available_departments

Enhanced LangGraph Tool Calls

The LangGraph implementation provides intelligent tool calling capabilities with dynamic tool selection and error handling. The agent automatically chooses appropriate tools based on user intent and maintains conversation context for seamless interactions.

Tool Calls Flow Diagram

Tool Calls Flow: LangGraph implementation showing dynamic tool selection and intelligent orchestration of MCP tools.

Tool Calls Features

MCP Integration

• Database operations via MCP servers
• Google Calendar synchronization
• Team collaboration tools
• Real-time tool discovery (38 tools)
• Secure tool communication via stdio
• Lazy loading for performance

Error Handling

• Graceful tool failure recovery
• ExceptionGroup unwrapping
• 20s timeout per tool
• 30s overall agent timeout
• Fallback strategies
• User-friendly error messages

Tool Features: Intelligent tool calling system with error recovery, timeout management, and seamless MCP integration.

Core Tool Calling Capabilities

• Dynamic Tool Selection: LLM intelligently chooses appropriate tools based on user intent
• Error Recovery: Graceful handling of tool failures with fallback strategies
• Context Awareness: Tools access conversation history and maintain state
• Streaming Responses: Real-time tool execution updates for better user experience
• Async Execution: Non-blocking tool calls with proper timeout management
• ExceptionGroup Handling: Unwraps and logs complex async exceptions

JWT Authentication System

Authentication Flow

1. User Registration

POST /api/auth/register → Bcrypt hash → JWT tokens

2. User Login

POST /api/auth/login → Verify password → Generate tokens

3. API Request

Bearer token → JWT validation → @require_auth

4. Token Refresh

POST /api/auth/refresh → New access token

Security Features

• Password Hashing: Bcrypt with automatic salt
• JWT Tokens: HS256 algorithm with secret key
• Token Expiry: 30 min access, 7 day refresh
• Authorization: @require_auth decorator
• Role Validation: @require_role decorator
• Team Membership: @require_team_member decorator
• Auto Logout: Frontend handles expired tokens

JWT Token Structure

{
  "user_id": "uuid",
  "email": "user@example.com",
  "roles": ["user"],
  "team_id": "uuid",
  "type": "access",
  "exp": 1728589200,  // 30 minutes from issue
  "iat": 1728587400   // issued at timestamp
}

New Features Technical Details

Redis Session Management & Caching

Session Management

• Session creation, retrieval, updates, deletion
• Audio buffer storage (base64 encoded)
• User authentication state management
• Session expiration handling

Real-time Features

• Pub/Sub notifications for team updates
• Rate limiting (requests per minute)
• User activity tracking & analytics
• Cache invalidation strategies


                                # Redis connection with environment config

                                REDIS_URL = os.getenv('REDIS_URL', 'redis://localhost:6379')

                                # Session data structure

                                session_data = {

                                  'user_id': 'user-123', 'audio_buffer': 'base64...',

                                  'created_at': timestamp, 'expires_at': ttl

                                }

Composio External Tool Integration

Supported Platforms

• Slack: Message sending, channel management
• GitHub: Repository operations, issue tracking
• Gmail: Email composition, sending, management
• Notion: Page creation, database operations
• Jira: Ticket creation, project management

Integration Features

• OAuth2 authentication for each platform
• Robust method discovery for API compatibility
• Error handling for missing methods
• Tool orchestration with LangGraph


                                # Composio tool loading with fallback methods

                                if hasattr(toolset, 'get_tools'):

                                    tools = toolset.get_tools(apps=["slack"])

                                elif hasattr(toolset, 'get_actions'):

                                    tools = toolset.get_actions(apps=["slack"])

Audio Stream Player & WebM Support

Audio Processing

• WebM format detection & handling
• Base64 audio buffer decoding
• Real-time audio streaming from Redis
• Audio format conversion (WebM → WAV fallback)

Player Features

• Session-based audio playback
• Audio file download (WebM/WAV)
• Audio buffer analysis & debugging
• Flask-SocketIO real-time updates


                                # WebM format detection

                                if audio_data.startswith(b'\x1a\x45\xdf\xa3'):

                                    return Response(audio_data, mimetype='audio/webm')

                                # Audio buffer from Redis

                                audio_buffer = session_data.get('audio_buffer', '')

                                audio_data = base64.b64decode(audio_buffer)

API Endpoints

Authentication

POST /api/auth/register
POST /api/auth/login
POST /api/auth/refresh
GET /api/auth/profile
PUT /api/auth/profile

Team Management

POST /api/teams/
GET /api/teams/
GET /api/teams/{id}
POST /api/teams/{id}/members
DELETE /api/teams/{id}/members/{user_id}
PUT /api/teams/{id}/members/{user_id}/role

Twilio Voice

POST /convonet_todo/twilio/call
POST /convonet_todo/twilio/process_audio
POST /convonet_todo/twilio/transfer

Webhook URL:

https://hjlees.com/convonet_todo/twilio/call

LiveKit WebRTC Voice

GET /webrtc/voice-assistant
GET /fastapi/voice-assistant (FastAPI Native)
WebSocket /voice
WebSocket /fastapi/voice-assistant/ws
GET /audio-player/

Features: LiveKit WebRTC, streaming STT (Deepgram), TTS (Deepgram/ElevenLabs/Cartesia), domain agents, Agent Monitor

Agent Monitor

GET /agent-monitor/

Real-time tool calls, voice timing, elapsed time per tool

Redis Audio Player

GET /audio-player/
GET /audio-player/sessions
GET /audio-player/audio/{session_id}
WebSocket /audio-stream

Audio Player Dashboard:

https://hjlees.com/audio-player/ →

Features: View and analyze audio streams from Redis audio buffers, real-time session monitoring, audio playback and download

Team Dashboard

GET /team-dashboard
POST /api/teams/
GET /api/teams/{id}/todos
GET /api/teams/{id}/calendar

Team Dashboard:

https://hjlees.com/team-dashboard →

Features: Team collaboration dashboard with todo management, calendar integration, member management, real-time updates, and role-based access control

Demo Credentials: admin@convonet.com / admin123

Call Center Agent Dashboard

A complete browser-based SIP phone client with ACD (Automatic Call Distribution) capabilities, providing enterprise-grade call center management features for handling voice assistant transfers and customer support calls.

Agent Management

✓ Secure agent authentication
✓ SIP credential management
✓ Session management
✓ Agent state tracking (Ready/Not Ready/On Call/Wrap Up)
✓ Time-in-state tracking
✓ Activity logging

Call Handling

✓ Incoming call notifications
✓ Caller ID display
✓ Answer/Reject controls
✓ Call hold/unhold
✓ Call transfer (blind & attended)
✓ Outbound dialing
✓ Call duration tracking

Customer Data Popup

✓ Automatic customer info display
✓ Customer ID & contact info
✓ Account status & tier
✓ Last contact date
✓ Open tickets/cases
✓ Lifetime value
✓ Agent notes

SIP Integration

✓ Browser-based SIP client (JsSIP)
✓ WebRTC audio support
✓ WebSocket Secure (WSS)
✓ RFC 3261 compliant
✓ Multiple codec support (G.711, Opus, G.722)
✓ NAT traversal (STUN/TURN)

Dashboard Interface

✓ Agent status panel
✓ Call control panel
✓ 12-key dialpad
✓ Call history display
✓ Real-time status updates
✓ Responsive design (desktop/tablet/mobile)

Monitoring & Reporting

✓ Agent metrics (calls handled, duration)
✓ Call metrics (answer rate, wait time)
✓ Real-time monitoring
✓ Activity timeline
✓ Availability percentage

Agent States

Logged Out

Agent not active

Logged In

Not ready for calls

Ready

Available for calls ✓

Not Ready

On break/unavailable

On Call

Currently on a call

Voice Assistant Transfer Integration

When a user requests to speak with a human agent during a WebRTC voice assistant session, the system automatically transfers the call to the Call Center Agent Dashboard:

✓ Transfer Request: LangGraph detects transfer intent and triggers Twilio API call

✓ Twilio Bridge: Twilio bridges the WebRTC call to FusionPBX SIP server

✓ FusionPBX Routing: Call routed to agent extension (e.g., 2001)

✓ Agent Dashboard: Agent dashboard receives incoming call with customer info popup

✓ Call Answer: Agent answers and continues conversation with customer

API Endpoints

Agent Management

POST /call-center/api/agent/login
POST /call-center/api/agent/logout
POST /call-center/api/agent/ready
POST /call-center/api/agent/not-ready
GET /call-center/api/agent/status

Call Handling

POST /call-center/api/call/ringing
POST /call-center/api/call/answer
POST /call-center/api/call/drop
POST /call-center/api/call/hold
POST /call-center/api/call/transfer

Customer Data

GET /call-center/api/customer/{id}

Access: /call-center/

Call Center Agent Dashboard:

https://hjlees.com/call-center/ →

Browser-based SIP client requiring no installation. Compatible with Chrome, Firefox, Edge, and Safari. Integrates with FusionPBX for call routing and transfer from WebRTC voice assistant sessions.

Twilio Voice Interface

The Convonet assistant provides complete voice interaction through Twilio integration, supporting natural language commands for team collaboration and task management with Amazon Polly.Amy voice and barge-in capabilities for interruption.

Voice Processing Pipeline

🎤

Speech Input

Twilio Voice

👂

Speech-to-Text

Twilio STT

🧠

LangGraph Agent

AI Processing

🗣️

Text-to-Speech

Polly.Amy

🔊

Voice Response

Twilio TTS

User Speech

Recognition

AI Processing

Voice Output

Voice Pipeline: Complete end-to-end voice processing from speech input to AI-generated voice response with real-time streaming.

Voice Command Examples

Personal Productivity

🗣️ "Create a high priority todo to review the quarterly report"
🗣️ "Add a reminder to call the dentist tomorrow at 2 PM"
🗣️ "Schedule a meeting for next Friday from 2 to 3 PM"
🗣️ "Show me all my pending todos"
🗣️ "Mark the grocery shopping todo as completed"

Team Collaboration

🗣️ "Create a hackathon team"
🗣️ "What teams are available?"
🗣️ "Who are the members of the development team?"
🗣️ "Create a high priority todo for the dev team"
🗣️ "Add admin@convonet.com to the hackathon team as owner"
🗣️ "Assign a code review task to John in the dev team"
🗣️ "Change john@example.com to admin role in the dev team"

Voice Components

• Speech-to-Text: Twilio speech recognition engine
• Text-to-Speech: Amazon Polly.Amy voice
• Barge-in: Interrupt AI while speaking
• Speech Timeout: Auto detection (10s)
• Agent Timeout: 30s processing time
• Tool Timeout: 20s per MCP tool
• Continuation: Multi-turn conversations
• Exit Detection: Natural conversation ending

Processing Flow

1. User calls Twilio number
2. Twilio webhook triggers /convonet_todo/twilio/call
3. TwiML with <Gather> returned (10s timeout)
4. User speaks, Twilio recognizes speech
5. SpeechResult posted to /process_audio
6. LangGraph agent processes (30s timeout)
7. MCP tools execute (20s per tool)
8. Database & Calendar operations
9. Response converted to TwiML
10. Polly.Amy speaks response with barge-in
11. Conversation continues or ends

Twilio Phone Integration

The Convonet assistant can be accessed via phone calls through Twilio integration, allowing users to interact with the AI assistant through voice calls from any phone number. The system uses TwiML and Gather for speech recognition with barge-in capability.

Twilio Call Flow Architecture

📞

Phone Call

PSTN Network

Twilio

Voice API

🌐

Flask Webhook

TwiML Response

🔌

MCP Client

38 Tools

⚙️

Database

PostgreSQL

🧠

LangGraph Agent

AI Processing

Call Initiation

Webhook

Tool Execution

AI Processing

Call Flow: Phone → Twilio → Flask Webhook → LangGraph Agent → MCP Tools → Database/Calendar → TwiML Response

Twilio Components

• Twilio Voice API: Handles incoming/outgoing calls
• TwiML: XML-based call flow instructions
• Gather Element: Collects speech input with barge-in
• Say Element: Text-to-speech via Polly.Amy
• Redirect Element: Call continuation logic
• Webhook Endpoints: Flask routes for call handling

Call Flow Process

1. User calls Twilio phone number
2. Twilio webhook triggers /twilio/call
3. TwiML with <Gather> returned
4. Greeting: "Hello! I'm your Convonet productivity assistant"
5. User speaks command
6. Speech posted to /process_audio
7. LangGraph agent processes request
8. MCP tools execute database operations
9. TwiML response with confirmation
10. Call continues or ends based on user

Twilio Configuration

# Environment Variables
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+1234567890

# Webhook Endpoints
POST https://hjlees.com/convonet_todo/twilio/call
POST https://hjlees.com/convonet_todo/twilio/process_audio

# TwiML Response Example
<Response>
  <Gather action="/convonet_todo/twilio/process_audio" 
          method="POST" 
          input="speech" 
          speechTimeout="auto" 
          timeout="10" 
          bargeIn="true">
    <Say voice="Polly.Amy">Hello! I'm your Convonet productivity assistant...</Say>
  </Gather>
  <Say voice="Polly.Amy">I didn't hear anything. Please try again.</Say>
  <Redirect>/convonet_todo/twilio/call?is_continuation=true</Redirect>
</Response>

Twilio Setup Instructions

Step 1: Configure Voice Settings

In Twilio Console → Phone Numbers → Active Numbers → Select your number

Step 2: Set Webhook URL

A CALL COMES IN: https://hjlees.com/convonet_todo/twilio/call (HTTP POST)

Step 3: Test Integration

Call your Twilio number and speak a command

Test Case Demo

Watch a demonstration of the Convonet voice assistant in action, showcasing the complete workflow from user interaction to call transfer to a human agent.

Video Format: M4V (MPEG-4) | Demo: Complete voice assistant workflow including call transfer

Audio Note: If you don't hear sound, please check:

Ensure the video volume control is not muted (check the volume icon in the video player)
Check your browser's audio settings and system volume
Some browsers may have limited support for MP2 audio codec in M4V files
Try using Chrome or Firefox for best compatibility

🎤 Voice Interaction

Demonstrates natural language voice commands and real-time transcription

🤖 AI Processing

Shows LangGraph agent processing requests and tool calling capabilities

📞 Call Transfer

Illustrates seamless transfer from AI assistant to human agent dashboard

Back to Home