Convonet Voice AI Productivity System
LangGraph + Twilio + MCP + Team Management + Call Transfer + Sentry Monitoring + Redis + Composio + Audio Stream Player + WebRTC Voice Integration
Technical Architecture
System Architecture Overview
Complete System Flow Diagrams
System Architecture Diagram
Complete system flow overview with all components and their relationships
View Full Diagram →Sequence Diagram
Step-by-step flow (52 steps) showing interactions between all components
View Full Diagram →Architecture Flow: Enterprise voice AI system with PIN authentication (PostgreSQL), team collaboration, intelligent call transfer to FusionPBX call center via Twilio, WebRTC voice integration with Deepgram STT and Redis audio buffering, and comprehensive Sentry error monitoring with automatic thread reset recovery (8s/10s/12s timeout optimization).
Overview
The Convonet Voice AI Productivity System is an enterprise-grade platform that combines
LangGraph AI agents, team collaboration, voice interaction, and intelligent call center integration.
Built for hackathon demonstrations, it showcases advanced features including PIN authentication,
role-based access control, call transfer to FreePBX, and production-grade error monitoring with Sentry.
The system enables teams to manage todos, assign tasks to members, and collaborate in real-time
through web dashboards, voice commands, and seamless transfer to human agents. Features optimized
timeout handling (8s/10s/12s), automatic thread reset recovery, and comprehensive error tracking.
All data is synchronized with Google Calendar using OAuth2 authentication, with intelligent caching
to ensure sub-second voice responses.
Core Technologies
- • LangGraph for agent orchestration with tools_condition
- • Model Context Protocol (MCP) - 38 tools (36 DB + 2 transfer)
- • LangChain for LLM integration & tool binding
- • Flask + Flask-SocketIO for web API & WebSocket
- • SQLAlchemy ORM with PostgreSQL (multi-tenant)
- • PIN authentication (4-6 digit voice PIN)
- • Anthropic Claude API (Claude 3.5 Sonnet)
- • Deepgram APIs (STT & TTS)
- • Twilio Programmable Voice with Media Streams
- • Google Calendar OAuth2 integration (optimized)
- • FusionPBX SIP integration for call transfer
- • Deepgram Speech-to-Text API for WebRTC
- • WebRTC browser-based voice recording
- • Sentry.io error monitoring & performance tracking
- • JsSIP WebRTC for browser-based softphone
- • Render.com auto-deployment with gunicorn+eventlet
Key Features
- • Multi-tenant team collaboration with role hierarchy
- • PIN-based voice authentication (4-6 digits)
- • Intelligent call transfer: AI → FusionPBX Extension 2001
- • WebRTC voice with Deepgram STT transcription
- • Redis audio buffer management for WebRTC
- • Sentry error monitoring & performance tracking
- • Optimized timeouts (8s/10s/12s) for Twilio compatibility
- • Automatic thread reset on timeout/error
- • 38 MCP tools (todos, calendar, teams, transfer)
- • WebRTC call center with JsSIP softphone
- • Real-time voice AI with barge-in capability
- • Google Calendar OAuth2 sync (background)
- • FusionPBX integration on Google Cloud VM
- • Production deployment on Render with auto-scaling
Recent Updates & Improvements (October 2025)
Call Transfer to FusionPBX
- ✓ Seamless AI → Human agent transfer
- ✓ FusionPBX extension 2001 integration
- ✓ SIP/WSS connectivity (Google Cloud VM)
- ✓ Transfer detection via phrases or tool
- ✓ Department routing (support, sales, etc.)
Deepgram WebRTC STT
- ✓ Real-time speech-to-text transcription
- ✓ WebRTC audio stream processing
- ✓ WebM format detection & handling
- ✓ High accuracy transcription (95%+)
- ✓ Low latency (200-500ms)
Composio Integration
- ✓ Slack workspace integration
- ✓ GitHub repository management
- ✓ Gmail email automation
- ✓ Notion workspace sync
- ✓ External platform connectivity
Sentry Integration
- ✓ Real-time error tracking & alerts
- ✓ Performance monitoring (agent processing time)
- ✓ User context & session tracking
- ✓ Timeout & thread reset tracking
- ✓ Production-grade observability
Timeout Optimization
- ✓ Tool timeout: 8s (from 20s)
- ✓ Agent timeout: 10s (from 25s)
- ✓ Webhook timeout: 12s (from 30s)
- ✓ Stays under Twilio's 15s HTTP limit
- ✓ Thread reset on timeout prevents errors
WebRTC Call Center
- ✓ JsSIP v3.10.1 browser softphone
- ✓ WebSocket Secure (WSS) on port 7443
- ✓ Agent dashboard with SIP registration
- ✓ Call control (answer, hold, transfer, hangup)
- ✓ Google Cloud firewall configured
Automatic Error Recovery
- ✓ Thread reset with timestamped IDs
- ✓ BrokenResourceError handling
- ✓ tool_call_id incomplete error recovery
- ✓ In-memory reset tracking (_reset_threads)
- ✓ No cascading failures
Performance Optimization
- ✓ Removed Google Calendar sync delay
- ✓ Simplified JSON responses (no MCP breaks)
- ✓ Agent processing time measurement
- ✓ Transaction tracking per voice call
- ✓ Custom Sentry metrics & measurements
WebRTC Voice Integration Architecture
The WebRTC Voice Interface provides browser-based voice interaction with real-time audio streaming, Redis session management, and seamless integration with the LangGraph AI agent. Users can interact with the assistant directly through their web browser without requiring phone calls.
WebRTC Voice Assistant Architecture
WebRTC Speech-to-Deepgram Processing Flow
Corrected Flow: User Browser → PIN Auth (PostgreSQL) → WebRTC Audio Capture → Socket.IO Streaming → Redis Audio Buffer → Deepgram STT → LangGraph → Claude LLM → Tool Calling (PostgreSQL/Google APIs) → Deepgram TTS → Redis Buffer → Socket.IO → User Browser
WebRTC Voice Assistant Flow Phases
View Detailed Sequence Diagram →Phase 1: Authentication
Steps 1-7
WebSocket connection, PIN authentication via PostgreSQL, Redis session creation
Phase 2: Conversation Loop
Steps 8-31
Audio capture → Redis buffer → Deepgram STT → LangGraph → Claude LLM → Tool execution → Deepgram TTS → Response
Phase 3: Transfer Request
Steps 32-38
User requests transfer, LangGraph detects intent, sets transfer flag in Redis
Phase 4: Twilio Transfer
Steps 39-52
Twilio API → FusionPBX → Agent Dashboard (JsSIP) → Live conversation bridging
WebRTC Architecture: Browser-based voice assistant with PostgreSQL PIN authentication, Redis session and audio buffer management, real-time audio streaming via Socket.IO, Deepgram STT transcription (not Twilio), LangGraph AI orchestration with Claude LLM and Deepgram TTS, MCP tool integration (PostgreSQL, Google APIs), and seamless transfer to FusionPBX call center via Twilio SIP trunking.
Component Interaction Matrix
| Component | Input From | Output To | Purpose |
|---|---|---|---|
| User Browser | User voice input | WebSocket Server | Captures audio, displays UI |
| WebSocket Server | Browser, WebRTC Server | Browser, Redis | Manages real-time communication |
| PIN Auth | WebSocket Server | PostgreSQL | Validates user credentials |
| Redis | WebRTC Server, Tools | Deepgram, WebRTC Server | Buffers audio, stores session |
| Deepgram STT | Redis Audio Buffer | WebRTC Voice Server | Converts speech to text |
| LangGraph | WebRTC Server | Claude LLM, Tools | Orchestrates AI conversation flow |
| Claude LLM | LangGraph | LangGraph | Generates responses, decides actions |
| Tool Calling | LangGraph | PostgreSQL, Google APIs | Executes external operations |
| Deepgram TTS | LangGraph Response | WebRTC Server | Converts text to speech |
| Twilio API | WebRTC Server | FusionPBX | Bridges call to agent |
| FusionPBX | Twilio | Agent Dashboard | Routes call to extension |
| Agent Dashboard | FusionPBX | PostgreSQL, User | Displays call, shows user info |
WebRTC Voice Interface
- ✓ Browser-based voice recording
- ✓ Real-time audio streaming
- ✓ WebSocket communication
- ✓ Audio format detection (WebM)
- ✓ Base64 audio encoding
Redis Session Management
- ✓ Session storage & caching
- ✓ Audio buffer management
- ✓ Real-time notifications
- ✓ User activity tracking
- ✓ Session expiration handling
Composio Integration
- ✓ Slack workspace integration
- ✓ GitHub repository management
- ✓ Gmail email automation
- ✓ Notion workspace sync
- ✓ Jira project management
Module Structure
Convonet Project Structure
Project Root/
├── app.py # Main Flask application
├── app_memory_optimized.py # Memory-optimized Flask application
├── passenger_wsgi.py # WSGI entry point for main app
├── passenger_wsgi_memory_optimized.py # WSGI entry for memory-optimized app
├── extensions.py # Flask extensions (db, login, migrate, etc.)
├── requirements.txt # Python dependencies
├── requirements_memory_optimized.txt # Memory-optimized dependencies
├── Procfile # Render.com process file
├── render.yaml # Render.com deployment config
├── render_memory_optimized.yaml # Memory-optimized deployment config
├── render-build.sh # Build script for Render.com
├── deepgram_service.py # Deepgram STT service (root level)
├── deepgram_webrtc_integration.py # WebRTC Deepgram integration (root level)
├── templates/ # Flask templates
│ ├── team_dashboard.html # Team management UI
│ ├── register.html # User registration
│ ├── convonet_tech_spec.html # Technical documentation
│ ├── convonet_system_architecture.html # System architecture diagram
│ └── convonet_sequence_diagram.html # Sequence diagram page
├── static/ # Static assets
│ └── convonet/ # Convonet static files
│ └── assets/ # Images, CSS, JS
├── shared/ # Shared utilities
│ ├── google_calendar.py # Google Calendar integration
│ ├── helpers.py # Shared helper functions
│ └── schemas.py # Shared data schemas
└── convonet/ # Convonet module
├── __init__.py # Package initialization
├── routes.py # Flask routes & Twilio webhooks (Sentry integrated)
├── assistant_graph_todo.py # LangGraph agent (optimized timeouts)
├── state.py # Agent state management
├── webrtc_voice_server.py # WebRTC voice assistant server
├── audio_player_routes.py # Audio stream player routes
├── redis_manager.py # Redis session & audio buffer management
├── composio_tools.py # Composio tool integration
├── twilio_handler.py # Twilio call handling
├── voice_intent_utils.py # Voice intent detection utilities
├── voice_utils.py # Voice processing utilities
├── environment_config.py # Environment configuration
├── models/ # Database models
│ ├── base.py # Shared SQLAlchemy Base
│ └── user_models.py # User, Team, TeamMembership models
├── security/ # Authentication & authorization
│ └── auth.py # JWT authentication system
├── api_routes/ # RESTful API endpoints
│ ├── auth_routes.py # User registration & login
│ ├── jwt_scr.py # JWT secret key management
│ ├── team_routes.py # Team management API
│ └── team_todo_routes.py # Team todo management API
├── migrations/ # Database migrations
│ ├── add_team_collaboration.py # Team collaboration migration
│ └── add_voice_pin.py # Voice PIN authentication migration
├── mcps/ # Model Context Protocol servers
│ ├── mcp_config.json # MCP server configuration
│ └── local_servers/
│ ├── db_todo.py # 36 database tools (optimized)
│ ├── call_transfer.py # 2 call transfer tools (FusionPBX)
│ └── google_calendar.py # Calendar operations
├── static/ # Convonet static assets
│ └── assets/
│ ├── img/ # Images & diagrams
│ ├── css/ # Stylesheets
│ └── js/ # JavaScript files
└── templates/ # Convonet templates
├── convonet_todo_index.html # Main todo interface
└── webrtc_voice_assistant.html # WebRTC voice assistant UI
Database Schema
users_convonet
- • id: UUID (PK)
- • email: String (unique)
- • username: String (unique)
- • password_hash: String
- • first_name: String
- • last_name: String
- • is_active: Boolean
- • is_verified: Boolean
- • created_at: DateTime
- • last_login_at: DateTime
teams_convonet
- • id: UUID (PK)
- • name: String
- • description: Text
- • is_active: Boolean
- • created_at: DateTime
- • updated_at: DateTime
team_memberships_convonet
- • id: UUID (PK)
- • team_id: UUID (FK)
- • user_id: UUID (FK)
- • role: Enum (owner/admin/member/viewer)
- • joined_at: DateTime
- • updated_at: DateTime
todos_convonet
- • id: UUID (PK)
- • title: String
- • description: String
- • completed: Boolean
- • priority: Enum
- • due_date: DateTime
- • creator_id: UUID (FK)
- • assignee_id: UUID (FK)
- • team_id: UUID (FK)
- • is_private: Boolean
- • google_calendar_event_id
reminders_convonet
- • id: UUID (PK)
- • reminder_text: String
- • importance: Enum
- • reminder_date: DateTime
- • google_calendar_event_id
- • created_at: DateTime
- • updated_at: DateTime
calendar_events_convonet
- • id: UUID (PK)
- • title: String
- • description: String
- • event_from: DateTime
- • event_to: DateTime
- • google_calendar_event_id
- • created_at: DateTime
- • updated_at: DateTime
call_recordings_convonet
- • id: UUID (PK)
- • call_sid: String (unique)
- • from_number: String
- • to_number: String
- • recording_path: String
- • transcription: Text
- • status: String
- • created_at: DateTime
LangGraph Agent Architecture
LangGraph Workflow Diagram
LangGraph Workflow: The agent can either continue to use tools or end the conversation based on user input and context.
Agent Components
- • TodoAgent Class: Main agent orchestrator with lazy initialization
- • StateGraph: Manages conversation flow and state
- • Assistant Node: GPT-4 reasoning and response generation
- • Tool Node: Executes 38 MCP tools
- • Conditional Edges: Routes between nodes based on tool calls
- • InMemorySaver: Checkpointer for state persistence
State Management
- • AgentState: Conversation state with message history
- • Message History: Maintains context across turns
- • Customer ID: User identification for multi-tenant
- • Thread ID: Conversation thread tracking
- • Lazy Loading: Prevents circular imports
- • ExceptionGroup Handling: Robust error recovery
Model Context Protocol (MCP) Integration
MCP provides a standardized way for AI agents to interact with external tools and services. The Convonet system uses MCP to expose 38 tools for database operations, team management, call transfer to FusionPBX, Google Calendar integration, and WebRTC voice processing with Deepgram STT.
MCP Server Configuration
{
"mcpServers": {
"db": {
"command": "python",
"args": ["./convonet/mcps/local_servers/db_todo.py"],
"transport": "stdio",
"env": {
"DB_URI": "${DB_URI}",
"GOOGLE_OAUTH2_TOKEN_B64": "${GOOGLE_OAUTH2_TOKEN_B64}",
"GOOGLE_CLIENT_ID": "${GOOGLE_CLIENT_ID}",
"GOOGLE_CLIENT_SECRET": "${GOOGLE_CLIENT_SECRET}"
}
}
}
}
Available MCP Tools (38)
Todo Management (5)
- • create_todo
- • get_todos
- • complete_todo
- • update_todo
- • delete_todo
Team Tools (8)
- • create_team
- • get_teams
- • get_team_members
- • create_team_todo
- • add_team_member
- • remove_team_member
- • change_member_role
- • search_users
Reminders (4)
- • create_reminder
- • get_reminders
- • update_reminder
- • delete_reminder
Calendar (6)
- • create_calendar_event
- • get_calendar_events
- • update_calendar_event
- • delete_calendar_event
- • sync_google_calendar_events
- • test_google_calendar
Call Transfer (2)
- • transfer_to_agent
- • get_available_departments
Enhanced LangGraph Tool Calls
The LangGraph implementation provides intelligent tool calling capabilities with dynamic tool selection and error handling. The agent automatically chooses appropriate tools based on user intent and maintains conversation context for seamless interactions.
Tool Calls Flow Diagram
Tool Calls Flow: LangGraph implementation showing dynamic tool selection and intelligent orchestration of MCP tools.
Tool Calls Features
MCP Integration
- • Database operations via MCP servers
- • Google Calendar synchronization
- • Team collaboration tools
- • Real-time tool discovery (38 tools)
- • Secure tool communication via stdio
- • Lazy loading for performance
Error Handling
- • Graceful tool failure recovery
- • ExceptionGroup unwrapping
- • 20s timeout per tool
- • 30s overall agent timeout
- • Fallback strategies
- • User-friendly error messages
Tool Features: Intelligent tool calling system with error recovery, timeout management, and seamless MCP integration.
Core Tool Calling Capabilities
- • Dynamic Tool Selection: LLM intelligently chooses appropriate tools based on user intent
- • Error Recovery: Graceful handling of tool failures with fallback strategies
- • Context Awareness: Tools access conversation history and maintain state
- • Streaming Responses: Real-time tool execution updates for better user experience
- • Async Execution: Non-blocking tool calls with proper timeout management
- • ExceptionGroup Handling: Unwraps and logs complex async exceptions
JWT Authentication System
Authentication Flow
Security Features
- • Password Hashing: Bcrypt with automatic salt
- • JWT Tokens: HS256 algorithm with secret key
- • Token Expiry: 30 min access, 7 day refresh
- • Authorization: @require_auth decorator
- • Role Validation: @require_role decorator
- • Team Membership: @require_team_member decorator
- • Auto Logout: Frontend handles expired tokens
JWT Token Structure
{
"user_id": "uuid",
"email": "user@example.com",
"roles": ["user"],
"team_id": "uuid",
"type": "access",
"exp": 1728589200, // 30 minutes from issue
"iat": 1728587400 // issued at timestamp
}
New Features Technical Details
Redis Session Management & Caching
Session Management
- • Session creation, retrieval, updates, deletion
- • Audio buffer storage (base64 encoded)
- • User authentication state management
- • Session expiration handling
Real-time Features
- • Pub/Sub notifications for team updates
- • Rate limiting (requests per minute)
- • User activity tracking & analytics
- • Cache invalidation strategies
# Redis connection with environment config
REDIS_URL = os.getenv('REDIS_URL', 'redis://localhost:6379')
# Session data structure
session_data = {
'user_id': 'user-123', 'audio_buffer': 'base64...',
'created_at': timestamp, 'expires_at': ttl
}
Composio External Tool Integration
Supported Platforms
- • Slack: Message sending, channel management
- • GitHub: Repository operations, issue tracking
- • Gmail: Email composition, sending, management
- • Notion: Page creation, database operations
- • Jira: Ticket creation, project management
Integration Features
- • OAuth2 authentication for each platform
- • Robust method discovery for API compatibility
- • Error handling for missing methods
- • Tool orchestration with LangGraph
# Composio tool loading with fallback methods
if hasattr(toolset, 'get_tools'):
tools = toolset.get_tools(apps=["slack"])
elif hasattr(toolset, 'get_actions'):
tools = toolset.get_actions(apps=["slack"])
Audio Stream Player & WebM Support
Audio Processing
- • WebM format detection & handling
- • Base64 audio buffer decoding
- • Real-time audio streaming from Redis
- • Audio format conversion (WebM → WAV fallback)
Player Features
- • Session-based audio playback
- • Audio file download (WebM/WAV)
- • Audio buffer analysis & debugging
- • Flask-SocketIO real-time updates
# WebM format detection
if audio_data.startswith(b'\x1a\x45\xdf\xa3'):
return Response(audio_data, mimetype='audio/webm')
# Audio buffer from Redis
audio_buffer = session_data.get('audio_buffer', '')
audio_data = base64.b64decode(audio_buffer)
API Endpoints
Authentication
POST /api/auth/registerPOST /api/auth/loginPOST /api/auth/refreshGET /api/auth/profilePUT /api/auth/profile
Team Management
POST /api/teams/GET /api/teams/GET /api/teams/{id}POST /api/teams/{id}/membersDELETE /api/teams/{id}/members/{user_id}PUT /api/teams/{id}/members/{user_id}/role
Twilio Voice
POST /convonet_todo/twilio/callPOST /convonet_todo/twilio/process_audioPOST /convonet_todo/twilio/transfer
Webhook URL:
https://hjlees.com/convonet_todo/twilio/call
WebRTC Voice
GET /convonet_todo/webrtc/voice-assistantWebSocket /voiceGET /audio-player/
Features: Deepgram STT transcription, Redis audio buffer storage, Socket.IO real-time communication
Redis Audio Player
GET /audio-player/GET /audio-player/sessionsGET /audio-player/audio/{session_id}WebSocket /audio-stream
Audio Player Dashboard:
Features: View and analyze audio streams from Redis audio buffers, real-time session monitoring, audio playback and download
Team Dashboard
GET /team-dashboardPOST /api/teams/GET /api/teams/{id}/todosGET /api/teams/{id}/calendar
Team Dashboard:
Features: Team collaboration dashboard with todo management, calendar integration, member management, real-time updates, and role-based access control
Demo Credentials: admin@convonet.com / admin123
Call Center Agent Dashboard
A complete browser-based SIP phone client with ACD (Automatic Call Distribution) capabilities, providing enterprise-grade call center management features for handling voice assistant transfers and customer support calls.
Agent Management
- ✓ Secure agent authentication
- ✓ SIP credential management
- ✓ Session management
- ✓ Agent state tracking (Ready/Not Ready/On Call/Wrap Up)
- ✓ Time-in-state tracking
- ✓ Activity logging
Call Handling
- ✓ Incoming call notifications
- ✓ Caller ID display
- ✓ Answer/Reject controls
- ✓ Call hold/unhold
- ✓ Call transfer (blind & attended)
- ✓ Outbound dialing
- ✓ Call duration tracking
Customer Data Popup
- ✓ Automatic customer info display
- ✓ Customer ID & contact info
- ✓ Account status & tier
- ✓ Last contact date
- ✓ Open tickets/cases
- ✓ Lifetime value
- ✓ Agent notes
SIP Integration
- ✓ Browser-based SIP client (JsSIP)
- ✓ WebRTC audio support
- ✓ WebSocket Secure (WSS)
- ✓ RFC 3261 compliant
- ✓ Multiple codec support (G.711, Opus, G.722)
- ✓ NAT traversal (STUN/TURN)
Dashboard Interface
- ✓ Agent status panel
- ✓ Call control panel
- ✓ 12-key dialpad
- ✓ Call history display
- ✓ Real-time status updates
- ✓ Responsive design (desktop/tablet/mobile)
Monitoring & Reporting
- ✓ Agent metrics (calls handled, duration)
- ✓ Call metrics (answer rate, wait time)
- ✓ Real-time monitoring
- ✓ Activity timeline
- ✓ Availability percentage
Agent States
Voice Assistant Transfer Integration
When a user requests to speak with a human agent during a WebRTC voice assistant session, the system automatically transfers the call to the Call Center Agent Dashboard:
API Endpoints
Agent Management
POST /call-center/api/agent/loginPOST /call-center/api/agent/logoutPOST /call-center/api/agent/readyPOST /call-center/api/agent/not-readyGET /call-center/api/agent/status
Call Handling
POST /call-center/api/call/ringingPOST /call-center/api/call/answerPOST /call-center/api/call/dropPOST /call-center/api/call/holdPOST /call-center/api/call/transfer
Customer Data
GET /call-center/api/customer/{id}
Access: /call-center/
Call Center Agent Dashboard:
Browser-based SIP client requiring no installation. Compatible with Chrome, Firefox, Edge, and Safari. Integrates with FusionPBX for call routing and transfer from WebRTC voice assistant sessions.
Twilio Voice Interface
The Convonet assistant provides complete voice interaction through Twilio integration, supporting natural language commands for team collaboration and task management with Amazon Polly.Amy voice and barge-in capabilities for interruption.
Voice Processing Pipeline
Voice Pipeline: Complete end-to-end voice processing from speech input to AI-generated voice response with real-time streaming.
Voice Command Examples
Personal Productivity
- 🗣️ "Create a high priority todo to review the quarterly report"
- 🗣️ "Add a reminder to call the dentist tomorrow at 2 PM"
- 🗣️ "Schedule a meeting for next Friday from 2 to 3 PM"
- 🗣️ "Show me all my pending todos"
- 🗣️ "Mark the grocery shopping todo as completed"
Team Collaboration
- 🗣️ "Create a hackathon team"
- 🗣️ "What teams are available?"
- 🗣️ "Who are the members of the development team?"
- 🗣️ "Create a high priority todo for the dev team"
- 🗣️ "Add admin@convonet.com to the hackathon team as owner"
- 🗣️ "Assign a code review task to John in the dev team"
- 🗣️ "Change john@example.com to admin role in the dev team"
Voice Components
- • Speech-to-Text: Twilio speech recognition engine
- • Text-to-Speech: Amazon Polly.Amy voice
- • Barge-in: Interrupt AI while speaking
- • Speech Timeout: Auto detection (10s)
- • Agent Timeout: 30s processing time
- • Tool Timeout: 20s per MCP tool
- • Continuation: Multi-turn conversations
- • Exit Detection: Natural conversation ending
Processing Flow
- 1. User calls Twilio number
- 2. Twilio webhook triggers /convonet_todo/twilio/call
- 3. TwiML with <Gather> returned (10s timeout)
- 4. User speaks, Twilio recognizes speech
- 5. SpeechResult posted to /process_audio
- 6. LangGraph agent processes (30s timeout)
- 7. MCP tools execute (20s per tool)
- 8. Database & Calendar operations
- 9. Response converted to TwiML
- 10. Polly.Amy speaks response with barge-in
- 11. Conversation continues or ends
Twilio Phone Integration
The Convonet assistant can be accessed via phone calls through Twilio integration, allowing users to interact with the AI assistant through voice calls from any phone number. The system uses TwiML and Gather for speech recognition with barge-in capability.
Twilio Call Flow Architecture
Call Flow: Phone → Twilio → Flask Webhook → LangGraph Agent → MCP Tools → Database/Calendar → TwiML Response
Twilio Components
- • Twilio Voice API: Handles incoming/outgoing calls
- • TwiML: XML-based call flow instructions
- • Gather Element: Collects speech input with barge-in
- • Say Element: Text-to-speech via Polly.Amy
- • Redirect Element: Call continuation logic
- • Webhook Endpoints: Flask routes for call handling
Call Flow Process
- 1. User calls Twilio phone number
- 2. Twilio webhook triggers /twilio/call
- 3. TwiML with <Gather> returned
- 4. Greeting: "Hello! I'm your Convonet productivity assistant"
- 5. User speaks command
- 6. Speech posted to /process_audio
- 7. LangGraph agent processes request
- 8. MCP tools execute database operations
- 9. TwiML response with confirmation
- 10. Call continues or ends based on user
Twilio Configuration
# Environment Variables
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+1234567890
# Webhook Endpoints
POST https://hjlees.com/convonet_todo/twilio/call
POST https://hjlees.com/convonet_todo/twilio/process_audio
# TwiML Response Example
<Response>
<Gather action="/convonet_todo/twilio/process_audio"
method="POST"
input="speech"
speechTimeout="auto"
timeout="10"
bargeIn="true">
<Say voice="Polly.Amy">Hello! I'm your Convonet productivity assistant...</Say>
</Gather>
<Say voice="Polly.Amy">I didn't hear anything. Please try again.</Say>
<Redirect>/convonet_todo/twilio/call?is_continuation=true</Redirect>
</Response>
Twilio Setup Instructions
Step 1: Configure Voice Settings
In Twilio Console → Phone Numbers → Active Numbers → Select your number
Step 2: Set Webhook URL
A CALL COMES IN: https://hjlees.com/convonet_todo/twilio/call (HTTP POST)
Step 3: Test Integration
Call your Twilio number and speak a command
Test Case Demo
Watch a demonstration of the Convonet voice assistant in action, showcasing the complete workflow from user interaction to call transfer to a human agent.
Video Format: M4V (MPEG-4) | Demo: Complete voice assistant workflow including call transfer
Audio Note: If you don't hear sound, please check:
- Ensure the video volume control is not muted (check the volume icon in the video player)
- Check your browser's audio settings and system volume
- Some browsers may have limited support for MP2 audio codec in M4V files
- Try using Chrome or Firefox for best compatibility
🎤 Voice Interaction
Demonstrates natural language voice commands and real-time transcription
🤖 AI Processing
Shows LangGraph agent processing requests and tool calling capabilities
📞 Call Transfer
Illustrates seamless transfer from AI assistant to human agent dashboard