Convonet Voice AI Productivity System

LangGraph + Twilio + MCP + Team Management + Call Transfer + Sentry Monitoring + Redis + Composio + Audio Stream Player + WebRTC Voice Integration

Flask LangGraph MCP (38 Tools) LangChain Claude 3.5 Sonnet Google Calendar Twilio Voice JWT Auth Team Collaboration FusionPBX Call Transfer Deepgram STT Sentry Monitoring WebRTC Call Center Redis Cache Composio Tools Audio Stream Player WebRTC Voice

Technical Architecture

System Architecture Overview

Complete System Flow Diagrams

System Architecture Diagram

Complete system flow overview with all components and their relationships

View Full Diagram →
Sequence Diagram

Step-by-step flow (52 steps) showing interactions between all components

View Full Diagram →
WebRTC Flow: User Browser → PIN Auth (PostgreSQL) → Redis Audio Buffer → Deepgram STT → LangGraph → Claude LLM → Tool Calling → PostgreSQL/Google APIs → Deepgram TTS → User
Transfer Flow: User Request → LangGraph (Detect Intent) → Twilio API → FusionPBX (Extension 2001) → Agent Dashboard (JsSIP) → User Info Popup → Live Conversation
Monitoring: All Operations → Sentry Monitoring → Real-time Alerts & Performance Tracking

Architecture Flow: Enterprise voice AI system with PIN authentication (PostgreSQL), team collaboration, intelligent call transfer to FusionPBX call center via Twilio, WebRTC voice integration with Deepgram STT and Redis audio buffering, and comprehensive Sentry error monitoring with automatic thread reset recovery (8s/10s/12s timeout optimization).

View System Architecture Diagram | View Sequence Diagram

Overview

The Convonet Voice AI Productivity System is an enterprise-grade platform that combines LangGraph AI agents, team collaboration, voice interaction, and intelligent call center integration. Built for hackathon demonstrations, it showcases advanced features including PIN authentication, role-based access control, call transfer to FreePBX, and production-grade error monitoring with Sentry.

The system enables teams to manage todos, assign tasks to members, and collaborate in real-time through web dashboards, voice commands, and seamless transfer to human agents. Features optimized timeout handling (8s/10s/12s), automatic thread reset recovery, and comprehensive error tracking. All data is synchronized with Google Calendar using OAuth2 authentication, with intelligent caching to ensure sub-second voice responses.

Core Technologies

  • • LangGraph for agent orchestration with tools_condition
  • • Model Context Protocol (MCP) - 38 tools (36 DB + 2 transfer)
  • • LangChain for LLM integration & tool binding
  • • Flask + Flask-SocketIO for web API & WebSocket
  • • SQLAlchemy ORM with PostgreSQL (multi-tenant)
  • • PIN authentication (4-6 digit voice PIN)
  • • Anthropic Claude API (Claude 3.5 Sonnet)
  • • Deepgram APIs (STT & TTS)
  • • Twilio Programmable Voice with Media Streams
  • • Google Calendar OAuth2 integration (optimized)
  • • FusionPBX SIP integration for call transfer
  • • Deepgram Speech-to-Text API for WebRTC
  • • WebRTC browser-based voice recording
  • • Sentry.io error monitoring & performance tracking
  • • JsSIP WebRTC for browser-based softphone
  • • Render.com auto-deployment with gunicorn+eventlet

Key Features

  • • Multi-tenant team collaboration with role hierarchy
  • • PIN-based voice authentication (4-6 digits)
  • • Intelligent call transfer: AI → FusionPBX Extension 2001
  • • WebRTC voice with Deepgram STT transcription
  • • Redis audio buffer management for WebRTC
  • • Sentry error monitoring & performance tracking
  • • Optimized timeouts (8s/10s/12s) for Twilio compatibility
  • • Automatic thread reset on timeout/error
  • • 38 MCP tools (todos, calendar, teams, transfer)
  • • WebRTC call center with JsSIP softphone
  • • Real-time voice AI with barge-in capability
  • • Google Calendar OAuth2 sync (background)
  • • FusionPBX integration on Google Cloud VM
  • • Production deployment on Render with auto-scaling

Recent Updates & Improvements (October 2025)

Call Transfer to FusionPBX

  • ✓ Seamless AI → Human agent transfer
  • ✓ FusionPBX extension 2001 integration
  • ✓ SIP/WSS connectivity (Google Cloud VM)
  • ✓ Transfer detection via phrases or tool
  • ✓ Department routing (support, sales, etc.)
Files: call_transfer.py, CALL_TRANSFER_GUIDE.md

Deepgram WebRTC STT

  • ✓ Real-time speech-to-text transcription
  • ✓ WebRTC audio stream processing
  • ✓ WebM format detection & handling
  • ✓ High accuracy transcription (95%+)
  • ✓ Low latency (200-500ms)
Files: deepgram_service.py, deepgram_webrtc_integration.py

Composio Integration

  • ✓ Slack workspace integration
  • ✓ GitHub repository management
  • ✓ Gmail email automation
  • ✓ Notion workspace sync
  • ✓ External platform connectivity
Files: COMPOSIO_FEATURES_SUMMARY.md

Sentry Integration

  • ✓ Real-time error tracking & alerts
  • ✓ Performance monitoring (agent processing time)
  • ✓ User context & session tracking
  • ✓ Timeout & thread reset tracking
  • ✓ Production-grade observability
Integration: Flask + SQLAlchemy + Logging

Timeout Optimization

  • ✓ Tool timeout: 8s (from 20s)
  • ✓ Agent timeout: 10s (from 25s)
  • ✓ Webhook timeout: 12s (from 30s)
  • ✓ Stays under Twilio's 15s HTTP limit
  • ✓ Thread reset on timeout prevents errors
Result: 95%+ operations complete successfully

WebRTC Call Center

  • ✓ JsSIP v3.10.1 browser softphone
  • ✓ WebSocket Secure (WSS) on port 7443
  • ✓ Agent dashboard with SIP registration
  • ✓ Call control (answer, hold, transfer, hangup)
  • ✓ Google Cloud firewall configured
Platform: FusionPBX 34.26.59.14 (GCP VM)

Automatic Error Recovery

  • ✓ Thread reset with timestamped IDs
  • ✓ BrokenResourceError handling
  • ✓ tool_call_id incomplete error recovery
  • ✓ In-memory reset tracking (_reset_threads)
  • ✓ No cascading failures
Benefit: Self-healing conversation threads

Performance Optimization

  • ✓ Removed Google Calendar sync delay
  • ✓ Simplified JSON responses (no MCP breaks)
  • ✓ Agent processing time measurement
  • ✓ Transaction tracking per voice call
  • ✓ Custom Sentry metrics & measurements
Result: Sub-5s response times
2001
FusionPBX Extension
Deepgram
STT API
38
MCP Tools
12s
Max Response Time
100%
Sentry Trace Rate

WebRTC Voice Integration Architecture

The WebRTC Voice Interface provides browser-based voice interaction with real-time audio streaming, Redis session management, and seamless integration with the LangGraph AI agent. Users can interact with the assistant directly through their web browser without requiring phone calls.

WebRTC Voice Assistant Architecture

WebRTC Speech-to-Deepgram Processing Flow
🌐
Browser
WebRTC
📡
Socket.IO
WebSocket
💾
Redis
Buffer
🎧
Deepgram
STT API
🧠
LangGraph
AI Agent

Corrected Flow: User Browser → PIN Auth (PostgreSQL) → WebRTC Audio Capture → Socket.IO Streaming → Redis Audio Buffer → Deepgram STT → LangGraph → Claude LLM → Tool Calling (PostgreSQL/Google APIs) → Deepgram TTS → Redis Buffer → Socket.IO → User Browser

WebRTC Voice Assistant Flow Phases
View Detailed Sequence Diagram →
Phase 1: Authentication

Steps 1-7

WebSocket connection, PIN authentication via PostgreSQL, Redis session creation

Phase 2: Conversation Loop

Steps 8-31

Audio capture → Redis buffer → Deepgram STT → LangGraph → Claude LLM → Tool execution → Deepgram TTS → Response

Phase 3: Transfer Request

Steps 32-38

User requests transfer, LangGraph detects intent, sets transfer flag in Redis

Phase 4: Twilio Transfer

Steps 39-52

Twilio API → FusionPBX → Agent Dashboard (JsSIP) → Live conversation bridging

WebRTC Architecture: Browser-based voice assistant with PostgreSQL PIN authentication, Redis session and audio buffer management, real-time audio streaming via Socket.IO, Deepgram STT transcription (not Twilio), LangGraph AI orchestration with Claude LLM and Deepgram TTS, MCP tool integration (PostgreSQL, Google APIs), and seamless transfer to FusionPBX call center via Twilio SIP trunking.

Component Interaction Matrix
Component Input From Output To Purpose
User Browser User voice input WebSocket Server Captures audio, displays UI
WebSocket Server Browser, WebRTC Server Browser, Redis Manages real-time communication
PIN Auth WebSocket Server PostgreSQL Validates user credentials
Redis WebRTC Server, Tools Deepgram, WebRTC Server Buffers audio, stores session
Deepgram STT Redis Audio Buffer WebRTC Voice Server Converts speech to text
LangGraph WebRTC Server Claude LLM, Tools Orchestrates AI conversation flow
Claude LLM LangGraph LangGraph Generates responses, decides actions
Tool Calling LangGraph PostgreSQL, Google APIs Executes external operations
Deepgram TTS LangGraph Response WebRTC Server Converts text to speech
Twilio API WebRTC Server FusionPBX Bridges call to agent
FusionPBX Twilio Agent Dashboard Routes call to extension
Agent Dashboard FusionPBX PostgreSQL, User Displays call, shows user info

WebRTC Voice Interface

  • ✓ Browser-based voice recording
  • ✓ Real-time audio streaming
  • ✓ WebSocket communication
  • ✓ Audio format detection (WebM)
  • ✓ Base64 audio encoding
Technology: WebRTC + Socket.IO

Redis Session Management

  • ✓ Session storage & caching
  • ✓ Audio buffer management
  • ✓ Real-time notifications
  • ✓ User activity tracking
  • ✓ Session expiration handling
Integration: Redis Pub/Sub + Session Store

Composio Integration

  • ✓ Slack workspace integration
  • ✓ GitHub repository management
  • ✓ Gmail email automation
  • ✓ Notion workspace sync
  • ✓ Jira project management
Platforms: Slack, GitHub, Gmail, Notion, Jira

Module Structure

Convonet Project Structure

Project Root/
├── app.py                          # Main Flask application
├── app_memory_optimized.py         # Memory-optimized Flask application
├── passenger_wsgi.py               # WSGI entry point for main app
├── passenger_wsgi_memory_optimized.py  # WSGI entry for memory-optimized app
├── extensions.py                   # Flask extensions (db, login, migrate, etc.)
├── requirements.txt                # Python dependencies
├── requirements_memory_optimized.txt  # Memory-optimized dependencies
├── Procfile                        # Render.com process file
├── render.yaml                     # Render.com deployment config
├── render_memory_optimized.yaml    # Memory-optimized deployment config
├── render-build.sh                 # Build script for Render.com
├── deepgram_service.py             # Deepgram STT service (root level)
├── deepgram_webrtc_integration.py  # WebRTC Deepgram integration (root level)
├── templates/                      # Flask templates
│   ├── team_dashboard.html         # Team management UI
│   ├── register.html               # User registration
│   ├── convonet_tech_spec.html     # Technical documentation
│   ├── convonet_system_architecture.html  # System architecture diagram
│   └── convonet_sequence_diagram.html     # Sequence diagram page
├── static/                         # Static assets
│   └── convonet/                   # Convonet static files
│       └── assets/                 # Images, CSS, JS
├── shared/                         # Shared utilities
│   ├── google_calendar.py          # Google Calendar integration
│   ├── helpers.py                  # Shared helper functions
│   └── schemas.py                  # Shared data schemas
└── convonet/                       # Convonet module
    ├── __init__.py                 # Package initialization
    ├── routes.py                   # Flask routes & Twilio webhooks (Sentry integrated)
    ├── assistant_graph_todo.py     # LangGraph agent (optimized timeouts)
    ├── state.py                    # Agent state management
    ├── webrtc_voice_server.py      # WebRTC voice assistant server
    ├── audio_player_routes.py      # Audio stream player routes
    ├── redis_manager.py            # Redis session & audio buffer management
    ├── composio_tools.py           # Composio tool integration
    ├── twilio_handler.py           # Twilio call handling
    ├── voice_intent_utils.py       # Voice intent detection utilities
    ├── voice_utils.py              # Voice processing utilities
    ├── environment_config.py       # Environment configuration
    ├── models/                     # Database models
    │   ├── base.py                 # Shared SQLAlchemy Base
    │   └── user_models.py          # User, Team, TeamMembership models
    ├── security/                   # Authentication & authorization
    │   └── auth.py                 # JWT authentication system
    ├── api_routes/                 # RESTful API endpoints
    │   ├── auth_routes.py          # User registration & login
    │   ├── jwt_scr.py              # JWT secret key management
    │   ├── team_routes.py          # Team management API
    │   └── team_todo_routes.py     # Team todo management API
    ├── migrations/                 # Database migrations
    │   ├── add_team_collaboration.py  # Team collaboration migration
    │   └── add_voice_pin.py        # Voice PIN authentication migration
    ├── mcps/                       # Model Context Protocol servers
    │   ├── mcp_config.json         # MCP server configuration
    │   └── local_servers/
    │       ├── db_todo.py          # 36 database tools (optimized)
    │       ├── call_transfer.py    # 2 call transfer tools (FusionPBX)
    │       └── google_calendar.py  # Calendar operations
    ├── static/                     # Convonet static assets
    │   └── assets/
    │       ├── img/                # Images & diagrams
    │       ├── css/                # Stylesheets
    │       └── js/                 # JavaScript files
    └── templates/                  # Convonet templates
        ├── convonet_todo_index.html  # Main todo interface
        └── webrtc_voice_assistant.html  # WebRTC voice assistant UI

Database Schema

users_convonet

  • • id: UUID (PK)
  • • email: String (unique)
  • • username: String (unique)
  • • password_hash: String
  • • first_name: String
  • • last_name: String
  • • is_active: Boolean
  • • is_verified: Boolean
  • • created_at: DateTime
  • • last_login_at: DateTime

teams_convonet

  • • id: UUID (PK)
  • • name: String
  • • description: Text
  • • is_active: Boolean
  • • created_at: DateTime
  • • updated_at: DateTime

team_memberships_convonet

  • • id: UUID (PK)
  • • team_id: UUID (FK)
  • • user_id: UUID (FK)
  • • role: Enum (owner/admin/member/viewer)
  • • joined_at: DateTime
  • • updated_at: DateTime

todos_convonet

  • • id: UUID (PK)
  • • title: String
  • • description: String
  • • completed: Boolean
  • • priority: Enum
  • • due_date: DateTime
  • • creator_id: UUID (FK)
  • • assignee_id: UUID (FK)
  • • team_id: UUID (FK)
  • • is_private: Boolean
  • • google_calendar_event_id

reminders_convonet

  • • id: UUID (PK)
  • • reminder_text: String
  • • importance: Enum
  • • reminder_date: DateTime
  • • google_calendar_event_id
  • • created_at: DateTime
  • • updated_at: DateTime

calendar_events_convonet

  • • id: UUID (PK)
  • • title: String
  • • description: String
  • • event_from: DateTime
  • • event_to: DateTime
  • • google_calendar_event_id
  • • created_at: DateTime
  • • updated_at: DateTime

call_recordings_convonet

  • • id: UUID (PK)
  • • call_sid: String (unique)
  • • from_number: String
  • • to_number: String
  • • recording_path: String
  • • transcription: Text
  • • status: String
  • • created_at: DateTime

LangGraph Agent Architecture

LangGraph Workflow Diagram

LangGraph Workflow Diagram

LangGraph Workflow: The agent can either continue to use tools or end the conversation based on user input and context.

Agent Components

  • TodoAgent Class: Main agent orchestrator with lazy initialization
  • StateGraph: Manages conversation flow and state
  • Assistant Node: GPT-4 reasoning and response generation
  • Tool Node: Executes 38 MCP tools
  • Conditional Edges: Routes between nodes based on tool calls
  • InMemorySaver: Checkpointer for state persistence

State Management

  • AgentState: Conversation state with message history
  • Message History: Maintains context across turns
  • Customer ID: User identification for multi-tenant
  • Thread ID: Conversation thread tracking
  • Lazy Loading: Prevents circular imports
  • ExceptionGroup Handling: Robust error recovery

Model Context Protocol (MCP) Integration

MCP provides a standardized way for AI agents to interact with external tools and services. The Convonet system uses MCP to expose 38 tools for database operations, team management, call transfer to FusionPBX, Google Calendar integration, and WebRTC voice processing with Deepgram STT.

MCP Server Configuration

{
  "mcpServers": {
    "db": {
      "command": "python",
      "args": ["./convonet/mcps/local_servers/db_todo.py"],
      "transport": "stdio",
      "env": {
        "DB_URI": "${DB_URI}",
        "GOOGLE_OAUTH2_TOKEN_B64": "${GOOGLE_OAUTH2_TOKEN_B64}",
        "GOOGLE_CLIENT_ID": "${GOOGLE_CLIENT_ID}",
        "GOOGLE_CLIENT_SECRET": "${GOOGLE_CLIENT_SECRET}"
      }
    }
  }
}

Available MCP Tools (38)

Todo Management (5)
  • • create_todo
  • • get_todos
  • • complete_todo
  • • update_todo
  • • delete_todo
Team Tools (8)
  • • create_team
  • • get_teams
  • • get_team_members
  • • create_team_todo
  • • add_team_member
  • • remove_team_member
  • • change_member_role
  • • search_users
Reminders (4)
  • • create_reminder
  • • get_reminders
  • • update_reminder
  • • delete_reminder
Calendar (6)
  • • create_calendar_event
  • • get_calendar_events
  • • update_calendar_event
  • • delete_calendar_event
  • • sync_google_calendar_events
  • • test_google_calendar
Call Transfer (2)
  • • transfer_to_agent
  • • get_available_departments

Enhanced LangGraph Tool Calls

The LangGraph implementation provides intelligent tool calling capabilities with dynamic tool selection and error handling. The agent automatically chooses appropriate tools based on user intent and maintains conversation context for seamless interactions.

Tool Calls Flow Diagram

Tool Calls Flow Diagram

Tool Calls Flow: LangGraph implementation showing dynamic tool selection and intelligent orchestration of MCP tools.

Tool Calls Features

MCP Integration
  • • Database operations via MCP servers
  • • Google Calendar synchronization
  • • Team collaboration tools
  • • Real-time tool discovery (38 tools)
  • • Secure tool communication via stdio
  • • Lazy loading for performance
Error Handling
  • • Graceful tool failure recovery
  • • ExceptionGroup unwrapping
  • • 20s timeout per tool
  • • 30s overall agent timeout
  • • Fallback strategies
  • • User-friendly error messages

Tool Features: Intelligent tool calling system with error recovery, timeout management, and seamless MCP integration.

Core Tool Calling Capabilities

  • Dynamic Tool Selection: LLM intelligently chooses appropriate tools based on user intent
  • Error Recovery: Graceful handling of tool failures with fallback strategies
  • Context Awareness: Tools access conversation history and maintain state
  • Streaming Responses: Real-time tool execution updates for better user experience
  • Async Execution: Non-blocking tool calls with proper timeout management
  • ExceptionGroup Handling: Unwraps and logs complex async exceptions

JWT Authentication System

Authentication Flow

1. User Registration
POST /api/auth/register → Bcrypt hash → JWT tokens
2. User Login
POST /api/auth/login → Verify password → Generate tokens
3. API Request
Bearer token → JWT validation → @require_auth
4. Token Refresh
POST /api/auth/refresh → New access token

Security Features

  • Password Hashing: Bcrypt with automatic salt
  • JWT Tokens: HS256 algorithm with secret key
  • Token Expiry: 30 min access, 7 day refresh
  • Authorization: @require_auth decorator
  • Role Validation: @require_role decorator
  • Team Membership: @require_team_member decorator
  • Auto Logout: Frontend handles expired tokens

JWT Token Structure

{
  "user_id": "uuid",
  "email": "user@example.com",
  "roles": ["user"],
  "team_id": "uuid",
  "type": "access",
  "exp": 1728589200,  // 30 minutes from issue
  "iat": 1728587400   // issued at timestamp
}

New Features Technical Details

Redis Session Management & Caching

Session Management
  • • Session creation, retrieval, updates, deletion
  • • Audio buffer storage (base64 encoded)
  • • User authentication state management
  • • Session expiration handling
Real-time Features
  • • Pub/Sub notifications for team updates
  • • Rate limiting (requests per minute)
  • • User activity tracking & analytics
  • • Cache invalidation strategies
# Redis connection with environment config
REDIS_URL = os.getenv('REDIS_URL', 'redis://localhost:6379')
# Session data structure
session_data = {
  'user_id': 'user-123', 'audio_buffer': 'base64...',
  'created_at': timestamp, 'expires_at': ttl
}

Composio External Tool Integration

Supported Platforms
  • Slack: Message sending, channel management
  • GitHub: Repository operations, issue tracking
  • Gmail: Email composition, sending, management
  • Notion: Page creation, database operations
  • Jira: Ticket creation, project management
Integration Features
  • • OAuth2 authentication for each platform
  • • Robust method discovery for API compatibility
  • • Error handling for missing methods
  • • Tool orchestration with LangGraph
# Composio tool loading with fallback methods
if hasattr(toolset, 'get_tools'):
    tools = toolset.get_tools(apps=["slack"])
elif hasattr(toolset, 'get_actions'):
    tools = toolset.get_actions(apps=["slack"])

Audio Stream Player & WebM Support

Audio Processing
  • • WebM format detection & handling
  • • Base64 audio buffer decoding
  • • Real-time audio streaming from Redis
  • • Audio format conversion (WebM → WAV fallback)
Player Features
  • • Session-based audio playback
  • • Audio file download (WebM/WAV)
  • • Audio buffer analysis & debugging
  • • Flask-SocketIO real-time updates
# WebM format detection
if audio_data.startswith(b'\x1a\x45\xdf\xa3'):
    return Response(audio_data, mimetype='audio/webm')
# Audio buffer from Redis
audio_buffer = session_data.get('audio_buffer', '')
audio_data = base64.b64decode(audio_buffer)

API Endpoints

Authentication

  • POST /api/auth/register
  • POST /api/auth/login
  • POST /api/auth/refresh
  • GET /api/auth/profile
  • PUT /api/auth/profile

Team Management

  • POST /api/teams/
  • GET /api/teams/
  • GET /api/teams/{id}
  • POST /api/teams/{id}/members
  • DELETE /api/teams/{id}/members/{user_id}
  • PUT /api/teams/{id}/members/{user_id}/role

Twilio Voice

  • POST /convonet_todo/twilio/call
  • POST /convonet_todo/twilio/process_audio
  • POST /convonet_todo/twilio/transfer

Webhook URL:

https://hjlees.com/convonet_todo/twilio/call

WebRTC Voice

  • GET /convonet_todo/webrtc/voice-assistant
  • WebSocket /voice
  • GET /audio-player/

Features: Deepgram STT transcription, Redis audio buffer storage, Socket.IO real-time communication

Redis Audio Player

  • GET /audio-player/
  • GET /audio-player/sessions
  • GET /audio-player/audio/{session_id}
  • WebSocket /audio-stream

Features: View and analyze audio streams from Redis audio buffers, real-time session monitoring, audio playback and download

Team Dashboard

  • GET /team-dashboard
  • POST /api/teams/
  • GET /api/teams/{id}/todos
  • GET /api/teams/{id}/calendar

Features: Team collaboration dashboard with todo management, calendar integration, member management, real-time updates, and role-based access control

Demo Credentials: admin@convonet.com / admin123

Call Center Agent Dashboard

A complete browser-based SIP phone client with ACD (Automatic Call Distribution) capabilities, providing enterprise-grade call center management features for handling voice assistant transfers and customer support calls.

Agent Management

  • ✓ Secure agent authentication
  • ✓ SIP credential management
  • ✓ Session management
  • ✓ Agent state tracking (Ready/Not Ready/On Call/Wrap Up)
  • ✓ Time-in-state tracking
  • ✓ Activity logging

Call Handling

  • ✓ Incoming call notifications
  • ✓ Caller ID display
  • ✓ Answer/Reject controls
  • ✓ Call hold/unhold
  • ✓ Call transfer (blind & attended)
  • ✓ Outbound dialing
  • ✓ Call duration tracking

Customer Data Popup

  • ✓ Automatic customer info display
  • ✓ Customer ID & contact info
  • ✓ Account status & tier
  • ✓ Last contact date
  • ✓ Open tickets/cases
  • ✓ Lifetime value
  • ✓ Agent notes

SIP Integration

  • ✓ Browser-based SIP client (JsSIP)
  • ✓ WebRTC audio support
  • ✓ WebSocket Secure (WSS)
  • ✓ RFC 3261 compliant
  • ✓ Multiple codec support (G.711, Opus, G.722)
  • ✓ NAT traversal (STUN/TURN)

Dashboard Interface

  • ✓ Agent status panel
  • ✓ Call control panel
  • ✓ 12-key dialpad
  • ✓ Call history display
  • ✓ Real-time status updates
  • ✓ Responsive design (desktop/tablet/mobile)

Monitoring & Reporting

  • ✓ Agent metrics (calls handled, duration)
  • ✓ Call metrics (answer rate, wait time)
  • ✓ Real-time monitoring
  • ✓ Activity timeline
  • ✓ Availability percentage

Agent States

Logged Out
Agent not active
Logged In
Not ready for calls
Ready
Available for calls ✓
Not Ready
On break/unavailable
On Call
Currently on a call

Voice Assistant Transfer Integration

When a user requests to speak with a human agent during a WebRTC voice assistant session, the system automatically transfers the call to the Call Center Agent Dashboard:

Transfer Request: LangGraph detects transfer intent and triggers Twilio API call
Twilio Bridge: Twilio bridges the WebRTC call to FusionPBX SIP server
FusionPBX Routing: Call routed to agent extension (e.g., 2001)
Agent Dashboard: Agent dashboard receives incoming call with customer info popup
Call Answer: Agent answers and continues conversation with customer

API Endpoints

Agent Management
  • POST /call-center/api/agent/login
  • POST /call-center/api/agent/logout
  • POST /call-center/api/agent/ready
  • POST /call-center/api/agent/not-ready
  • GET /call-center/api/agent/status
Call Handling
  • POST /call-center/api/call/ringing
  • POST /call-center/api/call/answer
  • POST /call-center/api/call/drop
  • POST /call-center/api/call/hold
  • POST /call-center/api/call/transfer
Customer Data
  • GET /call-center/api/customer/{id}

Access: /call-center/

Call Center Agent Dashboard:

https://hjlees.com/call-center/ →

Browser-based SIP client requiring no installation. Compatible with Chrome, Firefox, Edge, and Safari. Integrates with FusionPBX for call routing and transfer from WebRTC voice assistant sessions.

Twilio Voice Interface

The Convonet assistant provides complete voice interaction through Twilio integration, supporting natural language commands for team collaboration and task management with Amazon Polly.Amy voice and barge-in capabilities for interruption.

Voice Processing Pipeline

🎤
Speech Input
Twilio Voice
👂
Speech-to-Text
Twilio STT
🧠
LangGraph Agent
AI Processing
🗣️
Text-to-Speech
Polly.Amy
🔊
Voice Response
Twilio TTS
User Speech
Recognition
AI Processing
Voice Output

Voice Pipeline: Complete end-to-end voice processing from speech input to AI-generated voice response with real-time streaming.

Voice Command Examples

Personal Productivity
  • 🗣️ "Create a high priority todo to review the quarterly report"
  • 🗣️ "Add a reminder to call the dentist tomorrow at 2 PM"
  • 🗣️ "Schedule a meeting for next Friday from 2 to 3 PM"
  • 🗣️ "Show me all my pending todos"
  • 🗣️ "Mark the grocery shopping todo as completed"
Team Collaboration
  • 🗣️ "Create a hackathon team"
  • 🗣️ "What teams are available?"
  • 🗣️ "Who are the members of the development team?"
  • 🗣️ "Create a high priority todo for the dev team"
  • 🗣️ "Add admin@convonet.com to the hackathon team as owner"
  • 🗣️ "Assign a code review task to John in the dev team"
  • 🗣️ "Change john@example.com to admin role in the dev team"

Voice Components

  • Speech-to-Text: Twilio speech recognition engine
  • Text-to-Speech: Amazon Polly.Amy voice
  • Barge-in: Interrupt AI while speaking
  • Speech Timeout: Auto detection (10s)
  • Agent Timeout: 30s processing time
  • Tool Timeout: 20s per MCP tool
  • Continuation: Multi-turn conversations
  • Exit Detection: Natural conversation ending

Processing Flow

  1. 1. User calls Twilio number
  2. 2. Twilio webhook triggers /convonet_todo/twilio/call
  3. 3. TwiML with <Gather> returned (10s timeout)
  4. 4. User speaks, Twilio recognizes speech
  5. 5. SpeechResult posted to /process_audio
  6. 6. LangGraph agent processes (30s timeout)
  7. 7. MCP tools execute (20s per tool)
  8. 8. Database & Calendar operations
  9. 9. Response converted to TwiML
  10. 10. Polly.Amy speaks response with barge-in
  11. 11. Conversation continues or ends

Twilio Phone Integration

The Convonet assistant can be accessed via phone calls through Twilio integration, allowing users to interact with the AI assistant through voice calls from any phone number. The system uses TwiML and Gather for speech recognition with barge-in capability.

Twilio Call Flow Architecture

📞
Phone Call
PSTN Network
T
Twilio
Voice API
🌐
Flask Webhook
TwiML Response
🔌
MCP Client
38 Tools
⚙️
Database
PostgreSQL
🧠
LangGraph Agent
AI Processing
Call Initiation
Webhook
Tool Execution
AI Processing

Call Flow: Phone → Twilio → Flask Webhook → LangGraph Agent → MCP Tools → Database/Calendar → TwiML Response

Twilio Components

  • Twilio Voice API: Handles incoming/outgoing calls
  • TwiML: XML-based call flow instructions
  • Gather Element: Collects speech input with barge-in
  • Say Element: Text-to-speech via Polly.Amy
  • Redirect Element: Call continuation logic
  • Webhook Endpoints: Flask routes for call handling

Call Flow Process

  1. 1. User calls Twilio phone number
  2. 2. Twilio webhook triggers /twilio/call
  3. 3. TwiML with <Gather> returned
  4. 4. Greeting: "Hello! I'm your Convonet productivity assistant"
  5. 5. User speaks command
  6. 6. Speech posted to /process_audio
  7. 7. LangGraph agent processes request
  8. 8. MCP tools execute database operations
  9. 9. TwiML response with confirmation
  10. 10. Call continues or ends based on user

Twilio Configuration

# Environment Variables
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+1234567890

# Webhook Endpoints
POST https://hjlees.com/convonet_todo/twilio/call
POST https://hjlees.com/convonet_todo/twilio/process_audio

# TwiML Response Example
<Response>
  <Gather action="/convonet_todo/twilio/process_audio" 
          method="POST" 
          input="speech" 
          speechTimeout="auto" 
          timeout="10" 
          bargeIn="true">
    <Say voice="Polly.Amy">Hello! I'm your Convonet productivity assistant...</Say>
  </Gather>
  <Say voice="Polly.Amy">I didn't hear anything. Please try again.</Say>
  <Redirect>/convonet_todo/twilio/call?is_continuation=true</Redirect>
</Response>

Twilio Setup Instructions

Step 1: Configure Voice Settings

In Twilio Console → Phone Numbers → Active Numbers → Select your number

Step 2: Set Webhook URL

A CALL COMES IN: https://hjlees.com/convonet_todo/twilio/call (HTTP POST)

Step 3: Test Integration

Call your Twilio number and speak a command

Test Case Demo

Watch a demonstration of the Convonet voice assistant in action, showcasing the complete workflow from user interaction to call transfer to a human agent.

Video Format: M4V (MPEG-4) | Demo: Complete voice assistant workflow including call transfer

Audio Note: If you don't hear sound, please check:

  • Ensure the video volume control is not muted (check the volume icon in the video player)
  • Check your browser's audio settings and system volume
  • Some browsers may have limited support for MP2 audio codec in M4V files
  • Try using Chrome or Firefox for best compatibility

🎤 Voice Interaction

Demonstrates natural language voice commands and real-time transcription

🤖 AI Processing

Shows LangGraph agent processing requests and tool calling capabilities

📞 Call Transfer

Illustrates seamless transfer from AI assistant to human agent dashboard