Convonet Sequence Diagram

Step-by-Step Flow with All Components

Detailed Sequence Diagram

Convonet Sequence Diagram

Sequence Phases

Phase 1: Authentication

Steps 1-7

Phase 2: Conversation Loop

Steps 8-31

Phase 3: Transfer Request

Steps 32-38

Phase 4: Twilio Transfer

Steps 39-52

Phase 1: Authentication (Steps 1-7)

User authentication and session creation

1

User Browser → WebSocket Server: Connect WebSocket

User opens the Convonet WebRTC voice assistant UI and establishes a WebSocket connection via Socket.IO to the Flask server.

2

WebSocket Server → PIN Auth: Request Authentication

The WebSocket server requests PIN authentication from the authentication module.

3

PIN Auth → PostgreSQL: Validate PIN

PIN authentication module validates the user's PIN against PostgreSQL database using SQLAlchemy. Note: Authentication is handled via PostgreSQL, not FusionPBX.

4

PostgreSQL → PIN Auth: User Data

PostgreSQL returns user data including user ID, name, and team associations.

5

PIN Auth → Redis: Create Session

A new session is created in Redis with a unique session ID, storing user context and authentication status.

6

PIN Auth → WebSocket Server: Authenticated

Authentication module confirms successful authentication to the WebSocket server.

7

WebSocket Server → User Browser: Session ID

WebSocket server sends the session ID to the user browser. The user is now authenticated and ready to interact with the voice assistant.

Phase 2: Normal Conversation Loop (Steps 8-31)

Audio capture, transcription, AI processing, and response generation

8

User Browser → WebSocket Server: Start Recording

User initiates voice recording in the browser, beginning the audio capture process.

9

WebSocket Server → WebRTC Voice Server: Audio Chunks (WebRTC)

Audio chunks are streamed from the browser via WebRTC through the WebSocket server to the WebRTC Voice Server.

10

WebRTC Voice Server → Redis: Buffer Audio Data

Each session's audio chunks are appended to a Redis buffer (convonet.redis_manager) so downstream workers can read them for processing.

11-12

Redis → Deepgram STT: Send Audio Buffer → Transcribed Text

The WebRTC worker reads the current buffer and forwards it to Deepgram (not Twilio) for high-fidelity transcription. Deepgram returns the recognized text back to the WebRTC service.

Important: Deepgram STT is used for WebRTC audio, not Twilio transcription.

13-15

WebRTC Voice Server → LangGraph → Claude LLM: Process Intent

The transcript enters the LangGraph assistant (Convonet's LangGraph state machine). LangGraph calls Claude (LLM) for reasoning and response generation. LLM returns response and tool calls to LangGraph.

16-21

Tool Execution (If Needed)

If the LLM decides to execute an action, LangGraph invokes registered tools:

  • Database Operations: PostgreSQL queries/updates via SQLAlchemy
  • Calendar Operations: Google Calendar/OAuth workflows
  • PBX Metadata: FusionPBX lookups for call metadata

Tool results are injected back into the LangGraph state, and Claude composes the final reply.

22-24

LangGraph → Claude LLM → LangGraph: Generate Final Response

LangGraph sends context to Claude LLM for final response generation. LLM returns response text to LangGraph, which passes it to the WebRTC Voice Server.

25-31

Deepgram TTS → Redis → WebSocket → User: Audio Response

Final text is synthesized via Deepgram TTS. The audio response is streamed back over Socket.IO to the browser for playback. Audio is buffered in Redis before streaming to ensure smooth playback.

Loop: Steps 8-31 repeat for each user utterance, creating a continuous conversation loop until the user requests transfer or ends the session.

Phase 3: Transfer Request (Steps 32-38)

User requests transfer and system initiates transfer process

30

User Browser → WebSocket Server: "Transfer to Agent"

User says "I need a human" or requests transfer to a human agent.

31-32

WebSocket Server → WebRTC Voice Server → LangGraph: Transfer Intent

Transfer intent is passed through the WebRTC Voice Server to LangGraph for detection and processing.

33-34

LangGraph → Claude LLM: Detect Transfer Intent → Transfer Command

LangGraph detects transfer intent and confirms with Claude LLM. LLM returns a transfer command to LangGraph.

35

LangGraph → Redis: Set Transfer Flag

LangGraph sets a transfer flag in Redis to indicate that a transfer is in progress.

36-38

LangGraph → WebRTC Voice Server → WebSocket Server → User: Transfer Initiated

Transfer initiated signal flows back through the system. WebRTC Voice Server sends transfer event to WebSocket Server, which notifies the user browser. User sees transfer status update.

Phase 4: Twilio Transfer Flow (Steps 39-52)

Call bridging to FusionPBX and agent dashboard connection

39

WebRTC Voice Server → Twilio API: POST /voice_assistant/transfer_bridge

The WebRTC backend calls the Convonet /twilio/voice_assistant/transfer_bridge endpoint. Twilio uses the provided SIP URI: sip:2001@FREEPBX_DOMAIN;transport=udp or trunk number, depending on .env configuration.

40

Twilio → FusionPBX: SIP INVITE to Extension 2001

Twilio dials FusionPBX (running on Google Cloud) via SIP INVITE. FusionPBX routes the call to the target agent extension (e.g., 2001).

41

FusionPBX → Agent Dashboard: Ring Extension 2001

FusionPBX rings extension 2001. The agent dashboard (JsSIP client) registers with FusionPBX over WSS (wss://<fusionpbx>:7443) and receives the incoming call notification.

42-44

Agent Dashboard → PostgreSQL: Fetch User Info → Show User Info Popup

Agent dashboard fetches caller record from PostgreSQL via REST API. Dashboard pops the caller record and shows ringing controls with user information (name, context, call history).

45-48

Agent Answers → Conversation Begins

Agent answers the call via JsSIP client. FusionPBX notifies Twilio that the call is connected. Twilio bridges the audio between user leg and agent leg. Live conversation begins between the user and agent.

Note: Once Twilio bridges the call to FusionPBX, the audio is now a PSTN/SIP leg (not the original WebRTC stream). Deepgram STT may still be used on the server side for context logging if needed.

49-52

Alternative: Agent Rejects/Timeout

If the agent rejects the call or it times out:

  • FusionPBX notifies Twilio that the call failed
  • Twilio sends transfer failed notification to WebRTC Voice Server
  • WebRTC Voice Server sends transfer error event to WebSocket Server
  • User browser displays transfer failed message
Transfer Callback: Convonet logs the transfer outcome via the /transfer_callback webhook for monitoring and Sentry alerts.

Key Sequence Points

Authentication Path

PIN authentication uses PostgreSQL (not FusionPBX). Sessions are stored in Redis for fast access.

Audio Processing

Deepgram STT is used for WebRTC audio transcription, not Twilio. Redis buffers audio chunks for processing.

AI Orchestration

LangGraph coordinates between OpenAI LLM and MCP tools. Tool execution happens before final response generation.

Transfer Mechanism

Twilio bridges the WebRTC user leg to the SIP agent leg. Audio transitions from WebRTC to PSTN/SIP stream.