Real-time two-way voice calls with 46elks WebSocket

Overview

When someone calls your 46elks number, 46elks opens a WebSocket connection to your server. Your server receives audio from the caller, forwards it to an AI model, and streams the model's response back — in real time.

Caller  ↔  46elks  ↔  Your server (WebSocket)  ↔  AI model

Prerequisites

A 46elks account.
Don't have one? Sign up here.
A voice-enabled 46elks phone number.
Allocate one in your dashboard.

A free 46elks websocket-number.

Allocate one in your dashboard.

A public server with an open TCP port.
Any VPS will do — the server must be reachable from the internet.
Python 3.10+ with websockets and python-dotenv.

1. Point your numbers at your server

In the 46elks dashboard, set the virtual number's voice_start to {"connect":"YOUR-WEBSOCKET-NUMBER"}.

Then set the websocket-number's voice_start to a WebSocket URL pointing at your server:

ws://YOUR-SERVER-IP:8095

When someone calls, 46elks will connect to this address. If the connection fails (server down, port closed, wrong URL), the caller hears a busy tone.

2. Understand the message protocol

All messages are JSON. The field t identifies the message type. The session follows a strict lifecycle:

46elks sends hello — call has started.
Your server declares audio formats with sending and listening.
Audio streams bidirectionally via audio messages.
Either side sends bye to end the call.

3. Message reference

From 46elks → your server

Type	Fields	Description
`hello`	`callid`, `from`, `to`	Call started.
`audio`	`data` (base64)	Audio from caller.
`sync`	—	Buffer checkpoint acknowledgment.
`bye`	`reason` (done / hangup / error)	Call ended.

From your server → 46elks

Type	Fields	Description
`sending`	`format`	Declare outbound audio format (agent → caller).
`listening`	`format`	Declare inbound audio format (caller → agent).
`audio`	`data` (base64)	Audio to play to caller.
`interrupt`	—	Clear playback buffer. Must send `sending` again before resuming.
`sync`	—	Request buffer checkpoint.
`bye`	—	End the call.

Supported audio formats

Format string	Description
`pcm_8000` / `pcm_16000` / `pcm_24000`	16-bit PCM mono.
`alaw` / `ulaw`	G.711, 8 kHz.
`g722`	Wideband, 16 kHz.
`ogg`	Opus in Ogg container.
`wav` / `mp3`	Outbound (sending) only.

4. Prompt for Claude or Lovable

Paste the prompt below into Claude, Lovable, or any AI assistant to generate a custom voice agent. Fill in your assistant's task at the bottom.

Build a Python WebSocket server that accepts incoming phone calls from 46elks and bridges audio to OpenAI Realtime API.

46elks WebSocket protocol (use these exact field names):

All messages are JSON with field t (not type) as the message type.
First message from 46elks: {"t": "hello", "callid": "…", "from": "…", "to": "…"}
Declare outbound format: {"t": "sending", "format": "pcm_24000"}
Declare inbound format: {"t": "listening", "format": "pcm_24000"}
Audio from 46elks: {"t": "audio", "data": "<base64>"}
Send audio to 46elks: {"t": "audio", "data": "<base64>"}
Call ended: {"t": "bye", "reason": "hangup"}

OpenAI Realtime API:

Model: gpt-4o-realtime-preview
Header: OpenAI-Beta: realtime=v1
Use input_audio_format: "pcm16" and output_audio_format: "pcm16" — these match pcm_24000.
Forward caller audio via input_audio_buffer.append.
Play AI responses when response.audio.delta arrives.

The assistant's task: [describe what your assistant should do]
Listen on port 8095. Use websockets and python-dotenv.

5. Example: bridge to OpenAI Realtime

The example below creates a WebSocket server that accepts calls from 46elks and bridges audio to the OpenAI Realtime API. Use pcm_24000 on the 46elks side — it matches OpenAI's PCM16 format at 24 kHz.

#!/usr/bin/env python3
# pip install websockets python-dotenv requests

import asyncio, json, logging, os, sys
import requests, websockets
from websockets.asyncio.client import connect as ws_connect
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
OPENAI_MODEL   = "gpt-4o-realtime-preview"
CODEC          = "pcm_24000"

log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")

INSTRUCTIONS = "You are a helpful voice assistant. Keep answers short and clear."


async def handle_call(elks_ws):
    # 1. Receive hello from 46elks
    data = json.loads(await elks_ws.recv())
    if data.get("t") != "hello":
        log.error("Expected hello, got: %s", data)
        return

    call_id = data.get("callid", "?")
    caller  = data.get("from", "?")
    log.info("Call %s from %s", call_id, caller)

    # 2. Connect to OpenAI Realtime
    async with ws_connect(
        f"wss://api.openai.com/v1/realtime?model={OPENAI_MODEL}",
        additional_headers={
            "Authorization": f"Bearer {OPENAI_API_KEY}",
            "OpenAI-Beta": "realtime=v1",
        },
    ) as openai_ws:

        # Configure session
        await openai_ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "modalities": ["audio", "text"],
                "instructions": INSTRUCTIONS,
                "voice": "shimmer",
                "input_audio_format":  "pcm16",
                "output_audio_format": "pcm16",
                "turn_detection": {
                    "type": "server_vad",
                    "threshold": 0.9,
                    "silence_duration_ms": 800,
                },
            },
        }))

        # Send greeting
        await openai_ws.send(json.dumps({
            "type": "response.create",
            "response": {
                "modalities": ["audio", "text"],
                "instructions": "Greet the caller briefly.",
            },
        }))

        # 3. Declare audio formats to 46elks
        await elks_ws.send(json.dumps({"t": "sending",   "format": CODEC}))
        await elks_ws.send(json.dumps({"t": "listening", "format": CODEC}))

        is_speaking = False

        # 4a. Caller audio -> OpenAI
        async def elks_to_openai():
            async for message in elks_ws:
                msg = json.loads(message)
                if msg.get("t") == "audio" and not is_speaking:
                    await openai_ws.send(json.dumps({
                        "type": "input_audio_buffer.append",
                        "audio": msg["data"],
                    }))
                elif msg.get("t") == "bye":
                    log.info("Call ended: %s", msg.get("reason"))
                    break

        # 4b. OpenAI audio -> caller
        async def openai_to_elks():
            nonlocal is_speaking
            async for message in openai_ws:
                msg = json.loads(message)
                t   = msg.get("type")
                if   t == "response.created":       is_speaking = True
                elif t == "response.done":           is_speaking = False
                elif t == "response.audio.delta":
                    await elks_ws.send(json.dumps({
                        "t": "audio", "data": msg["delta"],
                    }))

        await asyncio.gather(elks_to_openai(), openai_to_elks())


async def main():
    port = int(sys.argv[1]) if len(sys.argv) > 1 else 8095
    log.info("Listening on port %d", port)
    async with websockets.serve(handle_call, "0.0.0.0", port):
        await asyncio.Future()

asyncio.run(main())

6. Run as a systemd service

Create /etc/systemd/system/voice-agent.service:

[Unit]
Description=46elks Voice Agent
After=network.target

[Service]
Type=simple
User=www-data
WorkingDirectory=/var/www/apps/voice-agent
ExecStart=/var/www/apps/voice-agent/venv/bin/python voice_agent.py 8095
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Then enable and start it:

systemctl daemon-reload
systemctl enable --now voice-agent
journalctl -u voice-agent -f   # follow logs

Troubleshooting

Symptom	Likely cause	Fix
Busy tone immediately	Server down, port not open, or wrong URL in dashboard.	Check `systemctl status voice-agent` and open the port in your firewall.
Call connects then drops within seconds	Wrong OpenAI model name or invalid API key.	Use `gpt-4o-realtime-preview` exactly.
No audio heard	`sending` / `listening` not sent, or sent before OpenAI session is ready.	Send both declarations after the OpenAI session is configured.
Echo / AI interrupts itself	Caller audio fed back to OpenAI while AI is speaking.	Track an `is_speaking` flag and skip `input_audio_buffer.append` while it's true.

Pre-launch checklist

☐ Port open in server firewall.
☐ 46elks virtual number voice_start set to {"connect":"YOUR-WEBSOCKET-NUMBER"}.
☐ 46elks websocket-number voice_start set to ws://YOUR-IP:PORT.
☐ OPENAI_API_KEY set in environment.
☐ Model name is gpt-4o-realtime-preview.
☐ Service running with Restart=always.

Senast uppdaterat 2026-03-22

Skrivet av Jesper Jansson

Real-time two-way voice calls with WebSocket