Overview

When someone calls your 46elks number, 46elks opens a WebSocket connection to your server. Your server receives audio from the caller, forwards it to an AI model, and streams the model's response back — in real time.

Caller  ↔  46elks  ↔  Your server (WebSocket)  ↔  AI model

Prerequisites

1. Point your numbers at your server

In the 46elks dashboard, set the virtual number's voice_start to {"connect":"YOUR-WEBSOCKET-NUMBER"}.

Then set the websocket-number's voice_start to a WebSocket URL pointing at your server:

ws://YOUR-SERVER-IP:8095

When someone calls, 46elks will connect to this address. If the connection fails (server down, port closed, wrong URL), the caller hears a busy tone.

2. Understand the message protocol

All messages are JSON. The field t identifies the message type. The session follows a strict lifecycle:

  1. 46elks sends hello — call has started.
  2. Your server declares audio formats with sending and listening.
  3. Audio streams bidirectionally via audio messages.
  4. Either side sends bye to end the call.

3. Message reference

From 46elks → your server

TypeFieldsDescription
hellocallid, from, toCall started.
audiodata (base64)Audio from caller.
syncBuffer checkpoint acknowledgment.
byereason (done / hangup / error)Call ended.

From your server → 46elks

TypeFieldsDescription
sendingformatDeclare outbound audio format (agent → caller).
listeningformatDeclare inbound audio format (caller → agent).
audiodata (base64)Audio to play to caller.
interruptClear playback buffer. Must send sending again before resuming.
syncRequest buffer checkpoint.
byeEnd the call.

Supported audio formats

Format stringDescription
pcm_8000 / pcm_16000 / pcm_2400016-bit PCM mono.
alaw / ulawG.711, 8 kHz.
g722Wideband, 16 kHz.
oggOpus in Ogg container.
wav / mp3Outbound (sending) only.

4. Prompt for Claude or Lovable

Paste the prompt below into Claude, Lovable, or any AI assistant to generate a custom voice agent. Fill in your assistant's task at the bottom.

Build a Python WebSocket server that accepts incoming phone calls from 46elks and bridges audio to OpenAI Realtime API.

46elks WebSocket protocol (use these exact field names):

  • All messages are JSON with field t (not type) as the message type.
  • First message from 46elks: {"t": "hello", "callid": "…", "from": "…", "to": "…"}
  • Declare outbound format: {"t": "sending", "format": "pcm_24000"}
  • Declare inbound format: {"t": "listening", "format": "pcm_24000"}
  • Audio from 46elks: {"t": "audio", "data": "<base64>"}
  • Send audio to 46elks: {"t": "audio", "data": "<base64>"}
  • Call ended: {"t": "bye", "reason": "hangup"}

OpenAI Realtime API:

  • Model: gpt-4o-realtime-preview
  • Header: OpenAI-Beta: realtime=v1
  • Use input_audio_format: "pcm16" and output_audio_format: "pcm16" — these match pcm_24000.
  • Forward caller audio via input_audio_buffer.append.
  • Play AI responses when response.audio.delta arrives.

The assistant's task: [describe what your assistant should do]
Listen on port 8095. Use websockets and python-dotenv.

5. Example: bridge to OpenAI Realtime

The example below creates a WebSocket server that accepts calls from 46elks and bridges audio to the OpenAI Realtime API. Use pcm_24000 on the 46elks side — it matches OpenAI's PCM16 format at 24 kHz.

#!/usr/bin/env python3
# pip install websockets python-dotenv requests

import asyncio, json, logging, os, sys
import requests, websockets
from websockets.asyncio.client import connect as ws_connect
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
OPENAI_MODEL   = "gpt-4o-realtime-preview"
CODEC          = "pcm_24000"

log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")

INSTRUCTIONS = "You are a helpful voice assistant. Keep answers short and clear."


async def handle_call(elks_ws):
    # 1. Receive hello from 46elks
    data = json.loads(await elks_ws.recv())
    if data.get("t") != "hello":
        log.error("Expected hello, got: %s", data)
        return

    call_id = data.get("callid", "?")
    caller  = data.get("from", "?")
    log.info("Call %s from %s", call_id, caller)

    # 2. Connect to OpenAI Realtime
    async with ws_connect(
        f"wss://api.openai.com/v1/realtime?model={OPENAI_MODEL}",
        additional_headers={
            "Authorization": f"Bearer {OPENAI_API_KEY}",
            "OpenAI-Beta": "realtime=v1",
        },
    ) as openai_ws:

        # Configure session
        await openai_ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "modalities": ["audio", "text"],
                "instructions": INSTRUCTIONS,
                "voice": "shimmer",
                "input_audio_format":  "pcm16",
                "output_audio_format": "pcm16",
                "turn_detection": {
                    "type": "server_vad",
                    "threshold": 0.9,
                    "silence_duration_ms": 800,
                },
            },
        }))

        # Send greeting
        await openai_ws.send(json.dumps({
            "type": "response.create",
            "response": {
                "modalities": ["audio", "text"],
                "instructions": "Greet the caller briefly.",
            },
        }))

        # 3. Declare audio formats to 46elks
        await elks_ws.send(json.dumps({"t": "sending",   "format": CODEC}))
        await elks_ws.send(json.dumps({"t": "listening", "format": CODEC}))

        is_speaking = False

        # 4a. Caller audio -> OpenAI
        async def elks_to_openai():
            async for message in elks_ws:
                msg = json.loads(message)
                if msg.get("t") == "audio" and not is_speaking:
                    await openai_ws.send(json.dumps({
                        "type": "input_audio_buffer.append",
                        "audio": msg["data"],
                    }))
                elif msg.get("t") == "bye":
                    log.info("Call ended: %s", msg.get("reason"))
                    break

        # 4b. OpenAI audio -> caller
        async def openai_to_elks():
            nonlocal is_speaking
            async for message in openai_ws:
                msg = json.loads(message)
                t   = msg.get("type")
                if   t == "response.created":       is_speaking = True
                elif t == "response.done":           is_speaking = False
                elif t == "response.audio.delta":
                    await elks_ws.send(json.dumps({
                        "t": "audio", "data": msg["delta"],
                    }))

        await asyncio.gather(elks_to_openai(), openai_to_elks())


async def main():
    port = int(sys.argv[1]) if len(sys.argv) > 1 else 8095
    log.info("Listening on port %d", port)
    async with websockets.serve(handle_call, "0.0.0.0", port):
        await asyncio.Future()

asyncio.run(main())

6. Run as a systemd service

Create /etc/systemd/system/voice-agent.service:

[Unit]
Description=46elks Voice Agent
After=network.target

[Service]
Type=simple
User=www-data
WorkingDirectory=/var/www/apps/voice-agent
ExecStart=/var/www/apps/voice-agent/venv/bin/python voice_agent.py 8095
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Then enable and start it:

systemctl daemon-reload
systemctl enable --now voice-agent
journalctl -u voice-agent -f   # follow logs

Troubleshooting

SymptomLikely causeFix
Busy tone immediately Server down, port not open, or wrong URL in dashboard. Check systemctl status voice-agent and open the port in your firewall.
Call connects then drops within seconds Wrong OpenAI model name or invalid API key. Use gpt-4o-realtime-preview exactly.
No audio heard sending / listening not sent, or sent before OpenAI session is ready. Send both declarations after the OpenAI session is configured.
Echo / AI interrupts itself Caller audio fed back to OpenAI while AI is speaking. Track an is_speaking flag and skip input_audio_buffer.append while it's true.

Pre-launch checklist