OpenAI Realtime Conversation API

Page Overview

Official Documentation

OpenAI Realtime WebRTC
OpenAI Realtime WebSocket

📝 Overview

Introduction✅

The OpenAI Realtime API provides two connection methods:

WebRTC - for real-time audio and video interaction in browsers and mobile clients
WebSocket - for server-to-server application integration

Use Cases✅

Real-time voice conversations
Audio and video conferencing
Real-time translation
Speech-to-text transcription
Real-time code generation
Server-side real-time integration

Key Features✅

Bidirectional audio streaming
Mixed text and audio conversations
Function calling support
Automatic voice activity detection (VAD)
Audio transcription
Server-side WebSocket integration

🔐 Authentication and Security✅

Authentication Methods✅

Standard API key (server-side use only)
Ephemeral token (client-side use)

Ephemeral Token✅

Validity: 1 minute
Usage limit: single connection
How to obtain: created through the server-side API

POST https://4All API地址/v1/realtime/sessions
Content-Type: application/json
Authorization: Bearer $4All API_API_KEY

{
  "model": "gpt-4o-realtime-preview-2024-12-17",
  "voice": "verse"
}

Security Recommendations✅

Never expose your standard API key on the client side
Use HTTPS/WSS for communication
Implement proper access control
Monitor for suspicious activity

🔌 Establishing a Connection✅

WebRTC Connection✅

URL: https://4All API地址/v1/realtime
Query parameter: model
Request headers:
Authorization: Bearer EPHEMERAL_KEY
Content-Type: application/sdp

WebSocket Connection

URL: wss://4All API地址/v1/realtime
Query parameter: model
Request headers:
Authorization: Bearer YOUR_API_KEY
OpenAI-Beta: realtime=v1

Connection Flow✅

Data Channel✅

Name: oai-events
Purpose: event transmission
Format: JSON

Audio Stream✅

Input: addTrack()
Output: ontrack event

💬 Conversation Interaction

Conversation Modes✅

Text-only conversation
Voice conversation
Mixed conversation

Session Management✅

Create session
Update session
End session
Session configuration

Event Types✅

Text events
Audio events
Function calls
Status updates
Error events

⚙️ Configuration Options✅

Audio Configuration[¶]✅

Input format
pcm16
g711_ulaw
g711_alaw
Output format
pcm16
g711_ulaw
g711_alaw
Voice type
alloy
echo
shimmer

Model Configuration✅

Temperature
Maximum output length
System prompt
Tool configuration

VAD Configuration✅

Threshold
Silence duration
Prefix padding

💡 Request Examples✅

WebRTC Connection ❌

Client Implementation (Browser)✅

async function init() {
  // Get an ephemeral key from the server - see the server code below
  const tokenResponse = await fetch("/session");
  const data = await tokenResponse.json();
  const EPHEMERAL_KEY = data.client_secret.value;

  // Create the peer connection
  const pc = new RTCPeerConnection();

  // Set up playback for remote audio returned by the model
  const audioEl = document.createElement("audio");
  audioEl.autoplay = true;
  pc.ontrack = e => audioEl.srcObject = e.streams[0];

  // Add local audio track from the browser microphone
  const ms = await navigator.mediaDevices.getUserMedia({
    audio: true
  });
  pc.addTrack(ms.getTracks()[0]);

  // Set up a data channel for sending and receiving events
  const dc = pc.createDataChannel("oai-events");
  dc.addEventListener("message", (e) => {
    // Real-time server events are received here!
    console.log(e);
  });

  // Start the session using Session Description Protocol (SDP)
  const offer = await pc.createOffer();
  await pc.setLocalDescription(offer);

  const baseUrl = "https://4All API地址/v1/realtime";
  const model = "gpt-4o-realtime-preview-2024-12-17";
  const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
    method: "POST",
    body: offer.sdp,
    headers: {
      Authorization: `Bearer ${EPHEMERAL_KEY}`,
      "Content-Type": "application/sdp"
    },
  });

  const answer = {
    type: "answer",
    sdp: await sdpResponse.text(),
  };
  await pc.setRemoteDescription(answer);
}

init();

Server-Side Implementation (Node.js)✅

import express from "express";

const app = express();

// Create an endpoint to generate ephemeral tokens
// This endpoint is used together with the client code above
app.get("/session", async (req, res) => {
  const r = await fetch("https://4All API地址/v1/realtime/sessions", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.4All API_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "gpt-4o-realtime-preview-2024-12-17",
      voice: "verse",
    }),
  });
  const data = await r.json();

  // Send the JSON received from the OpenAI REST API back to the client
  res.send(data);
});

app.listen(3000);

WebRTC Event Send/Receive Example✅

// Create a data channel from the peer connection
const dc = pc.createDataChannel("oai-events");

// Listen for server events on the data channel
// Event data needs to be parsed from a JSON string
dc.addEventListener("message", (e) => {
  const realtimeEvent = JSON.parse(e.data);
  console.log(realtimeEvent);
});

// Send a client event: serialize a valid client event into
// JSON and send it through the data channel
const responseCreate = {
  type: "response.create",
  response: {
    modalities: ["text"],
    instructions: "Write a haiku about code",
  },
};
dc.send(JSON.stringify(responseCreate));

WebSocket Connection ✅

Node.js (ws module)✅

import WebSocket from "ws";

const url = "wss://4All API地址/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17";
const ws = new WebSocket(url, {
  headers: {
    "Authorization": "Bearer " + process.env.4All API_API_KEY,
    "OpenAI-Beta": "realtime=v1",
  },
});

ws.on("open", function open() {
  console.log("Connected to server.");
});

ws.on("message", function incoming(message) {
  console.log(JSON.parse(message.toString()));
});

Python (websocket-client)✅

# websocket-client library required:
# pip install websocket-client

import os
import json
import websocket

NEW_API_KEY = os.environ.get("4All API_API_KEY")

url = "wss://4All API地址/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17"
headers = [
    "Authorization: Bearer " + 4All API_API_KEY,
    "OpenAI-Beta: realtime=v1"
]

def on_open(ws):
    print("Connected to server.")

def on_message(ws, message):
    data = json.loads(message)
    print("Received event:", json.dumps(data, indent=2))

ws = websocket.WebSocketApp(
    url,
    header=headers,
    on_open=on_open,
    on_message=on_message,
)

ws.run_forever()

Browser (Standard WebSocket)✅

/*
Note: In browser client environments, we recommend using WebRTC.
However, in browser-like environments such as Deno and Cloudflare Workers,
you can also use the standard WebSocket interface.
*/

const ws = new WebSocket(
  "wss://4All API地址/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17",
  [
    "realtime",
    // Authentication
    "openai-insecure-api-key." + 4All API_API_KEY,
    // Optional
    "openai-organization." + OPENAI_ORG_ID,
    "openai-project." + OPENAI_PROJECT_ID,
    // Beta protocol, required
    "openai-beta.realtime-v1"
  ]
);

ws.on("open", function open() {
  console.log("Connected to server.");
});

ws.on("message", function incoming(message) {
  console.log(message.data);
});

Message Send/Receive Example✅

Node.js/Browser✅

// Receive server events
ws.on("message", function incoming(message) {
  // The message data needs to be parsed from JSON
  const serverEvent = JSON.parse(message.data)
  console.log(serverEvent);
});

// Send an event by creating a JSON data structure
// that conforms to the client event format
const event = {
  type: "response.create",
  response: {
    modalities: ["audio", "text"],
    instructions: "Give me a haiku about code.",
  }
};
ws.send(JSON.stringify(event));

Python✅

// Send a client event by serializing a dictionary into JSON
def on_open(ws):
    print("Connected to server.")

    event = {
        "type": "response.create",
        "response": {
            "modalities": ["text"],
            "instructions": "Please assist the user."
        }
    }
    ws.send(json.dumps(event))

# Received messages need to be parsed from the JSON payload
def on_message(ws, message):
    data = json.loads(message)
    print("Received event:", json.dumps(data, indent=2))

⚠️ Error Handling✅

Common Errors✅

Connection errors
Network issues
Authentication failures
Configuration errors
Audio errors
Device permissions
Unsupported formats
Codec issues
Session errors
Token expiration
Session timeout
Concurrency limits

Error Recovery✅

Automatic reconnection
Session recovery
Error retries
Graceful degradation

📝 Event Reference✅

Common Request Headers✅

All events must include the following request headers:

Request Header	Type	Description	Example Value
Authorization	String	Authentication token	Bearer $NEW_API_KEY
OpenAI-Beta	String	API version	realtime=v1

Client Events✅

session.update✅

Update the session’s default configuration.

Parameter	Type	Required	Description	Example Value / Optional Values
event_id	String	No	Client-generated event identifier	event_123
type	String	No	Event type	session.update
modalities	String array	No	Modalities the model can respond with	[“text”, “audio”]
instructions	String	No	System instructions prepended before model calls	”Your knowledge cutoff is 2023-10…“
voice	String	No	Voice type used by the model	alloy、echo、shimmer
input_audio_format	String	No	Input audio format	pcm16、g711_ulaw、g711_alaw
output_audio_format	String	No	Output audio format	pcm16、g711_ulaw、g711_alaw
input_audio_transcription.model	String	No	Model used for transcription	whisper-1
turn_detection.type	String	No	Voice activity detection type	server_vad
turn_detection.threshold	Number	No	VAD activation threshold (0.0-1.0)	0.8
turn_detection.prefix_padding_ms	Integer	No	Amount of audio included before speech begins	500
turn_detection.silence_duration_ms	Integer	No	Silence duration used to detect the end of speech	1000
tools	Array	No	List of tools available to the model	[]
tool_choice	String	No	How the model chooses tools	auto/none/required
temperature	Number	No	Model sampling temperature	0.8
max_output_tokens	String/Integer	No	Maximum tokens for a single response	”inf”/4096

input_audio_buffer.append✅

Append audio data to the input audio buffer.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_456
type	String	No	Event type	input_audio_buffer.append
audio	String	No	Base64-encoded audio data	Base64EncodedAudioData

input_audio_buffer.commit✅

Commit the audio data in the buffer as a user message.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_789
type	String	No	Event type	input_audio_buffer.commit

input_audio_buffer.clear✅

Clear all audio data from the input audio buffer.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_012
type	String	No	Event type	input_audio_buffer.clear

conversation.item.create✅

Add a new conversation item to the conversation.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_345
type	String	No	Event type	conversation.item.create
previous_item_id	String	No	The new conversation item will be inserted after this ID	null
item.id	String	No	Unique identifier of the conversation item	msg_001
item.type	String	No	Conversation item type	message/function_call/function_call_output
item.status	String	No	Conversation item status	completed/in_progress/incomplete
item.role	String	No	Role of the message sender	user/assistant/system
item.content	Array	No	Message content	[text/audio/transcript]
item.call_id	String	No	Function call ID	call_001
item.name	String	No	Name of the function being called	function_name
item.arguments	String	No	Function call arguments	{“param”: “value”}
item.output	String	No	Function call output	{“result”: “value”}

conversation.item.truncate✅

Truncate the audio content in an assistant message.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_678
type	String	No	Event type	conversation.item.truncate
item_id	String	No	ID of the assistant message item to truncate	msg_002
content_index	Integer	No	Index of the content segment to truncate	0
audio_end_ms	Integer	No	End time of the audio truncation	1500

conversation.item.delete✅

Delete a specified conversation item from the conversation history.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_901
type	String	No	Event type	conversation.item.delete
item_id	String	No	ID of the conversation item to delete	msg_003

response.create✅

Trigger response generation.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_234
type	String	No	Event type	response.create
response.modalities	String array	No	Modalities for the response	[“text”, “audio”]
response.instructions	String	No	Instructions for the model	”Please assist the user.”
response.voice	String	No	Voice type used by the model	alloy/echo/shimmer
response.output_audio_format	String	No	Output audio format	pcm16
response.tools	Array	No	List of tools available to the model	[“type”, “name”, “description”]
response.tool_choice	String	No	How the model chooses tools	auto
response.temperature	Number	No	Sampling temperature	0.7
response.max_output_tokens	Integer/String	No	Maximum output tokens	150/“inf”

response.cancel✅

Cancel an in-progress response generation.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Client-generated event identifier	event_567
type	String	No	Event type	response.cancel

Server Events✅

error✅

Event returned when an error occurs.

Parameter	Type	Required	Description	Example Value
event_id	String array	No	Unique identifier of the server event	[“event_890”]
type	String	No	Event type	error
error.type	String	No	Error type	invalid_request_error/server_error
error.code	String	No	Error code	invalid_event
error.message	String	No	Human-readable error message	”The ‘type’ field is missing.”
error.param	String	No	Parameter associated with the error	null
error.event_id	String	No	ID of the related event	event_567

conversation.item.input_audio_transcription.completed✅

Returned when input audio transcription is enabled and transcription succeeds.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier of the server event	event_2122
type	String	No	Event type	conversation.item.input_audio_transcription.completed
item_id	String	No	ID of the user message item	msg_003
content_index	Integer	No	Index of the content segment containing the audio	0
transcript	String	No	Transcribed text	”Hello, how are you?“

conversation.item.input_audio_transcription.failed✅

Returned when input audio transcription is configured, but transcription of the user message fails.

Parameter	Type	Required	Description	Example Value
event_id	String	No	Unique identifier of the server event	event_2324
type	String array	No	Event class