OpenAI Realtime Conversation API
OpenAI Realtime Conversation API
Section titled “OpenAI Realtime Conversation API”Page Overview
Official Documentation
- OpenAI Realtime WebRTC
- OpenAI Realtime WebSocket
📝 Overview
Section titled “📝 Overview”Introduction✅
Section titled “Introduction✅”The OpenAI Realtime API provides two connection methods:
- WebRTC - for real-time audio and video interaction in browsers and mobile clients
- WebSocket - for server-to-server application integration
Use Cases✅
Section titled “Use Cases✅”- Real-time voice conversations
- Audio and video conferencing
- Real-time translation
- Speech-to-text transcription
- Real-time code generation
- Server-side real-time integration
Key Features✅
Section titled “Key Features✅”- Bidirectional audio streaming
- Mixed text and audio conversations
- Function calling support
- Automatic voice activity detection (VAD)
- Audio transcription
- Server-side WebSocket integration
🔐 Authentication and Security✅
Section titled “🔐 Authentication and Security✅”Authentication Methods✅
Section titled “Authentication Methods✅”- Standard API key (server-side use only)
- Ephemeral token (client-side use)
Ephemeral Token✅
Section titled “Ephemeral Token✅”- Validity: 1 minute
- Usage limit: single connection
- How to obtain: created through the server-side API
POST https://4All API地址/v1/realtime/sessionsContent-Type: application/jsonAuthorization: Bearer $4All API_API_KEY
{ "model": "gpt-4o-realtime-preview-2024-12-17", "voice": "verse"}Security Recommendations✅
Section titled “Security Recommendations✅”- Never expose your standard API key on the client side
- Use HTTPS/WSS for communication
- Implement proper access control
- Monitor for suspicious activity
🔌 Establishing a Connection✅
Section titled “🔌 Establishing a Connection✅”WebRTC Connection✅
Section titled “WebRTC Connection✅”- URL: https://4All API地址/v1/realtime
- Query parameter: model
- Request headers:
- Authorization: Bearer EPHEMERAL_KEY
- Content-Type: application/sdp
WebSocket Connection
Section titled “WebSocket Connection”- URL: wss://4All API地址/v1/realtime
- Query parameter: model
- Request headers:
- Authorization: Bearer YOUR_API_KEY
- OpenAI-Beta: realtime=v1
Connection Flow✅
Section titled “Connection Flow✅”Data Channel✅
Section titled “Data Channel✅”- Name: oai-events
- Purpose: event transmission
- Format: JSON
Audio Stream✅
Section titled “Audio Stream✅”- Input: addTrack()
- Output: ontrack event
💬 Conversation Interaction
Section titled “💬 Conversation Interaction”Conversation Modes✅
Section titled “Conversation Modes✅”- Text-only conversation
- Voice conversation
- Mixed conversation
Session Management✅
Section titled “Session Management✅”- Create session
- Update session
- End session
- Session configuration
Event Types✅
Section titled “Event Types✅”- Text events
- Audio events
- Function calls
- Status updates
- Error events
⚙️ Configuration Options✅
Section titled “⚙️ Configuration Options✅”Audio Configuration[¶]✅
Section titled “Audio Configuration[¶]✅”- Input format
- pcm16
- g711_ulaw
- g711_alaw
- Output format
- pcm16
- g711_ulaw
- g711_alaw
- Voice type
- alloy
- echo
- shimmer
Model Configuration✅
Section titled “Model Configuration✅”- Temperature
- Maximum output length
- System prompt
- Tool configuration
VAD Configuration✅
Section titled “VAD Configuration✅”- Threshold
- Silence duration
- Prefix padding
💡 Request Examples✅
Section titled “💡 Request Examples✅”WebRTC Connection ❌
Section titled “WebRTC Connection ❌”Client Implementation (Browser)✅
Section titled “Client Implementation (Browser)✅”async function init() { // Get an ephemeral key from the server - see the server code below const tokenResponse = await fetch("/session"); const data = await tokenResponse.json(); const EPHEMERAL_KEY = data.client_secret.value;
// Create the peer connection const pc = new RTCPeerConnection();
// Set up playback for remote audio returned by the model const audioEl = document.createElement("audio"); audioEl.autoplay = true; pc.ontrack = e => audioEl.srcObject = e.streams[0];
// Add local audio track from the browser microphone const ms = await navigator.mediaDevices.getUserMedia({ audio: true }); pc.addTrack(ms.getTracks()[0]);
// Set up a data channel for sending and receiving events const dc = pc.createDataChannel("oai-events"); dc.addEventListener("message", (e) => { // Real-time server events are received here! console.log(e); });
// Start the session using Session Description Protocol (SDP) const offer = await pc.createOffer(); await pc.setLocalDescription(offer);
const baseUrl = "https://4All API地址/v1/realtime"; const model = "gpt-4o-realtime-preview-2024-12-17"; const sdpResponse = await fetch(`${baseUrl}?model=${model}`, { method: "POST", body: offer.sdp, headers: { Authorization: `Bearer ${EPHEMERAL_KEY}`, "Content-Type": "application/sdp" }, });
const answer = { type: "answer", sdp: await sdpResponse.text(), }; await pc.setRemoteDescription(answer);}
init();Server-Side Implementation (Node.js)✅
Section titled “Server-Side Implementation (Node.js)✅”import express from "express";
const app = express();
// Create an endpoint to generate ephemeral tokens// This endpoint is used together with the client code aboveapp.get("/session", async (req, res) => { const r = await fetch("https://4All API地址/v1/realtime/sessions", { method: "POST", headers: { "Authorization": `Bearer ${process.env.4All API_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-4o-realtime-preview-2024-12-17", voice: "verse", }), }); const data = await r.json();
// Send the JSON received from the OpenAI REST API back to the client res.send(data);});
app.listen(3000);WebRTC Event Send/Receive Example✅
Section titled “WebRTC Event Send/Receive Example✅”// Create a data channel from the peer connectionconst dc = pc.createDataChannel("oai-events");
// Listen for server events on the data channel// Event data needs to be parsed from a JSON stringdc.addEventListener("message", (e) => { const realtimeEvent = JSON.parse(e.data); console.log(realtimeEvent);});
// Send a client event: serialize a valid client event into// JSON and send it through the data channelconst responseCreate = { type: "response.create", response: { modalities: ["text"], instructions: "Write a haiku about code", },};dc.send(JSON.stringify(responseCreate));WebSocket Connection ✅
Section titled “WebSocket Connection ✅”Node.js (ws module)✅
Section titled “Node.js (ws module)✅”import WebSocket from "ws";
const url = "wss://4All API地址/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17";const ws = new WebSocket(url, { headers: { "Authorization": "Bearer " + process.env.4All API_API_KEY, "OpenAI-Beta": "realtime=v1", },});
ws.on("open", function open() { console.log("Connected to server.");});
ws.on("message", function incoming(message) { console.log(JSON.parse(message.toString()));});Python (websocket-client)✅
Section titled “Python (websocket-client)✅”# websocket-client library required:# pip install websocket-client
import osimport jsonimport websocket
NEW_API_KEY = os.environ.get("4All API_API_KEY")
url = "wss://4All API地址/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17"headers = [ "Authorization: Bearer " + 4All API_API_KEY, "OpenAI-Beta: realtime=v1"]
def on_open(ws): print("Connected to server.")
def on_message(ws, message): data = json.loads(message) print("Received event:", json.dumps(data, indent=2))
ws = websocket.WebSocketApp( url, header=headers, on_open=on_open, on_message=on_message,)
ws.run_forever()Browser (Standard WebSocket)✅
Section titled “Browser (Standard WebSocket)✅”/*Note: In browser client environments, we recommend using WebRTC.However, in browser-like environments such as Deno and Cloudflare Workers,you can also use the standard WebSocket interface.*/
const ws = new WebSocket( "wss://4All API地址/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17", [ "realtime", // Authentication "openai-insecure-api-key." + 4All API_API_KEY, // Optional "openai-organization." + OPENAI_ORG_ID, "openai-project." + OPENAI_PROJECT_ID, // Beta protocol, required "openai-beta.realtime-v1" ]);
ws.on("open", function open() { console.log("Connected to server.");});
ws.on("message", function incoming(message) { console.log(message.data);});Message Send/Receive Example✅
Section titled “Message Send/Receive Example✅”Node.js/Browser✅
Section titled “Node.js/Browser✅”// Receive server eventsws.on("message", function incoming(message) { // The message data needs to be parsed from JSON const serverEvent = JSON.parse(message.data) console.log(serverEvent);});
// Send an event by creating a JSON data structure// that conforms to the client event formatconst event = { type: "response.create", response: { modalities: ["audio", "text"], instructions: "Give me a haiku about code.", }};ws.send(JSON.stringify(event));Python✅
Section titled “Python✅”// Send a client event by serializing a dictionary into JSONdef on_open(ws): print("Connected to server.")
event = { "type": "response.create", "response": { "modalities": ["text"], "instructions": "Please assist the user." } } ws.send(json.dumps(event))
# Received messages need to be parsed from the JSON payloaddef on_message(ws, message): data = json.loads(message) print("Received event:", json.dumps(data, indent=2))⚠️ Error Handling✅
Section titled “⚠️ Error Handling✅”Common Errors✅
Section titled “Common Errors✅”- Connection errors
- Network issues
- Authentication failures
- Configuration errors
- Audio errors
- Device permissions
- Unsupported formats
- Codec issues
- Session errors
- Token expiration
- Session timeout
- Concurrency limits
Error Recovery✅
Section titled “Error Recovery✅”- Automatic reconnection
- Session recovery
- Error retries
- Graceful degradation
📝 Event Reference✅
Section titled “📝 Event Reference✅”Common Request Headers✅
Section titled “Common Request Headers✅”All events must include the following request headers:
| Request Header | Type | Description | Example Value |
|---|---|---|---|
| Authorization | String | Authentication token | Bearer $NEW_API_KEY |
| OpenAI-Beta | String | API version | realtime=v1 |
Client Events✅
Section titled “Client Events✅”session.update✅
Section titled “session.update✅”Update the session’s default configuration.
| Parameter | Type | Required | Description | Example Value / Optional Values |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_123 |
| type | String | No | Event type | session.update |
| modalities | String array | No | Modalities the model can respond with | [“text”, “audio”] |
| instructions | String | No | System instructions prepended before model calls | ”Your knowledge cutoff is 2023-10…“ |
| voice | String | No | Voice type used by the model | alloy、echo、shimmer |
| input_audio_format | String | No | Input audio format | pcm16、g711_ulaw、g711_alaw |
| output_audio_format | String | No | Output audio format | pcm16、g711_ulaw、g711_alaw |
| input_audio_transcription.model | String | No | Model used for transcription | whisper-1 |
| turn_detection.type | String | No | Voice activity detection type | server_vad |
| turn_detection.threshold | Number | No | VAD activation threshold (0.0-1.0) | 0.8 |
| turn_detection.prefix_padding_ms | Integer | No | Amount of audio included before speech begins | 500 |
| turn_detection.silence_duration_ms | Integer | No | Silence duration used to detect the end of speech | 1000 |
| tools | Array | No | List of tools available to the model | [] |
| tool_choice | String | No | How the model chooses tools | auto/none/required |
| temperature | Number | No | Model sampling temperature | 0.8 |
| max_output_tokens | String/Integer | No | Maximum tokens for a single response | ”inf”/4096 |
input_audio_buffer.append✅
Section titled “input_audio_buffer.append✅”Append audio data to the input audio buffer.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_456 |
| type | String | No | Event type | input_audio_buffer.append |
| audio | String | No | Base64-encoded audio data | Base64EncodedAudioData |
input_audio_buffer.commit✅
Section titled “input_audio_buffer.commit✅”Commit the audio data in the buffer as a user message.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_789 |
| type | String | No | Event type | input_audio_buffer.commit |
input_audio_buffer.clear✅
Section titled “input_audio_buffer.clear✅”Clear all audio data from the input audio buffer.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_012 |
| type | String | No | Event type | input_audio_buffer.clear |
conversation.item.create✅
Section titled “conversation.item.create✅”Add a new conversation item to the conversation.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_345 |
| type | String | No | Event type | conversation.item.create |
| previous_item_id | String | No | The new conversation item will be inserted after this ID | null |
| item.id | String | No | Unique identifier of the conversation item | msg_001 |
| item.type | String | No | Conversation item type | message/function_call/function_call_output |
| item.status | String | No | Conversation item status | completed/in_progress/incomplete |
| item.role | String | No | Role of the message sender | user/assistant/system |
| item.content | Array | No | Message content | [text/audio/transcript] |
| item.call_id | String | No | Function call ID | call_001 |
| item.name | String | No | Name of the function being called | function_name |
| item.arguments | String | No | Function call arguments | {“param”: “value”} |
| item.output | String | No | Function call output | {“result”: “value”} |
conversation.item.truncate✅
Section titled “conversation.item.truncate✅”Truncate the audio content in an assistant message.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_678 |
| type | String | No | Event type | conversation.item.truncate |
| item_id | String | No | ID of the assistant message item to truncate | msg_002 |
| content_index | Integer | No | Index of the content segment to truncate | 0 |
| audio_end_ms | Integer | No | End time of the audio truncation | 1500 |
conversation.item.delete✅
Section titled “conversation.item.delete✅”Delete a specified conversation item from the conversation history.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_901 |
| type | String | No | Event type | conversation.item.delete |
| item_id | String | No | ID of the conversation item to delete | msg_003 |
response.create✅
Section titled “response.create✅”Trigger response generation.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_234 |
| type | String | No | Event type | response.create |
| response.modalities | String array | No | Modalities for the response | [“text”, “audio”] |
| response.instructions | String | No | Instructions for the model | ”Please assist the user.” |
| response.voice | String | No | Voice type used by the model | alloy/echo/shimmer |
| response.output_audio_format | String | No | Output audio format | pcm16 |
| response.tools | Array | No | List of tools available to the model | [“type”, “name”, “description”] |
| response.tool_choice | String | No | How the model chooses tools | auto |
| response.temperature | Number | No | Sampling temperature | 0.7 |
| response.max_output_tokens | Integer/String | No | Maximum output tokens | 150/“inf” |
response.cancel✅
Section titled “response.cancel✅”Cancel an in-progress response generation.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_567 |
| type | String | No | Event type | response.cancel |
Server Events✅
Section titled “Server Events✅”error✅
Section titled “error✅”Event returned when an error occurs.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String array | No | Unique identifier of the server event | [“event_890”] |
| type | String | No | Event type | error |
| error.type | String | No | Error type | invalid_request_error/server_error |
| error.code | String | No | Error code | invalid_event |
| error.message | String | No | Human-readable error message | ”The ‘type’ field is missing.” |
| error.param | String | No | Parameter associated with the error | null |
| error.event_id | String | No | ID of the related event | event_567 |
conversation.item.input_audio_transcription.completed✅
Section titled “conversation.item.input_audio_transcription.completed✅”Returned when input audio transcription is enabled and transcription succeeds.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier of the server event | event_2122 |
| type | String | No | Event type | conversation.item.input_audio_transcription.completed |
| item_id | String | No | ID of the user message item | msg_003 |
| content_index | Integer | No | Index of the content segment containing the audio | 0 |
| transcript | String | No | Transcribed text | ”Hello, how are you?“ |
conversation.item.input_audio_transcription.failed✅
Section titled “conversation.item.input_audio_transcription.failed✅”Returned when input audio transcription is configured, but transcription of the user message fails.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier of the server event | event_2324 |
| type | String array | No | Event class |