Skip to content

OpenAI Realtime Conversation API

Page Overview

Official Documentation

  • OpenAI Realtime WebRTC
  • OpenAI Realtime WebSocket

The OpenAI Realtime API provides two connection methods:

  • WebRTC - for real-time audio and video interaction in browsers and mobile clients
  • WebSocket - for server-to-server application integration
  • Real-time voice conversations
  • Audio and video conferencing
  • Real-time translation
  • Speech-to-text transcription
  • Real-time code generation
  • Server-side real-time integration
  • Bidirectional audio streaming
  • Mixed text and audio conversations
  • Function calling support
  • Automatic voice activity detection (VAD)
  • Audio transcription
  • Server-side WebSocket integration
  • Standard API key (server-side use only)
  • Ephemeral token (client-side use)
  • Validity: 1 minute
  • Usage limit: single connection
  • How to obtain: created through the server-side API
POST https://4All API地址/v1/realtime/sessions
Content-Type: application/json
Authorization: Bearer $4All API_API_KEY
{
"model": "gpt-4o-realtime-preview-2024-12-17",
"voice": "verse"
}
  • Never expose your standard API key on the client side
  • Use HTTPS/WSS for communication
  • Implement proper access control
  • Monitor for suspicious activity
  • URL: https://4All API地址/v1/realtime
  • Query parameter: model
  • Request headers:
  • Authorization: Bearer EPHEMERAL_KEY
  • Content-Type: application/sdp
  • URL: wss://4All API地址/v1/realtime
  • Query parameter: model
  • Request headers:
  • Authorization: Bearer YOUR_API_KEY
  • OpenAI-Beta: realtime=v1
  • Name: oai-events
  • Purpose: event transmission
  • Format: JSON
  • Input: addTrack()
  • Output: ontrack event
  • Text-only conversation
  • Voice conversation
  • Mixed conversation
  • Create session
  • Update session
  • End session
  • Session configuration
  • Text events
  • Audio events
  • Function calls
  • Status updates
  • Error events
  • Input format
  • pcm16
  • g711_ulaw
  • g711_alaw
  • Output format
  • pcm16
  • g711_ulaw
  • g711_alaw
  • Voice type
  • alloy
  • echo
  • shimmer
  • Temperature
  • Maximum output length
  • System prompt
  • Tool configuration
  • Threshold
  • Silence duration
  • Prefix padding
async function init() {
// Get an ephemeral key from the server - see the server code below
const tokenResponse = await fetch("/session");
const data = await tokenResponse.json();
const EPHEMERAL_KEY = data.client_secret.value;
// Create the peer connection
const pc = new RTCPeerConnection();
// Set up playback for remote audio returned by the model
const audioEl = document.createElement("audio");
audioEl.autoplay = true;
pc.ontrack = e => audioEl.srcObject = e.streams[0];
// Add local audio track from the browser microphone
const ms = await navigator.mediaDevices.getUserMedia({
audio: true
});
pc.addTrack(ms.getTracks()[0]);
// Set up a data channel for sending and receiving events
const dc = pc.createDataChannel("oai-events");
dc.addEventListener("message", (e) => {
// Real-time server events are received here!
console.log(e);
});
// Start the session using Session Description Protocol (SDP)
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
const baseUrl = "https://4All API地址/v1/realtime";
const model = "gpt-4o-realtime-preview-2024-12-17";
const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
method: "POST",
body: offer.sdp,
headers: {
Authorization: `Bearer ${EPHEMERAL_KEY}`,
"Content-Type": "application/sdp"
},
});
const answer = {
type: "answer",
sdp: await sdpResponse.text(),
};
await pc.setRemoteDescription(answer);
}
init();
import express from "express";
const app = express();
// Create an endpoint to generate ephemeral tokens
// This endpoint is used together with the client code above
app.get("/session", async (req, res) => {
const r = await fetch("https://4All API地址/v1/realtime/sessions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.4All API_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-4o-realtime-preview-2024-12-17",
voice: "verse",
}),
});
const data = await r.json();
// Send the JSON received from the OpenAI REST API back to the client
res.send(data);
});
app.listen(3000);
// Create a data channel from the peer connection
const dc = pc.createDataChannel("oai-events");
// Listen for server events on the data channel
// Event data needs to be parsed from a JSON string
dc.addEventListener("message", (e) => {
const realtimeEvent = JSON.parse(e.data);
console.log(realtimeEvent);
});
// Send a client event: serialize a valid client event into
// JSON and send it through the data channel
const responseCreate = {
type: "response.create",
response: {
modalities: ["text"],
instructions: "Write a haiku about code",
},
};
dc.send(JSON.stringify(responseCreate));
import WebSocket from "ws";
const url = "wss://4All API地址/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17";
const ws = new WebSocket(url, {
headers: {
"Authorization": "Bearer " + process.env.4All API_API_KEY,
"OpenAI-Beta": "realtime=v1",
},
});
ws.on("open", function open() {
console.log("Connected to server.");
});
ws.on("message", function incoming(message) {
console.log(JSON.parse(message.toString()));
});
# websocket-client library required:
# pip install websocket-client
import os
import json
import websocket
NEW_API_KEY = os.environ.get("4All API_API_KEY")
url = "wss://4All API地址/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17"
headers = [
"Authorization: Bearer " + 4All API_API_KEY,
"OpenAI-Beta: realtime=v1"
]
def on_open(ws):
print("Connected to server.")
def on_message(ws, message):
data = json.loads(message)
print("Received event:", json.dumps(data, indent=2))
ws = websocket.WebSocketApp(
url,
header=headers,
on_open=on_open,
on_message=on_message,
)
ws.run_forever()
/*
Note: In browser client environments, we recommend using WebRTC.
However, in browser-like environments such as Deno and Cloudflare Workers,
you can also use the standard WebSocket interface.
*/
const ws = new WebSocket(
"wss://4All API地址/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17",
[
"realtime",
// Authentication
"openai-insecure-api-key." + 4All API_API_KEY,
// Optional
"openai-organization." + OPENAI_ORG_ID,
"openai-project." + OPENAI_PROJECT_ID,
// Beta protocol, required
"openai-beta.realtime-v1"
]
);
ws.on("open", function open() {
console.log("Connected to server.");
});
ws.on("message", function incoming(message) {
console.log(message.data);
});
// Receive server events
ws.on("message", function incoming(message) {
// The message data needs to be parsed from JSON
const serverEvent = JSON.parse(message.data)
console.log(serverEvent);
});
// Send an event by creating a JSON data structure
// that conforms to the client event format
const event = {
type: "response.create",
response: {
modalities: ["audio", "text"],
instructions: "Give me a haiku about code.",
}
};
ws.send(JSON.stringify(event));
// Send a client event by serializing a dictionary into JSON
def on_open(ws):
print("Connected to server.")
event = {
"type": "response.create",
"response": {
"modalities": ["text"],
"instructions": "Please assist the user."
}
}
ws.send(json.dumps(event))
# Received messages need to be parsed from the JSON payload
def on_message(ws, message):
data = json.loads(message)
print("Received event:", json.dumps(data, indent=2))
  • Connection errors
  • Network issues
  • Authentication failures
  • Configuration errors
  • Audio errors
  • Device permissions
  • Unsupported formats
  • Codec issues
  • Session errors
  • Token expiration
  • Session timeout
  • Concurrency limits
  • Automatic reconnection
  • Session recovery
  • Error retries
  • Graceful degradation

All events must include the following request headers:

Request HeaderTypeDescriptionExample Value
AuthorizationStringAuthentication tokenBearer $NEW_API_KEY
OpenAI-BetaStringAPI versionrealtime=v1

Update the session’s default configuration.

ParameterTypeRequiredDescriptionExample Value / Optional Values
event_idStringNoClient-generated event identifierevent_123
typeStringNoEvent typesession.update
modalitiesString arrayNoModalities the model can respond with[“text”, “audio”]
instructionsStringNoSystem instructions prepended before model calls”Your knowledge cutoff is 2023-10…“
voiceStringNoVoice type used by the modelalloy、echo、shimmer
input_audio_formatStringNoInput audio formatpcm16、g711_ulaw、g711_alaw
output_audio_formatStringNoOutput audio formatpcm16、g711_ulaw、g711_alaw
input_audio_transcription.modelStringNoModel used for transcriptionwhisper-1
turn_detection.typeStringNoVoice activity detection typeserver_vad
turn_detection.thresholdNumberNoVAD activation threshold (0.0-1.0)0.8
turn_detection.prefix_padding_msIntegerNoAmount of audio included before speech begins500
turn_detection.silence_duration_msIntegerNoSilence duration used to detect the end of speech1000
toolsArrayNoList of tools available to the model[]
tool_choiceStringNoHow the model chooses toolsauto/none/required
temperatureNumberNoModel sampling temperature0.8
max_output_tokensString/IntegerNoMaximum tokens for a single response”inf”/4096

Append audio data to the input audio buffer.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_456
typeStringNoEvent typeinput_audio_buffer.append
audioStringNoBase64-encoded audio dataBase64EncodedAudioData

Commit the audio data in the buffer as a user message.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_789
typeStringNoEvent typeinput_audio_buffer.commit

Clear all audio data from the input audio buffer.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_012
typeStringNoEvent typeinput_audio_buffer.clear

Add a new conversation item to the conversation.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_345
typeStringNoEvent typeconversation.item.create
previous_item_idStringNoThe new conversation item will be inserted after this IDnull
item.idStringNoUnique identifier of the conversation itemmsg_001
item.typeStringNoConversation item typemessage/function_call/function_call_output
item.statusStringNoConversation item statuscompleted/in_progress/incomplete
item.roleStringNoRole of the message senderuser/assistant/system
item.contentArrayNoMessage content[text/audio/transcript]
item.call_idStringNoFunction call IDcall_001
item.nameStringNoName of the function being calledfunction_name
item.argumentsStringNoFunction call arguments{“param”: “value”}
item.outputStringNoFunction call output{“result”: “value”}

Truncate the audio content in an assistant message.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_678
typeStringNoEvent typeconversation.item.truncate
item_idStringNoID of the assistant message item to truncatemsg_002
content_indexIntegerNoIndex of the content segment to truncate0
audio_end_msIntegerNoEnd time of the audio truncation1500

Delete a specified conversation item from the conversation history.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_901
typeStringNoEvent typeconversation.item.delete
item_idStringNoID of the conversation item to deletemsg_003

Trigger response generation.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_234
typeStringNoEvent typeresponse.create
response.modalitiesString arrayNoModalities for the response[“text”, “audio”]
response.instructionsStringNoInstructions for the model”Please assist the user.”
response.voiceStringNoVoice type used by the modelalloy/echo/shimmer
response.output_audio_formatStringNoOutput audio formatpcm16
response.toolsArrayNoList of tools available to the model[“type”, “name”, “description”]
response.tool_choiceStringNoHow the model chooses toolsauto
response.temperatureNumberNoSampling temperature0.7
response.max_output_tokensInteger/StringNoMaximum output tokens150/“inf”

Cancel an in-progress response generation.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_567
typeStringNoEvent typeresponse.cancel

Event returned when an error occurs.

ParameterTypeRequiredDescriptionExample Value
event_idString arrayNoUnique identifier of the server event[“event_890”]
typeStringNoEvent typeerror
error.typeStringNoError typeinvalid_request_error/server_error
error.codeStringNoError codeinvalid_event
error.messageStringNoHuman-readable error message”The ‘type’ field is missing.”
error.paramStringNoParameter associated with the errornull
error.event_idStringNoID of the related eventevent_567

conversation.item.input_audio_transcription.completed✅

Section titled “conversation.item.input_audio_transcription.completed✅”

Returned when input audio transcription is enabled and transcription succeeds.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier of the server eventevent_2122
typeStringNoEvent typeconversation.item.input_audio_transcription.completed
item_idStringNoID of the user message itemmsg_003
content_indexIntegerNoIndex of the content segment containing the audio0
transcriptStringNoTranscribed text”Hello, how are you?“

conversation.item.input_audio_transcription.failed✅

Section titled “conversation.item.input_audio_transcription.failed✅”

Returned when input audio transcription is configured, but transcription of the user message fails.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier of the server eventevent_2324
typeString arrayNoEvent class