Skip to content

OpenAI Chat Format (Chat Completions)

Official Documentation OpenAI Chat

Given a list of messages containing a conversation, the model will return a response. For related guidance, see OpenAI’s official documentation: Chat Completions

Terminal window
curl https://api.4allapi.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $4ALLAPI_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}'

Response example:

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-4o-mini",
"system_fingerprint": "fp_44709d6fcb",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm glad to help you. What can I assist you with?"
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
Terminal window
curl https://api.4allapi.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $4ALLAPI_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
],
"max_tokens": 300
}'

Response example:

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-4o-mini",
"system_fingerprint": "fp_44709d6fcb",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The image shows a wooden boardwalk running through a lush green wetland. The boardwalk appears to stretch into the distance, with verdant vegetation on both sides."
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
Terminal window
curl https://api.4allapi.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $4ALLAPI_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Tell me a story"
}
],
"stream": true
}'

Streaming response example:

{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"delta":{"content":"Once upon a time"},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"delta":{"content":"there was a"},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"delta":{"content":"little rabbit"},"logprobs":null,"finish_reason":null}]}
// ... more chunks ...
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}
Terminal window
curl https://api.4allapi.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $4ALLAPI_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What is the weather like in Beijing today?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a specified location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. Beijing"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'

Response example:

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1699896916,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"北京\", \"unit\": \"celsius\"}"
}
}
]
},
"logprobs": null,
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": 82,
"completion_tokens": 17,
"total_tokens": 99
}
}
Terminal window
curl https://api.4allapi.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $4ALLAPI_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a JSON assistant. Please respond in JSON format."
},
{
"role": "user",
"content": "Give me an example of user information"
}
],
"response_format": { "type": "json_object" }
}'

Response example:

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-4o-mini",
"system_fingerprint": "fp_44709d6fcb",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "{\"user\":{\"id\":1,\"name\":\"张三\",\"age\":28,\"email\":\"[email protected]\",\"interests\":[\"读书\",\"旅游\",\"摄影\"]}}"
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 15,
"total_tokens": 25
}
}

POST /v1/chat/completions Create a model response for the given chat conversation. For more details, see the Text Generation, Vision, and Audio guides.

Include the following header to authenticate with your API key:

Authorization: Bearer $4ALLAPI_API_KEY

Where $4ALLAPI_API_KEY is your API key. You can find or generate your API key on the API Keys page in the 4All API platform.

  • Type: array
  • Required: yes A list of messages containing the conversation so far. Different message types are supported depending on the model used, such as text, images, and audio.
  • Type: string
  • Required: yes The model ID to use. For details on which models are compatible with the Chat API, see the model endpoint compatibility table.
  • Type: boolean or null
  • Required: no
  • Default: false Whether to store the output of this chat completion request for use in our model distillation or evaluation products.
  • Type: string or null
  • Required: no
  • Default: medium
  • Only applies to o1 and o3-mini models Constrains how much reasoning effort reasoning models will spend. Supported values are currently low, medium, and high. Reducing reasoning effort can speed up responses and reduce the number of tokens used for reasoning in the response.
  • Type: map
  • Required: no A collection of 16 key-value pairs that can be attached to an object. This is useful for storing additional information about the object in a structured format, and can be queried via the API or dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
  • Type: array or null
  • Required: no The output types you want the model to generate for this request. Most models can generate text, which is the default: ["text"] The model can also be used to generate audio. To request both text and audio responses from this model, you can use: ["text", "audio"]
  • Type: object
  • Required: no Configuration for predicted outputs, which can significantly improve response time when most of the model’s response is already known in advance. This is most commonly used when making small edits to files.
  • Type: object or null
  • Required: no Parameters for audio output. Required when requesting audio output with modalities: ["audio"].
  • Type: number or null
  • Required: no
  • Default: 1 Sampling temperature to use, between 0 and 2. Higher values like 0.8 make the output more random, while lower values like 0.2 make it more focused and deterministic. We generally recommend altering this value or top_p, but not both.
  • Type: number or null
  • Required: no
  • Default: 1 An alternative to sampling with temperature, called nucleus sampling, where the model considers the tokens with the top top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this value or temperature, but not both.
  • Type: integer or null
  • Required: no
  • Default: 1 How many chat completion choices to generate for each input message. Note that you will be charged for the number of tokens generated across all choices. Keeping n set to 1 minimizes costs.
  • Type: string/array/null
  • Required: no
  • Default: null Up to 4 sequences where the API will stop generating further tokens.
  • Type: integer or null
  • Required: no The maximum number of tokens that can be generated in the chat completion. This value can be used to control the cost of text generated via the API. This value is now deprecated in favor of max_completion_tokens and is incompatible with o1 series models.
  • Type: number or null
  • Required: no
  • Default: 0 A number between -2.0 and 2.0. Positive values penalize new tokens based on whether they have appeared in the text so far, increasing the likelihood that the model talks about new topics.
  • Type: number or null
  • Required: no
  • Default: 0 A number between -2.0 and 2.0. Positive values penalize new tokens based on how often they have already appeared in the text so far, decreasing the likelihood of the model repeating the same line verbatim.
  • Type: map
  • Required: no
  • Default: null Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens, specified by their token IDs in the tokenizer, to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model before sampling. The exact effect varies by model, but values between -1 and 1 should decrease or increase the likelihood of selection; values like -100 or 100 should result in the corresponding token being prohibited or exclusively selected.
  • Type: string
  • Required: no A unique identifier representing your end user, which can help 4All API monitor and detect abuse. Learn more.
  • Type: string or null
  • Required: no
  • Default: auto Specifies the latency tier to use for processing the request. This parameter is relevant to customers subscribed to the Scale tier service:
  • If set to ‘auto’ and the project has Scale tier enabled, the system will use Scale tier credits until they run out
  • If set to ‘auto’ and the project does not have Scale tier enabled, requests will be processed with the default service tier, which has a lower uptime SLA and no latency guarantee
  • If set to ‘default’, requests will be processed with the default service tier, which has a lower uptime SLA and no latency guarantee
  • If not set, the default behavior is ‘auto’
  • Type: object or null
  • Required: no
  • Default: null Options for streaming responses. Only used when stream: true is set.
  • Type: object
  • Required: no Specifies the format the model must output.
  • Set to { "type": "json_schema", "json_schema": {...} } to enable structured outputs and ensure the model matches the JSON schema you provide.
  • Set to { "type": "json_object" } to enable JSON mode and ensure the model generates valid JSON. Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Otherwise, the model may generate endless whitespace until token generation reaches the limit.
  • Type: integer or null
  • Required: no Beta feature. If specified, our system will make a best effort to sample deterministically, so repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the response parameters to monitor backend changes.
  • Type: array
  • Required: no A list of tools the model may call. Currently only functions are supported as tools. Use this parameter to provide a list of functions the model may generate JSON inputs for. Up to 128 functions are supported.
  • Type: string or object
  • Required: no Controls which tool the model calls, if any: - none: the model will not call any tools and will instead generate a message - auto: the model can choose between generating a message or calling one or more tools - required: the model must call one or more tools - {"type": "function", "function": {"name": "my_function"}}: force the model to call a specific tool Defaults to none when no tools are present, and to auto when tools are present.
  • Type: boolean
  • Required: no
  • Default: true Whether to enable parallel function calling during tool use.

Returns a chat completion object, or a streamed sequence of chat completion chunk objects if the request is streamed.

  • Type: string
  • Description: Unique identifier for the response
  • Type: string
  • Description: Object type, with the value "chat.completion"
  • Type: integer
  • Description: Timestamp when the response was created
  • Type: string
  • Description: The model name used
  • Type: string
  • Description: System fingerprint identifier
  • Type: array
  • Description: Contains the generated response options
  • Properties:
  • index: option index
  • message: message object containing role and content
  • logprobs: log probability information
  • finish_reason: reason the generation finished
  • Type: object
  • Description: Token usage statistics
  • Properties:
  • prompt_tokens: number of tokens used by the prompt
  • completion_tokens: number of tokens used by the completion
  • total_tokens: total token count
  • completion_tokens_details: token details