OpenAI Chat Format (Chat Completions)

Official Documentation OpenAI Chat

📝 Introduction

Given a list of messages containing a conversation, the model will return a response. For related guidance, see OpenAI’s official documentation: Chat Completions

💡 Request Examples

Basic Text Chat ✅

curl https://api.4allapi.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $4ALLAPI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

Response example:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o-mini",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm glad to help you. What can I assist you with?"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Image Analysis Chat ✅

curl https://api.4allapi.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $4ALLAPI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

Response example:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o-mini",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The image shows a wooden boardwalk running through a lush green wetland. The boardwalk appears to stretch into the distance, with verdant vegetation on both sides."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Streaming Response ✅

curl https://api.4allapi.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $4ALLAPI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "Tell me a story"
      }
    ],
    "stream": true
  }'

Streaming response example:

{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"delta":{"content":"Once upon a time"},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"delta":{"content":"there was a"},"logprobs":null,"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"delta":{"content":"little rabbit"},"logprobs":null,"finish_reason":null}]}
// ... more chunks ...
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"gpt-4o-mini","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}

Function Calling ✅

curl https://api.4allapi.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $4ALLAPI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather like in Beijing today?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather for a specified location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City name, e.g. Beijing"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"]
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Response example:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699896916,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"北京\", \"unit\": \"celsius\"}"
            }
          }
        ]
      },
      "logprobs": null,
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 82,
    "completion_tokens": 17,
    "total_tokens": 99
  }
}

JSON Mode Output ✅

curl https://api.4allapi.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $4ALLAPI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a JSON assistant. Please respond in JSON format."
      },
      {
        "role": "user",
        "content": "Give me an example of user information"
      }
    ],
    "response_format": { "type": "json_object" }
  }'

Response example:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o-mini",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\"user\":{\"id\":1,\"name\":\"张三\",\"age\":28,\"email\":\"[email protected]\",\"interests\":[\"读书\",\"旅游\",\"摄影\"]}}"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 15,
    "total_tokens": 25
  }
}

📮 Request

Endpoint

POST /v1/chat/completions Create a model response for the given chat conversation. For more details, see the Text Generation, Vision, and Audio guides.

Authentication

Include the following header to authenticate with your API key:

Authorization: Bearer $4ALLAPI_API_KEY

Where $4ALLAPI_API_KEY is your API key. You can find or generate your API key on the API Keys page in the 4All API platform.

Request Body Parameters

messages

Type: array
Required: yes A list of messages containing the conversation so far. Different message types are supported depending on the model used, such as text, images, and audio.

model

Type: string
Required: yes The model ID to use. For details on which models are compatible with the Chat API, see the model endpoint compatibility table.

store

Type: boolean or null
Required: no
Default: false Whether to store the output of this chat completion request for use in our model distillation or evaluation products.

reasoning_effort

Type: string or null
Required: no
Default: medium
Only applies to o1 and o3-mini models Constrains how much reasoning effort reasoning models will spend. Supported values are currently low, medium, and high. Reducing reasoning effort can speed up responses and reduce the number of tokens used for reasoning in the response.

metadata

Type: map
Required: no A collection of 16 key-value pairs that can be attached to an object. This is useful for storing additional information about the object in a structured format, and can be queried via the API or dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

modalities

Type: array or null
Required: no The output types you want the model to generate for this request. Most models can generate text, which is the default: ["text"] The model can also be used to generate audio. To request both text and audio responses from this model, you can use: ["text", "audio"]

prediction

Type: object
Required: no Configuration for predicted outputs, which can significantly improve response time when most of the model’s response is already known in advance. This is most commonly used when making small edits to files.

audio

Type: object or null
Required: no Parameters for audio output. Required when requesting audio output with modalities: ["audio"].

temperature

Type: number or null
Required: no
Default: 1 Sampling temperature to use, between 0 and 2. Higher values like 0.8 make the output more random, while lower values like 0.2 make it more focused and deterministic. We generally recommend altering this value or top_p, but not both.

top_p

Type: number or null
Required: no
Default: 1 An alternative to sampling with temperature, called nucleus sampling, where the model considers the tokens with the top top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this value or temperature, but not both.

n

Type: integer or null
Required: no
Default: 1 How many chat completion choices to generate for each input message. Note that you will be charged for the number of tokens generated across all choices. Keeping n set to 1 minimizes costs.

stop

Type: string/array/null
Required: no
Default: null Up to 4 sequences where the API will stop generating further tokens.

max_tokens

Type: integer or null
Required: no The maximum number of tokens that can be generated in the chat completion. This value can be used to control the cost of text generated via the API. This value is now deprecated in favor of max_completion_tokens and is incompatible with o1 series models.

presence_penalty

Type: number or null
Required: no
Default: 0 A number between -2.0 and 2.0. Positive values penalize new tokens based on whether they have appeared in the text so far, increasing the likelihood that the model talks about new topics.

frequency_penalty

Type: number or null
Required: no
Default: 0 A number between -2.0 and 2.0. Positive values penalize new tokens based on how often they have already appeared in the text so far, decreasing the likelihood of the model repeating the same line verbatim.

logit_bias

Type: map
Required: no
Default: null Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens, specified by their token IDs in the tokenizer, to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model before sampling. The exact effect varies by model, but values between -1 and 1 should decrease or increase the likelihood of selection; values like -100 or 100 should result in the corresponding token being prohibited or exclusively selected.

user

Type: string
Required: no A unique identifier representing your end user, which can help 4All API monitor and detect abuse. Learn more.

service_tier

Type: string or null
Required: no
Default: auto Specifies the latency tier to use for processing the request. This parameter is relevant to customers subscribed to the Scale tier service:
If set to ‘auto’ and the project has Scale tier enabled, the system will use Scale tier credits until they run out
If set to ‘auto’ and the project does not have Scale tier enabled, requests will be processed with the default service tier, which has a lower uptime SLA and no latency guarantee
If set to ‘default’, requests will be processed with the default service tier, which has a lower uptime SLA and no latency guarantee
If not set, the default behavior is ‘auto’

stream_options

Type: object or null
Required: no
Default: null Options for streaming responses. Only used when stream: true is set.

response_format

Type: object
Required: no Specifies the format the model must output.
Set to { "type": "json_schema", "json_schema": {...} } to enable structured outputs and ensure the model matches the JSON schema you provide.
Set to { "type": "json_object" } to enable JSON mode and ensure the model generates valid JSON. Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Otherwise, the model may generate endless whitespace until token generation reaches the limit.

seed

Type: integer or null
Required: no Beta feature. If specified, our system will make a best effort to sample deterministically, so repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the response parameters to monitor backend changes.

tools

Type: array
Required: no A list of tools the model may call. Currently only functions are supported as tools. Use this parameter to provide a list of functions the model may generate JSON inputs for. Up to 128 functions are supported.

tool_choice

Type: string or object
Required: no Controls which tool the model calls, if any: - none: the model will not call any tools and will instead generate a message - auto: the model can choose between generating a message or calling one or more tools - required: the model must call one or more tools - {"type": "function", "function": {"name": "my_function"}}: force the model to call a specific tool Defaults to none when no tools are present, and to auto when tools are present.

parallel_tool_calls

Type: boolean
Required: no
Default: true Whether to enable parallel function calling during tool use.

📥 Response

Successful Response

Returns a chat completion object, or a streamed sequence of chat completion chunk objects if the request is streamed.

id

Type: string
Description: Unique identifier for the response

object

Type: string
Description: Object type, with the value "chat.completion"

created

Type: integer
Description: Timestamp when the response was created

model

Type: string
Description: The model name used

system_fingerprint

Type: string
Description: System fingerprint identifier

choices

Type: array
Description: Contains the generated response options
Properties:
index: option index
message: message object containing role and content
logprobs: log probability information
finish_reason: reason the generation finished

usage

Type: object
Description: Token usage statistics
Properties:
prompt_tokens: number of tokens used by the prompt
completion_tokens: number of tokens used by the completion
total_tokens: total token count
completion_tokens_details: token details