Skip to content

DeepSeek Reasoning Conversation Format (Chat Completions-style)

Deepseek Reasoning Conversation Format (Chat Completions Style)

Section titled “Deepseek Reasoning Conversation Format (Chat Completions Style)”

Overview

Official Documentation

Reasoning model (deepseek-reasoner)

Deepseek-reasoner is a reasoning model launched by DeepSeek. Before producing the final answer, the model first outputs a chain-of-thought section to improve the accuracy of the final response. The API exposes the reasoning_content from deepseek-reasoner to users for viewing, display, and distillation.

curl https://4All API地址/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $4All API_API_KEY" \
-d '{
"model": "deepseek-reasoner",
"messages": [
{
"role": "user",
"content": "9.11 and 9.8, which is greater?"
}
],
"max_tokens": 4096
}'

Response example:

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "deepseek-reasoner",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": "让我一步步思考:\n1. 我们需要比较9.11和9.8的大小\n2. 两个数都是小数,我们可以直接比较\n3. 9.8 = 9.80\n4. 9.11 < 9.80\n5. 所以9.8更大",
"content": "9.8 is greater than 9.11."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 15,
"total_tokens": 25
}
}
curl https://4All API地址/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $4All API_API_KEY" \
-d '{
"model": "deepseek-reasoner",
"messages": [
{
"role": "user",
"content": "9.11 and 9.8, which is greater?"
}
],
"stream": true
}'

Streaming response example:

{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"deepseek-reasoner","choices":[{"index":0,"delta":{"role":"assistant","reasoning_content":"让我"},"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"deepseek-reasoner","choices":[{"index":0,"delta":{"reasoning_content":"一步步"},"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"deepseek-reasoner","choices":[{"index":0,"delta":{"reasoning_content":"思考:"},"finish_reason":null}]}
// ... more chain-of-thought content ...
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"deepseek-reasoner","choices":[{"index":0,"delta":{"content":"9.8"},"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"deepseek-reasoner","choices":[{"index":0,"delta":{"content":" is greater"},"finish_reason":null}]}
// ... more final answer content ...
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1694268190,"model":"deepseek-reasoner","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
POST /v1/chat/completions

Include the following in the request header for API key authentication:

Authorization: Bearer $4All API_API_KEY

Where $DEEPSEEK_API_KEY is your API key.

  • Type: array
  • Required: Yes

The list of messages containing the conversation so far. Note that if you pass reasoning_content in the input messages sequence, the API will return a 400 error.

  • Type: string
  • Required: Yes
  • Value: deepseek-reasoner

The model ID to use. Currently only deepseek-reasoner is supported.

  • Type: integer
  • Required: No
  • Default: 4096
  • Maximum: 8192

The maximum length of the final answer, excluding the chain-of-thought output. Note that chain-of-thought output can reach up to 32K tokens.

  • Type: boolean
  • Required: No
  • Default: false

Whether to use streaming responses.

The following parameters are currently not supported:

  • temperature
  • top_p
  • presence_penalty
  • frequency_penalty
  • logprobs
  • top_logprobs

Note: To remain compatible with existing software, setting temperature, top_p, presence_penalty, or frequency_penalty will not produce an error, but they will also not take effect. Setting logprobs or top_logprobs will produce an error.

  • Chat completion
  • Prefix continuation in chat (Beta)
  • Function call
  • JSON output
  • FIM completion (Beta)

Returns a chat completion object, or a streamed sequence of chat completion chunk objects if the request is streamed.

  • Type: string
  • Description: Unique identifier for the response
  • Type: string
  • Description: Object type, value is “chat.completion”
  • Type: integer
  • Description: Response creation timestamp
  • Type: string
  • Description: The model name used, value is “deepseek-reasoner”
  • Type: array
  • Description: Contains the generated response options
  • Properties:
  • index : Option index
  • message : Message object containing the role, chain-of-thought content, and final answer role : Role, value is “assistant” reasoning_content : Chain-of-thought content content : Final answer content
  • finish_reason : Completion reason
  • Type: object
  • Description: Token usage statistics
  • Properties:
  • prompt_tokens : Number of tokens used by the prompt
  • completion_tokens : Number of tokens used by the completion
  • total_tokens : Total number of tokens

During each round of conversation, the model outputs chain-of-thought content (reasoning_content) and the final answer (content). In the next round, the chain-of-thought content from the previous round is not concatenated into the context, as shown below:

Note

If you pass reasoning_content in the input messages sequence, the API will return a 400 error. Therefore, please remove the reasoning_content field from the API response before making the next API request, as shown in the example below.

Example:

from openai import OpenAI
client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://4All API地址")
# Round 1
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
model="deepseek-reasoner",
messages=messages
)
reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content
# Round 2 - only concatenate the final answer content
messages.append({'role': 'assistant', 'content': content})
messages.append({'role': 'user', 'content': "How many Rs are there in the word 'strawberry'?"})
response = client.chat.completions.create(
model="deepseek-reasoner",
messages=messages
)

Streaming response example:

# Round 1
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
model="deepseek-reasoner",
messages=messages,
stream=True
)
reasoning_content = ""
content = ""
for chunk in response:
if chunk.choices[0].delta.reasoning_content:
reasoning_content += chunk.choices[0].delta.reasoning_content
else:
content += chunk.choices[0].delta.content
# Round 2 - only concatenate the final answer content
messages.append({"role": "assistant", "content": content})
messages.append({'role': 'user', 'content': "How many Rs are there in the word 'strawberry'?"})
response = client.chat.completions.create(
model="deepseek-reasoner",
messages=messages,
stream=True
)