Skip to content

OpenAI Audio Format

Overview

Official Documentation

OpenAI Audio

The OpenAI Audio API provides three main capabilities:

  • Text-to-Speech (TTS) - Converts text into natural-sounding speech
  • Speech-to-Text (STT) - Transcribes audio into text
  • Audio Translation - Translates non-English audio into English text
curl https://4All API地址/v1/audio/speech \
-H "Authorization: Bearer $NEWAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "你好,世界!",
"voice": "alloy"
}' \
--output speech.mp3
curl https://4All API地址/v1/audio/transcriptions \
-H "Authorization: Bearer $NEWAPI_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/file/audio.mp3" \
-F model="whisper-1"

Response example:

{
"text": "你好,世界!"
}
curl https://4All API地址/v1/audio/translations \
-H "Authorization: Bearer $NEWAPI_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/file/chinese.mp3" \
-F model="whisper-1"

Response example:

{
"text": "Hello, world!"
}
POST /v1/audio/speech

Converts text to speech.

POST /v1/audio/transcriptions

Transcribes audio into text in the input language.

POST /v1/audio/translations

Translates audio into English text.

Use the following header for API key authentication:

Authorization: Bearer $NEWAPI_API_KEY

Where $NEWAPI_API_KEY is your API key.

  • Type: string
  • Required: Yes
  • Allowed values: tts-1, tts-1-hd
  • Description: The TTS model to use
  • Type: string
  • Required: Yes
  • Maximum length: 4096 characters
  • Description: The text to convert to speech
  • Type: string
  • Required: Yes
  • Allowed values: alloy, echo, fable, onyx, nova, shimmer
  • Description: The voice to use when generating speech
  • Type: string
  • Required: No
  • Default: mp3
  • Allowed values: mp3, opus, aac, flac, wav, pcm
  • Description: Audio output format
  • Type: number
  • Required: No
  • Default: 1.0
  • Range: 0.25 - 4.0
  • Description: The speaking speed of the generated audio
  • Type: file
  • Required: Yes
  • Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
  • Description: The audio file to transcribe
  • Type: string
  • Required: Yes
  • Currently supported: whisper-1
  • Description: The model ID to use
  • Type: string
  • Required: No
  • Format: ISO-639-1 (e.g. “en”)
  • Description: The language of the audio; providing it can improve accuracy
  • Type: string
  • Required: No
  • Description: Text used to guide the model’s style or to continue a previous segment of audio
  • Type: string
  • Required: No
  • Default: json
  • Allowed values: json, text, srt, verbose_json, vtt
  • Description: Output format
  • Type: number
  • Required: No
  • Default: 0
  • Range: 0 - 1
  • Description: Sampling temperature; higher values make the output more random
  • Type: array
  • Required: No
  • Default: segment
  • Allowed values: word, segment
  • Description: Timestamp granularity for the transcription
  • Type: file
  • Required: Yes
  • Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
  • Description: The audio file to translate
  • Type: string
  • Required: Yes
  • Currently supported: whisper-1
  • Description: The model ID to use
  • Type: string
  • Required: No
  • Description: English text used to guide the model’s style
  • Type: string
  • Required: No
  • Default: json
  • Allowed values: json, text, srt, verbose_json, vtt
  • Description: Output format
  • Type: number
  • Required: No
  • Default: 0
  • Range: 0 - 1
  • Description: Sampling temperature; higher values make the output more random

Returns the binary audio file content.

{
"text": "转录的文本内容"
}
{
"task": "transcribe",
"language": "english",
"duration": 8.47,
"text": "完整的转录文本",
"segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 3.32,
"text": "分段的转录文本",
"tokens": [50364, 440, 7534],
"temperature": 0.0,
"avg_logprob": -0.286,
"compression_ratio": 1.236,
"no_speech_prob": 0.009
}
]
}
{
"text": "翻译后的英文文本"
}

When a request has an issue, the API returns an error response object with an HTTP status code in the 4XX-5XX range.

  • 400 Bad Request : Invalid request parameters
  • 401 Unauthorized : API key is invalid or missing
  • 429 Too Many Requests : API rate limit exceeded
  • 500 Internal Server Error : Internal server error

Error response example:

{
"error": {
"message": "文件格式不支持",
"type": "invalid_request_error",
"param": "file",
"code": "invalid_file_format"
}
}