OpenAI Audio Format

OpenAI Audio Formats

Overview

Official Documentation

OpenAI Audio

📝 Introduction

The OpenAI Audio API provides three main capabilities:

Text-to-Speech (TTS) - Converts text into natural-sounding speech
Speech-to-Text (STT) - Transcribes audio into text
Audio Translation - Translates non-English audio into English text

💡 Request Examples

Text-to-Speech ✅

curl https://4All API地址/v1/audio/speech \
  -H "Authorization: Bearer $NEWAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "你好,世界!",
    "voice": "alloy"
  }' \
  --output speech.mp3

Speech-to-Text ✅

curl https://4All API地址/v1/audio/transcriptions \
  -H "Authorization: Bearer $NEWAPI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \
  -F model="whisper-1"

Response example:

{
  "text": "你好,世界!"
}

Audio Translation ✅

curl https://4All API地址/v1/audio/translations \
  -H "Authorization: Bearer $NEWAPI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/chinese.mp3" \
  -F model="whisper-1"

Response example:

{
  "text": "Hello, world!"
}

📮 Requests

Endpoints

Text-to-Speech

POST /v1/audio/speech

Converts text to speech.

Speech-to-Text

POST /v1/audio/transcriptions

Transcribes audio into text in the input language.

Audio Translation

POST /v1/audio/translations

Translates audio into English text.

Authentication

Use the following header for API key authentication:

Authorization: Bearer $NEWAPI_API_KEY

Where $NEWAPI_API_KEY is your API key.

Request Body Parameters

Text-to-Speech

model

Type: string
Required: Yes
Allowed values: tts-1, tts-1-hd
Description: The TTS model to use

input

Type: string
Required: Yes
Maximum length: 4096 characters
Description: The text to convert to speech

voice

Type: string
Required: Yes
Allowed values: alloy, echo, fable, onyx, nova, shimmer
Description: The voice to use when generating speech

response_format

Type: string
Required: No
Default: mp3
Allowed values: mp3, opus, aac, flac, wav, pcm
Description: Audio output format

speed

Type: number
Required: No
Default: 1.0
Range: 0.25 - 4.0
Description: The speaking speed of the generated audio

Speech-to-Text

file

Type: file
Required: Yes
Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
Description: The audio file to transcribe

model

Type: string
Required: Yes
Currently supported: whisper-1
Description: The model ID to use

language

Type: string
Required: No
Format: ISO-639-1 (e.g. “en”)
Description: The language of the audio; providing it can improve accuracy

prompt

Type: string
Required: No
Description: Text used to guide the model’s style or to continue a previous segment of audio

response_format

Type: string
Required: No
Default: json
Allowed values: json, text, srt, verbose_json, vtt
Description: Output format

temperature

Type: number
Required: No
Default: 0
Range: 0 - 1
Description: Sampling temperature; higher values make the output more random

timestamp_granularities

Type: array
Required: No
Default: segment
Allowed values: word, segment
Description: Timestamp granularity for the transcription

Audio Translation

file

Type: file
Required: Yes
Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
Description: The audio file to translate

model

Type: string
Required: Yes
Currently supported: whisper-1
Description: The model ID to use

prompt

Type: string
Required: No
Description: English text used to guide the model’s style

response_format

Type: string
Required: No
Default: json
Allowed values: json, text, srt, verbose_json, vtt
Description: Output format

temperature

Type: number
Required: No
Default: 0
Range: 0 - 1
Description: Sampling temperature; higher values make the output more random

📥 Responses

Successful Response

Text-to-Speech

Returns the binary audio file content.

Speech-to-Text

Basic JSON Format

{
  "text": "转录的文本内容"
}

Detailed JSON Format

{
  "task": "transcribe",
  "language": "english",
  "duration": 8.47,
  "text": "完整的转录文本",
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 3.32,
      "text": "分段的转录文本",
      "tokens": [50364, 440, 7534],
      "temperature": 0.0,
      "avg_logprob": -0.286,
      "compression_ratio": 1.236,
      "no_speech_prob": 0.009
    }
  ]
}

Audio Translation

{
  "text": "翻译后的英文文本"
}

Error Response

When a request has an issue, the API returns an error response object with an HTTP status code in the 4XX-5XX range.

Common Error Status Codes

400 Bad Request : Invalid request parameters
401 Unauthorized : API key is invalid or missing
429 Too Many Requests : API rate limit exceeded
500 Internal Server Error : Internal server error

Error response example:

{
  "error": {
    "message": "文件格式不支持",
    "type": "invalid_request_error",
    "param": "file",
    "code": "invalid_file_format"
  }
}