OpenAI Audio Format
OpenAI Audio Formats
Section titled “OpenAI Audio Formats”Overview
Official Documentation
OpenAI Audio
📝 Introduction
Section titled “📝 Introduction”The OpenAI Audio API provides three main capabilities:
- Text-to-Speech (TTS) - Converts text into natural-sounding speech
- Speech-to-Text (STT) - Transcribes audio into text
- Audio Translation - Translates non-English audio into English text
💡 Request Examples
Section titled “💡 Request Examples”Text-to-Speech ✅
Section titled “Text-to-Speech ✅”curl https://4All API地址/v1/audio/speech \ -H "Authorization: Bearer $NEWAPI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "tts-1", "input": "你好,世界!", "voice": "alloy" }' \ --output speech.mp3Speech-to-Text ✅
Section titled “Speech-to-Text ✅”curl https://4All API地址/v1/audio/transcriptions \ -H "Authorization: Bearer $NEWAPI_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F file="@/path/to/file/audio.mp3" \ -F model="whisper-1"Response example:
{ "text": "你好,世界!"}Audio Translation ✅
Section titled “Audio Translation ✅”curl https://4All API地址/v1/audio/translations \ -H "Authorization: Bearer $NEWAPI_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F file="@/path/to/file/chinese.mp3" \ -F model="whisper-1"Response example:
{ "text": "Hello, world!"}📮 Requests
Section titled “📮 Requests”Endpoints
Section titled “Endpoints”Text-to-Speech
Section titled “Text-to-Speech”POST /v1/audio/speechConverts text to speech.
Speech-to-Text
Section titled “Speech-to-Text”POST /v1/audio/transcriptionsTranscribes audio into text in the input language.
Audio Translation
Section titled “Audio Translation”POST /v1/audio/translationsTranslates audio into English text.
Authentication
Section titled “Authentication”Use the following header for API key authentication:
Authorization: Bearer $NEWAPI_API_KEYWhere $NEWAPI_API_KEY is your API key.
Request Body Parameters
Section titled “Request Body Parameters”Text-to-Speech
Section titled “Text-to-Speech”- Type: string
- Required: Yes
- Allowed values: tts-1, tts-1-hd
- Description: The TTS model to use
- Type: string
- Required: Yes
- Maximum length: 4096 characters
- Description: The text to convert to speech
- Type: string
- Required: Yes
- Allowed values: alloy, echo, fable, onyx, nova, shimmer
- Description: The voice to use when generating speech
response_format
Section titled “response_format”- Type: string
- Required: No
- Default: mp3
- Allowed values: mp3, opus, aac, flac, wav, pcm
- Description: Audio output format
- Type: number
- Required: No
- Default: 1.0
- Range: 0.25 - 4.0
- Description: The speaking speed of the generated audio
Speech-to-Text
Section titled “Speech-to-Text”- Type: file
- Required: Yes
- Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
- Description: The audio file to transcribe
- Type: string
- Required: Yes
- Currently supported: whisper-1
- Description: The model ID to use
language
Section titled “language”- Type: string
- Required: No
- Format: ISO-639-1 (e.g. “en”)
- Description: The language of the audio; providing it can improve accuracy
prompt
Section titled “prompt”- Type: string
- Required: No
- Description: Text used to guide the model’s style or to continue a previous segment of audio
response_format
Section titled “response_format”- Type: string
- Required: No
- Default: json
- Allowed values: json, text, srt, verbose_json, vtt
- Description: Output format
temperature
Section titled “temperature”- Type: number
- Required: No
- Default: 0
- Range: 0 - 1
- Description: Sampling temperature; higher values make the output more random
timestamp_granularities
Section titled “timestamp_granularities”- Type: array
- Required: No
- Default: segment
- Allowed values: word, segment
- Description: Timestamp granularity for the transcription
Audio Translation
Section titled “Audio Translation”- Type: file
- Required: Yes
- Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
- Description: The audio file to translate
- Type: string
- Required: Yes
- Currently supported: whisper-1
- Description: The model ID to use
prompt
Section titled “prompt”- Type: string
- Required: No
- Description: English text used to guide the model’s style
response_format
Section titled “response_format”- Type: string
- Required: No
- Default: json
- Allowed values: json, text, srt, verbose_json, vtt
- Description: Output format
temperature
Section titled “temperature”- Type: number
- Required: No
- Default: 0
- Range: 0 - 1
- Description: Sampling temperature; higher values make the output more random
📥 Responses
Section titled “📥 Responses”Successful Response
Section titled “Successful Response”Text-to-Speech
Section titled “Text-to-Speech”Returns the binary audio file content.
Speech-to-Text
Section titled “Speech-to-Text”Basic JSON Format
Section titled “Basic JSON Format”{ "text": "转录的文本内容"}Detailed JSON Format
Section titled “Detailed JSON Format”{ "task": "transcribe", "language": "english", "duration": 8.47, "text": "完整的转录文本", "segments": [ { "id": 0, "seek": 0, "start": 0.0, "end": 3.32, "text": "分段的转录文本", "tokens": [50364, 440, 7534], "temperature": 0.0, "avg_logprob": -0.286, "compression_ratio": 1.236, "no_speech_prob": 0.009 } ]}Audio Translation
Section titled “Audio Translation”{ "text": "翻译后的英文文本"}Error Response
Section titled “Error Response”When a request has an issue, the API returns an error response object with an HTTP status code in the 4XX-5XX range.
Common Error Status Codes
Section titled “Common Error Status Codes”- 400 Bad Request : Invalid request parameters
- 401 Unauthorized : API key is invalid or missing
- 429 Too Many Requests : API rate limit exceeded
- 500 Internal Server Error : Internal server error
Error response example:
{ "error": { "message": "文件格式不支持", "type": "invalid_request_error", "param": "file", "code": "invalid_file_format" }}