Skip to content

TTS语音合成

The Speech Synthesis (Text-to-Speech, TTS) API lets you convert text into natural, fluent speech. This API is compatible with the OpenAI standard and also supports the Tongyi Qianwen qwen-tts model family, delivering high-quality Chinese and English speech synthesis.

  • Endpoint: /v1/audio/speech
  • Method: POST
  • Content Type: application/json
  • Authentication: Bearer Token

This system fully supports the Tongyi Qianwen qwen-tts model family:

Model NameDescriptionFeatures
qwen-ttsBasic versionStandard audio quality, suitable for general use
qwen-tts-latestLatest versionBetter audio quality, supports more voices
qwen-tts-2025-05-22Specific versionStable release, suitable for production environments

General Voices (Supported by All Versions)

Section titled “General Voices (Supported by All Versions)”
Voice CodeVoice NameGenderFeatures
CherrySweet female voiceFemaleSweet and pleasant, suitable for warm, friendly scenarios
SerenaGentle female voiceFemaleSoft and gentle, suitable for professional announcements
EthanSteady male voiceMaleCalm and composed, suitable for business scenarios
ChelsieLively female voiceFemaleEnergetic and lively, suitable for youthful content

Premium Voices (Supported by qwen-tts-latest and qwen-tts-2025-05-22)

Section titled “Premium Voices (Supported by qwen-tts-latest and qwen-tts-2025-05-22)”
Voice CodeVoice NameGenderFeatures
DylanBeijing dialectMaleYouthful and energetic, suitable for trendy content
JadaWu dialectFemaleIntelligent and elegant, suitable for educational content
SunnySichuan dialectFemaleBright and cheerful, suitable for children’s content
{
"model": "qwen-tts",
"input": "Hello, welcome to the speech synthesis service!",
"voice": "Cherry"
}
{
"model": "qwen-tts-latest",
"input": "This is a piece of text that needs to be converted into speech. It supports Chinese, English, and mixed Chinese-English input.",
"voice": "Serena",
"speed": 1.0,
"response_format": "wav"
}
ParameterTypeRequiredDescription
modelstringYesThe TTS model to use, supports OpenAI tts and the qwen-tts family
inputstringYesText to convert into speech, up to 512 tokens
voicestringYesVoice selection, see the supported voice list
speednumberNoSpeech speed, range 0.25-4.0, default 1.0
response_formatstringNoAudio format, currently supports wav

The API returns the audio file content directly, with the following response headers:

Content-Type: audio/wav
Content-Disposition: attachment; filename="audio.wav"

Audio format specifications:

  • Format: WAV (RIFF)
  • Encoding: 16-bit PCM
  • Channels: Mono
  • Sample Rate: 24000 Hz
{
"error": {
"message": "Error description",
"type": "invalid_request_error",
"code": "error_code"
}
}
Terminal window
curl -X POST "https://api.4allapi.com/v1/audio/speech" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-tts",
"input": "Hello, this is a speech synthesis test",
"voice": "Cherry"
}' \
--output audio.wav
Terminal window
curl -X POST "https://api.4allapi.com/v1/audio/speech" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-tts-latest",
"input": "Welcome to the Tongyi Qianwen speech synthesis service, I’m Jada!",
"voice": "Jada",
"speed": 1.2
}' \
--output audio_jada.wav
async function generateSpeech(text, voice = 'Cherry') {
const response = await fetch('/v1/audio/speech', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'qwen-tts-latest',
input: text,
voice: voice
})
});
if (response.ok) {
const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
// Play audio
const audio = new Audio(audioUrl);
audio.play();
return audioUrl;
} else {
const error = await response.json();
throw new Error(error.error.message);
}
}
// Usage example
generateSpeech('Hello, world!', 'Serena');
import requests
import io
def generate_speech(text, voice='Cherry', model='qwen-tts-latest'):
url = 'https://api.4allapi.com/v1/audio/speech'
headers = {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
}
data = {
'model': model,
'input': text,
'voice': voice
}
response = requests.post(url, headers=headers, json=data)
if response.status_code == 200:
return response.content
else:
raise Exception(f"API Error: {response.status_code} - {response.text}")
# Usage example
audio_content = generate_speech('Hello, this is a Python call example!', 'Ethan')
# Save the audio file
with open('output.wav', 'wb') as f:
f.write(audio_content)
const fs = require('fs');
const fetch = require('node-fetch');
async function generateSpeech(text, voice = 'Cherry') {
try {
const response = await fetch('https://api.4allapi.com/v1/audio/speech', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'qwen-tts-latest',
input: text,
voice: voice
})
});
if (response.ok) {
const buffer = await response.buffer();
fs.writeFileSync('audio.wav', buffer);
console.log('Audio file saved as audio.wav');
} else {
const error = await response.json();
console.error('API error:', error);
}
} catch (error) {
console.error('Request failed:', error);
}
}
// Usage example
generateSpeech('Welcome to Node.js speech synthesis!', 'Dylan');

For long text, you can use a streaming response to get a faster time to first byte:

Terminal window
curl -X POST "https://api.4allapi.com/v1/audio/speech" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen-tts",
"input": "This is a longer piece of text; using a streaming response can provide a better experience...",
"voice": "Chelsie"
}' \
--no-buffer
  • Maximum text length per request: 512 tokens
  • Supported languages: Chinese, English, and mixed Chinese-English text
  • Special characters are handled automatically
  • Rate limits: Based on your subscription plan
  • Concurrency limits: Multiple concurrent requests are supported by default
  • File size: The generated audio file size depends on text length
  • Maximum audio duration: About 5 minutes (depending on text length)
  • Audio quality: 24kHz, 16-bit PCM
  • Output format: WAV
Error CodeDescriptionSolution
invalid_api_keyInvalid API keyCheck the API key in the Authorization header
model_not_foundModel does not existMake sure you are using the correct qwen-tts model name
invalid_voiceUnsupported voiceCheck whether the voice parameter is in the supported list
text_too_longText too longReduce the input text to within 512 tokens
quota_exceededInsufficient quotaCheck your account balance or request rate limits
  1. Empty or corrupted audio file
  • Check whether the API key is valid
  • Make sure the channel configuration is correct
  • Verify the model name and voice parameter
  1. Request timeout
  • Check your network connection
  • Reduce the text length
  • Retry the request
  1. Voice does not take effect
  • Confirm that the model version being used supports the voice
  • Check the case of the voice parameter

The qwen-tts model family is billed by character count:

  • Billing unit: Calculated based on input character count
  • Billing method: Prepaid model, deducted from account balance
  • Price: See the pricing configuration in the admin console
  1. Punctuation: Using punctuation properly can improve speech rhythm
  2. Numbers: It is recommended to write numbers in Chinese form (for example: 123一百二十三)
  3. English words: In mixed Chinese-English text, English words will be pronounced according to Chinese speech rules
  1. Scenario matching: Choose the right voice based on the content type
  2. Consistency: For the same application, it is recommended to use a consistent voice
  3. Testing: Test different voices first before making a final choice
  1. Caching: Cache audio files for repeated text
  2. Chunking: For long text, process it in segments
  3. Concurrency: Control the number of concurrent requests appropriately
  • :check_mark_button: Added support for the qwen-tts model family
  • :check_mark_button: Supports 7 different voices
  • :check_mark_button: Compatible with OpenAI standard API format
  • :check_mark_button: Supports both streaming and non-streaming responses
  • :check_mark_button: Complete error handling mechanism
  • :check_mark_button: Multilingual SDK examples

If you run into any issues while using the service, please:

  1. Check the troubleshooting section in this document
  2. Review the error message in the API response
  3. Contact the technical support team

Note: This API is fully compatible with OpenAI’s /v1/audio/speech interface specification, so it can directly replace existing OpenAI TTS calls.


4All API · One-stop AI large model API aggregation platform | Pricing | Contact Us

© 2025 4All API. All rights reserved.