OpenAI】TTS Text-to-Speech Python Script

[OpenAI] Python Script for TTS Text-to-Speech

This page overview

Below is a fairly comprehensive and practical example of an OpenAI text-to-speech (TTS) Python script.

1. Feature Overview

Purpose: instantly convert input text into natural speech for voiceovers, audio announcements, speaking in conversational products, and more.
Features: low latency, multilingual support, multiple preset voices, multiple audio output formats, and streaming playback.
Typical latency: short sentences can usually produce playable audio within a few hundred milliseconds to about 1–2 seconds; this depends on network conditions, the model, and the format.

2. Available Models and Differences

gpt-4o-mini-tts (recommended default): fast, cost-effective, and suitable for most real-time and batch scenarios.
tts-1: an earlier general-purpose TTS model with balanced quality and speed.
tts-1-hd: a higher-quality version, suitable for long-form narration where audio fidelity matters more and latency is less critical. Tip: if you want the lowest latency and cost, prioritize gpt-4o-mini-tts; if you want the best possible audio quality, try tts-1-hd. The older tts-1/tts-1-hd models are still available, but official guidance generally recommends using gpt-4o-mini-tts for new projects first.

3. Obtaining and Securely Using an API Key (Two Connectivity Options)

This copy is meant to guide users toward choosing “Option B” while remaining objective and persuasive. We can optimize it from the perspective of title, structure, wording, and user psychology.

How do you get an OpenAI tts-1 API KEY? These two methods are enough

Option A: Official channel Features: cumbersome process, special network environment requirements, and beginners are likely to run into obstacles during registration and use. Suitable for: advanced users who are familiar with overseas service registration processes and have good network conditions.
Option B: Domestic acceleration (for convenient developer access) Features: by using a professional relay service (such as 4All API.com), the connection is stable, fast, and easy to activate, so you can get started immediately. Suitable for: all developers who want stable and efficient access and hope to get started quickly; it is also the choice of many advanced users.

How to Call It (Overview)

Step 1: Use a .env file to securely manage API keys

Professional development best practices strictly prohibit writing sensitive information such as keys and passwords directly into code. The best practice is to manage them using environment variables. A dotenv file is the most popular approach for local development.

Install the python-dotenv library

pip install python-dotenv

Create a .env file Create a file named .env in the root directory of your project, at the same level as your Python script. Define your key in it. We use the clear variable name 4All API_API_KEY.

Contents of the .env file:

# This is the environment variable file, used to store sensitive information
# 4All API_API_KEY, enter the KEY you got from 4All API.com or the official OpenAI service
4All API_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Step 2: Parameterization and model selection

A good script should be flexible. TTS APIs typically provide multiple voice models to choose from. We can make the input text and voice model parameters to make calling easier.

Using OpenAI’s TTS models as an example, it provides 6 high-quality voices:

alloy (balanced male voice)
echo (warm male voice)
fable (steady male voice)
onyx (deep male voice)
nova (lively female voice)
shimmer (professional female voice)

We can switch between them easily in code:

# Text to convert
input_text = "你好，世界！提笔写下这句简单的问候，我带着好奇、敬意与希望。"

# Select a voice
selected_voice = "nova"

# Build the request payload
data = {
    "model": "tts-1",
    "input": input_text,
    "voice": selected_voice
}

Final Version: A Professional-Grade TTS API Calling Script

By combining all of the best practices above, we get the final Python script. It is secure, robust, flexible, and easy to maintain.

import os
import requests
from dotenv import load_dotenv

def generate_speech(text: str, voice: str = "alloy", output_filename: str = "speech.mp3"):
    """
    Call the text-to-speech API to generate an audio file.

    Args:
        text (str): The text to convert to speech.
        voice (str): The name of the voice model to use.
        output_filename (str): The output audio filename.
    """
    # --- 1. Load and validate configuration ---
    load_dotenv()
    api_key = os.environ.get("4All API_API_KEY")
    if not api_key:
        raise ValueError("API key not found. Please make sure '4All API_API_KEY' is set correctly in the .env file")

    url = "https://sg.4All API.com/v1/audio/speech"

    # --- 2. Prepare the request payload ---
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
    }
    data = {
        "model": "tts-1-hd",
        "input": text,
        "voice": voice
    }

    # --- 3. Send the request and handle the response ---
    try:
        print(f"Generating speech using voice '{voice}'...")
        # Use stream=True for streamed download
        response = requests.post(url, headers=headers, json=data, stream=True)

        # Check the HTTP status code; if it is not 2xx, raise an exception
        response.raise_for_status()

        print(f"Request successful, writing audio to file: {output_filename}")

        # Write the file in binary chunks, suitable for large files
        with open(output_filename, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)

        print(f"Audio file saved successfully!")

    except requests.exceptions.HTTPError as e:
        # Catch and print more detailed HTTP error information (such as 401, 404, 500, etc.)
        print(f"Request failed, HTTP error: {e}")
        print(f"Response content: {response.text}")
    except requests.exceptions.RequestException as e:
        # Catch network or connection errors
        print(f"Request failed, network or connection error: {e}")
    except Exception as e:
        # Catch any other unknown errors
        print(f"An unknown error occurred: {e}")

if __name__ == '__main__':
    # --- Usage example ---
    long_text = "你好，世界！提笔写下这句简单的问候，我带着好奇、敬意与希望：无论经纬如何交错，我们共享同一片天空与明月。我愿倾听你每个角落的故事，珍视差异，守护脆弱的美好。"

    # Use the lively female voice 'nova'
    generate_speech(long_text, voice="nova", output_filename="speech_nova.mp3")

    # Use the deep male voice 'onyx'
    generate_speech("欢迎体验我们的文本转语音服务。", voice="onyx", output_filename="speech_onyx.mp3")

Jiezhitong (jieagi) process summary:

1. Create a folder, for example: openaitts
1. In the openaitts folder directory, create a .env file to store the key.
1. In the openaitts folder directory, create a Python script file, for example: openai-tts.py, and put the Python script into the file you created.

After completing the steps, run your script file.

Starting from a simple curl command, by introducing the requests library, protecting keys with a .env file, parameterizing API calls, and building robust error handling, we ultimately completed a professional-grade Python script.

This process reflects the software engineering mindset of moving from “works” to “works well.” Based on this script, you can further encapsulate it into a class, build a command-line tool (CLI), or even integrate it into a large web application, giving your project powerful speech capabilities.