text-to-speech

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ElevenLabs Text-to-Speech

ElevenLabs 文本转语音

Generate natural speech from text - supports 74+ languages, multiple models for quality vs latency tradeoffs.

Setup: See Installation Guide. For JavaScript, use
@elevenlabs/*
packages only.

从文本生成自然语音 - 支持74种以上语言，提供多种模型以平衡质量与延迟。

设置： 查看安装指南。对于JavaScript，仅使用
@elevenlabs/*
包。

Quick Start

快速开始

Python

python

from elevenlabs.client import ElevenLabs

client = ElevenLabs()

audio = client.text_to_speech.convert(
    text="Hello, welcome to ElevenLabs!",
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # George
    model_id="eleven_multilingual_v2"
)

with open("output.mp3", "wb") as f:
    for chunk in audio:
        f.write(chunk)

python

from elevenlabs.client import ElevenLabs

client = ElevenLabs()

audio = client.text_to_speech.convert(
    text="Hello, welcome to ElevenLabs!",
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # George
    model_id="eleven_multilingual_v2"
)

with open("output.mp3", "wb") as f:
    for chunk in audio:
        f.write(chunk)

JavaScript

javascript

import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createWriteStream } from "fs";

const client = new ElevenLabsClient();
const audio = await client.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
  text: "Hello, welcome to ElevenLabs!",
  modelId: "eleven_multilingual_v2",
});
audio.pipe(createWriteStream("output.mp3"));

javascript

import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createWriteStream } from "fs";

const client = new ElevenLabsClient();
const audio = await client.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
  text: "Hello, welcome to ElevenLabs!",
  modelId: "eleven_multilingual_v2",
});
audio.pipe(createWriteStream("output.mp3"));

cURL

bash

curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/JBFqnCBsd6RMkjVDRZzb" \
  -H "xi-api-key: $ELEVENLABS_API_KEY" -H "Content-Type: application/json" \
  -d '{"text": "Hello!", "model_id": "eleven_multilingual_v2"}' --output output.mp3

bash

curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/JBFqnCBsd6RMkjVDRZzb" \
  -H "xi-api-key: $ELEVENLABS_API_KEY" -H "Content-Type: application/json" \
  -d '{"text": "Hello!", "model_id": "eleven_multilingual_v2"}' --output output.mp3

Models

模型

Model ID	Languages	Latency	Best For
`eleven_v3`	74	Standard	Highest quality, emotional range
`eleven_multilingual_v2`	29	Standard	High quality, most use cases
`eleven_flash_v2_5`	32	~75ms	Ultra-low latency, real-time
`eleven_flash_v2`	English	~75ms	English-only, fastest
`eleven_turbo_v2_5`	32	Low	Balanced quality/speed

模型ID	语言数量	延迟	最佳适用场景
`eleven_v3`	74	标准	最高质量，丰富情感表达
`eleven_multilingual_v2`	29	标准	高质量，多数使用场景
`eleven_flash_v2_5`	32	~75ms	超低延迟，实时场景
`eleven_flash_v2`	英语	~75ms	仅英语，最快速度
`eleven_turbo_v2_5`	32	低	平衡质量与速度

Voice IDs

语音ID

Use pre-made voices or create custom voices in the dashboard.

Popular voices:

```
JBFqnCBsd6RMkjVDRZzb
```
- George (male, narrative)
```
EXAVITQu4vr4xnSDxMaL
```
- Sarah (female, soft)
```
onwK4e9ZLuTAKqWW03F9
```
- Daniel (male, authoritative)
```
XB0fDUnXU5powFXDhCwa
```
- Charlotte (female, conversational)

python

voices = client.voices.get_all()
for voice in voices.voices:
    print(f"{voice.voice_id}: {voice.name}")

使用预制语音或在控制台中创建自定义语音。

热门语音：

```
JBFqnCBsd6RMkjVDRZzb
```
- George（男性，叙事风格）
```
EXAVITQu4vr4xnSDxMaL
```
- Sarah（女性，柔和风格）
```
onwK4e9ZLuTAKqWW03F9
```
- Daniel（男性，权威风格）
```
XB0fDUnXU5powFXDhCwa
```
- Charlotte（女性，对话风格）

python

voices = client.voices.get_all()
for voice in voices.voices:
    print(f"{voice.voice_id}: {voice.name}")

Voice Settings

语音设置

Fine-tune how the voice sounds:

Stability: How consistent the voice stays. Lower values = more emotional range and variation, but can sound unstable. Higher = steady, predictable delivery.
Similarity boost: How closely to match the original voice sample. Higher values sound more like the original but may amplify audio artifacts.
Style: Exaggerates the voice's unique style characteristics (only works with v2+ models).
Speaker boost: Post-processing that enhances clarity and voice similarity.

python

from elevenlabs import VoiceSettings

audio = client.text_to_speech.convert(
    text="Customize my voice settings.",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    voice_settings=VoiceSettings(
        stability=0.5,
        similarity_boost=0.75,
        style=0.5,
        use_speaker_boost=True
    )
)

微调语音的发音效果：

稳定性（Stability）：语音的一致性程度。值越低，情感范围和变化越丰富，但可能听起来不稳定。值越高，语音表现稳定、可预测。
相似度增强（Similarity boost）：与原始语音样本的匹配程度。值越高，声音越接近原始样本，但可能放大音频瑕疵。
风格（Style）：放大语音独特的风格特征（仅适用于v2及以上模型）。
说话人增强（Speaker boost）：后处理效果，提升清晰度和语音相似度。

python

from elevenlabs import VoiceSettings

audio = client.text_to_speech.convert(
    text="Customize my voice settings.",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    voice_settings=VoiceSettings(
        stability=0.5,
        similarity_boost=0.75,
        style=0.5,
        use_speaker_boost=True
    )
)

Language Enforcement

语言强制

Force specific language for pronunciation:

python

audio = client.text_to_speech.convert(
    text="Bonjour, comment allez-vous?",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_multilingual_v2",
    language_code="fr"  # ISO 639-1 code
)

强制使用特定语言进行发音：

python

audio = client.text_to_speech.convert(
    text="Bonjour, comment allez-vous?",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_multilingual_v2",
    language_code="fr"  # ISO 639-1 code
)

Text Normalization

文本规范化

Controls how numbers, dates, and abbreviations are converted to spoken words. For example, "01/15/2026" becomes "January fifteenth, twenty twenty-six":

```
"auto"
```
(default): Model decides based on context
```
"on"
```
: Always normalize (use when you want natural speech)
```
"off"
```
: Speak literally (use when you want "zero one slash one five...")

python

audio = client.text_to_speech.convert(
    text="Call 1-800-555-0123 on 01/15/2026",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    apply_text_normalization="on"
)

控制数字、日期和缩写转换为口语的方式。例如，"01/15/2026"会转换为“January fifteenth, twenty twenty-six”：

```
"auto"
```
（默认）：模型根据上下文决定
```
"on"
```
：始终规范化（需要自然语音时使用）
```
"off"
```
：按字面发音（需要“zero one slash one five...”时使用）

python

audio = client.text_to_speech.convert(
    text="Call 1-800-555-0123 on 01/15/2026",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    apply_text_normalization="on"
)

Request Stitching

请求拼接

When generating long audio in multiple requests, the audio can have pops, unnatural pauses, or tone shifts at the boundaries. Request stitching solves this by letting each request know what comes before/after it:

python

undefined

当通过多个请求生成长音频时，音频在拼接处可能会出现爆音、不自然停顿或语调变化。请求拼接功能通过让每个请求了解前后文内容来解决此问题：

python

undefined

First request

audio1 = client.text_to_speech.convert( text="This is the first part.", voice_id="JBFqnCBsd6RMkjVDRZzb", next_text="And this continues the story." )

Second request using previous context

audio2 = client.text_to_speech.convert( text="And this continues the story.", voice_id="JBFqnCBsd6RMkjVDRZzb", previous_text="This is the first part." )

undefined

audio2 = client.text_to_speech.convert( text="And this continues the story.", voice_id="JBFqnCBsd6RMkjVDRZzb", previous_text="This is the first part." )

undefined

Output Formats

输出格式

Format	Description
`mp3_44100_128`	MP3 44.1kHz 128kbps (default) - compressed, good for web/apps
`mp3_44100_192`	MP3 44.1kHz 192kbps (Creator+) - higher quality compressed
`pcm_16000`	Raw uncompressed audio at 16kHz - use for real-time processing
`pcm_22050`	Raw uncompressed audio at 22.05kHz
`pcm_24000`	Raw uncompressed audio at 24kHz - good balance for streaming
`pcm_44100`	Raw uncompressed audio at 44.1kHz (Pro+) - CD quality
`ulaw_8000`	μ-law compressed 8kHz - standard for phone systems (Twilio, telephony)

格式	描述
`mp3_44100_128`	MP3 44.1kHz 128kbps（默认）- 压缩格式，适用于网页/应用
`mp3_44100_192`	MP3 44.1kHz 192kbps（Creator+）- 更高质量的压缩格式
`pcm_16000`	16kHz原始未压缩音频 - 用于实时处理
`pcm_22050`	22.05kHz原始未压缩音频
`pcm_24000`	24kHz原始未压缩音频 - 流媒体场景的理想平衡
`pcm_44100`	44.1kHz原始未压缩音频（Pro+）- CD级质量
`ulaw_8000`	μ-law压缩8kHz - 电话系统标准（Twilio、电信场景）

Streaming

流式传输

For real-time applications:

python

audio_stream = client.text_to_speech.convert(
    text="This text will be streamed as audio.",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_flash_v2_5"  # Ultra-low latency
)

for chunk in audio_stream:
    play_audio(chunk)

See references/streaming.md for WebSocket streaming.

适用于实时应用：

python

audio_stream = client.text_to_speech.convert(
    text="This text will be streamed as audio.",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_flash_v2_5"  # Ultra-low latency
)

for chunk in audio_stream:
    play_audio(chunk)

查看流式音频了解WebSocket流式传输。

Error Handling

错误处理

python

try:
    audio = client.text_to_speech.convert(
        text="Generate speech",
        voice_id="invalid-voice-id"
    )
except Exception as e:
    print(f"API error: {e}")

Common errors:

401: Invalid API key
422: Invalid parameters (check voice_id, model_id)
429: Rate limit exceeded

python

try:
    audio = client.text_to_speech.convert(
        text="Generate speech",
        voice_id="invalid-voice-id"
    )
except Exception as e:
    print(f"API error: {e}")

常见错误：

401: 无效API密钥
422: 参数无效（检查voice_id、model_id）
429: 超出请求限制

Tracking Costs

成本追踪

Monitor character usage via response headers (

x-character-count

request-id

python

response = client.text_to_speech.convert.with_raw_response(
    text="Hello!", voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2"
)
audio = response.parse()
print(f"Characters used: {response.headers.get('x-character-count')}")

通过响应头(

x-character-count

request-id

)监控字符使用量：

python

response = client.text_to_speech.convert.with_raw_response(
    text="Hello!", voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2"
)
audio = response.parse()
print(f"Characters used: {response.headers.get('x-character-count')}")

References

参考资料

Installation Guide
Streaming Audio
Voice Settings

安装指南
流式音频
语音设置