text-to-speech

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ElevenLabs Text-to-Speech

ElevenLabs 文本转语音

Generate natural speech from text - supports 74+ languages, multiple models for quality vs latency tradeoffs.
Setup: See Installation Guide. For JavaScript, use
@elevenlabs/*
packages only.
从文本生成自然语音 - 支持74种以上语言,提供多种模型以平衡质量与延迟。
设置: 查看安装指南。对于JavaScript,仅使用
@elevenlabs/*
包。

Quick Start

快速开始

Python

Python

python
from elevenlabs.client import ElevenLabs

client = ElevenLabs()

audio = client.text_to_speech.convert(
    text="Hello, welcome to ElevenLabs!",
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # George
    model_id="eleven_multilingual_v2"
)

with open("output.mp3", "wb") as f:
    for chunk in audio:
        f.write(chunk)
python
from elevenlabs.client import ElevenLabs

client = ElevenLabs()

audio = client.text_to_speech.convert(
    text="Hello, welcome to ElevenLabs!",
    voice_id="JBFqnCBsd6RMkjVDRZzb",  # George
    model_id="eleven_multilingual_v2"
)

with open("output.mp3", "wb") as f:
    for chunk in audio:
        f.write(chunk)

JavaScript

JavaScript

javascript
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createWriteStream } from "fs";

const client = new ElevenLabsClient();
const audio = await client.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
  text: "Hello, welcome to ElevenLabs!",
  modelId: "eleven_multilingual_v2",
});
audio.pipe(createWriteStream("output.mp3"));
javascript
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createWriteStream } from "fs";

const client = new ElevenLabsClient();
const audio = await client.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
  text: "Hello, welcome to ElevenLabs!",
  modelId: "eleven_multilingual_v2",
});
audio.pipe(createWriteStream("output.mp3"));

cURL

cURL

bash
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/JBFqnCBsd6RMkjVDRZzb" \
  -H "xi-api-key: $ELEVENLABS_API_KEY" -H "Content-Type: application/json" \
  -d '{"text": "Hello!", "model_id": "eleven_multilingual_v2"}' --output output.mp3
bash
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/JBFqnCBsd6RMkjVDRZzb" \
  -H "xi-api-key: $ELEVENLABS_API_KEY" -H "Content-Type: application/json" \
  -d '{"text": "Hello!", "model_id": "eleven_multilingual_v2"}' --output output.mp3

Models

模型

Model IDLanguagesLatencyBest For
eleven_v3
74StandardHighest quality, emotional range
eleven_multilingual_v2
29StandardHigh quality, most use cases
eleven_flash_v2_5
32~75msUltra-low latency, real-time
eleven_flash_v2
English~75msEnglish-only, fastest
eleven_turbo_v2_5
32LowBalanced quality/speed
模型ID语言数量延迟最佳适用场景
eleven_v3
74标准最高质量,丰富情感表达
eleven_multilingual_v2
29标准高质量,多数使用场景
eleven_flash_v2_5
32~75ms超低延迟,实时场景
eleven_flash_v2
英语~75ms仅英语,最快速度
eleven_turbo_v2_5
32平衡质量与速度

Voice IDs

语音ID

Use pre-made voices or create custom voices in the dashboard.
Popular voices:
  • JBFqnCBsd6RMkjVDRZzb
    - George (male, narrative)
  • EXAVITQu4vr4xnSDxMaL
    - Sarah (female, soft)
  • onwK4e9ZLuTAKqWW03F9
    - Daniel (male, authoritative)
  • XB0fDUnXU5powFXDhCwa
    - Charlotte (female, conversational)
python
voices = client.voices.get_all()
for voice in voices.voices:
    print(f"{voice.voice_id}: {voice.name}")
使用预制语音或在控制台中创建自定义语音。
热门语音:
  • JBFqnCBsd6RMkjVDRZzb
    - George(男性,叙事风格)
  • EXAVITQu4vr4xnSDxMaL
    - Sarah(女性,柔和风格)
  • onwK4e9ZLuTAKqWW03F9
    - Daniel(男性,权威风格)
  • XB0fDUnXU5powFXDhCwa
    - Charlotte(女性,对话风格)
python
voices = client.voices.get_all()
for voice in voices.voices:
    print(f"{voice.voice_id}: {voice.name}")

Voice Settings

语音设置

Fine-tune how the voice sounds:
  • Stability: How consistent the voice stays. Lower values = more emotional range and variation, but can sound unstable. Higher = steady, predictable delivery.
  • Similarity boost: How closely to match the original voice sample. Higher values sound more like the original but may amplify audio artifacts.
  • Style: Exaggerates the voice's unique style characteristics (only works with v2+ models).
  • Speaker boost: Post-processing that enhances clarity and voice similarity.
python
from elevenlabs import VoiceSettings

audio = client.text_to_speech.convert(
    text="Customize my voice settings.",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    voice_settings=VoiceSettings(
        stability=0.5,
        similarity_boost=0.75,
        style=0.5,
        use_speaker_boost=True
    )
)
微调语音的发音效果:
  • 稳定性(Stability):语音的一致性程度。值越低,情感范围和变化越丰富,但可能听起来不稳定。值越高,语音表现稳定、可预测。
  • 相似度增强(Similarity boost):与原始语音样本的匹配程度。值越高,声音越接近原始样本,但可能放大音频瑕疵。
  • 风格(Style):放大语音独特的风格特征(仅适用于v2及以上模型)。
  • 说话人增强(Speaker boost):后处理效果,提升清晰度和语音相似度。
python
from elevenlabs import VoiceSettings

audio = client.text_to_speech.convert(
    text="Customize my voice settings.",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    voice_settings=VoiceSettings(
        stability=0.5,
        similarity_boost=0.75,
        style=0.5,
        use_speaker_boost=True
    )
)

Language Enforcement

语言强制

Force specific language for pronunciation:
python
audio = client.text_to_speech.convert(
    text="Bonjour, comment allez-vous?",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_multilingual_v2",
    language_code="fr"  # ISO 639-1 code
)
强制使用特定语言进行发音:
python
audio = client.text_to_speech.convert(
    text="Bonjour, comment allez-vous?",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_multilingual_v2",
    language_code="fr"  # ISO 639-1 code
)

Text Normalization

文本规范化

Controls how numbers, dates, and abbreviations are converted to spoken words. For example, "01/15/2026" becomes "January fifteenth, twenty twenty-six":
  • "auto"
    (default): Model decides based on context
  • "on"
    : Always normalize (use when you want natural speech)
  • "off"
    : Speak literally (use when you want "zero one slash one five...")
python
audio = client.text_to_speech.convert(
    text="Call 1-800-555-0123 on 01/15/2026",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    apply_text_normalization="on"
)
控制数字、日期和缩写转换为口语的方式。例如,"01/15/2026"会转换为“January fifteenth, twenty twenty-six”:
  • "auto"
    (默认):模型根据上下文决定
  • "on"
    :始终规范化(需要自然语音时使用)
  • "off"
    :按字面发音(需要“zero one slash one five...”时使用)
python
audio = client.text_to_speech.convert(
    text="Call 1-800-555-0123 on 01/15/2026",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    apply_text_normalization="on"
)

Request Stitching

请求拼接

When generating long audio in multiple requests, the audio can have pops, unnatural pauses, or tone shifts at the boundaries. Request stitching solves this by letting each request know what comes before/after it:
python
undefined
当通过多个请求生成长音频时,音频在拼接处可能会出现爆音、不自然停顿或语调变化。请求拼接功能通过让每个请求了解前后文内容来解决此问题:
python
undefined

First request

First request

audio1 = client.text_to_speech.convert( text="This is the first part.", voice_id="JBFqnCBsd6RMkjVDRZzb", next_text="And this continues the story." )
audio1 = client.text_to_speech.convert( text="This is the first part.", voice_id="JBFqnCBsd6RMkjVDRZzb", next_text="And this continues the story." )

Second request using previous context

Second request using previous context

audio2 = client.text_to_speech.convert( text="And this continues the story.", voice_id="JBFqnCBsd6RMkjVDRZzb", previous_text="This is the first part." )
undefined
audio2 = client.text_to_speech.convert( text="And this continues the story.", voice_id="JBFqnCBsd6RMkjVDRZzb", previous_text="This is the first part." )
undefined

Output Formats

输出格式

FormatDescription
mp3_44100_128
MP3 44.1kHz 128kbps (default) - compressed, good for web/apps
mp3_44100_192
MP3 44.1kHz 192kbps (Creator+) - higher quality compressed
pcm_16000
Raw uncompressed audio at 16kHz - use for real-time processing
pcm_22050
Raw uncompressed audio at 22.05kHz
pcm_24000
Raw uncompressed audio at 24kHz - good balance for streaming
pcm_44100
Raw uncompressed audio at 44.1kHz (Pro+) - CD quality
ulaw_8000
μ-law compressed 8kHz - standard for phone systems (Twilio, telephony)
格式描述
mp3_44100_128
MP3 44.1kHz 128kbps(默认)- 压缩格式,适用于网页/应用
mp3_44100_192
MP3 44.1kHz 192kbps(Creator+)- 更高质量的压缩格式
pcm_16000
16kHz原始未压缩音频 - 用于实时处理
pcm_22050
22.05kHz原始未压缩音频
pcm_24000
24kHz原始未压缩音频 - 流媒体场景的理想平衡
pcm_44100
44.1kHz原始未压缩音频(Pro+)- CD级质量
ulaw_8000
μ-law压缩8kHz - 电话系统标准(Twilio、电信场景)

Streaming

流式传输

For real-time applications:
python
audio_stream = client.text_to_speech.convert(
    text="This text will be streamed as audio.",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_flash_v2_5"  # Ultra-low latency
)

for chunk in audio_stream:
    play_audio(chunk)
See references/streaming.md for WebSocket streaming.
适用于实时应用:
python
audio_stream = client.text_to_speech.convert(
    text="This text will be streamed as audio.",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_flash_v2_5"  # Ultra-low latency
)

for chunk in audio_stream:
    play_audio(chunk)
查看流式音频了解WebSocket流式传输。

Error Handling

错误处理

python
try:
    audio = client.text_to_speech.convert(
        text="Generate speech",
        voice_id="invalid-voice-id"
    )
except Exception as e:
    print(f"API error: {e}")
Common errors:
  • 401: Invalid API key
  • 422: Invalid parameters (check voice_id, model_id)
  • 429: Rate limit exceeded
python
try:
    audio = client.text_to_speech.convert(
        text="Generate speech",
        voice_id="invalid-voice-id"
    )
except Exception as e:
    print(f"API error: {e}")
常见错误:
  • 401: 无效API密钥
  • 422: 参数无效(检查voice_id、model_id)
  • 429: 超出请求限制

Tracking Costs

成本追踪

Monitor character usage via response headers (
x-character-count
,
request-id
):
python
response = client.text_to_speech.convert.with_raw_response(
    text="Hello!", voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2"
)
audio = response.parse()
print(f"Characters used: {response.headers.get('x-character-count')}")
通过响应头(
x-character-count
,
request-id
)监控字符使用量:
python
response = client.text_to_speech.convert.with_raw_response(
    text="Hello!", voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2"
)
audio = response.parse()
print(f"Characters used: {response.headers.get('x-character-count')}")

References

参考资料

  • Installation Guide
  • Streaming Audio
  • Voice Settings
  • 安装指南
  • 流式音频
  • 语音设置