text-to-speech
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseElevenLabs Text-to-Speech
ElevenLabs 文本转语音
Generate natural speech from text - supports 74+ languages, multiple models for quality vs latency tradeoffs.
Setup: See Installation Guide. For JavaScript, usepackages only.@elevenlabs/*
从文本生成自然语音 - 支持74种以上语言,提供多种模型以平衡质量与延迟。
设置: 查看安装指南。对于JavaScript,仅使用包。@elevenlabs/*
Quick Start
快速开始
Python
Python
python
from elevenlabs.client import ElevenLabs
client = ElevenLabs()
audio = client.text_to_speech.convert(
text="Hello, welcome to ElevenLabs!",
voice_id="JBFqnCBsd6RMkjVDRZzb", # George
model_id="eleven_multilingual_v2"
)
with open("output.mp3", "wb") as f:
for chunk in audio:
f.write(chunk)python
from elevenlabs.client import ElevenLabs
client = ElevenLabs()
audio = client.text_to_speech.convert(
text="Hello, welcome to ElevenLabs!",
voice_id="JBFqnCBsd6RMkjVDRZzb", # George
model_id="eleven_multilingual_v2"
)
with open("output.mp3", "wb") as f:
for chunk in audio:
f.write(chunk)JavaScript
JavaScript
javascript
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createWriteStream } from "fs";
const client = new ElevenLabsClient();
const audio = await client.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
text: "Hello, welcome to ElevenLabs!",
modelId: "eleven_multilingual_v2",
});
audio.pipe(createWriteStream("output.mp3"));javascript
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createWriteStream } from "fs";
const client = new ElevenLabsClient();
const audio = await client.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
text: "Hello, welcome to ElevenLabs!",
modelId: "eleven_multilingual_v2",
});
audio.pipe(createWriteStream("output.mp3"));cURL
cURL
bash
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/JBFqnCBsd6RMkjVDRZzb" \
-H "xi-api-key: $ELEVENLABS_API_KEY" -H "Content-Type: application/json" \
-d '{"text": "Hello!", "model_id": "eleven_multilingual_v2"}' --output output.mp3bash
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/JBFqnCBsd6RMkjVDRZzb" \
-H "xi-api-key: $ELEVENLABS_API_KEY" -H "Content-Type: application/json" \
-d '{"text": "Hello!", "model_id": "eleven_multilingual_v2"}' --output output.mp3Models
模型
| Model ID | Languages | Latency | Best For |
|---|---|---|---|
| 74 | Standard | Highest quality, emotional range |
| 29 | Standard | High quality, most use cases |
| 32 | ~75ms | Ultra-low latency, real-time |
| English | ~75ms | English-only, fastest |
| 32 | Low | Balanced quality/speed |
| 模型ID | 语言数量 | 延迟 | 最佳适用场景 |
|---|---|---|---|
| 74 | 标准 | 最高质量,丰富情感表达 |
| 29 | 标准 | 高质量,多数使用场景 |
| 32 | ~75ms | 超低延迟,实时场景 |
| 英语 | ~75ms | 仅英语,最快速度 |
| 32 | 低 | 平衡质量与速度 |
Voice IDs
语音ID
Use pre-made voices or create custom voices in the dashboard.
Popular voices:
- - George (male, narrative)
JBFqnCBsd6RMkjVDRZzb - - Sarah (female, soft)
EXAVITQu4vr4xnSDxMaL - - Daniel (male, authoritative)
onwK4e9ZLuTAKqWW03F9 - - Charlotte (female, conversational)
XB0fDUnXU5powFXDhCwa
python
voices = client.voices.get_all()
for voice in voices.voices:
print(f"{voice.voice_id}: {voice.name}")使用预制语音或在控制台中创建自定义语音。
热门语音:
- - George(男性,叙事风格)
JBFqnCBsd6RMkjVDRZzb - - Sarah(女性,柔和风格)
EXAVITQu4vr4xnSDxMaL - - Daniel(男性,权威风格)
onwK4e9ZLuTAKqWW03F9 - - Charlotte(女性,对话风格)
XB0fDUnXU5powFXDhCwa
python
voices = client.voices.get_all()
for voice in voices.voices:
print(f"{voice.voice_id}: {voice.name}")Voice Settings
语音设置
Fine-tune how the voice sounds:
- Stability: How consistent the voice stays. Lower values = more emotional range and variation, but can sound unstable. Higher = steady, predictable delivery.
- Similarity boost: How closely to match the original voice sample. Higher values sound more like the original but may amplify audio artifacts.
- Style: Exaggerates the voice's unique style characteristics (only works with v2+ models).
- Speaker boost: Post-processing that enhances clarity and voice similarity.
python
from elevenlabs import VoiceSettings
audio = client.text_to_speech.convert(
text="Customize my voice settings.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
voice_settings=VoiceSettings(
stability=0.5,
similarity_boost=0.75,
style=0.5,
use_speaker_boost=True
)
)微调语音的发音效果:
- 稳定性(Stability):语音的一致性程度。值越低,情感范围和变化越丰富,但可能听起来不稳定。值越高,语音表现稳定、可预测。
- 相似度增强(Similarity boost):与原始语音样本的匹配程度。值越高,声音越接近原始样本,但可能放大音频瑕疵。
- 风格(Style):放大语音独特的风格特征(仅适用于v2及以上模型)。
- 说话人增强(Speaker boost):后处理效果,提升清晰度和语音相似度。
python
from elevenlabs import VoiceSettings
audio = client.text_to_speech.convert(
text="Customize my voice settings.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
voice_settings=VoiceSettings(
stability=0.5,
similarity_boost=0.75,
style=0.5,
use_speaker_boost=True
)
)Language Enforcement
语言强制
Force specific language for pronunciation:
python
audio = client.text_to_speech.convert(
text="Bonjour, comment allez-vous?",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
language_code="fr" # ISO 639-1 code
)强制使用特定语言进行发音:
python
audio = client.text_to_speech.convert(
text="Bonjour, comment allez-vous?",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
language_code="fr" # ISO 639-1 code
)Text Normalization
文本规范化
Controls how numbers, dates, and abbreviations are converted to spoken words. For example, "01/15/2026" becomes "January fifteenth, twenty twenty-six":
- (default): Model decides based on context
"auto" - : Always normalize (use when you want natural speech)
"on" - : Speak literally (use when you want "zero one slash one five...")
"off"
python
audio = client.text_to_speech.convert(
text="Call 1-800-555-0123 on 01/15/2026",
voice_id="JBFqnCBsd6RMkjVDRZzb",
apply_text_normalization="on"
)控制数字、日期和缩写转换为口语的方式。例如,"01/15/2026"会转换为“January fifteenth, twenty twenty-six”:
- (默认):模型根据上下文决定
"auto" - :始终规范化(需要自然语音时使用)
"on" - :按字面发音(需要“zero one slash one five...”时使用)
"off"
python
audio = client.text_to_speech.convert(
text="Call 1-800-555-0123 on 01/15/2026",
voice_id="JBFqnCBsd6RMkjVDRZzb",
apply_text_normalization="on"
)Request Stitching
请求拼接
When generating long audio in multiple requests, the audio can have pops, unnatural pauses, or tone shifts at the boundaries. Request stitching solves this by letting each request know what comes before/after it:
python
undefined当通过多个请求生成长音频时,音频在拼接处可能会出现爆音、不自然停顿或语调变化。请求拼接功能通过让每个请求了解前后文内容来解决此问题:
python
undefinedFirst request
First request
audio1 = client.text_to_speech.convert(
text="This is the first part.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
next_text="And this continues the story."
)
audio1 = client.text_to_speech.convert(
text="This is the first part.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
next_text="And this continues the story."
)
Second request using previous context
Second request using previous context
audio2 = client.text_to_speech.convert(
text="And this continues the story.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
previous_text="This is the first part."
)
undefinedaudio2 = client.text_to_speech.convert(
text="And this continues the story.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
previous_text="This is the first part."
)
undefinedOutput Formats
输出格式
| Format | Description |
|---|---|
| MP3 44.1kHz 128kbps (default) - compressed, good for web/apps |
| MP3 44.1kHz 192kbps (Creator+) - higher quality compressed |
| Raw uncompressed audio at 16kHz - use for real-time processing |
| Raw uncompressed audio at 22.05kHz |
| Raw uncompressed audio at 24kHz - good balance for streaming |
| Raw uncompressed audio at 44.1kHz (Pro+) - CD quality |
| μ-law compressed 8kHz - standard for phone systems (Twilio, telephony) |
| 格式 | 描述 |
|---|---|
| MP3 44.1kHz 128kbps(默认)- 压缩格式,适用于网页/应用 |
| MP3 44.1kHz 192kbps(Creator+)- 更高质量的压缩格式 |
| 16kHz原始未压缩音频 - 用于实时处理 |
| 22.05kHz原始未压缩音频 |
| 24kHz原始未压缩音频 - 流媒体场景的理想平衡 |
| 44.1kHz原始未压缩音频(Pro+)- CD级质量 |
| μ-law压缩8kHz - 电话系统标准(Twilio、电信场景) |
Streaming
流式传输
For real-time applications:
python
audio_stream = client.text_to_speech.convert(
text="This text will be streamed as audio.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_flash_v2_5" # Ultra-low latency
)
for chunk in audio_stream:
play_audio(chunk)See references/streaming.md for WebSocket streaming.
适用于实时应用:
python
audio_stream = client.text_to_speech.convert(
text="This text will be streamed as audio.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_flash_v2_5" # Ultra-low latency
)
for chunk in audio_stream:
play_audio(chunk)查看流式音频了解WebSocket流式传输。
Error Handling
错误处理
python
try:
audio = client.text_to_speech.convert(
text="Generate speech",
voice_id="invalid-voice-id"
)
except Exception as e:
print(f"API error: {e}")Common errors:
- 401: Invalid API key
- 422: Invalid parameters (check voice_id, model_id)
- 429: Rate limit exceeded
python
try:
audio = client.text_to_speech.convert(
text="Generate speech",
voice_id="invalid-voice-id"
)
except Exception as e:
print(f"API error: {e}")常见错误:
- 401: 无效API密钥
- 422: 参数无效(检查voice_id、model_id)
- 429: 超出请求限制
Tracking Costs
成本追踪
Monitor character usage via response headers (, ):
x-character-countrequest-idpython
response = client.text_to_speech.convert.with_raw_response(
text="Hello!", voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2"
)
audio = response.parse()
print(f"Characters used: {response.headers.get('x-character-count')}")通过响应头(, )监控字符使用量:
x-character-countrequest-idpython
response = client.text_to_speech.convert.with_raw_response(
text="Hello!", voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2"
)
audio = response.parse()
print(f"Characters used: {response.headers.get('x-character-count')}")References
参考资料
- Installation Guide
- Streaming Audio
- Voice Settings
- 安装指南
- 流式音频
- 语音设置