voice-changer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseElevenLabs Voice Changer
ElevenLabs Voice Changer
Transform the voice in an audio recording into a different target voice. Voice Changer (previously called Speech-to-Speech — the API endpoint and SDK methods still use the / name) keeps the original performance — emotion, pacing, intonation, breaths, whispers, laughs, cries — and only swaps who is speaking.
speech_to_speechspeechToSpeechSetup: See Installation Guide. For JavaScript, usepackages only.@elevenlabs/*
将音频录音中的语音转换为不同的目标语音。Voice Changer(以前称为Speech-to-Speech——API端点和SDK方法仍使用 / 名称)保留原始表演的所有细节——情感、节奏、语调、呼吸、低语、笑声、哭声——仅替换说话人。
speech_to_speechspeechToSpeech设置: 查看安装指南。对于JavaScript,仅使用包。@elevenlabs/*
Key Facts
关键信息
- Maximum input length: 5 minutes per request — split longer recordings into chunks and stitch the outputs.
- Maximum file size: 50 MB per request — compress to MP3 if your source is larger.
- Pricing: 1,000 characters per minute of audio processed (duration-based, not text-based).
- Recommended model: — often outperforms
eleven_multilingual_sts_v2even for English-only content.eleven_english_sts_v2
- 最大输入时长: 每次请求5分钟——较长的录音需分割为片段后再拼接输出。
- 最大文件大小: 每次请求50 MB——如果源文件更大,请压缩为MP3格式。
- 定价: 每处理1分钟音频计费1000字符(基于时长,而非文本)。
- 推荐模型: ——即使仅处理英文内容,通常也优于
eleven_multilingual_sts_v2。eleven_english_sts_v2
Quick Start
快速开始
Python
Python
python
from elevenlabs import ElevenLabs
client = ElevenLabs()
with open("source.mp3", "rb") as audio_file:
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb", # George
audio=audio_file,
model_id="eleven_multilingual_sts_v2",
output_format="mp3_44100_128",
)
with open("converted.mp3", "wb") as f:
for chunk in audio_stream:
f.write(chunk)python
from elevenlabs import ElevenLabs
client = ElevenLabs()
with open("source.mp3", "rb") as audio_file:
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb", # George
audio=audio_file,
model_id="eleven_multilingual_sts_v2",
output_format="mp3_44100_128",
)
with open("converted.mp3", "wb") as f:
for chunk in audio_stream:
f.write(chunk)JavaScript
JavaScript
javascript
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createReadStream, createWriteStream } from "fs";
const client = new ElevenLabsClient();
const audioStream = await client.speechToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
audio: createReadStream("source.mp3"),
modelId: "eleven_multilingual_sts_v2",
outputFormat: "mp3_44100_128",
});
audioStream.pipe(createWriteStream("converted.mp3"));javascript
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createReadStream, createWriteStream } from "fs";
const client = new ElevenLabsClient();
const audioStream = await client.speechToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", {
audio: createReadStream("source.mp3"),
modelId: "eleven_multilingual_sts_v2",
outputFormat: "mp3_44100_128",
});
audioStream.pipe(createWriteStream("converted.mp3"));cURL
cURL
bash
curl -X POST "https://api.elevenlabs.io/v1/speech-to-speech/JBFqnCBsd6RMkjVDRZzb?output_format=mp3_44100_128" \
-H "xi-api-key: $ELEVENLABS_API_KEY" \
-F "audio=@source.mp3" \
-F "model_id=eleven_multilingual_sts_v2" \
--output converted.mp3bash
curl -X POST "https://api.elevenlabs.io/v1/speech-to-speech/JBFqnCBsd6RMkjVDRZzb?output_format=mp3_44100_128" \
-H "xi-api-key: $ELEVENLABS_API_KEY" \
-F "audio=@source.mp3" \
-F "model_id=eleven_multilingual_sts_v2" \
--output converted.mp3Parameters
参数
| Parameter | Type | Default | Description |
|---|---|---|---|
| string (required) | — | Target voice to speak in. Use a pre-made voice ID, a cloned voice, or a voice from the library |
| file (required) | — | Source audio whose performance (emotion, timing, delivery) will be preserved |
| string | | |
| string | | See output formats table below |
| JSON string | — | Override stored voice settings for this request only |
| integer | — | Best-effort deterministic sampling (0 – 4294967295) |
| boolean | | Run the isolation model on the input before conversion |
| string | | |
| int (query) | — | 0–4. Trade quality for latency. |
| boolean (query) | | Set to |
| 参数 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| string(必填) | — | 目标语音ID。可使用预制语音ID、克隆语音或语音库中的语音 |
| 文件(必填) | — | 源音频,其表演细节(情感、时长、表达方式)将被保留 |
| string | | |
| string | | 见下方输出格式表 |
| JSON字符串 | — | 仅在此请求中覆盖存储的语音设置 |
| 整数 | — | 尽力实现确定性采样(范围0 – 4294967295) |
| 布尔值 | | 转换前对输入音频运行隔离模型 |
| string | | |
| 整数(查询参数) | — | 0–4。以质量为代价换取延迟。 |
| 布尔值(查询参数) | | 设置为 |
Models
模型
| Model ID | Languages | Best For |
|---|---|---|
| 29 | Recommended for everything — often outperforms the English model even on English audio |
| English | API default — English-only fallback |
Only models whose property is true can be used here. Voice Changer does not currently have a low-latency "flash/turbo" tier — if you need one, keep input, an / low-bitrate output, and raise .
can_do_voice_conversionpcm_s16le_16opus_*mp3_*optimize_streaming_latency| 模型ID | 语言 | 最佳适用场景 |
|---|---|---|
| 29种 | 推荐用于所有场景——即使处理英文音频,通常也优于英文专用模型 |
| 英文 | API默认值——仅英文场景的备选方案 |
仅属性为true的模型可在此使用。Voice Changer目前没有低延迟的“flash/turbo”版本——如果需要低延迟,请使用输入、/低比特率输出,并提高值。
can_do_voice_conversionpcm_s16le_16opus_*mp3_*optimize_streaming_latencyLanguages (eleven_multilingual_sts_v2
)
eleven_multilingual_sts_v2支持语言(eleven_multilingual_sts_v2
)
eleven_multilingual_sts_v2English (US, UK, AU, CA), Japanese, Chinese, German, Hindi, French (FR, CA), Korean, Portuguese (BR, PT), Italian, Spanish (ES, MX), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (SA, AE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Russian.
英语(美、英、澳、加)、日语、中文、德语、印地语、法语(法、加)、韩语、葡萄牙语(巴、葡)、意大利语、西班牙语(西、墨)、印尼语、荷兰语、土耳其语、菲律宾语、波兰语、瑞典语、保加利亚语、罗马尼亚语、阿拉伯语(沙、阿)、捷克语、希腊语、芬兰语、克罗地亚语、马来语、斯洛伐克语、丹麦语、泰米尔语、乌克兰语、俄语。
Target Voices
目标语音
Use any voice ID from pre-made voices, your cloned voices, or the voice library.
Popular voices:
- — George (male, narrative)
JBFqnCBsd6RMkjVDRZzb - — Sarah (female, soft)
EXAVITQu4vr4xnSDxMaL - — Daniel (male, authoritative)
onwK4e9ZLuTAKqWW03F9 - — Charlotte (female, conversational)
XB0fDUnXU5powFXDhCwa
python
voices = client.voices.get_all()
for voice in voices.voices:
print(f"{voice.voice_id}: {voice.name}")可使用预制语音、克隆语音或语音库中的任意语音ID。
热门语音:
- — George(男性,旁白风格)
JBFqnCBsd6RMkjVDRZzb - — Sarah(女性,柔和风格)
EXAVITQu4vr4xnSDxMaL - — Daniel(男性,权威风格)
onwK4e9ZLuTAKqWW03F9 - — Charlotte(女性,对话风格)
XB0fDUnXU5powFXDhCwa
python
voices = client.voices.get_all()
for voice in voices.voices:
print(f"{voice.voice_id}: {voice.name}")Converting from a URL
从URL转换
python
import requests
from io import BytesIO
from elevenlabs import ElevenLabs
client = ElevenLabs()
audio_url = "https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
response = requests.get(audio_url)
audio_data = BytesIO(response.content)
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
audio=audio_data,
model_id="eleven_multilingual_sts_v2",
output_format="mp3_44100_128",
)
with open("converted.mp3", "wb") as f:
for chunk in audio_stream:
f.write(chunk)python
import requests
from io import BytesIO
from elevenlabs import ElevenLabs
client = ElevenLabs()
audio_url = "https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
response = requests.get(audio_url)
audio_data = BytesIO(response.content)
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
audio=audio_data,
model_id="eleven_multilingual_sts_v2",
output_format="mp3_44100_128",
)
with open("converted.mp3", "wb") as f:
for chunk in audio_stream:
f.write(chunk)Voice Settings Override
语音设置覆盖
Fine-tune the target voice for a single request without changing its stored defaults:
python
from elevenlabs import VoiceSettings
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
audio=audio_file,
model_id="eleven_multilingual_sts_v2",
voice_settings=VoiceSettings(
stability=0.5,
similarity_boost=0.75,
style=0.0,
use_speaker_boost=True,
),
)- Stability: lower = more emotional range (follows the source more freely), higher = steadier delivery.
- Similarity boost: higher = closer to the target voice's timbre, may amplify source artifacts.
- Style: exaggerates the target voice's unique characteristics (v2+ models).
- Speaker boost: post-processing to sharpen clarity of the target voice.
在单次请求中微调目标语音,而不更改其存储的默认设置:
python
from elevenlabs import VoiceSettings
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
audio=audio_file,
model_id="eleven_multilingual_sts_v2",
voice_settings=VoiceSettings(
stability=0.5,
similarity_boost=0.75,
style=0.0,
use_speaker_boost=True,
),
)- Stability(稳定性): 值越低,情感范围越广(更自由地跟随源音频);值越高,表达方式越稳定。
- Similarity boost(相似度增强): 值越高,越接近目标语音的音色,但可能放大源音频中的瑕疵。
- Style(风格): 强化目标语音的独特特征(仅v2及以上模型支持)。
- Speaker boost(说话人增强): 后处理以提升目标语音的清晰度。
Cleaning Up Noisy Source Audio
清理含噪源音频
If the input recording is noisy, either pre-process with the voice-isolator skill or pass to do it in a single call:
remove_background_noise=Truepython
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
audio=audio_file,
model_id="eleven_multilingual_sts_v2",
remove_background_noise=True,
)Cleaner input almost always produces better conversion — the model is trying to match phonemes and prosody, and background noise gets in the way.
如果输入录音有噪音,可先使用语音隔离工具预处理,或传入在单次调用中完成:
remove_background_noise=Truepython
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
audio=audio_file,
model_id="eleven_multilingual_sts_v2",
remove_background_noise=True,
)更干净的输入几乎总能产生更好的转换效果——模型尝试匹配音素和韵律,背景噪音会造成干扰。
Low-Latency PCM Input
低延迟PCM输入
If you already have raw 16-bit PCM mono @ 16kHz, passing skips decoding and reduces latency:
file_format="pcm_s16le_16"python
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
audio=pcm_bytes,
model_id="eleven_multilingual_sts_v2",
file_format="pcm_s16le_16",
)Pair this with (0–4) as a query param for further latency reductions at some quality cost.
optimize_streaming_latency如果您已有原始16kHz单声道16位PCM音频,传入可跳过解码并降低延迟:
file_format="pcm_s16le_16"python
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
audio=pcm_bytes,
model_id="eleven_multilingual_sts_v2",
file_format="pcm_s16le_16",
)搭配(0–4)作为查询参数,可进一步降低延迟,但会牺牲部分质量。
optimize_streaming_latencyOutput Formats
输出格式
| Format | Description |
|---|---|
| MP3 44.1kHz 128kbps (default) — good for web/apps |
| MP3 44.1kHz 192kbps (Creator+) — higher quality |
| MP3 44.1kHz 64kbps — smaller files |
| MP3 22.05kHz 32kbps — smallest MP3 |
| Raw PCM 16kHz — real-time pipelines |
| Raw PCM 24kHz — good streaming balance |
| Raw PCM 44.1kHz (Pro+) — CD quality |
| Raw PCM 48kHz (Pro+) — highest quality |
| μ-law 8kHz — Twilio / telephony |
| A-law 8kHz — telephony |
| Opus 48kHz 64kbps — efficient streaming |
| 格式 | 描述 |
|---|---|
| MP3 44.1kHz 128kbps(默认)——适合网页/应用 |
| MP3 44.1kHz 192kbps(Creator+版)——更高质量 |
| MP3 44.1kHz 64kbps——文件更小 |
| MP3 22.05kHz 32kbps——最小的MP3格式 |
| 原始PCM 16kHz——实时流水线 |
| 原始PCM 24kHz——流媒体平衡佳 |
| 原始PCM 44.1kHz(Pro+版)——CD音质 |
| 原始PCM 48kHz(Pro+版)——最高质量 |
| μ-law 8kHz——Twilio/电话系统 |
| A-law 8kHz——电话系统 |
| Opus 48kHz 64kbps——高效流媒体 |
Deterministic Output
确定性输出
Pass a to make repeated conversions of the same input return (best-effort) identical audio — useful for testing and A/B comparisons.
seedpython
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
audio=audio_file,
model_id="eleven_multilingual_sts_v2",
seed=12345,
)传入参数,可使相同输入的多次转换返回(尽力实现)相同的音频——适用于测试和A/B对比。
seedpython
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
audio=audio_file,
model_id="eleven_multilingual_sts_v2",
seed=12345,
)Input Audio Best Practices
输入音频最佳实践
The conversion quality is bounded by the input recording — the model can only swap the timbre, not rescue a bad source. A few practical rules:
- Be expressive. Whisper, shout, laugh, cry — the model preserves all of it. Flat input gives you flat output.
- Watch microphone gain. Too quiet and the model under-detects phonemes; too loud and clipping bleeds into the conversion. Aim for healthy peaks, no clipping.
- Accent and cadence transfer from the source, not the target. If you read in an American accent and target the British "George" voice, you get George's timbre with an American accent. To dub into a different accent or language, record someone speaking in that target accent/language and convert into a cloned/library voice.
- Clean up noise first. Either pass or run the source through the voice-isolator skill before conversion. Noise hurts more here than in TTS.
remove_background_noise=True - Split long recordings. Anything over 5 minutes must be chunked. Cut at natural pauses, convert each piece, and concatenate the resulting audio.
转换质量受限于输入录音——模型仅能替换音色,无法挽救质量差的源音频。以下是一些实用规则:
- 富有表现力:低语、呼喊、大笑、哭泣——模型会保留所有这些细节。平淡的输入会得到平淡的输出。
- 注意麦克风增益:音量过低会导致模型无法准确检测音素;音量过高则削波失真会影响转换效果。目标是达到健康的峰值,无削波。
- 口音和节奏源自源音频,而非目标语音:如果您用美式口音朗读,目标是英式“George”语音,您会得到带有美式口音的George音色。要配音成不同口音或语言,请录制目标口音/语言的音频,再转换为克隆/库中的语音。
- 先清理噪音:要么传入,要么先使用语音隔离工具处理源音频再进行转换。噪音对语音转换的影响比对TTS的影响更大。
remove_background_noise=True - 分割长录音:超过5分钟的录音必须分割为片段。在自然停顿处切割,转换每个片段,再拼接结果音频。
Common Workflows
常见工作流
- Re-voice a narration — keep the performance of a scratch recording, swap in a different narrator voice.
- Localize / dub — convert a voice-over into the same speaker's cloned voice in another language (using ).
eleven_multilingual_sts_v2 - Create character voices — act out a line yourself, convert into a distinctive character voice for games or animation.
- Anonymize a speaker — replace a recognizable voice with a neutral pre-made voice while preserving what was said and how.
- Pair with voice-isolator — isolate the source voice first (or set ) for noisy field recordings before conversion.
remove_background_noise=True - Pair with voice cloning — clone a target voice from a short sample, then use its here as the conversion target.
voice_id
- 重新配音旁白:保留草稿录音的表演细节,替换为不同的旁白语音。
- 本地化/配音:使用将配音转换为同一说话人在另一种语言中的克隆语音。
eleven_multilingual_sts_v2 - 创建角色语音:自己表演台词,转换为独特的角色语音用于游戏或动画。
- 匿名化说话人:将可识别的语音替换为中性预制语音,同时保留所说内容和表达方式。
- 搭配语音隔离工具:对于含噪现场录音,先隔离源语音(或设置)再进行转换。
remove_background_noise=True - 搭配语音克隆:从短样本克隆目标语音,然后使用其作为转换目标。
voice_id
Error Handling
错误处理
python
try:
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
audio=audio_file,
model_id="eleven_multilingual_sts_v2",
)
except Exception as e:
print(f"Voice changer failed: {e}")Common errors:
- 401: Invalid API key
- 422: Invalid parameters (check ,
voice_id, ormodel_idvs the supplied audio)file_format - 429: Rate limit exceeded
python
try:
audio_stream = client.speech_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
audio=audio_file,
model_id="eleven_multilingual_sts_v2",
)
except Exception as e:
print(f"Voice changer failed: {e}")常见错误:
- 401:API密钥无效
- 422:参数无效(检查、
voice_id或model_id与提供的音频是否匹配)file_format - 429:超出速率限制
References
参考资料
- Installation Guide
- 安装指南