elevenlabs
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseElevenLabs Audio Generation
ElevenLabs 音频生成
Requires in .
ELEVENLABS_API_KEY.env需要在文件中配置。
.envELEVENLABS_API_KEYText-to-Speech
文本转语音
python
from elevenlabs.client import ElevenLabs
from elevenlabs import save, VoiceSettings
import os
client = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
audio = client.text_to_speech.convert(
text="Welcome to my video!",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
voice_settings=VoiceSettings(
stability=0.5,
similarity_boost=0.75,
style=0.5,
speed=1.0
)
)
save(audio, "voiceover.mp3")python
from elevenlabs.client import ElevenLabs
from elevenlabs import save, VoiceSettings
import os
client = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
audio = client.text_to_speech.convert(
text="Welcome to my video!",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
voice_settings=VoiceSettings(
stability=0.5,
similarity_boost=0.75,
style=0.5,
speed=1.0
)
)
save(audio, "voiceover.mp3")Models
模型
| Model | Quality | SSML Support | Notes |
|---|---|---|---|
| Highest consistency | None | Stable, production-ready, 29 languages |
| Good | | Fast, supports pause/pronunciation tags |
| Good | | Fastest latency |
| Most expressive | None | Alpha — unreliable, needs prompt engineering |
Choose: multilingual_v2 for reliability, flash/turbo for SSML control, v3 for maximum expressiveness (expect retakes).
| 模型 | 质量 | SSML支持 | 说明 |
|---|---|---|---|
| 一致性最高 | 无 | 稳定、可用于生产环境,支持29种语言 |
| 良好 | | 生成速度快,支持停顿/发音标签 |
| 良好 | | 延迟最低 |
| 表现力最强 | 无 | 测试版——可靠性不足,需要提示词工程 |
选择建议: 追求可靠性选multilingual_v2,需要SSML控制选flash/turbo,追求最大表现力选v3(可能需要多次生成)。
Voice Settings by Style
不同风格的语音设置
| Style | stability | similarity | style | speed |
|---|---|---|---|---|
| Natural/professional | 0.75-0.85 | 0.9 | 0.0-0.1 | 1.0 |
| Conversational | 0.5-0.6 | 0.85 | 0.3-0.4 | 0.9-1.0 |
| Energetic/YouTuber | 0.3-0.5 | 0.75 | 0.5-0.7 | 1.0-1.1 |
| 风格 | stability | similarity | style | speed |
|---|---|---|---|---|
| 自然/专业 | 0.75-0.85 | 0.9 | 0.0-0.1 | 1.0 |
| 口语化 | 0.5-0.6 | 0.85 | 0.3-0.4 | 0.9-1.0 |
| 充满活力/博主风格 | 0.3-0.5 | 0.75 | 0.5-0.7 | 1.0-1.1 |
Pauses Between Sections
段落间停顿
With flash/turbo models: Use SSML break tags inline:
...end of section. <break time="1.5s" /> Start of next...Max 3 seconds per break. Excessive breaks can cause speed artifacts.
With multilingual_v2 / v3: No SSML support. Options:
- Paragraph breaks (blank lines) — creates ~0.3-0.5s natural pause
- Post-process with ffmpeg: split audio and insert silence
WARNING: (ellipsis) is NOT a reliable pause — it can be vocalized as a word/sound. Do not use ellipsis as a pause mechanism.
...使用flash/turbo模型: 在线文本中使用SSML停顿标签:
...段落结尾。<break time="1.5s" />下一段开始...单次停顿最长3秒。过多停顿可能导致语速异常。
使用multilingual_v2 / v3模型: 不支持SSML。可选方案:
- 段落换行(空行)——会产生约0.3-0.5秒的自然停顿
- 使用ffmpeg后期处理:分割音频并插入静音
注意: (省略号)不能可靠实现停顿——可能会被发音为单词或声音。请勿使用省略号作为停顿方式。
...Pronunciation Control
发音控制
Phonetic spelling (any model): Write words as you want them pronounced:
- →
JanusJan-us - →
nginxengine-x - Use dashes, capitals, apostrophes to guide pronunciation
SSML phoneme tags (flash/turbo only):
<phoneme alphabet="ipa" ph="ˈdʒeɪnəs">Janus</phoneme>音标拼写(所有模型适用): 按照期望的发音方式拼写单词:
- →
JanusJan-us - →
nginxengine-x - 使用连字符、大写字母、撇号引导发音
SSML音素标签(仅flash/turbo支持):
<phoneme alphabet="ipa" ph="ˈdʒeɪnəs">Janus</phoneme>Iterative Workflow
迭代工作流
- Generate → listen → identify pronunciation/pacing issues
- Adjust: phonetic spellings, break tags, voice settings
- Regenerate. If pauses aren't precise enough, add silence in post with ffmpeg rather than fighting the TTS engine.
- 生成 → 试听 → 识别发音/节奏问题
- 调整:音标拼写、停顿标签、语音设置
- 重新生成。如果停顿不够精准,建议使用ffmpeg后期添加静音,而非强行调整TTS引擎。
Voice Cloning
语音克隆
Instant Voice Clone
快速语音克隆
python
with open("sample.mp3", "rb") as f:
voice = client.voices.ivc.create(
name="My Voice",
files=[f],
remove_background_noise=True
)
print(f"Voice ID: {voice.voice_id}")- Use (not
client.voices.ivc.create())client.voices.clone() - Pass file handles in binary mode (), not paths
"rb" - Convert m4a first:
ffmpeg -i input.m4a -codec:a libmp3lame -qscale:a 2 output.mp3 - Multiple samples (2-3 clips) improve accuracy
- Save voice ID for reuse
Professional Voice Clone: Requires Creator plan+, 30+ min audio. See reference.md.
python
with open("sample.mp3", "rb") as f:
voice = client.voices.ivc.create(
name="My Voice",
files=[f],
remove_background_noise=True
)
print(f"Voice ID: {voice.voice_id}")- 使用(而非
client.voices.ivc.create())client.voices.clone() - 以二进制模式()传入文件句柄,而非文件路径
"rb" - 先将m4a格式转换为mp3:
ffmpeg -i input.m4a -codec:a libmp3lame -qscale:a 2 output.mp3 - 提供2-3个音频样本可提升克隆精度
- 保存生成的Voice ID以便复用
专业级语音克隆: 需要Creator及以上套餐,提供30分钟以上的音频样本。详见reference.md。
Sound Effects
音效生成
Max 22 seconds per generation.
python
result = client.text_to_sound_effects.convert(
text="Thunder rumbling followed by heavy rain",
duration_seconds=10,
prompt_influence=0.3
)
with open("thunder.mp3", "wb") as f:
for chunk in result:
f.write(chunk)Prompt tips: Be specific — "Heavy footsteps on wooden floorboards, slow and deliberate, with creaking"
单次生成最长22秒。
python
result = client.text_to_sound_effects.convert(
text="Thunder rumbling followed by heavy rain",
duration_seconds=10,
prompt_influence=0.3
)
with open("thunder.mp3", "wb") as f:
for chunk in result:
f.write(chunk)提示词技巧: 描述要具体——例如“木质地板上的沉重脚步声,缓慢且刻意,伴随地板吱呀声”
Music Generation
音乐生成
10 seconds to 5 minutes. Use (not ).
client.music.compose().generate()python
result = client.music.compose(
prompt="Upbeat indie rock, catchy guitar riff, energetic drums, travel vlog",
music_length_ms=60000,
force_instrumental=True
)
with open("music.mp3", "wb") as f:
for chunk in result:
f.write(chunk)Prompt structure: Genre, mood, instruments, tempo, use case. Add "no vocals" or use for background music.
force_instrumental=True生成时长为10秒至5分钟。使用(而非)。
client.music.compose().generate()python
result = client.music.compose(
prompt="Upbeat indie rock, catchy guitar riff, energetic drums, travel vlog",
music_length_ms=60000,
force_instrumental=True
)
with open("music.mp3", "wb") as f:
for chunk in result:
f.write(chunk)提示词结构: 风格、情绪、乐器、节奏、使用场景。如果需要纯背景音乐,添加“no vocals”或设置。
force_instrumental=TrueRemotion Integration
Remotion 集成
Complete Workflow: Script to Synchronized Scene
完整工作流:脚本到同步场景
VOICEOVER-SCRIPT.md → voiceover.py → public/audio/ → Remotion composition
↓ ↓ ↓ ↓
Scene narration Generate MP3 Audio files <Audio> component
with durations per scene with timing synced to scenesVOICEOVER-SCRIPT.md → voiceover.py → public/audio/ → Remotion 合成
↓ ↓ ↓ ↓
场景解说文本 生成MP3音频 带时序的音频文件 <Audio> 组件
包含时长信息 按场景生成 清单文件 与场景同步Step 1: Generate Per-Scene Audio
步骤1:按场景生成音频
Use the toolkit's voiceover tool to generate audio for each scene:
bash
undefined使用工具包中的voiceover工具为每个场景生成音频:
bash
undefinedGenerate voiceover files for each scene
为每个场景生成旁白文件
python tools/voiceover.py --scene-dir public/audio/scenes --json
python tools/voiceover.py --scene-dir public/audio/scenes --json
Output:
输出:
public/audio/scenes/
public/audio/scenes/
├── scene-01-title.mp3
├── scene-01-title.mp3
├── scene-02-problem.mp3
├── scene-02-problem.mp3
├── scene-03-solution.mp3
├── scene-03-solution.mp3
└── manifest.json (durations for each file)
└── manifest.json (每个文件的时长信息)
The `manifest.json` contains timing info:
```json
{
"scenes": [
{ "file": "scene-01-title.mp3", "duration": 4.2 },
{ "file": "scene-02-problem.mp3", "duration": 12.8 },
{ "file": "scene-03-solution.mp3", "duration": 15.3 }
],
"totalDuration": 32.3
}
`manifest.json`包含时序信息:
```json
{
"scenes": [
{ "file": "scene-01-title.mp3", "duration": 4.2 },
{ "file": "scene-02-problem.mp3", "duration": 12.8 },
{ "file": "scene-03-solution.mp3", "duration": 15.3 }
],
"totalDuration": 32.3
}Step 2: Use Audio in Remotion Composition
步骤2:在Remotion合成中使用音频
tsx
// src/Composition.tsx
import { Audio, staticFile, Series, useVideoConfig } from 'remotion';
// Import scene components
import { TitleSlide } from './scenes/TitleSlide';
import { ProblemSlide } from './scenes/ProblemSlide';
import { SolutionSlide } from './scenes/SolutionSlide';
// Scene durations (from manifest.json, converted to frames at 30fps)
const SCENE_DURATIONS = {
title: Math.ceil(4.2 * 30), // 126 frames
problem: Math.ceil(12.8 * 30), // 384 frames
solution: Math.ceil(15.3 * 30), // 459 frames
};
export const MainComposition: React.FC = () => {
return (
<>
{/* Scene sequence */}
<Series>
<Series.Sequence durationInFrames={SCENE_DURATIONS.title}>
<TitleSlide />
</Series.Sequence>
<Series.Sequence durationInFrames={SCENE_DURATIONS.problem}>
<ProblemSlide />
</Series.Sequence>
<Series.Sequence durationInFrames={SCENE_DURATIONS.solution}>
<SolutionSlide />
</Series.Sequence>
</Series>
{/* Audio track - plays continuously across all scenes */}
<Audio src={staticFile('audio/voiceover.mp3')} volume={1} />
{/* Optional: Background music at lower volume */}
<Audio src={staticFile('audio/music.mp3')} volume={0.15} />
</>
);
};tsx
// src/Composition.tsx
import { Audio, staticFile, Series, useVideoConfig } from 'remotion';
// 导入场景组件
import { TitleSlide } from './scenes/TitleSlide';
import { ProblemSlide } from './scenes/ProblemSlide';
import { SolutionSlide } from './scenes/SolutionSlide';
// 场景时长(来自manifest.json,转换为30fps的帧数)
const SCENE_DURATIONS = {
title: Math.ceil(4.2 * 30), // 126帧
problem: Math.ceil(12.8 * 30), // 384帧
solution: Math.ceil(15.3 * 30), // 459帧
};
export const MainComposition: React.FC = () => {
return (
<>
{/* 场景序列 */}
<Series>
<Series.Sequence durationInFrames={SCENE_DURATIONS.title}>
<TitleSlide />
</Series.Sequence>
<Series.Sequence durationInFrames={SCENE_DURATIONS.problem}>
<ProblemSlide />
</Series.Sequence>
<Series.Sequence durationInFrames={SCENE_DURATIONS.solution}>
<SolutionSlide />
</Series.Sequence>
</Series>
{/* 音轨 - 跨所有场景连续播放 */}
<Audio src={staticFile('audio/voiceover.mp3')} volume={1} />
{/* 可选:低音量背景音乐 */}
<Audio src={staticFile('audio/music.mp3')} volume={0.15} />
</>
);
};Step 3: Per-Scene Audio (Alternative)
步骤3:按场景单独添加音频(替代方案)
For more control, add audio to each scene individually:
tsx
// src/scenes/ProblemSlide.tsx
import { Audio, staticFile, useCurrentFrame } from 'remotion';
export const ProblemSlide: React.FC = () => {
const frame = useCurrentFrame();
return (
<div style={{ /* slide styles */ }}>
<h1>The Problem</h1>
{/* Scene content */}
{/* Audio starts when this scene starts (frame 0 of this sequence) */}
<Audio src={staticFile('audio/scenes/scene-02-problem.mp3')} />
</div>
);
};如需更精细的控制,可为每个场景单独添加音频:
tsx
// src/scenes/ProblemSlide.tsx
import { Audio, staticFile, useCurrentFrame } from 'remotion';
export const ProblemSlide: React.FC = () => {
const frame = useCurrentFrame();
return (
<div style={{ /* 幻灯片样式 */ }}>
<h1>The Problem</h1>
{/* 场景内容 */}
{/* 音频在场景开始时播放(此序列的第0帧) */}
<Audio src={staticFile('audio/scenes/scene-02-problem.mp3')} />
</div>
);
};Syncing Visuals to Voiceover
视觉与旁白同步
Calculate scene duration from audio, not the other way around:
tsx
// src/config/timing.ts
import manifest from '../../public/audio/scenes/manifest.json';
const FPS = 30;
// Convert audio durations to frame counts
export const sceneDurations = manifest.scenes.reduce((acc, scene) => {
const name = scene.file.replace(/^scene-\d+-/, '').replace('.mp3', '');
acc[name] = Math.ceil(scene.duration * FPS);
return acc;
}, {} as Record<string, number>);
// Usage in composition:
// <Series.Sequence durationInFrames={sceneDurations.title}>根据音频时长计算场景时长,而非相反:
tsx
// src/config/timing.ts
import manifest from '../../public/audio/scenes/manifest.json';
const FPS = 30;
// 将音频时长转换为帧数
export const sceneDurations = manifest.scenes.reduce((acc, scene) => {
const name = scene.file.replace(/^scene-\d+-/, '').replace('.mp3', '');
acc[name] = Math.ceil(scene.duration * FPS);
return acc;
}, {} as Record<string, number>);
// 在合成中使用:
// <Series.Sequence durationInFrames={sceneDurations.title}>Audio Timing Patterns
音频时序模式
tsx
import { Audio, Sequence, interpolate, useCurrentFrame } from 'remotion';
// Fade in audio
export const FadeInAudio: React.FC<{ src: string; fadeFrames?: number }> = ({
src,
fadeFrames = 30
}) => {
const frame = useCurrentFrame();
const volume = interpolate(frame, [0, fadeFrames], [0, 1], {
extrapolateRight: 'clamp',
});
return <Audio src={src} volume={volume} />;
};
// Delayed audio start
export const DelayedAudio: React.FC<{ src: string; delayFrames: number }> = ({
src,
delayFrames
}) => (
<Sequence from={delayFrames}>
<Audio src={src} />
</Sequence>
);
// Usage:
// <FadeInAudio src={staticFile('audio/music.mp3')} fadeFrames={60} />
// <DelayedAudio src={staticFile('audio/sfx/whoosh.mp3')} delayFrames={45} />tsx
import { Audio, Sequence, interpolate, useCurrentFrame } from 'remotion';
// 音频淡入
export const FadeInAudio: React.FC<{ src: string; fadeFrames?: number }> = ({
src,
fadeFrames = 30
}) => {
const frame = useCurrentFrame();
const volume = interpolate(frame, [0, fadeFrames], [0, 1], {
extrapolateRight: 'clamp',
});
return <Audio src={src} volume={volume} />;
};
// 延迟音频播放
export const DelayedAudio: React.FC<{ src: string; delayFrames: number }> = ({
src,
delayFrames
}) => (
<Sequence from={delayFrames}>
<Audio src={src} />
</Sequence>
);
// 使用示例:
// <FadeInAudio src={staticFile('audio/music.mp3')} fadeFrames={60} />
// <DelayedAudio src={staticFile('audio/sfx/whoosh.mp3')} delayFrames={45} />Voiceover + Demo Video Sync
旁白与演示视频同步
When a scene has both voiceover and demo video:
tsx
import { Audio, OffthreadVideo, staticFile, useVideoConfig } from 'remotion';
export const DemoScene: React.FC = () => {
const { durationInFrames, fps } = useVideoConfig();
// Calculate playback rate to fit demo into voiceover duration
const demoDuration = 45; // seconds (original demo length)
const sceneDuration = durationInFrames / fps; // seconds (from voiceover)
const playbackRate = demoDuration / sceneDuration;
return (
<>
<OffthreadVideo
src={staticFile('demos/feature-demo.mp4')}
playbackRate={playbackRate}
/>
<Audio src={staticFile('audio/scenes/scene-04-demo.mp3')} />
</>
);
};当场景同时包含旁白和演示视频时:
tsx
import { Audio, OffthreadVideo, staticFile, useVideoConfig } from 'remotion';
export const DemoScene: React.FC = () => {
const { durationInFrames, fps } = useVideoConfig();
// 计算播放速率使演示视频适配旁白时长
const demoDuration = 45; // 秒(原演示视频时长)
const sceneDuration = durationInFrames / fps; // 秒(来自旁白)
const playbackRate = demoDuration / sceneDuration;
return (
<>
<OffthreadVideo
src={staticFile('demos/feature-demo.mp4')}
playbackRate={playbackRate}
/>
<Audio src={staticFile('audio/scenes/scene-04-demo.mp3')} />
</>
);
};Error Handling
错误处理
tsx
import { Audio, staticFile, delayRender, continueRender } from 'remotion';
import { useEffect, useState } from 'react';
export const SafeAudio: React.FC<{ src: string }> = ({ src }) => {
const [handle] = useState(() => delayRender());
const [audioReady, setAudioReady] = useState(false);
useEffect(() => {
const audio = new window.Audio(src);
audio.oncanplaythrough = () => {
setAudioReady(true);
continueRender(handle);
};
audio.onerror = () => {
console.error(`Failed to load audio: ${src}`);
continueRender(handle); // Continue without audio rather than hang
};
}, [src, handle]);
if (!audioReady) return null;
return <Audio src={src} />;
};tsx
import { Audio, staticFile, delayRender, continueRender } from 'remotion';
import { useEffect, useState } from 'react';
export const SafeAudio: React.FC<{ src: string }> = ({ src }) => {
const [handle] = useState(() => delayRender());
const [audioReady, setAudioReady] = useState(false);
useEffect(() => {
const audio = new window.Audio(src);
audio.oncanplaythrough = () => {
setAudioReady(true);
continueRender(handle);
};
audio.onerror = () => {
console.error(`Failed to load audio: ${src}`);
continueRender(handle); // 即使音频加载失败也继续渲染,避免卡顿
};
}, [src, handle]);
if (!audioReady) return null;
return <Audio src={src} />;
};Toolkit Command: /generate-voiceover
工具包命令:/generate-voiceover
The command handles the full workflow:
/generate-voiceover/generate-voiceover
1. Reads VOICEOVER-SCRIPT.md
2. Extracts narration for each scene
3. Generates audio via ElevenLabs API
4. Saves to public/audio/scenes/
5. Creates manifest.json with durations
6. Updates project.json with timing info/generate-voiceover/generate-voiceover
1. 读取VOICEOVER-SCRIPT.md
2. 提取每个场景的解说文本
3. 通过ElevenLabs API生成音频
4. 保存到public/audio/scenes/
5. 创建包含时长信息的manifest.json
6. 更新project.json中的时序信息Popular Voices
热门语音
- George: (warm narrator)
JBFqnCBsd6RMkjVDRZzb - Rachel: (clear female)
21m00Tcm4TlvDq8ikWAM - Adam: (professional male)
pNInz6obpgDQGcFmaJgB
List all:
client.voices.get_all()For full API docs, see reference.md.
- George: (温暖的旁白风格)
JBFqnCBsd6RMkjVDRZzb - Rachel: (清晰的女声)
21m00Tcm4TlvDq8ikWAM - Adam: (专业的男声)
pNInz6obpgDQGcFmaJgB
列出所有语音:
client.voices.get_all()完整API文档请见reference.md。