elevenlabs

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ElevenLabs Audio Generation

ElevenLabs 音频生成

Requires
ELEVENLABS_API_KEY
in
.env
.
需要在
.env
文件中配置
ELEVENLABS_API_KEY

Text-to-Speech

文本转语音

python
from elevenlabs.client import ElevenLabs
from elevenlabs import save, VoiceSettings
import os

client = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))

audio = client.text_to_speech.convert(
    text="Welcome to my video!",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_multilingual_v2",
    voice_settings=VoiceSettings(
        stability=0.5,
        similarity_boost=0.75,
        style=0.5,
        speed=1.0
    )
)
save(audio, "voiceover.mp3")
python
from elevenlabs.client import ElevenLabs
from elevenlabs import save, VoiceSettings
import os

client = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))

audio = client.text_to_speech.convert(
    text="Welcome to my video!",
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    model_id="eleven_multilingual_v2",
    voice_settings=VoiceSettings(
        stability=0.5,
        similarity_boost=0.75,
        style=0.5,
        speed=1.0
    )
)
save(audio, "voiceover.mp3")

Models

模型

ModelQualitySSML SupportNotes
eleven_multilingual_v2
Highest consistencyNoneStable, production-ready, 29 languages
eleven_flash_v2_5
Good
<break>
,
<phoneme>
Fast, supports pause/pronunciation tags
eleven_turbo_v2_5
Good
<break>
,
<phoneme>
Fastest latency
eleven_v3
Most expressiveNoneAlpha — unreliable, needs prompt engineering
Choose: multilingual_v2 for reliability, flash/turbo for SSML control, v3 for maximum expressiveness (expect retakes).
模型质量SSML支持说明
eleven_multilingual_v2
一致性最高稳定、可用于生产环境,支持29种语言
eleven_flash_v2_5
良好
<break>
,
<phoneme>
生成速度快,支持停顿/发音标签
eleven_turbo_v2_5
良好
<break>
,
<phoneme>
延迟最低
eleven_v3
表现力最强测试版——可靠性不足,需要提示词工程
选择建议: 追求可靠性选multilingual_v2,需要SSML控制选flash/turbo,追求最大表现力选v3(可能需要多次生成)。

Voice Settings by Style

不同风格的语音设置

Stylestabilitysimilaritystylespeed
Natural/professional0.75-0.850.90.0-0.11.0
Conversational0.5-0.60.850.3-0.40.9-1.0
Energetic/YouTuber0.3-0.50.750.5-0.71.0-1.1
风格stabilitysimilaritystylespeed
自然/专业0.75-0.850.90.0-0.11.0
口语化0.5-0.60.850.3-0.40.9-1.0
充满活力/博主风格0.3-0.50.750.5-0.71.0-1.1

Pauses Between Sections

段落间停顿

With flash/turbo models: Use SSML break tags inline:
...end of section. <break time="1.5s" /> Start of next...
Max 3 seconds per break. Excessive breaks can cause speed artifacts.
With multilingual_v2 / v3: No SSML support. Options:
  • Paragraph breaks (blank lines) — creates ~0.3-0.5s natural pause
  • Post-process with ffmpeg: split audio and insert silence
WARNING:
...
(ellipsis) is NOT a reliable pause — it can be vocalized as a word/sound. Do not use ellipsis as a pause mechanism.
使用flash/turbo模型: 在线文本中使用SSML停顿标签:
...段落结尾。<break time="1.5s" />下一段开始...
单次停顿最长3秒。过多停顿可能导致语速异常。
使用multilingual_v2 / v3模型: 不支持SSML。可选方案:
  • 段落换行(空行)——会产生约0.3-0.5秒的自然停顿
  • 使用ffmpeg后期处理:分割音频并插入静音
注意:
...
(省略号)不能可靠实现停顿——可能会被发音为单词或声音。请勿使用省略号作为停顿方式。

Pronunciation Control

发音控制

Phonetic spelling (any model): Write words as you want them pronounced:
  • Janus
    Jan-us
  • nginx
    engine-x
  • Use dashes, capitals, apostrophes to guide pronunciation
SSML phoneme tags (flash/turbo only):
<phoneme alphabet="ipa" ph="ˈdʒeɪnəs">Janus</phoneme>
音标拼写(所有模型适用): 按照期望的发音方式拼写单词:
  • Janus
    Jan-us
  • nginx
    engine-x
  • 使用连字符、大写字母、撇号引导发音
SSML音素标签(仅flash/turbo支持):
<phoneme alphabet="ipa" ph="ˈdʒeɪnəs">Janus</phoneme>

Iterative Workflow

迭代工作流

  1. Generate → listen → identify pronunciation/pacing issues
  2. Adjust: phonetic spellings, break tags, voice settings
  3. Regenerate. If pauses aren't precise enough, add silence in post with ffmpeg rather than fighting the TTS engine.
  1. 生成 → 试听 → 识别发音/节奏问题
  2. 调整:音标拼写、停顿标签、语音设置
  3. 重新生成。如果停顿不够精准,建议使用ffmpeg后期添加静音,而非强行调整TTS引擎。

Voice Cloning

语音克隆

Instant Voice Clone

快速语音克隆

python
with open("sample.mp3", "rb") as f:
    voice = client.voices.ivc.create(
        name="My Voice",
        files=[f],
        remove_background_noise=True
    )
print(f"Voice ID: {voice.voice_id}")
  • Use
    client.voices.ivc.create()
    (not
    client.voices.clone()
    )
  • Pass file handles in binary mode (
    "rb"
    ), not paths
  • Convert m4a first:
    ffmpeg -i input.m4a -codec:a libmp3lame -qscale:a 2 output.mp3
  • Multiple samples (2-3 clips) improve accuracy
  • Save voice ID for reuse
Professional Voice Clone: Requires Creator plan+, 30+ min audio. See reference.md.
python
with open("sample.mp3", "rb") as f:
    voice = client.voices.ivc.create(
        name="My Voice",
        files=[f],
        remove_background_noise=True
    )
print(f"Voice ID: {voice.voice_id}")
  • 使用
    client.voices.ivc.create()
    (而非
    client.voices.clone()
  • 以二进制模式(
    "rb"
    )传入文件句柄,而非文件路径
  • 先将m4a格式转换为mp3:
    ffmpeg -i input.m4a -codec:a libmp3lame -qscale:a 2 output.mp3
  • 提供2-3个音频样本可提升克隆精度
  • 保存生成的Voice ID以便复用
专业级语音克隆: 需要Creator及以上套餐,提供30分钟以上的音频样本。详见reference.md

Sound Effects

音效生成

Max 22 seconds per generation.
python
result = client.text_to_sound_effects.convert(
    text="Thunder rumbling followed by heavy rain",
    duration_seconds=10,
    prompt_influence=0.3
)
with open("thunder.mp3", "wb") as f:
    for chunk in result:
        f.write(chunk)
Prompt tips: Be specific — "Heavy footsteps on wooden floorboards, slow and deliberate, with creaking"
单次生成最长22秒。
python
result = client.text_to_sound_effects.convert(
    text="Thunder rumbling followed by heavy rain",
    duration_seconds=10,
    prompt_influence=0.3
)
with open("thunder.mp3", "wb") as f:
    for chunk in result:
        f.write(chunk)
提示词技巧: 描述要具体——例如“木质地板上的沉重脚步声,缓慢且刻意,伴随地板吱呀声”

Music Generation

音乐生成

10 seconds to 5 minutes. Use
client.music.compose()
(not
.generate()
).
python
result = client.music.compose(
    prompt="Upbeat indie rock, catchy guitar riff, energetic drums, travel vlog",
    music_length_ms=60000,
    force_instrumental=True
)
with open("music.mp3", "wb") as f:
    for chunk in result:
        f.write(chunk)
Prompt structure: Genre, mood, instruments, tempo, use case. Add "no vocals" or use
force_instrumental=True
for background music.
生成时长为10秒至5分钟。使用
client.music.compose()
(而非
.generate()
)。
python
result = client.music.compose(
    prompt="Upbeat indie rock, catchy guitar riff, energetic drums, travel vlog",
    music_length_ms=60000,
    force_instrumental=True
)
with open("music.mp3", "wb") as f:
    for chunk in result:
        f.write(chunk)
提示词结构: 风格、情绪、乐器、节奏、使用场景。如果需要纯背景音乐,添加“no vocals”或设置
force_instrumental=True

Remotion Integration

Remotion 集成

Complete Workflow: Script to Synchronized Scene

完整工作流:脚本到同步场景

VOICEOVER-SCRIPT.md → voiceover.py → public/audio/ → Remotion composition
        ↓                  ↓               ↓                 ↓
  Scene narration    Generate MP3    Audio files     <Audio> component
  with durations     per scene       with timing     synced to scenes
VOICEOVER-SCRIPT.md → voiceover.py → public/audio/ → Remotion 合成
        ↓                  ↓               ↓                 ↓
  场景解说文本    生成MP3音频    带时序的音频文件     <Audio> 组件
  包含时长信息    按场景生成       清单文件          与场景同步

Step 1: Generate Per-Scene Audio

步骤1:按场景生成音频

Use the toolkit's voiceover tool to generate audio for each scene:
bash
undefined
使用工具包中的voiceover工具为每个场景生成音频:
bash
undefined

Generate voiceover files for each scene

为每个场景生成旁白文件

python tools/voiceover.py --scene-dir public/audio/scenes --json
python tools/voiceover.py --scene-dir public/audio/scenes --json

Output:

输出:

public/audio/scenes/

public/audio/scenes/

├── scene-01-title.mp3

├── scene-01-title.mp3

├── scene-02-problem.mp3

├── scene-02-problem.mp3

├── scene-03-solution.mp3

├── scene-03-solution.mp3

└── manifest.json (durations for each file)

└── manifest.json (每个文件的时长信息)


The `manifest.json` contains timing info:
```json
{
  "scenes": [
    { "file": "scene-01-title.mp3", "duration": 4.2 },
    { "file": "scene-02-problem.mp3", "duration": 12.8 },
    { "file": "scene-03-solution.mp3", "duration": 15.3 }
  ],
  "totalDuration": 32.3
}

`manifest.json`包含时序信息:
```json
{
  "scenes": [
    { "file": "scene-01-title.mp3", "duration": 4.2 },
    { "file": "scene-02-problem.mp3", "duration": 12.8 },
    { "file": "scene-03-solution.mp3", "duration": 15.3 }
  ],
  "totalDuration": 32.3
}

Step 2: Use Audio in Remotion Composition

步骤2:在Remotion合成中使用音频

tsx
// src/Composition.tsx
import { Audio, staticFile, Series, useVideoConfig } from 'remotion';

// Import scene components
import { TitleSlide } from './scenes/TitleSlide';
import { ProblemSlide } from './scenes/ProblemSlide';
import { SolutionSlide } from './scenes/SolutionSlide';

// Scene durations (from manifest.json, converted to frames at 30fps)
const SCENE_DURATIONS = {
  title: Math.ceil(4.2 * 30),      // 126 frames
  problem: Math.ceil(12.8 * 30),   // 384 frames
  solution: Math.ceil(15.3 * 30),  // 459 frames
};

export const MainComposition: React.FC = () => {
  return (
    <>
      {/* Scene sequence */}
      <Series>
        <Series.Sequence durationInFrames={SCENE_DURATIONS.title}>
          <TitleSlide />
        </Series.Sequence>
        <Series.Sequence durationInFrames={SCENE_DURATIONS.problem}>
          <ProblemSlide />
        </Series.Sequence>
        <Series.Sequence durationInFrames={SCENE_DURATIONS.solution}>
          <SolutionSlide />
        </Series.Sequence>
      </Series>

      {/* Audio track - plays continuously across all scenes */}
      <Audio src={staticFile('audio/voiceover.mp3')} volume={1} />

      {/* Optional: Background music at lower volume */}
      <Audio src={staticFile('audio/music.mp3')} volume={0.15} />
    </>
  );
};
tsx
// src/Composition.tsx
import { Audio, staticFile, Series, useVideoConfig } from 'remotion';

// 导入场景组件
import { TitleSlide } from './scenes/TitleSlide';
import { ProblemSlide } from './scenes/ProblemSlide';
import { SolutionSlide } from './scenes/SolutionSlide';

// 场景时长(来自manifest.json,转换为30fps的帧数)
const SCENE_DURATIONS = {
  title: Math.ceil(4.2 * 30),      // 126帧
  problem: Math.ceil(12.8 * 30),   // 384帧
  solution: Math.ceil(15.3 * 30),  // 459帧
};

export const MainComposition: React.FC = () => {
  return (
    <>
      {/* 场景序列 */}
      <Series>
        <Series.Sequence durationInFrames={SCENE_DURATIONS.title}>
          <TitleSlide />
        </Series.Sequence>
        <Series.Sequence durationInFrames={SCENE_DURATIONS.problem}>
          <ProblemSlide />
        </Series.Sequence>
        <Series.Sequence durationInFrames={SCENE_DURATIONS.solution}>
          <SolutionSlide />
        </Series.Sequence>
      </Series>

      {/* 音轨 - 跨所有场景连续播放 */}
      <Audio src={staticFile('audio/voiceover.mp3')} volume={1} />

      {/* 可选:低音量背景音乐 */}
      <Audio src={staticFile('audio/music.mp3')} volume={0.15} />
    </>
  );
};

Step 3: Per-Scene Audio (Alternative)

步骤3:按场景单独添加音频(替代方案)

For more control, add audio to each scene individually:
tsx
// src/scenes/ProblemSlide.tsx
import { Audio, staticFile, useCurrentFrame } from 'remotion';

export const ProblemSlide: React.FC = () => {
  const frame = useCurrentFrame();

  return (
    <div style={{ /* slide styles */ }}>
      <h1>The Problem</h1>
      {/* Scene content */}

      {/* Audio starts when this scene starts (frame 0 of this sequence) */}
      <Audio src={staticFile('audio/scenes/scene-02-problem.mp3')} />
    </div>
  );
};
如需更精细的控制,可为每个场景单独添加音频:
tsx
// src/scenes/ProblemSlide.tsx
import { Audio, staticFile, useCurrentFrame } from 'remotion';

export const ProblemSlide: React.FC = () => {
  const frame = useCurrentFrame();

  return (
    <div style={{ /* 幻灯片样式 */ }}>
      <h1>The Problem</h1>
      {/* 场景内容 */}

      {/* 音频在场景开始时播放(此序列的第0帧) */}
      <Audio src={staticFile('audio/scenes/scene-02-problem.mp3')} />
    </div>
  );
};

Syncing Visuals to Voiceover

视觉与旁白同步

Calculate scene duration from audio, not the other way around:
tsx
// src/config/timing.ts
import manifest from '../../public/audio/scenes/manifest.json';

const FPS = 30;

// Convert audio durations to frame counts
export const sceneDurations = manifest.scenes.reduce((acc, scene) => {
  const name = scene.file.replace(/^scene-\d+-/, '').replace('.mp3', '');
  acc[name] = Math.ceil(scene.duration * FPS);
  return acc;
}, {} as Record<string, number>);

// Usage in composition:
// <Series.Sequence durationInFrames={sceneDurations.title}>
根据音频时长计算场景时长,而非相反:
tsx
// src/config/timing.ts
import manifest from '../../public/audio/scenes/manifest.json';

const FPS = 30;

// 将音频时长转换为帧数
export const sceneDurations = manifest.scenes.reduce((acc, scene) => {
  const name = scene.file.replace(/^scene-\d+-/, '').replace('.mp3', '');
  acc[name] = Math.ceil(scene.duration * FPS);
  return acc;
}, {} as Record<string, number>);

// 在合成中使用:
// <Series.Sequence durationInFrames={sceneDurations.title}>

Audio Timing Patterns

音频时序模式

tsx
import { Audio, Sequence, interpolate, useCurrentFrame } from 'remotion';

// Fade in audio
export const FadeInAudio: React.FC<{ src: string; fadeFrames?: number }> = ({
  src,
  fadeFrames = 30
}) => {
  const frame = useCurrentFrame();
  const volume = interpolate(frame, [0, fadeFrames], [0, 1], {
    extrapolateRight: 'clamp',
  });
  return <Audio src={src} volume={volume} />;
};

// Delayed audio start
export const DelayedAudio: React.FC<{ src: string; delayFrames: number }> = ({
  src,
  delayFrames
}) => (
  <Sequence from={delayFrames}>
    <Audio src={src} />
  </Sequence>
);

// Usage:
// <FadeInAudio src={staticFile('audio/music.mp3')} fadeFrames={60} />
// <DelayedAudio src={staticFile('audio/sfx/whoosh.mp3')} delayFrames={45} />
tsx
import { Audio, Sequence, interpolate, useCurrentFrame } from 'remotion';

// 音频淡入
export const FadeInAudio: React.FC<{ src: string; fadeFrames?: number }> = ({
  src,
  fadeFrames = 30
}) => {
  const frame = useCurrentFrame();
  const volume = interpolate(frame, [0, fadeFrames], [0, 1], {
    extrapolateRight: 'clamp',
  });
  return <Audio src={src} volume={volume} />;
};

// 延迟音频播放
export const DelayedAudio: React.FC<{ src: string; delayFrames: number }> = ({
  src,
  delayFrames
}) => (
  <Sequence from={delayFrames}>
    <Audio src={src} />
  </Sequence>
);

// 使用示例:
// <FadeInAudio src={staticFile('audio/music.mp3')} fadeFrames={60} />
// <DelayedAudio src={staticFile('audio/sfx/whoosh.mp3')} delayFrames={45} />

Voiceover + Demo Video Sync

旁白与演示视频同步

When a scene has both voiceover and demo video:
tsx
import { Audio, OffthreadVideo, staticFile, useVideoConfig } from 'remotion';

export const DemoScene: React.FC = () => {
  const { durationInFrames, fps } = useVideoConfig();

  // Calculate playback rate to fit demo into voiceover duration
  const demoDuration = 45; // seconds (original demo length)
  const sceneDuration = durationInFrames / fps; // seconds (from voiceover)
  const playbackRate = demoDuration / sceneDuration;

  return (
    <>
      <OffthreadVideo
        src={staticFile('demos/feature-demo.mp4')}
        playbackRate={playbackRate}
      />
      <Audio src={staticFile('audio/scenes/scene-04-demo.mp3')} />
    </>
  );
};
当场景同时包含旁白和演示视频时:
tsx
import { Audio, OffthreadVideo, staticFile, useVideoConfig } from 'remotion';

export const DemoScene: React.FC = () => {
  const { durationInFrames, fps } = useVideoConfig();

  // 计算播放速率使演示视频适配旁白时长
  const demoDuration = 45; // 秒(原演示视频时长)
  const sceneDuration = durationInFrames / fps; // 秒(来自旁白)
  const playbackRate = demoDuration / sceneDuration;

  return (
    <>
      <OffthreadVideo
        src={staticFile('demos/feature-demo.mp4')}
        playbackRate={playbackRate}
      />
      <Audio src={staticFile('audio/scenes/scene-04-demo.mp3')} />
    </>
  );
};

Error Handling

错误处理

tsx
import { Audio, staticFile, delayRender, continueRender } from 'remotion';
import { useEffect, useState } from 'react';

export const SafeAudio: React.FC<{ src: string }> = ({ src }) => {
  const [handle] = useState(() => delayRender());
  const [audioReady, setAudioReady] = useState(false);

  useEffect(() => {
    const audio = new window.Audio(src);
    audio.oncanplaythrough = () => {
      setAudioReady(true);
      continueRender(handle);
    };
    audio.onerror = () => {
      console.error(`Failed to load audio: ${src}`);
      continueRender(handle); // Continue without audio rather than hang
    };
  }, [src, handle]);

  if (!audioReady) return null;
  return <Audio src={src} />;
};
tsx
import { Audio, staticFile, delayRender, continueRender } from 'remotion';
import { useEffect, useState } from 'react';

export const SafeAudio: React.FC<{ src: string }> = ({ src }) => {
  const [handle] = useState(() => delayRender());
  const [audioReady, setAudioReady] = useState(false);

  useEffect(() => {
    const audio = new window.Audio(src);
    audio.oncanplaythrough = () => {
      setAudioReady(true);
      continueRender(handle);
    };
    audio.onerror = () => {
      console.error(`Failed to load audio: ${src}`);
      continueRender(handle); // 即使音频加载失败也继续渲染,避免卡顿
    };
  }, [src, handle]);

  if (!audioReady) return null;
  return <Audio src={src} />;
};

Toolkit Command: /generate-voiceover

工具包命令:/generate-voiceover

The
/generate-voiceover
command handles the full workflow:
/generate-voiceover

1. Reads VOICEOVER-SCRIPT.md
2. Extracts narration for each scene
3. Generates audio via ElevenLabs API
4. Saves to public/audio/scenes/
5. Creates manifest.json with durations
6. Updates project.json with timing info
/generate-voiceover
命令可处理完整工作流:
/generate-voiceover

1. 读取VOICEOVER-SCRIPT.md
2. 提取每个场景的解说文本
3. 通过ElevenLabs API生成音频
4. 保存到public/audio/scenes/
5. 创建包含时长信息的manifest.json
6. 更新project.json中的时序信息

Popular Voices

热门语音

  • George:
    JBFqnCBsd6RMkjVDRZzb
    (warm narrator)
  • Rachel:
    21m00Tcm4TlvDq8ikWAM
    (clear female)
  • Adam:
    pNInz6obpgDQGcFmaJgB
    (professional male)
List all:
client.voices.get_all()
For full API docs, see reference.md.
  • George:
    JBFqnCBsd6RMkjVDRZzb
    (温暖的旁白风格)
  • Rachel:
    21m00Tcm4TlvDq8ikWAM
    (清晰的女声)
  • Adam:
    pNInz6obpgDQGcFmaJgB
    (专业的男声)
列出所有语音:
client.voices.get_all()
完整API文档请见reference.md