elevenlabs-narration
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseElevenLabs Narration for Video Production
面向视频制作的ElevenLabs旁白解决方案
Complete integration guide for using ElevenLabs text-to-speech in video production pipelines. Covers voice selection, timing calculations, API patterns, and cost optimization for professional narration.
在视频制作流水线中使用ElevenLabs文本转语音(TTS)的完整集成指南。涵盖专业旁白场景下的语音选择、时长计算、API模式以及成本优化等内容。
Overview
概述
- Generating narration audio for video segments
- Selecting appropriate voices for content type
- Calculating segment timing from frames to milliseconds
- Building script-to-audio pipelines
- Optimizing API usage and costs
- Handling rate limits and errors
- 为视频片段生成旁白音频
- 根据内容类型选择合适的语音
- 将视频帧数转换为毫秒级时长计算
- 构建脚本转音频流水线
- 优化API使用方式与成本
- 处理速率限制与错误
ElevenLabs API Overview
ElevenLabs API概述
Model Comparison (2026)
2026年模型对比
| Model | Latency | Quality | Cost | Best For |
|---|---|---|---|---|
| eleven_multilingual_v2 | Medium | Best | $0.30/1K chars | Production, multilingual |
| eleven_turbo_v2_5 | Low | Excellent | $0.18/1K chars | Real-time, drafts |
| eleven_flash_v2_5 | Lowest | Good | $0.08/1K chars | Previews, testing |
| eleven_english_sts_v2 | Medium | Best | $0.30/1K chars | Speech-to-speech |
| 模型 | 延迟 | 音质 | 成本 | 适用场景 |
|---|---|---|---|---|
| eleven_multilingual_v2 | 中等 | 最佳 | 0.30美元/千字符 | 生产环境、多语言场景 |
| eleven_turbo_v2_5 | 低 | 优秀 | 0.18美元/千字符 | 实时场景、草稿制作 |
| eleven_flash_v2_5 | 最低 | 良好 | 0.08美元/千字符 | 预览、测试场景 |
| eleven_english_sts_v2 | 中等 | 最佳 | 0.30美元/千字符 | 语音转语音场景 |
API Endpoints
API端点
Base URL: https://api.elevenlabs.io/v1
POST /text-to-speech/{voice_id} # Generate audio
POST /text-to-speech/{voice_id}/stream # Stream audio
GET /voices # List voices
GET /voices/{voice_id} # Voice details
GET /user # Usage/quota
POST /speech-to-speech/{voice_id} # Voice conversionBase URL: https://api.elevenlabs.io/v1
POST /text-to-speech/{voice_id} # 生成音频
POST /text-to-speech/{voice_id}/stream # 流式输出音频
GET /voices # 列出所有语音
GET /voices/{voice_id} # 获取语音详情
GET /user # 查询使用量/配额
POST /speech-to-speech/{voice_id} # 语音转换Core Integration Pattern
核心集成模式
Basic Text-to-Speech
基础文本转语音实现
typescript
import { ElevenLabsClient } from 'elevenlabs';
const client = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY
});
async function generateNarration(
text: string,
voiceId: string = 'Rachel'
): Promise<Buffer> {
const audio = await client.generate({
voice: voiceId,
text: text,
model_id: 'eleven_multilingual_v2',
voice_settings: {
stability: 0.5,
similarity_boost: 0.8,
style: 0.0,
use_speaker_boost: true
}
});
// Convert stream to buffer
const chunks: Buffer[] = [];
for await (const chunk of audio) {
chunks.push(chunk);
}
return Buffer.concat(chunks);
}typescript
import { ElevenLabsClient } from 'elevenlabs';
const client = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY
});
async function generateNarration(
text: string,
voiceId: string = 'Rachel'
): Promise<Buffer> {
const audio = await client.generate({
voice: voiceId,
text: text,
model_id: 'eleven_multilingual_v2',
voice_settings: {
stability: 0.5,
similarity_boost: 0.8,
style: 0.0,
use_speaker_boost: true
}
});
// 将流转换为Buffer
const chunks: Buffer[] = [];
for await (const chunk of audio) {
chunks.push(chunk);
}
return Buffer.concat(chunks);
}Voice Selection Quick Reference
语音选择快速参考
Pre-Built Voices for Video Narration
适用于视频旁白的预置语音
| Voice | ID | Characteristics | Use Case |
|---|---|---|---|
| Rachel | 21m00Tcm4TlvDq8ikWAM | Warm, conversational | General narration |
| Adam | pNInz6obpgDQGcFmaJgB | Deep, authoritative | Tech explainers |
| Antoni | ErXwobaYiN019PkySvjV | Energetic, youthful | Product demos |
| Bella | EXAVITQu4vr4xnSDxMaL | Friendly, engaging | Tutorials |
| Josh | TxGEqnHWrfWFTfGW9XjX | Deep, narrative | Documentaries |
| 语音名称 | ID | 特点 | 适用场景 |
|---|---|---|---|
| Rachel | 21m00Tcm4TlvDq8ikWAM | 温暖、口语化 | 通用旁白 |
| Adam | pNInz6obpgDQGcFmaJgB | 低沉、权威 | 科技讲解视频 |
| Antoni | ErXwobaYiN019PkySvjV | 活力、年轻化 | 产品演示视频 |
| Bella | EXAVITQu4vr4xnSDxMaL | 友好、有吸引力 | 教程视频 |
| Josh | TxGEqnHWrfWFTfGW9XjX | 深沉、叙事感强 | 纪录片 |
Voice Settings Explained
语音设置说明
typescript
interface VoiceSettings {
stability: number; // 0.0-1.0 (lower = more expressive)
similarity_boost: number; // 0.0-1.0 (higher = closer to original)
style: number; // 0.0-1.0 (v2 models only)
use_speaker_boost: boolean; // Clarity enhancement
}
// Recommended settings by content type
const VOICE_PRESETS = {
narration: { stability: 0.65, similarity_boost: 0.8, style: 0.0 },
conversational: { stability: 0.4, similarity_boost: 0.75, style: 0.2 },
dramatic: { stability: 0.3, similarity_boost: 0.9, style: 0.5 },
professional: { stability: 0.8, similarity_boost: 0.85, style: 0.0 },
energetic: { stability: 0.35, similarity_boost: 0.85, style: 0.4 }
};typescript
interface VoiceSettings {
stability: number; // 0.0-1.0(值越低,表现力越强)
similarity_boost: number; // 0.0-1.0(值越高,越接近原始语音)
style: number; // 0.0-1.0(仅v2模型支持)
use_speaker_boost: boolean; // 清晰度增强
}
// 按内容类型推荐的设置
const VOICE_PRESETS = {
narration: { stability: 0.65, similarity_boost: 0.8, style: 0.0 },
conversational: { stability: 0.4, similarity_boost: 0.75, style: 0.2 },
dramatic: { stability: 0.3, similarity_boost: 0.9, style: 0.5 },
professional: { stability: 0.8, similarity_boost: 0.85, style: 0.0 },
energetic: { stability: 0.35, similarity_boost: 0.85, style: 0.4 }
};Segment Timing Calculations
片段时长计算
Frame-to-Milliseconds Conversion
帧数转毫秒转换
typescript
function framesToMs(frames: number, fps: number = 30): number {
return Math.round((frames / fps) * 1000);
}
function msToFrames(ms: number, fps: number = 30): number {
return Math.round((ms / 1000) * fps);
}
// Examples
framesToMs(90, 30); // 3000ms (3 seconds at 30fps)
framesToMs(150, 30); // 5000ms (5 seconds at 30fps)
msToFrames(2500, 30); // 75 framestypescript
function framesToMs(frames: number, fps: number = 30): number {
return Math.round((frames / fps) * 1000);
}
function msToFrames(ms: number, fps: number = 30): number {
return Math.round((ms / 1000) * fps);
}
// 示例
framesToMs(90, 30); // 3000ms(30帧率下的3秒)
framesToMs(150, 30); // 5000ms(30帧率下的5秒)
msToFrames(2500, 30); // 75帧Words Per Minute Reference
每分钟字数参考
Speaking Speed WPM Words/30s Use Case
----------------------------------------------------------
Slow (dramatic) 100 50 Hooks, reveals
Normal narration 130-150 65-75 Standard content
Conversational 150-170 75-85 Tutorials, demos
Fast (excited) 170-190 85-95 Features, energy
Very fast 200+ 100+ Avoid (unclear)语速 每分钟字数(WPM) 30秒字数 适用场景
----------------------------------------------------------
慢速(戏剧化) 100 50 开场钩子、揭秘场景
正常旁白 130-150 65-75 标准内容
口语化 150-170 75-85 教程、演示视频
快速(兴奋) 170-190 85-95 功能介绍、活力场景
极快 200+ 100+ 避免使用(清晰度不足)Remotion Integration
Remotion集成
Audio Component for Remotion
Remotion音频组件
typescript
import { Audio, Sequence, useVideoConfig } from 'remotion';
interface NarrationProps {
audioUrl: string;
startFrame: number;
volume?: number;
}
export const Narration: React.FC<NarrationProps> = ({
audioUrl,
startFrame,
volume = 1
}) => {
return (
<Audio
src={audioUrl}
startFrom={0}
volume={volume}
/>
);
};
// Usage in a scene
export const NarratedScene: React.FC = () => {
return (
<>
<Sequence from={0} durationInFrames={150}>
<HookScene />
<Narration audioUrl="/audio/hook-narration.mp3" startFrame={0} />
</Sequence>
<Sequence from={150} durationInFrames={300}>
<DemoScene />
<Narration audioUrl="/audio/demo-narration.mp3" startFrame={150} />
</Sequence>
</>
);
};typescript
import { Audio, Sequence, useVideoConfig } from 'remotion';
interface NarrationProps {
audioUrl: string;
startFrame: number;
volume?: number;
}
export const Narration: React.FC<NarrationProps> = ({
audioUrl,
startFrame,
volume = 1
}) => {
return (
<Audio
src={audioUrl}
startFrom={0}
volume={volume}
/>
);
};
// 在场景中使用
export const NarratedScene: React.FC = () => {
return (
<>
<Sequence from={0} durationInFrames={150}>
<HookScene />
<Narration audioUrl="/audio/hook-narration.mp3" startFrame={0} />
</Sequence>
<Sequence from={150} durationInFrames={300}>
<DemoScene />
<Narration audioUrl="/audio/demo-narration.mp3" startFrame={150} />
</Sequence>
</>
);
};Cost Optimization
成本优化
Character Counting
字符数统计
typescript
function estimateCost(
text: string,
model: 'multilingual' | 'turbo' | 'flash' = 'multilingual'
): number {
const chars = text.length;
const costPer1K = {
multilingual: 0.30,
turbo: 0.18,
flash: 0.08
};
return (chars / 1000) * costPer1K[model];
}typescript
function estimateCost(
text: string,
model: 'multilingual' | 'turbo' | 'flash' = 'multilingual'
): number {
const chars = text.length;
const costPer1K = {
multilingual: 0.30,
turbo: 0.18,
flash: 0.08
};
return (chars / 1000) * costPer1K[model];
}Cost Optimization Strategies
成本优化策略
| Strategy | Savings | Implementation |
|---|---|---|
| Use Turbo for drafts | 40% | Switch model_id during preview |
| Cache generated audio | 100% | Hash text+voice, store locally |
| Batch similar requests | 20% | Group by voice, reduce overhead |
| Use Flash for previews | 73% | Draft with flash, final with v2 |
| 策略 | 节省比例 | 实现方式 |
|---|---|---|
| 草稿使用Turbo模型 | 40% | 预览阶段切换model_id |
| 缓存生成的音频 | 100% | 对文本+语音进行哈希,本地存储 |
| 批量处理相似请求 | 20% | 按语音分组,减少请求开销 |
| 预览使用Flash模型 | 73% | 草稿用Flash,最终版本用v2模型 |
Environment Setup
环境配置
bash
undefinedbash
undefinedRequired
必填
ELEVENLABS_API_KEY=xi_xxxxxxxxxxxxxxxxxxxx
ELEVENLABS_API_KEY=xi_xxxxxxxxxxxxxxxxxxxx
Optional
可选
ELEVENLABS_MODEL_ID=eleven_multilingual_v2
ELEVENLABS_DEFAULT_VOICE=21m00Tcm4TlvDq8ikWAM
undefinedELEVENLABS_MODEL_ID=eleven_multilingual_v2
ELEVENLABS_DEFAULT_VOICE=21m00Tcm4TlvDq8ikWAM
undefinedRelated Skills
相关技能
- : Video rhythm and timing rules
video-pacing - : Pre-production planning and scene structure
video-storyboarding - : Broader TTS comparison (Gemini, OpenAI, etc.)
audio-language-models - : Programmatic video generation
remotion-composer
- : 视频节奏与时长规则
video-pacing - : 前期制作规划与场景结构
video-storyboarding - : 更全面的TTS对比(Gemini、OpenAI等)
audio-language-models - : 程序化视频生成
remotion-composer
References
参考资料
- API Integration - Full API patterns, streaming, error handling
- Voice Selection - Complete voice catalog with characteristics
- Timing Calculation - Segment planning, pipeline implementation
- API集成 - 完整API模式、流式输出、错误处理
- 语音选择 - 完整语音库及特点介绍
- 时长计算 - 片段规划、流水线实现
Capability Details
能力详情
elevenlabs-tts
elevenlabs-tts
Keywords: elevenlabs, tts, text-to-speech, narration, voice
Solves:
- Generate narration audio with ElevenLabs
- Configure voice settings for video
- Integrate TTS into video pipeline
关键词: elevenlabs, tts, text-to-speech, narration, voice
解决场景:
- 使用ElevenLabs生成旁白音频
- 为视频配置语音设置
- 将TTS集成到视频流水线
voice-selection
voice-selection
Keywords: voice, rachel, adam, narrator, character
Solves:
- Choose the right voice for content type
- Configure voice settings (stability, similarity)
- Match voice to audience and tone
关键词: voice, rachel, adam, narrator, character
解决场景:
- 根据内容类型选择合适的语音
- 配置语音设置(稳定性、相似度)
- 匹配语音与受众及风格
segment-timing
segment-timing
Keywords: timing, frames, milliseconds, duration, pacing
Solves:
- Convert frames to milliseconds
- Calculate WPM for narration
- Validate script fits video segment
关键词: timing, frames, milliseconds, duration, pacing
解决场景:
- 帧数与毫秒转换
- 计算旁白每分钟字数
- 验证脚本是否适配视频片段
cost-optimization
cost-optimization
Keywords: cost, pricing, budget, optimize, characters
Solves:
- Estimate narration costs
- Reduce API usage with caching
- Choose cost-effective models
关键词: cost, pricing, budget, optimize, characters
解决场景:
- 估算旁白成本
- 通过缓存减少API使用量
- 选择高性价比模型