elevenlabs-narration

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ElevenLabs Narration for Video Production

面向视频制作的ElevenLabs旁白解决方案

Complete integration guide for using ElevenLabs text-to-speech in video production pipelines. Covers voice selection, timing calculations, API patterns, and cost optimization for professional narration.
在视频制作流水线中使用ElevenLabs文本转语音(TTS)的完整集成指南。涵盖专业旁白场景下的语音选择、时长计算、API模式以及成本优化等内容。

Overview

概述

  • Generating narration audio for video segments
  • Selecting appropriate voices for content type
  • Calculating segment timing from frames to milliseconds
  • Building script-to-audio pipelines
  • Optimizing API usage and costs
  • Handling rate limits and errors
  • 为视频片段生成旁白音频
  • 根据内容类型选择合适的语音
  • 将视频帧数转换为毫秒级时长计算
  • 构建脚本转音频流水线
  • 优化API使用方式与成本
  • 处理速率限制与错误

ElevenLabs API Overview

ElevenLabs API概述

Model Comparison (2026)

2026年模型对比

ModelLatencyQualityCostBest For
eleven_multilingual_v2MediumBest$0.30/1K charsProduction, multilingual
eleven_turbo_v2_5LowExcellent$0.18/1K charsReal-time, drafts
eleven_flash_v2_5LowestGood$0.08/1K charsPreviews, testing
eleven_english_sts_v2MediumBest$0.30/1K charsSpeech-to-speech
模型延迟音质成本适用场景
eleven_multilingual_v2中等最佳0.30美元/千字符生产环境、多语言场景
eleven_turbo_v2_5优秀0.18美元/千字符实时场景、草稿制作
eleven_flash_v2_5最低良好0.08美元/千字符预览、测试场景
eleven_english_sts_v2中等最佳0.30美元/千字符语音转语音场景

API Endpoints

API端点

Base URL: https://api.elevenlabs.io/v1

POST /text-to-speech/{voice_id}           # Generate audio
POST /text-to-speech/{voice_id}/stream    # Stream audio
GET  /voices                              # List voices
GET  /voices/{voice_id}                   # Voice details
GET  /user                                # Usage/quota
POST /speech-to-speech/{voice_id}         # Voice conversion
Base URL: https://api.elevenlabs.io/v1

POST /text-to-speech/{voice_id}           # 生成音频
POST /text-to-speech/{voice_id}/stream    # 流式输出音频
GET  /voices                              # 列出所有语音
GET  /voices/{voice_id}                   # 获取语音详情
GET  /user                                # 查询使用量/配额
POST /speech-to-speech/{voice_id}         # 语音转换

Core Integration Pattern

核心集成模式

Basic Text-to-Speech

基础文本转语音实现

typescript
import { ElevenLabsClient } from 'elevenlabs';

const client = new ElevenLabsClient({
  apiKey: process.env.ELEVENLABS_API_KEY
});

async function generateNarration(
  text: string,
  voiceId: string = 'Rachel'
): Promise<Buffer> {
  const audio = await client.generate({
    voice: voiceId,
    text: text,
    model_id: 'eleven_multilingual_v2',
    voice_settings: {
      stability: 0.5,
      similarity_boost: 0.8,
      style: 0.0,
      use_speaker_boost: true
    }
  });

  // Convert stream to buffer
  const chunks: Buffer[] = [];
  for await (const chunk of audio) {
    chunks.push(chunk);
  }
  return Buffer.concat(chunks);
}
typescript
import { ElevenLabsClient } from 'elevenlabs';

const client = new ElevenLabsClient({
  apiKey: process.env.ELEVENLABS_API_KEY
});

async function generateNarration(
  text: string,
  voiceId: string = 'Rachel'
): Promise<Buffer> {
  const audio = await client.generate({
    voice: voiceId,
    text: text,
    model_id: 'eleven_multilingual_v2',
    voice_settings: {
      stability: 0.5,
      similarity_boost: 0.8,
      style: 0.0,
      use_speaker_boost: true
    }
  });

  // 将流转换为Buffer
  const chunks: Buffer[] = [];
  for await (const chunk of audio) {
    chunks.push(chunk);
  }
  return Buffer.concat(chunks);
}

Voice Selection Quick Reference

语音选择快速参考

Pre-Built Voices for Video Narration

适用于视频旁白的预置语音

VoiceIDCharacteristicsUse Case
Rachel21m00Tcm4TlvDq8ikWAMWarm, conversationalGeneral narration
AdampNInz6obpgDQGcFmaJgBDeep, authoritativeTech explainers
AntoniErXwobaYiN019PkySvjVEnergetic, youthfulProduct demos
BellaEXAVITQu4vr4xnSDxMaLFriendly, engagingTutorials
JoshTxGEqnHWrfWFTfGW9XjXDeep, narrativeDocumentaries
语音名称ID特点适用场景
Rachel21m00Tcm4TlvDq8ikWAM温暖、口语化通用旁白
AdampNInz6obpgDQGcFmaJgB低沉、权威科技讲解视频
AntoniErXwobaYiN019PkySvjV活力、年轻化产品演示视频
BellaEXAVITQu4vr4xnSDxMaL友好、有吸引力教程视频
JoshTxGEqnHWrfWFTfGW9XjX深沉、叙事感强纪录片

Voice Settings Explained

语音设置说明

typescript
interface VoiceSettings {
  stability: number;        // 0.0-1.0 (lower = more expressive)
  similarity_boost: number; // 0.0-1.0 (higher = closer to original)
  style: number;           // 0.0-1.0 (v2 models only)
  use_speaker_boost: boolean; // Clarity enhancement
}

// Recommended settings by content type
const VOICE_PRESETS = {
  narration: { stability: 0.65, similarity_boost: 0.8, style: 0.0 },
  conversational: { stability: 0.4, similarity_boost: 0.75, style: 0.2 },
  dramatic: { stability: 0.3, similarity_boost: 0.9, style: 0.5 },
  professional: { stability: 0.8, similarity_boost: 0.85, style: 0.0 },
  energetic: { stability: 0.35, similarity_boost: 0.85, style: 0.4 }
};
typescript
interface VoiceSettings {
  stability: number;        // 0.0-1.0(值越低,表现力越强)
  similarity_boost: number; // 0.0-1.0(值越高,越接近原始语音)
  style: number;           // 0.0-1.0(仅v2模型支持)
  use_speaker_boost: boolean; // 清晰度增强
}

// 按内容类型推荐的设置
const VOICE_PRESETS = {
  narration: { stability: 0.65, similarity_boost: 0.8, style: 0.0 },
  conversational: { stability: 0.4, similarity_boost: 0.75, style: 0.2 },
  dramatic: { stability: 0.3, similarity_boost: 0.9, style: 0.5 },
  professional: { stability: 0.8, similarity_boost: 0.85, style: 0.0 },
  energetic: { stability: 0.35, similarity_boost: 0.85, style: 0.4 }
};

Segment Timing Calculations

片段时长计算

Frame-to-Milliseconds Conversion

帧数转毫秒转换

typescript
function framesToMs(frames: number, fps: number = 30): number {
  return Math.round((frames / fps) * 1000);
}

function msToFrames(ms: number, fps: number = 30): number {
  return Math.round((ms / 1000) * fps);
}

// Examples
framesToMs(90, 30);   // 3000ms (3 seconds at 30fps)
framesToMs(150, 30);  // 5000ms (5 seconds at 30fps)
msToFrames(2500, 30); // 75 frames
typescript
function framesToMs(frames: number, fps: number = 30): number {
  return Math.round((frames / fps) * 1000);
}

function msToFrames(ms: number, fps: number = 30): number {
  return Math.round((ms / 1000) * fps);
}

// 示例
framesToMs(90, 30);   // 3000ms(30帧率下的3秒)
framesToMs(150, 30);  // 5000ms(30帧率下的5秒)
msToFrames(2500, 30); // 75帧

Words Per Minute Reference

每分钟字数参考

Speaking Speed       WPM     Words/30s    Use Case
----------------------------------------------------------
Slow (dramatic)      100     50           Hooks, reveals
Normal narration     130-150 65-75        Standard content
Conversational       150-170 75-85        Tutorials, demos
Fast (excited)       170-190 85-95        Features, energy
Very fast            200+    100+         Avoid (unclear)
语速       每分钟字数(WPM)  30秒字数    适用场景
----------------------------------------------------------
慢速(戏剧化)      100     50           开场钩子、揭秘场景
正常旁白     130-150 65-75        标准内容
口语化       150-170 75-85        教程、演示视频
快速(兴奋)       170-190 85-95        功能介绍、活力场景
极快            200+    100+         避免使用(清晰度不足)

Remotion Integration

Remotion集成

Audio Component for Remotion

Remotion音频组件

typescript
import { Audio, Sequence, useVideoConfig } from 'remotion';

interface NarrationProps {
  audioUrl: string;
  startFrame: number;
  volume?: number;
}

export const Narration: React.FC<NarrationProps> = ({
  audioUrl,
  startFrame,
  volume = 1
}) => {
  return (
    <Audio
      src={audioUrl}
      startFrom={0}
      volume={volume}
    />
  );
};

// Usage in a scene
export const NarratedScene: React.FC = () => {
  return (
    <>
      <Sequence from={0} durationInFrames={150}>
        <HookScene />
        <Narration audioUrl="/audio/hook-narration.mp3" startFrame={0} />
      </Sequence>

      <Sequence from={150} durationInFrames={300}>
        <DemoScene />
        <Narration audioUrl="/audio/demo-narration.mp3" startFrame={150} />
      </Sequence>
    </>
  );
};
typescript
import { Audio, Sequence, useVideoConfig } from 'remotion';

interface NarrationProps {
  audioUrl: string;
  startFrame: number;
  volume?: number;
}

export const Narration: React.FC<NarrationProps> = ({
  audioUrl,
  startFrame,
  volume = 1
}) => {
  return (
    <Audio
      src={audioUrl}
      startFrom={0}
      volume={volume}
    />
  );
};

// 在场景中使用
export const NarratedScene: React.FC = () => {
  return (
    <>
      <Sequence from={0} durationInFrames={150}>
        <HookScene />
        <Narration audioUrl="/audio/hook-narration.mp3" startFrame={0} />
      </Sequence>

      <Sequence from={150} durationInFrames={300}>
        <DemoScene />
        <Narration audioUrl="/audio/demo-narration.mp3" startFrame={150} />
      </Sequence>
    </>
  );
};

Cost Optimization

成本优化

Character Counting

字符数统计

typescript
function estimateCost(
  text: string,
  model: 'multilingual' | 'turbo' | 'flash' = 'multilingual'
): number {
  const chars = text.length;
  const costPer1K = {
    multilingual: 0.30,
    turbo: 0.18,
    flash: 0.08
  };
  return (chars / 1000) * costPer1K[model];
}
typescript
function estimateCost(
  text: string,
  model: 'multilingual' | 'turbo' | 'flash' = 'multilingual'
): number {
  const chars = text.length;
  const costPer1K = {
    multilingual: 0.30,
    turbo: 0.18,
    flash: 0.08
  };
  return (chars / 1000) * costPer1K[model];
}

Cost Optimization Strategies

成本优化策略

StrategySavingsImplementation
Use Turbo for drafts40%Switch model_id during preview
Cache generated audio100%Hash text+voice, store locally
Batch similar requests20%Group by voice, reduce overhead
Use Flash for previews73%Draft with flash, final with v2
策略节省比例实现方式
草稿使用Turbo模型40%预览阶段切换model_id
缓存生成的音频100%对文本+语音进行哈希,本地存储
批量处理相似请求20%按语音分组,减少请求开销
预览使用Flash模型73%草稿用Flash,最终版本用v2模型

Environment Setup

环境配置

bash
undefined
bash
undefined

Required

必填

ELEVENLABS_API_KEY=xi_xxxxxxxxxxxxxxxxxxxx
ELEVENLABS_API_KEY=xi_xxxxxxxxxxxxxxxxxxxx

Optional

可选

ELEVENLABS_MODEL_ID=eleven_multilingual_v2 ELEVENLABS_DEFAULT_VOICE=21m00Tcm4TlvDq8ikWAM
undefined
ELEVENLABS_MODEL_ID=eleven_multilingual_v2 ELEVENLABS_DEFAULT_VOICE=21m00Tcm4TlvDq8ikWAM
undefined

Related Skills

相关技能

  • video-pacing
    : Video rhythm and timing rules
  • video-storyboarding
    : Pre-production planning and scene structure
  • audio-language-models
    : Broader TTS comparison (Gemini, OpenAI, etc.)
  • remotion-composer
    : Programmatic video generation
  • video-pacing
    : 视频节奏与时长规则
  • video-storyboarding
    : 前期制作规划与场景结构
  • audio-language-models
    : 更全面的TTS对比(Gemini、OpenAI等)
  • remotion-composer
    : 程序化视频生成

References

参考资料

  • API Integration - Full API patterns, streaming, error handling
  • Voice Selection - Complete voice catalog with characteristics
  • Timing Calculation - Segment planning, pipeline implementation
  • API集成 - 完整API模式、流式输出、错误处理
  • 语音选择 - 完整语音库及特点介绍
  • 时长计算 - 片段规划、流水线实现

Capability Details

能力详情

elevenlabs-tts

elevenlabs-tts

Keywords: elevenlabs, tts, text-to-speech, narration, voice Solves:
  • Generate narration audio with ElevenLabs
  • Configure voice settings for video
  • Integrate TTS into video pipeline
关键词: elevenlabs, tts, text-to-speech, narration, voice 解决场景:
  • 使用ElevenLabs生成旁白音频
  • 为视频配置语音设置
  • 将TTS集成到视频流水线

voice-selection

voice-selection

Keywords: voice, rachel, adam, narrator, character Solves:
  • Choose the right voice for content type
  • Configure voice settings (stability, similarity)
  • Match voice to audience and tone
关键词: voice, rachel, adam, narrator, character 解决场景:
  • 根据内容类型选择合适的语音
  • 配置语音设置(稳定性、相似度)
  • 匹配语音与受众及风格

segment-timing

segment-timing

Keywords: timing, frames, milliseconds, duration, pacing Solves:
  • Convert frames to milliseconds
  • Calculate WPM for narration
  • Validate script fits video segment
关键词: timing, frames, milliseconds, duration, pacing 解决场景:
  • 帧数与毫秒转换
  • 计算旁白每分钟字数
  • 验证脚本是否适配视频片段

cost-optimization

cost-optimization

Keywords: cost, pricing, budget, optimize, characters Solves:
  • Estimate narration costs
  • Reduce API usage with caching
  • Choose cost-effective models
关键词: cost, pricing, budget, optimize, characters 解决场景:
  • 估算旁白成本
  • 通过缓存减少API使用量
  • 选择高性价比模型