Loading...
Loading...
ElevenLabs TTS integration for video narration. Use when generating voiceover audio, selecting voices, or building script-to-audio pipelines
npx skill4agent add yonatangross/orchestkit elevenlabs-narration| Model | Latency | Quality | Cost | Best For |
|---|---|---|---|---|
| eleven_multilingual_v2 | Medium | Best | $0.30/1K chars | Production, multilingual |
| eleven_turbo_v2_5 | Low | Excellent | $0.18/1K chars | Real-time, drafts |
| eleven_flash_v2_5 | Lowest | Good | $0.08/1K chars | Previews, testing |
| eleven_english_sts_v2 | Medium | Best | $0.30/1K chars | Speech-to-speech |
Base URL: https://api.elevenlabs.io/v1
POST /text-to-speech/{voice_id} # Generate audio
POST /text-to-speech/{voice_id}/stream # Stream audio
GET /voices # List voices
GET /voices/{voice_id} # Voice details
GET /user # Usage/quota
POST /speech-to-speech/{voice_id} # Voice conversionimport { ElevenLabsClient } from 'elevenlabs';
const client = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY
});
async function generateNarration(
text: string,
voiceId: string = 'Rachel'
): Promise<Buffer> {
const audio = await client.generate({
voice: voiceId,
text: text,
model_id: 'eleven_multilingual_v2',
voice_settings: {
stability: 0.5,
similarity_boost: 0.8,
style: 0.0,
use_speaker_boost: true
}
});
// Convert stream to buffer
const chunks: Buffer[] = [];
for await (const chunk of audio) {
chunks.push(chunk);
}
return Buffer.concat(chunks);
}| Voice | ID | Characteristics | Use Case |
|---|---|---|---|
| Rachel | 21m00Tcm4TlvDq8ikWAM | Warm, conversational | General narration |
| Adam | pNInz6obpgDQGcFmaJgB | Deep, authoritative | Tech explainers |
| Antoni | ErXwobaYiN019PkySvjV | Energetic, youthful | Product demos |
| Bella | EXAVITQu4vr4xnSDxMaL | Friendly, engaging | Tutorials |
| Josh | TxGEqnHWrfWFTfGW9XjX | Deep, narrative | Documentaries |
interface VoiceSettings {
stability: number; // 0.0-1.0 (lower = more expressive)
similarity_boost: number; // 0.0-1.0 (higher = closer to original)
style: number; // 0.0-1.0 (v2 models only)
use_speaker_boost: boolean; // Clarity enhancement
}
// Recommended settings by content type
const VOICE_PRESETS = {
narration: { stability: 0.65, similarity_boost: 0.8, style: 0.0 },
conversational: { stability: 0.4, similarity_boost: 0.75, style: 0.2 },
dramatic: { stability: 0.3, similarity_boost: 0.9, style: 0.5 },
professional: { stability: 0.8, similarity_boost: 0.85, style: 0.0 },
energetic: { stability: 0.35, similarity_boost: 0.85, style: 0.4 }
};function framesToMs(frames: number, fps: number = 30): number {
return Math.round((frames / fps) * 1000);
}
function msToFrames(ms: number, fps: number = 30): number {
return Math.round((ms / 1000) * fps);
}
// Examples
framesToMs(90, 30); // 3000ms (3 seconds at 30fps)
framesToMs(150, 30); // 5000ms (5 seconds at 30fps)
msToFrames(2500, 30); // 75 framesSpeaking Speed WPM Words/30s Use Case
----------------------------------------------------------
Slow (dramatic) 100 50 Hooks, reveals
Normal narration 130-150 65-75 Standard content
Conversational 150-170 75-85 Tutorials, demos
Fast (excited) 170-190 85-95 Features, energy
Very fast 200+ 100+ Avoid (unclear)import { Audio, Sequence, useVideoConfig } from 'remotion';
interface NarrationProps {
audioUrl: string;
startFrame: number;
volume?: number;
}
export const Narration: React.FC<NarrationProps> = ({
audioUrl,
startFrame,
volume = 1
}) => {
return (
<Audio
src={audioUrl}
startFrom={0}
volume={volume}
/>
);
};
// Usage in a scene
export const NarratedScene: React.FC = () => {
return (
<>
<Sequence from={0} durationInFrames={150}>
<HookScene />
<Narration audioUrl="/audio/hook-narration.mp3" startFrame={0} />
</Sequence>
<Sequence from={150} durationInFrames={300}>
<DemoScene />
<Narration audioUrl="/audio/demo-narration.mp3" startFrame={150} />
</Sequence>
</>
);
};function estimateCost(
text: string,
model: 'multilingual' | 'turbo' | 'flash' = 'multilingual'
): number {
const chars = text.length;
const costPer1K = {
multilingual: 0.30,
turbo: 0.18,
flash: 0.08
};
return (chars / 1000) * costPer1K[model];
}| Strategy | Savings | Implementation |
|---|---|---|
| Use Turbo for drafts | 40% | Switch model_id during preview |
| Cache generated audio | 100% | Hash text+voice, store locally |
| Batch similar requests | 20% | Group by voice, reduce overhead |
| Use Flash for previews | 73% | Draft with flash, final with v2 |
# Required
ELEVENLABS_API_KEY=xi_xxxxxxxxxxxxxxxxxxxx
# Optional
ELEVENLABS_MODEL_ID=eleven_multilingual_v2
ELEVENLABS_DEFAULT_VOICE=21m00Tcm4TlvDq8ikWAMvideo-pacingvideo-storyboardingaudio-language-modelsremotion-composer