voice-generation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVoice Generation Skill
语音生成技能
Generate realistic speech using AI (Google Gemini TTS, ElevenLabs, OpenAI TTS).
使用AI生成逼真的语音(支持Google Gemini TTS、ElevenLabs、OpenAI TTS)。
Prerequisites
前置条件
At least one API key is required:
- - For Google Gemini TTS (same key as video/image/music) ✅
GOOGLE_API_KEY - - For ElevenLabs high-quality voice synthesis
ELEVENLABS_API_KEY - - For OpenAI TTS voices
OPENAI_API_KEY
至少需要一个API密钥:
- - 适用于Google Gemini TTS(与视频/图片/音乐使用相同密钥)✅
GOOGLE_API_KEY - - 适用于ElevenLabs高质量语音合成
ELEVENLABS_API_KEY - - 适用于OpenAI TTS语音
OPENAI_API_KEY
Available APIs
可用API
Google Gemini TTS (Recommended - Same API Key)
Google Gemini TTS(推荐 - 共享同一密钥)
- Best for: Podcasts, dialogues, audiobooks with style control
- Voices: 30 voices with natural language style control
- Multi-speaker: Up to 2 speakers for dialogues ✅
- Languages: 24 languages (auto-detected)
- Features: Control style, accent, pace via prompts
- Output: 24kHz WAV
- API Key: Same as video/image/music ✅
GOOGLE_API_KEY
- 最佳适用场景:播客、对话、带风格控制的有声书
- 语音选项:30种语音,支持自然语言风格控制
- 多说话人:最多支持2个说话人进行对话 ✅
- 语言支持:24种语言(自动检测)
- 功能特性:可通过提示词控制风格、口音、语速
- 输出格式:24kHz WAV
- API密钥:与视频/图片/音乐使用相同的✅
GOOGLE_API_KEY
ElevenLabs (Best Quality)
ElevenLabs(最高音质)
- Best for: Natural-sounding voices, voice cloning, long-form content
- Voices: 100+ pre-made voices + custom voice cloning
- Languages: 29+ languages
- Models: Eleven Multilingual v2, Eleven Turbo v2
- 最佳适用场景:自然语音、语音克隆、长内容生成
- 语音选项:100+预制语音 + 自定义语音克隆
- 语言支持:29+种语言
- 模型:Eleven Multilingual v2、Eleven Turbo v2
OpenAI TTS (Simplest)
OpenAI TTS(最简易用)
- Best for: Quick, reliable text-to-speech with consistent quality
- Voices: alloy, echo, fable, onyx, nova, shimmer
- Models: tts-1 (fast), tts-1-hd (high quality)
- Output: MP3, Opus, AAC, FLAC
- 最佳适用场景:快速、可靠的文本转语音,音质稳定
- 语音选项:alloy、echo、fable、onyx、nova、shimmer
- 模型:tts-1(快速)、tts-1-hd(高质量)
- 输出格式:MP3、Opus、AAC、FLAC
Workflow
工作流程
Step 1: Understand the Request
步骤1:理解用户请求
Parse the user's voice request for:
- Text content: What should be spoken?
- Voice type: Male, female, specific character?
- Tone: Professional, casual, dramatic, cheerful?
- Use case: Narration, voiceover, audiobook, notification?
- Language: English, Spanish, other?
- Speed: Normal, slow, fast?
解析用户的语音生成请求,明确以下信息:
- 文本内容:需要朗读的内容是什么?
- 语音类型:男声、女声、特定角色?
- 语气:专业、随意、戏剧化、欢快?
- 使用场景:旁白、配音、有声书、通知?
- 语言:英语、西班牙语或其他语言?
- 语速:正常、慢速、快速?
Step 2: Select Voice and API
步骤2:选择语音与API
Choose based on requirements:
| Use Case | Recommended API | Reason |
|---|---|---|
| Default / Same key as video | Gemini TTS | Same |
| Multi-speaker dialogue | Gemini TTS | Up to 2 speakers built-in |
| Style/accent control | Gemini TTS | Natural language prompts |
| Voice cloning | ElevenLabs | Only API with cloning |
| 100+ voice options | ElevenLabs | Widest selection |
| Audiobook/podcast | ElevenLabs or Gemini | Both excellent for long content |
| Quick narration | OpenAI TTS | Fast, reliable |
| Budget-conscious | OpenAI TTS | Lower cost |
根据需求选择合适的选项:
| 使用场景 | 推荐API | 理由 |
|---|---|---|
| 默认选项 / 与视频共享密钥 | Gemini TTS | 使用相同的 |
| 多说话人对话 | Gemini TTS | 内置支持最多2个说话人 |
| 风格/口音控制 | Gemini TTS | 支持自然语言提示词 |
| 语音克隆 | ElevenLabs | 唯一支持克隆的API |
| 100+语音选项 | ElevenLabs | 语音选择范围最广 |
| 有声书/播客 | ElevenLabs或Gemini | 两者均适用于长内容 |
| 快速旁白生成 | OpenAI TTS | 快速、可靠 |
| 预算友好 | OpenAI TTS | 成本更低 |
Step 3: Prepare the Text
步骤3:优化文本
Optimize text for speech:
- Add pauses: Use commas, periods for natural rhythm
- Spell out numbers: "1,234" → "one thousand two hundred thirty-four" (if needed)
- Handle acronyms: "NASA" vs "N.A.S.A." depending on pronunciation
- Mark emphasis: Some APIs support emphasis markers
Example transformation:
- Original: "The Q4 2024 results show a 15% YoY increase."
- Optimized: "The Q4 2024 results show a fifteen percent year-over-year increase."
为语音合成优化文本:
- 添加停顿:使用逗号、句号营造自然节奏
- 数字拼写:如将“1,234”转换为“one thousand two hundred thirty-four”(按需调整)
- 首字母缩写词处理:根据发音选择“NASA”或“N.A.S.A.”
- 标记重音:部分API支持重音标记
转换示例:
- 原文:"The Q4 2024 results show a 15% YoY increase."
- 优化后:"The Q4 2024 results show a fifteen percent year-over-year increase."
Step 4: Generate the Audio
步骤4:生成音频
Execute the appropriate script from :
${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/For Google Gemini TTS (single speaker):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "Welcome to our podcast!" \
--voice "Charon"Gemini TTS with style direction:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "Have a wonderful day!" \
--voice "Puck" \
--style "Say cheerfully with a British accent:"Gemini TTS multi-speaker (dialogue):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--multi \
--speaker "Host:Charon" \
--speaker "Guest:Aoede" \
--text "Host: Welcome to the show!
Guest: Thanks for having me!"For ElevenLabs:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/elevenlabs.py \
--text "Your text here" \
--voice "Rachel" \
--model "eleven_multilingual_v2"For OpenAI TTS:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/openai_tts.py \
--text "Your text here" \
--voice "nova" \
--model "tts-1-hd"List Gemini voices:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py --list-voices执行目录下的对应脚本:
${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/Google Gemini TTS(单说话人):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "Welcome to our podcast!" \
--voice "Charon"带风格指令的Gemini TTS:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "Have a wonderful day!" \
--voice "Puck" \
--style "Say cheerfully with a British accent:"Gemini TTS多说话人(对话):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--multi \
--speaker "Host:Charon" \
--speaker "Guest:Aoede" \
--text "Host: Welcome to the show!
Guest: Thanks for having me!"ElevenLabs:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/elevenlabs.py \
--text "Your text here" \
--voice "Rachel" \
--model "eleven_multilingual_v2"OpenAI TTS:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/openai_tts.py \
--text "Your text here" \
--voice "nova" \
--model "tts-1-hd"列出Gemini语音选项:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py --list-voicesStep 5: Deliver the Result
步骤5:交付结果
- Provide the generated audio file path
- Mention the voice and settings used
- Offer to:
- Try a different voice
- Adjust speed or tone
- Use a different API
- Generate in a different format
- 提供生成的音频文件路径
- 说明使用的语音及设置
- 提供以下可选服务:
- 尝试不同语音
- 调整语速或语气
- 使用其他API
- 生成其他格式的音频
Error Handling
错误处理
Missing API key: Inform the user which key is needed:
- Gemini TTS: Same as video/image - https://aistudio.google.com/apikey
GOOGLE_API_KEY - ElevenLabs: https://elevenlabs.io
- OpenAI: https://platform.openai.com/api-keys
Gemini TTS requires google-genai package:
pip install google-genaiText too long: Split into chunks and concatenate, or suggest shorter text.
Rate limit: Suggest waiting or trying a different API.
Unsupported language: Suggest an alternative API that supports the language.
Multi-speaker limit: Gemini TTS supports max 2 speakers. For more, use ElevenLabs with multiple calls.
缺少API密钥:告知用户所需的密钥类型及获取链接:
- Gemini TTS:与视频/图片使用相同的- https://aistudio.google.com/apikey
GOOGLE_API_KEY - ElevenLabs:https://elevenlabs.io
- OpenAI:https://platform.openai.com/api-keys
Gemini TTS需要google-genai包:执行进行安装
pip install google-genai文本过长:将文本拆分后拼接,或建议缩短文本
速率限制:建议等待一段时间或尝试其他API
不支持的语言:推荐支持该语言的替代API
多说话人限制:Gemini TTS最多支持2个说话人。如需更多说话人,可使用ElevenLabs并进行多次调用
Voice Selection Guide
语音选择指南
Google Gemini TTS Voices (30 voices)
Google Gemini TTS语音(30种)
| Style | Voices | Best For |
|---|---|---|
| Bright/Upbeat | Zephyr, Puck, Aoede, Laomedeia | Marketing, cheerful content |
| Firm/Informative | Charon, Kore, Orus, Rasalgethi | News, tutorials, professional |
| Soft/Warm | Achernar, Sulafat, Vindemiatrix | Meditation, gentle narration |
| Smooth | Algieba, Despina, Callirrhoe | Audiobooks, storytelling |
| Clear | Erinome, Iapetus, Pulcherrima | Instructions, clarity |
| Character | Fenrir (excitable), Enceladus (breathy), Algenib (gravelly), Gacrux (mature) | Character voices, drama |
| Friendly | Achird, Zubenelgenubi (casual) | Casual, conversational |
Gemini TTS Style Tips:
- Use natural language: or
--style "Say angrily:"--style "Whisper mysteriously:" - Specify accents:
--style "Speak with a British accent from London:" - Control pace:
--style "Speak slowly and deliberately:" - Combine:
--style "Say excitedly with a Southern US accent:"
| 风格 | 语音选项 | 最佳适用场景 |
|---|---|---|
| 明亮/欢快 | Zephyr、Puck、Aoede、Laomedeia | 营销内容、欢快类内容 |
| 坚定/资讯类 | Charon、Kore、Orus、Rasalgethi | 新闻、教程、专业内容 |
| 柔和/温暖 | Achernar、Sulafat、Vindemiatrix | 冥想内容、温和旁白 |
| 流畅 | Algieba、Despina、Callirrhoe | 有声书、故事讲述 |
| 清晰 | Erinome、Iapetus、Pulcherrima | 操作说明、需要清晰表达的内容 |
| 角色类 | Fenrir(兴奋)、Enceladus(轻柔呼吸感)、Algenib(沙哑)、Gacrux(成熟) | 角色配音、戏剧内容 |
| 友好 | Achird、Zubenelgenubi(随意) | 日常对话、非正式内容 |
Gemini TTS风格提示技巧:
- 使用自然语言:或
--style "Say angrily:"--style "Whisper mysteriously:" - 指定口音:
--style "Speak with a British accent from London:" - 控制语速:
--style "Speak slowly and deliberately:" - 组合指令:
--style "Say excitedly with a Southern US accent:"
OpenAI TTS Voices
OpenAI TTS语音
| Voice | Description | Best For |
|---|---|---|
| alloy | Neutral, balanced | General purpose |
| echo | Warm, conversational | Podcasts, casual |
| fable | Expressive, British | Storytelling |
| onyx | Deep, authoritative | Narration, professional |
| nova | Friendly, upbeat | Marketing, tutorials |
| shimmer | Soft, gentle | Meditation, ASMR |
| 语音 | 描述 | 最佳适用场景 |
|---|---|---|
| alloy | 中性、均衡 | 通用场景 |
| echo | 温暖、口语化 | 播客、非正式内容 |
| fable | 富有表现力、英式口音 | 故事讲述 |
| onyx | 低沉、权威 | 旁白、专业内容 |
| nova | 友好、欢快 | 营销内容、教程 |
| shimmer | 柔和、轻柔 | 冥想、ASMR内容 |
ElevenLabs Popular Voices
ElevenLabs热门语音
| Voice | Description | Best For |
|---|---|---|
| Rachel | Young female, American | Narration, audiobooks |
| Domi | Young female, energetic | Marketing, ads |
| Bella | Young female, soft | Storytelling |
| Antoni | Young male, well-rounded | Narration |
| Josh | Young male, deep | Audiobooks |
| Arnold | Mature male, authoritative | Documentary |
| Adam | Middle-aged male, deep | Narration |
| Sam | Young male, raspy | Character voices |
| 语音 | 描述 | 最佳适用场景 |
|---|---|---|
| Rachel | 年轻女声、美式口音 | 旁白、有声书 |
| Domi | 年轻女声、充满活力 | 营销内容、广告 |
| Bella | 年轻女声、柔和 | 故事讲述 |
| Antoni | 年轻男声、全面均衡 | 旁白 |
| Josh | 年轻男声、低沉 | 有声书 |
| Arnold | 成熟男声、权威 | 纪录片 |
| Adam | 中年男声、低沉 | 旁白 |
| Sam | 年轻男声、沙哑 | 角色配音 |
Best Practices
最佳实践
For Narration
旁白场景
- Use a consistent voice throughout
- Add natural pauses between paragraphs
- Consider pacing for the content type
- 全程使用统一语音
- 在段落间添加自然停顿
- 根据内容类型调整语速
For Dialogue
对话场景
- Use different voices for different characters
- Match voice characteristics to character descriptions
- Adjust speed for emotional scenes
- 为不同角色使用不同语音
- 语音特征与角色描述匹配
- 根据情感场景调整语速
For Accessibility
无障碍场景
- Use clear, well-paced speech
- Avoid overly stylized voices
- Test with screen readers if applicable
- 使用清晰、语速适中的语音
- 避免过度风格化的语音
- 如有需要,配合屏幕阅读器进行测试
API Comparison
API对比
| Feature | Gemini TTS | ElevenLabs | OpenAI TTS |
|---|---|---|---|
| API Key | | | |
| Voice quality | Excellent | Excellent | Very good |
| Voice variety | 30 voices | 100+ voices | 6 voices |
| Multi-speaker | ✅ Up to 2 | ❌ No | ❌ No |
| Style control | ✅ Natural language | Limited | ❌ No |
| Voice cloning | ❌ No | ✅ Yes | ❌ No |
| Languages | 24 | 29+ | 50+ |
| Speed control | Via prompts | Yes | Yes (0.25-4x) |
| Max length | 32k tokens | 5,000 chars | 4,096 chars |
| Output format | WAV (24kHz) | MP3, WAV | MP3, Opus, AAC, FLAC |
| Same key as video/image | ✅ Yes | ❌ No | ❌ No |
| 特性 | Gemini TTS | ElevenLabs | OpenAI TTS |
|---|---|---|---|
| API密钥 | | | |
| 语音质量 | 优秀 | 优秀 | 非常好 |
| 语音多样性 | 30种 | 100+种 | 6种 |
| 多说话人支持 | ✅ 最多2个 | ❌ 不支持 | ❌ 不支持 |
| 风格控制 | ✅ 自然语言指令 | 有限支持 | ❌ 不支持 |
| 语音克隆 | ❌ 不支持 | ✅ 支持 | ❌ 不支持 |
| 语言数量 | 24种 | 29+种 | 50+种 |
| 语速控制 | 通过提示词 | 支持 | 支持(0.25-4倍) |
| 最大内容长度 | 32k tokens | 5000字符 | 4096字符 |
| 输出格式 | WAV(24kHz) | MP3、WAV | MP3、Opus、AAC、FLAC |
| 与视频/图片共享密钥 | ✅ 是 | ❌ 否 | ❌ 否 |