gemini-tts

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Gemini Text-to-Speech

Gemini 文本转语音

Generate natural-sounding speech from text using Gemini's TTS models through executable scripts with support for multiple voices and multi-speaker conversations.
通过可执行脚本,利用Gemini的TTS模型将文本转换为自然流畅的语音,支持多种音色和多角色对话。

When to Use This Skill

何时使用该Skill

Use this skill when you need to:
  • Convert text to natural speech
  • Create audio for podcasts, audiobooks, or videos
  • Generate multi-speaker conversations
  • Stream audio for long content
  • Choose from multiple voice options
  • Create accessible audio content
  • Generate voiceovers for presentations
  • Batch convert text to audio files
当你需要以下功能时,可使用本Skill:
  • 将文本转换为自然语音
  • 为播客、有声书或视频创建音频
  • 生成多角色对话音频
  • 对长文本内容进行流式音频输出
  • 从多种音色中选择合适的声音
  • 创建无障碍音频内容
  • 为演示文稿生成旁白
  • 批量将文本转换为音频文件

Available Scripts

可用脚本

scripts/tts.py

scripts/tts.py

Purpose: Convert text to speech using Gemini TTS models
When to use:
  • Any text-to-speech conversion
  • Multi-speaker conversation generation
  • Streaming audio for long texts
  • Voiceovers for content creation
  • Accessible audio generation
Key parameters:
ParameterDescriptionExample
text
Text to convert (required)
"Hello, world!"
--voice
,
-v
Voice name
Kore
--output
,
-o
Base name for output file
welcome
--output-dir
Output directory for audio
audio/
--no-timestamp
Disable auto timestampFlag
--model
,
-m
TTS model
gemini-2.5-flash-preview-tts
--stream
,
-s
Enable streamingFlag
--speakers
Multi-speaker mapping
"Joe:Kore,Jane:Puck"
Output: WAV audio file path
用途:使用Gemini TTS模型将文本转换为语音
适用场景
  • 任何文本转语音的转换需求
  • 生成多角色对话音频
  • 对长文本进行流式音频输出
  • 为内容创作生成旁白
  • 生成无障碍音频内容
关键参数:
参数描述示例
text
需要转换的文本(必填)
"Hello, world!"
--voice
,
-v
音色名称
Kore
--output
,
-o
输出文件的基础名称
welcome
--output-dir
音频输出目录
audio/
--no-timestamp
禁用自动时间戳标志参数
--model
,
-m
TTS模型
gemini-2.5-flash-preview-tts
--stream
,
-s
启用流式输出标志参数
--speakers
多角色音色映射
"Joe:Kore,Jane:Puck"
输出:WAV音频文件路径

Workflows

工作流

Workflow 1: Basic Text-to-Speech

工作流1:基础文本转语音

bash
python scripts/tts.py "Hello, world! Have a wonderful day."
  • Best for: Quick audio generation, simple messages
  • Voice:
    Kore
    (default, clear and professional)
  • Output:
    audio/tts_output_YYYYMMDD_HHMMSS.wav
    (auto timestamp)
bash
python scripts/tts.py "Hello, world! Have a wonderful day."
  • 最佳适用场景:快速生成音频、简单消息
  • 默认音色:
    Kore
    (清晰、专业)
  • 输出文件:
    audio/tts_output_YYYYMMDD_HHMMSS.wav
    (自动添加时间戳)

Workflow 2: Choose Different Voice

工作流2:选择不同音色

bash
python scripts/tts.py "Welcome to our podcast about technology trends" --voice Puck --output welcome
  • Best for: Friendly, conversational content
  • Voice options: Kore, Puck, Charon, Fenrir, Aoede, Zephyr, Sulafat
  • Output:
    audio/welcome_YYYYMMDD_HHMMSS.wav
bash
python scripts/tts.py "Welcome to our podcast about technology trends" --voice Puck --output welcome
  • 最佳适用场景:友好的对话类内容
  • 可用音色:Kore、Puck、Charon、Fenrir、Aoede、Zephyr、Sulafat
  • 输出文件:
    audio/welcome_YYYYMMDD_HHMMSS.wav

Workflow 3: Multi-Speaker Conversation

工作流3:多角色对话

bash
python scripts/tts.py "TTS the following conversation:
Joe: How's it going today?
Jane: Not too bad, how about you?
Joe: I'm working on a new project.
Jane: Sounds exciting, tell me more!" --speakers "Joe:Kore,Jane:Puck" --output conversation
  • Best for: Dialogues, interviews, role-playing content
  • Format: Marked conversation with speaker names
  • Script automatically routes text to appropriate voices
  • Output:
    audio/conversation_YYYYMMDD_HHMMSS.wav
bash
python scripts/tts.py "TTS the following conversation:
Joe: How's it going today?
Jane: Not too bad, how about you?
Joe: I'm working on a new project.
Jane: Sounds exciting, tell me more!" --speakers "Joe:Kore,Jane:Puck" --output conversation
  • 最佳适用场景:对话内容、访谈、角色扮演类内容
  • 格式:带有角色名称标记的对话文本
  • 脚本会自动将对应文本分配给指定音色
  • 输出文件:
    audio/conversation_YYYYMMDD_HHMMSS.wav

Workflow 4: Long Content with Streaming

工作流4:长文本流式输出

bash
python scripts/tts.py "This is a very long text that would benefit from streaming..." --stream --output long-form
  • Best for: Podcasts, audiobooks, long articles
  • Streaming: Processes audio in chunks for long texts
  • Output:
    audio/long-form_YYYYMMDD_HHMMSS.wav
bash
python scripts/tts.py "This is a very long text that would benefit from streaming..." --stream --output long-form
  • 最佳适用场景:播客、有声书、长篇文章
  • 流式处理:将长文本分块处理生成音频
  • 输出文件:
    audio/long-form_YYYYMMDD_HHMMSS.wav

Workflow 5: Professional Voiceover

工作流5:专业旁白

bash
python scripts/tts.py "Welcome to our quarterly earnings presentation. Today we'll discuss our growth metrics and future plans." --voice Charon --output voiceover
  • Best for: Corporate content, presentations, formal announcements
  • Voice:
    Charon
    (deep, authoritative)
  • Use when: Professional, serious tone required
bash
python scripts/tts.py "Welcome to our quarterly earnings presentation. Today we'll discuss our growth metrics and future plans." --voice Charon --output voiceover
  • 最佳适用场景:企业内容、演示文稿、正式公告
  • 音色:
    Charon
    (低沉、权威)
  • 适用场景:需要专业、严肃语气的内容

Workflow 6: Custom Output Directory

工作流6:自定义输出目录

bash
python scripts/tts.py "Save to specific folder." --output-dir ./my-projects/podcasts/ --output episode1
  • Best for: Organized project structures
  • Directory created automatically if it doesn't exist
  • Output:
    ./my-projects/podcasts/episode1_YYYYMMDD_HHMMSS.wav
bash
python scripts/tts.py "Save to specific folder." --output-dir ./my-projects/podcasts/ --output episode1
  • 最佳适用场景:结构化的项目文件管理
  • 目录不存在时会自动创建
  • 输出文件:
    ./my-projects/podcasts/episode1_YYYYMMDD_HHMMSS.wav

Workflow 7: Content Creation Pipeline (Text → Audio)

工作流7:内容创作流水线(文本→音频)

bash
undefined
bash
undefined

1. Generate script (gemini-text skill)

1. 生成脚本(gemini-text skill)

python skills/gemini-text/scripts/generate.py "Write a 2-minute podcast intro about sustainable energy"
python skills/gemini-text/scripts/generate.py "Write a 2-minute podcast intro about sustainable energy"

2. Generate audio (this skill)

2. 生成音频(本Skill)

python scripts/tts.py "[Paste generated script]" --voice Fenrir --output podcast-intro
python scripts/tts.py "[Paste generated script]" --voice Fenrir --output podcast-intro

3. Use in video or podcast

3. 用于视频或播客

- Best for: Podcasts, audiobooks, video narration
- Combines with: gemini-text for script generation
- 最佳适用场景:播客、有声书、视频旁白
- 搭配使用:gemini-text Skill用于生成脚本

Workflow 8: Accessible Content

工作流8:无障碍内容

bash
python scripts/tts.py "Welcome to our accessible website. This audio describes our main navigation options." --voice Aoede --output accessibility
  • Best for: Web accessibility, screen reader alternatives
  • Voice:
    Aoede
    (melodic, pleasant)
  • Use when: Making content accessible to visually impaired users
bash
python scripts/tts.py "Welcome to our accessible website. This audio describes our main navigation options." --voice Aoede --output accessibility
  • 最佳适用场景:网站无障碍优化、屏幕阅读器替代方案
  • 音色:
    Aoede
    (悦耳、柔和)
  • 适用场景:为视障用户创建可访问内容

Workflow 9: Educational Content

工作流9:教育类内容

bash
python scripts/tts.py "Chapter 1: Introduction to Quantum Computing. Let's explore the fundamental principles..." --voice Zephyr --output chapter1
  • Best for: Educational materials, tutorials, e-learning
  • Voice:
    Zephyr
    (light, airy)
  • Combines well with: gemini-text for content generation
bash
python scripts/tts.py "Chapter 1: Introduction to Quantum Computing. Let's explore the fundamental principles..." --voice Zephyr --output chapter1
  • 最佳适用场景:教育材料、教程、在线学习内容
  • 音色:
    Zephyr
    (轻快、清晰)
  • 搭配使用:gemini-text Skill用于生成内容

Workflow 10: Disable Timestamp

工作流10:禁用时间戳

bash
python scripts/tts.py "Fixed filename." --output my-audio --no-timestamp
  • Best for: When you want complete control over filename
  • Output:
    audio/my-audio.wav
    (no timestamp)
  • Use when: Generating files for specific naming schemes
bash
python scripts/tts.py "Fixed filename." --output my-audio --no-timestamp
  • 最佳适用场景:需要完全控制文件名的情况
  • 输出文件:
    audio/my-audio.wav
    (无时间戳)
  • 适用场景:生成符合特定命名规范的文件

Parameters Reference

参数参考

Model Selection

模型选择

ModelQualitySpeedBest For
gemini-2.5-flash-preview-tts
GoodFastGeneral use, high volume
gemini-2.5-pro-preview-tts
HigherSlowerPremium content, voiceovers
模型质量速度最佳适用场景
gemini-2.5-flash-preview-tts
良好快速通用场景、大音量生成需求
gemini-2.5-pro-preview-tts
更高较慢高质量内容、专业旁白

Voice Selection

音色选择

VoiceCharacteristicsBest For
KoreClear, professionalAnnouncements, general purpose (default)
PuckFriendly, conversationalCasual content, interviews
CharonDeep, authoritativeCorporate, serious content
FenrirWarm, expressiveStorytelling, narratives
AoedeMelodic, pleasantEducational, accessibility
ZephyrLight, airyGentle content, tutorials
SulafatNeutral, balancedDocumentaries, factual content
音色特点最佳适用场景
Kore清晰、专业公告、通用信息(默认音色)
Puck友好、口语化播客、访谈、休闲内容
Charon低沉、权威企业内容、新闻、正式演示
Fenrir温暖、富有表现力有声书、故事、情感类内容
Aoede悦耳、柔和教育内容、无障碍优化
Zephyr轻快、空灵温和类内容、教程
Sulafat中立、均衡纪录片、事实性演示

Audio Format

音频格式

SpecificationValue
FormatWAV (PCM)
Sample rate24000 Hz
Channels1 (mono)
Bit depth16-bit
规格数值
格式WAV (PCM)
采样率24000 Hz
声道1(单声道)
位深16-bit

Token Limits

令牌限制

LimitTypeDescription
8,192InputMaximum input text tokens
16,384OutputMaximum output audio tokens
限制值类型描述
8,192输入最大输入文本令牌数
16,384输出最大输出音频令牌数

Output Interpretation

输出说明

Audio File

音频文件

  • Format: WAV (compatible with most players)
  • Mono channel (single audio track)
  • Sample rate: 24000 Hz (broadcast quality)
  • Can be converted to MP3/AAC if needed
  • 格式:WAV(兼容大多数播放器)
  • 单声道(单个音轨)
  • 采样率:24000 Hz(广播级质量)
  • 可按需转换为MP3/AAC格式

Multi-Speaker Files

多角色音频文件

  • Single WAV file with multiple voices
  • Voices separated by timing within file
  • Use
    --speakers
    parameter to map speakers to voices
  • 包含多种音色的单个WAV文件
  • 不同音色通过时间轴区分
  • 使用
    --speakers
    参数映射角色与音色

Streaming Output

流式输出

  • Audio processed in chunks during generation
  • Script shows "Streaming audio..." message
  • Useful for very long texts or real-time applications
  • 生成音频时按块处理内容
  • 脚本会显示"Streaming audio..."提示
  • 适用于超长文本或实时应用场景

Common Issues

常见问题

"google-genai not installed"

"google-genai not installed"

bash
pip install google-genai
bash
pip install google-genai

"Voice name not found"

"Voice name not found"

  • Check voice name spelling
  • Use available voices: Kore, Puck, Charon, Fenrir, Aoede, Zephyr, Sulafat
  • Voice names are case-sensitive
  • 检查音色名称拼写
  • 使用可用音色:Kore、Puck、Charon、Fenrir、Aoede、Zephyr、Sulafat
  • 音色名称区分大小写

"No audio generated"

"No audio generated"

  • Check text is not empty
  • Verify text doesn't exceed token limit (8,192)
  • Try shorter text segments
  • Check API quota limits
  • 检查输入文本是否为空
  • 确认文本未超过令牌限制(8,192)
  • 尝试缩短文本长度
  • 检查API配额限制

"Multi-speaker format error"

"Multi-speaker format error"

  • Format:
    SpeakerName:VoiceName,Speaker2:Voice2
  • Separate speakers with commas
  • Use colon between speaker and voice
  • Example:
    "Joe:Kore,Jane:Puck,Host:Charon"
  • 格式要求:
    SpeakerName:VoiceName,Speaker2:Voice2
  • 使用逗号分隔不同角色
  • 角色与音色之间使用冒号分隔
  • 示例:
    "Joe:Kore,Jane:Puck,Host:Charon"

"Output file already exists"

"Output file already exists"

  • Script will overwrite existing files
  • Change
    --output
    filename to avoid conflicts
  • Use unique names for batch generation
  • 脚本会覆盖已存在的文件
  • 修改
    --output
    参数的文件名以避免冲突
  • 批量生成时使用唯一文件名

Audio quality issues

音频质量问题

  • Check input text for unusual characters
  • Try different voice for better pronunciation
  • Consider splitting long text into smaller segments
  • Verify audio playback software compatibility
  • 检查输入文本是否包含特殊字符
  • 尝试更换音色以获得更好的发音效果
  • 考虑将长文本拆分为多个逻辑段落
  • 确认音频播放软件的兼容性

Best Practices

最佳实践

Voice Selection

音色选择

  • Kore: General purpose, clear articulation
  • Puck: Conversational, engaging tone
  • Charon: Professional, authoritative
  • Fenrir: Emotional, storytelling
  • Aoede: Soft, gentle for accessibility
  • Zephyr: Educational, clear explanations
  • Kore:通用场景、清晰发音
  • Puck:口语化、引人入胜的语气
  • Charon:专业、权威的风格
  • Fenrir:富有情感、适合讲故事
  • Aoede:柔和、适合无障碍内容
  • Zephyr:教育场景、清晰的讲解

Text Preparation

文本准备

  • Use natural language and punctuation
  • Include pauses with commas and periods
  • Spell out difficult words if needed
  • Break very long text into logical segments
  • Add speaker labels for multi-speaker content
  • 使用自然语言和标点符号
  • 用逗号和句号设置停顿
  • 对生僻词可拼写完整
  • 将超长文本拆分为逻辑段落
  • 为多角色内容添加角色标签

Performance Optimization

性能优化

  • Use streaming for very long texts
  • Generate shorter segments for better control
  • Use flash model for faster generation
  • Batch process multiple files for efficiency
  • 对超长文本使用流式输出
  • 生成较短的文本段以获得更好的控制
  • 使用flash模型提升生成速度
  • 批量处理多个文件以提高效率

Quality Tips

质量提升技巧

  • Test different voices for your content type
  • Use appropriate pacing with punctuation
  • Consider context when selecting voice
  • Listen to output before final use
  • Multi-speaker requires clear speaker labeling
  • 针对不同内容类型测试多种音色
  • 用标点符号控制语速
  • 选择音色时考虑内容上下文
  • 最终使用前先试听输出音频
  • 多角色内容需要清晰的角色标记

Use Cases by Voice

按音色划分的适用场景

VoiceIdeal Use Cases
KoreAnnouncements, navigation, general info
PuckPodcasts, interviews, casual content
CharonCorporate, news, formal presentations
FenrirAudiobooks, stories, emotional content
AoedeAccessibility, educational, gentle content
ZephyrTutorials, explanations, guides
SulafatDocumentaries, factual presentations
音色理想适用场景
Kore公告、导航、通用信息
Puck播客、访谈、休闲内容
Charon企业内容、新闻、正式演示
Fenrir有声书、故事、情感类内容
Aoede无障碍内容、教育、温和类内容
Zephyr教程、讲解、指南
Sulafat纪录片、事实性演示

Related Skills

相关Skill

  • gemini-text: Generate scripts and text for TTS
  • gemini-image: Create visuals to accompany audio
  • gemini-batch: Process multiple TTS requests efficiently
  • gemini-files: Upload audio files for processing
  • gemini-text:为TTS生成脚本和文本
  • gemini-image:创建与音频配套的视觉内容
  • gemini-batch:高效处理多个TTS请求
  • gemini-files:上传音频文件进行处理

Quick Reference

快速参考

bash
undefined
bash
undefined

Basic

基础用法

python scripts/tts.py "Your text here"
python scripts/tts.py "Your text here"

Custom voice

自定义音色

python scripts/tts.py "Your text" --voice Puck --output audio.wav
python scripts/tts.py "Your text" --voice Puck --output audio.wav

Multi-speaker

多角色对话

python scripts/tts.py "Joe: Hi. Jane: Hello!" --speakers "Joe:Kore,Jane:Puck"
python scripts/tts.py "Joe: Hi. Jane: Hello!" --speakers "Joe:Kore,Jane:Puck"

Streaming

流式输出

python scripts/tts.py "Long text..." --stream --output long.wav
python scripts/tts.py "Long text..." --stream --output long.wav

Professional

专业旁白

python scripts/tts.py "Corporate announcement" --voice Charon
undefined
python scripts/tts.py "Corporate announcement" --voice Charon
undefined

Reference

参考资料