google-tts

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Google Cloud Text-to-Speech

Google Cloud Text-to-Speech

Converts text and documents into audio using Google Cloud TTS API. Supports Neural2, WaveNet, Studio, and Standard voices across 40+ languages.
借助Google Cloud TTS API将文本和文档转换为音频。支持Neural2、WaveNet、Studio和Standard四种音色,覆盖40余种语言。

Setup

配置

API key via
GOOGLE_TTS_API_KEY
env var or
skills/google-tts/config.json
with
{"api_key": "..."}
. Requires
ffmpeg
for multi-chunk documents. Optional:
pip install PyPDF2 python-docx
for PDF/DOCX.
通过环境变量
GOOGLE_TTS_API_KEY
提供API密钥,或在
skills/google-tts/config.json
中配置
{"api_key": "..."}
。 处理多分段文档需要依赖
ffmpeg
。可选:安装
PyPDF2 python-docx
以支持PDF/DOCX格式文件。

Commands

命令

List Voices

列出可用音色

bash
python skills/google-tts/scripts/google_tts.py voices --language en-US --type Neural2
python skills/google-tts/scripts/google_tts.py voices --json
bash
python skills/google-tts/scripts/google_tts.py voices --language en-US --type Neural2
python skills/google-tts/scripts/google_tts.py voices --json

Text-to-Speech

文本转语音

bash
undefined
bash
undefined

From text or document (PDF, DOCX, MD, TXT)

从文本或文档(PDF、DOCX、MD、TXT)转换

python skills/google-tts/scripts/google_tts.py tts --text "Hello world" --output ~/Downloads/hello.mp3 python skills/google-tts/scripts/google_tts.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3
python skills/google-tts/scripts/google_tts.py tts --text "Hello world" --output ~/Downloads/hello.mp3 python skills/google-tts/scripts/google_tts.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3

With voice, rate, pitch, encoding options

自定义音色、语速、音调、编码选项

python skills/google-tts/scripts/google_tts.py tts --file doc.md --voice en-US-Neural2-F --rate 0.9 --encoding MP3 --output ~/Downloads/out.mp3
undefined
python skills/google-tts/scripts/google_tts.py tts --file doc.md --voice en-US-Neural2-F --rate 0.9 --encoding MP3 --output ~/Downloads/out.mp3
undefined

Podcast Generation

播客生成

Takes a JSON script with alternating speakers, synthesizes each with a different voice.
json
[
  {"speaker": "host1", "text": "Welcome to our podcast!"},
  {"speaker": "host2", "text": "Thanks for having me..."}
]
bash
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --output ~/Downloads/podcast.mp3
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --voice1 en-US-Neural2-J --voice2 en-US-Neural2-H --rate 0.9 --output ~/Downloads/podcast.mp3
读取包含交替说话者的JSON脚本,为每个说话者使用不同的音色合成语音。
json
[
  {"speaker": "host1", "text": "Welcome to our podcast!"},
  {"speaker": "host2", "text": "Thanks for having me..."}
]
bash
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --output ~/Downloads/podcast.mp3
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --voice1 en-US-Neural2-J --voice2 en-US-Neural2-H --rate 0.9 --output ~/Downloads/podcast.mp3

Workflow

工作流程

Single-Voice Narration

单音色朗读

  1. If user provides a file path, use
    --file
    . For generated content, write clean prose to
    /tmp/tts_input.md
    first.
  2. Default voice:
    en-US-Neural2-D
    (male) or
    en-US-Neural2-F
    (female). Use Neural2 for best quality/cost balance.
  3. Generate:
    python skills/google-tts/scripts/google_tts.py tts --file /tmp/tts_input.md --output ~/Downloads/recording.mp3
  4. Report file location and size. Default output to
    ~/Downloads/
    .
  1. 如果用户提供文件路径,使用
    --file
    参数。若为生成的内容,先将清晰的文本写入
    /tmp/tts_input.md
  2. 默认音色:
    en-US-Neural2-D
    (男声)或
    en-US-Neural2-F
    (女声)。推荐使用Neural2音色以平衡音质与成本。
  3. 执行生成命令:
    python skills/google-tts/scripts/google_tts.py tts --file /tmp/tts_input.md --output ~/Downloads/recording.mp3
  4. 反馈文件位置和大小。默认输出路径为
    ~/Downloads/

Podcast from Document

从文档制作播客

  1. Extract text:
    python skills/google-tts/scripts/extract.py /path/to/document.pdf
  2. Generate a two-host conversation script as JSON:
    • Natural discussion, not verbatim reading. Host 1 leads, Host 2 reacts/analyzes.
    • Include intro and outro. Vary turn lengths. Keep turns under 4000 chars.
  3. Write script to
    /tmp/podcast_script.json
  4. Generate:
    python skills/google-tts/scripts/google_tts.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3
  5. Clean up temp files.
  1. 提取文本:
    python skills/google-tts/scripts/extract.py /path/to/document.pdf
  2. 生成双主播对话格式的JSON脚本:
    • 采用自然对话形式,而非逐字朗读。主播1主导内容,主播2回应/分析。
    • 包含开场和结尾。调整发言时长,单段发言字符数不超过4000。
  3. 将脚本写入
    /tmp/podcast_script.json
  4. 执行生成命令:
    python skills/google-tts/scripts/google_tts.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3
  5. 清理临时文件。

Reference

参考信息

  • Recommended voice type: Neural2 (~$4/1M chars, high quality)
  • Speaking rate: 0.25-4.0 (0.85-0.95 good for technical content)
  • Pitch: -20.0 to 20.0 semitones
  • Encodings: MP3 (default), LINEAR16 (.wav), OGG_OPUS (.ogg)
  • API limit: 5000 bytes/request. Script auto-chunks at sentence boundaries.
  • 推荐音色类型:Neural2(约4美元/百万字符,音质出色)
  • 语速范围:0.25-4.0(技术内容推荐0.85-0.95)
  • 音调范围:-20.0至20.0半音
  • 支持编码:MP3(默认)、LINEAR16(.wav)、OGG_OPUS(.ogg)
  • API限制:单次请求最大5000字节。脚本会自动按句子边界拆分内容。