google-tts

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Google Cloud Text-to-Speech

Converts text and documents into audio using Google Cloud TTS API. Supports Neural2, WaveNet, Studio, and Standard voices across 40+ languages.

借助Google Cloud TTS API将文本和文档转换为音频。支持Neural2、WaveNet、Studio和Standard四种音色，覆盖40余种语言。

Setup

配置

API key via

GOOGLE_TTS_API_KEY

env var or

skills/google-tts/config.json

with

{"api_key": "..."}

. Requires

ffmpeg

for multi-chunk documents. Optional:

pip install PyPDF2 python-docx

for PDF/DOCX.

通过环境变量

GOOGLE_TTS_API_KEY

提供API密钥，或在

skills/google-tts/config.json

中配置

{"api_key": "..."}

。处理多分段文档需要依赖

ffmpeg

。可选：安装

PyPDF2 python-docx

以支持PDF/DOCX格式文件。

Commands

命令

List Voices

列出可用音色

bash

python skills/google-tts/scripts/google_tts.py voices --language en-US --type Neural2
python skills/google-tts/scripts/google_tts.py voices --json

bash

python skills/google-tts/scripts/google_tts.py voices --language en-US --type Neural2
python skills/google-tts/scripts/google_tts.py voices --json

Text-to-Speech

文本转语音

bash

undefined

bash

undefined

From text or document (PDF, DOCX, MD, TXT)

从文本或文档（PDF、DOCX、MD、TXT）转换

python skills/google-tts/scripts/google_tts.py tts --text "Hello world" --output ~/Downloads/hello.mp3 python skills/google-tts/scripts/google_tts.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3

With voice, rate, pitch, encoding options

自定义音色、语速、音调、编码选项

python skills/google-tts/scripts/google_tts.py tts --file doc.md --voice en-US-Neural2-F --rate 0.9 --encoding MP3 --output ~/Downloads/out.mp3

undefined

python skills/google-tts/scripts/google_tts.py tts --file doc.md --voice en-US-Neural2-F --rate 0.9 --encoding MP3 --output ~/Downloads/out.mp3

undefined

Podcast Generation

播客生成

Takes a JSON script with alternating speakers, synthesizes each with a different voice.

json

[
  {"speaker": "host1", "text": "Welcome to our podcast!"},
  {"speaker": "host2", "text": "Thanks for having me..."}
]

bash

python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --output ~/Downloads/podcast.mp3
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --voice1 en-US-Neural2-J --voice2 en-US-Neural2-H --rate 0.9 --output ~/Downloads/podcast.mp3

读取包含交替说话者的JSON脚本，为每个说话者使用不同的音色合成语音。

json

[
  {"speaker": "host1", "text": "Welcome to our podcast!"},
  {"speaker": "host2", "text": "Thanks for having me..."}
]

bash

python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --output ~/Downloads/podcast.mp3
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --voice1 en-US-Neural2-J --voice2 en-US-Neural2-H --rate 0.9 --output ~/Downloads/podcast.mp3

Workflow

工作流程

Single-Voice Narration

单音色朗读

If user provides a file path, use
```
--file
```
. For generated content, write clean prose to
```
/tmp/tts_input.md
```
first.
Default voice:
```
en-US-Neural2-D
```
(male) or
```
en-US-Neural2-F
```
(female). Use Neural2 for best quality/cost balance.

Generate:

python skills/google-tts/scripts/google_tts.py tts --file /tmp/tts_input.md --output ~/Downloads/recording.mp3

Report file location and size. Default output to
```
~/Downloads/
```
.

如果用户提供文件路径，使用
```
--file
```
参数。若为生成的内容，先将清晰的文本写入
```
/tmp/tts_input.md
```
。
默认音色：
```
en-US-Neural2-D
```
（男声）或
```
en-US-Neural2-F
```
（女声）。推荐使用Neural2音色以平衡音质与成本。

执行生成命令：

python skills/google-tts/scripts/google_tts.py tts --file /tmp/tts_input.md --output ~/Downloads/recording.mp3

反馈文件位置和大小。默认输出路径为
```
~/Downloads/
```
。

Podcast from Document

从文档制作播客

Extract text:

python skills/google-tts/scripts/extract.py /path/to/document.pdf

Generate a two-host conversation script as JSON:
- Natural discussion, not verbatim reading. Host 1 leads, Host 2 reacts/analyzes.
- Include intro and outro. Vary turn lengths. Keep turns under 4000 chars.
Write script to
```
/tmp/podcast_script.json
```

Generate:

python skills/google-tts/scripts/google_tts.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3

Clean up temp files.

提取文本：

python skills/google-tts/scripts/extract.py /path/to/document.pdf

生成双主播对话格式的JSON脚本：
- 采用自然对话形式，而非逐字朗读。主播1主导内容，主播2回应/分析。
- 包含开场和结尾。调整发言时长，单段发言字符数不超过4000。
将脚本写入
```
/tmp/podcast_script.json
```

执行生成命令：

python skills/google-tts/scripts/google_tts.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3

清理临时文件。

Reference

参考信息

Recommended voice type: Neural2 (~$4/1M chars, high quality)
Speaking rate: 0.25-4.0 (0.85-0.95 good for technical content)
Pitch: -20.0 to 20.0 semitones
Encodings: MP3 (default), LINEAR16 (.wav), OGG_OPUS (.ogg)
API limit: 5000 bytes/request. Script auto-chunks at sentence boundaries.

推荐音色类型：Neural2（约4美元/百万字符，音质出色）
语速范围：0.25-4.0（技术内容推荐0.85-0.95）
音调范围：-20.0至20.0半音
支持编码：MP3（默认）、LINEAR16（.wav）、OGG_OPUS（.ogg）
API限制：单次请求最大5000字节。脚本会自动按句子边界拆分内容。