google-tts
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGoogle Cloud Text-to-Speech
Google Cloud Text-to-Speech
Converts text and documents into audio using Google Cloud TTS API. Supports Neural2, WaveNet, Studio, and Standard voices across 40+ languages.
借助Google Cloud TTS API将文本和文档转换为音频。支持Neural2、WaveNet、Studio和Standard四种音色,覆盖40余种语言。
Setup
配置
API key via env var or with .
Requires for multi-chunk documents. Optional: for PDF/DOCX.
GOOGLE_TTS_API_KEYskills/google-tts/config.json{"api_key": "..."}ffmpegpip install PyPDF2 python-docx通过环境变量提供API密钥,或在中配置。
处理多分段文档需要依赖。可选:安装以支持PDF/DOCX格式文件。
GOOGLE_TTS_API_KEYskills/google-tts/config.json{"api_key": "..."}ffmpegPyPDF2 python-docxCommands
命令
List Voices
列出可用音色
bash
python skills/google-tts/scripts/google_tts.py voices --language en-US --type Neural2
python skills/google-tts/scripts/google_tts.py voices --jsonbash
python skills/google-tts/scripts/google_tts.py voices --language en-US --type Neural2
python skills/google-tts/scripts/google_tts.py voices --jsonText-to-Speech
文本转语音
bash
undefinedbash
undefinedFrom text or document (PDF, DOCX, MD, TXT)
从文本或文档(PDF、DOCX、MD、TXT)转换
python skills/google-tts/scripts/google_tts.py tts --text "Hello world" --output ~/Downloads/hello.mp3
python skills/google-tts/scripts/google_tts.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3
python skills/google-tts/scripts/google_tts.py tts --text "Hello world" --output ~/Downloads/hello.mp3
python skills/google-tts/scripts/google_tts.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3
With voice, rate, pitch, encoding options
自定义音色、语速、音调、编码选项
python skills/google-tts/scripts/google_tts.py tts --file doc.md --voice en-US-Neural2-F --rate 0.9 --encoding MP3 --output ~/Downloads/out.mp3
undefinedpython skills/google-tts/scripts/google_tts.py tts --file doc.md --voice en-US-Neural2-F --rate 0.9 --encoding MP3 --output ~/Downloads/out.mp3
undefinedPodcast Generation
播客生成
Takes a JSON script with alternating speakers, synthesizes each with a different voice.
json
[
{"speaker": "host1", "text": "Welcome to our podcast!"},
{"speaker": "host2", "text": "Thanks for having me..."}
]bash
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --output ~/Downloads/podcast.mp3
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --voice1 en-US-Neural2-J --voice2 en-US-Neural2-H --rate 0.9 --output ~/Downloads/podcast.mp3读取包含交替说话者的JSON脚本,为每个说话者使用不同的音色合成语音。
json
[
{"speaker": "host1", "text": "Welcome to our podcast!"},
{"speaker": "host2", "text": "Thanks for having me..."}
]bash
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --output ~/Downloads/podcast.mp3
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --voice1 en-US-Neural2-J --voice2 en-US-Neural2-H --rate 0.9 --output ~/Downloads/podcast.mp3Workflow
工作流程
Single-Voice Narration
单音色朗读
- If user provides a file path, use . For generated content, write clean prose to
--filefirst./tmp/tts_input.md - Default voice: (male) or
en-US-Neural2-D(female). Use Neural2 for best quality/cost balance.en-US-Neural2-F - Generate:
python skills/google-tts/scripts/google_tts.py tts --file /tmp/tts_input.md --output ~/Downloads/recording.mp3 - Report file location and size. Default output to .
~/Downloads/
- 如果用户提供文件路径,使用参数。若为生成的内容,先将清晰的文本写入
--file。/tmp/tts_input.md - 默认音色:(男声)或
en-US-Neural2-D(女声)。推荐使用Neural2音色以平衡音质与成本。en-US-Neural2-F - 执行生成命令:
python skills/google-tts/scripts/google_tts.py tts --file /tmp/tts_input.md --output ~/Downloads/recording.mp3 - 反馈文件位置和大小。默认输出路径为。
~/Downloads/
Podcast from Document
从文档制作播客
- Extract text:
python skills/google-tts/scripts/extract.py /path/to/document.pdf - Generate a two-host conversation script as JSON:
- Natural discussion, not verbatim reading. Host 1 leads, Host 2 reacts/analyzes.
- Include intro and outro. Vary turn lengths. Keep turns under 4000 chars.
- Write script to
/tmp/podcast_script.json - Generate:
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3 - Clean up temp files.
- 提取文本:
python skills/google-tts/scripts/extract.py /path/to/document.pdf - 生成双主播对话格式的JSON脚本:
- 采用自然对话形式,而非逐字朗读。主播1主导内容,主播2回应/分析。
- 包含开场和结尾。调整发言时长,单段发言字符数不超过4000。
- 将脚本写入
/tmp/podcast_script.json - 执行生成命令:
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3 - 清理临时文件。
Reference
参考信息
- Recommended voice type: Neural2 (~$4/1M chars, high quality)
- Speaking rate: 0.25-4.0 (0.85-0.95 good for technical content)
- Pitch: -20.0 to 20.0 semitones
- Encodings: MP3 (default), LINEAR16 (.wav), OGG_OPUS (.ogg)
- API limit: 5000 bytes/request. Script auto-chunks at sentence boundaries.
- 推荐音色类型:Neural2(约4美元/百万字符,音质出色)
- 语速范围:0.25-4.0(技术内容推荐0.85-0.95)
- 音调范围:-20.0至20.0半音
- 支持编码:MP3(默认)、LINEAR16(.wav)、OGG_OPUS(.ogg)
- API限制:单次请求最大5000字节。脚本会自动按句子边界拆分内容。