tts
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesetts
TTS 文本转语音
Convert any text into speech audio. Supports two backends (Kokoro local, Noiz cloud), two modes (simple or timeline-accurate), and per-segment voice control.
将任意文本转换为语音音频。支持两种后端(本地Kokoro、云端Noiz)、两种模式(简单模式或时间轴精准模式),以及逐段语音控制。
Triggers
触发场景
- text to speech / tts / read aloud / generate audio
- voice clone / subtitle dubbing / srt to audio
- epub to audio / markdown to audio / kokoro
- 文本转语音 / TTS / 朗读 / 生成音频
- 语音克隆 / 字幕配音 / SRT转音频
- EPUB转音频 / Markdown转音频 / kokoro
Simple Mode — text to audio
简单模式 — 文本转音频
bash
undefinedbash
undefinedKokoro (auto-detected when installed)
Kokoro(安装后自动检测)
bash skills/tts/scripts/tts.sh speak -t "Hello world" -v af_sarah -o hello.wav
bash skills/tts/scripts/tts.sh speak -f article.txt -v zf_xiaoni --lang cmn -o out.mp3 --format mp3
bash skills/tts/scripts/tts.sh speak -t "Hello world" -v af_sarah -o hello.wav
bash skills/tts/scripts/tts.sh speak -f article.txt -v zf_xiaoni --lang cmn -o out.mp3 --format mp3
Noiz (auto-detected when NOIZ_API_KEY is set, or force with --backend noiz)
Noiz(设置NOIZ_API_KEY后自动检测,或通过--backend noiz强制指定)
If --voice-id is omitted, the script prints 5 available built-in voices and exits.
如果省略--voice-id参数,脚本会输出5个可用的内置语音后退出。
Pick one from the output and re-run with --voice-id <id>.
从输出中选择一个语音,重新运行时添加--voice-id <id>参数。
bash skills/tts/scripts/tts.sh speak -f input.txt --voice-id voice_abc --auto-emotion --emo '{"Joy":0.5}' -o out.wav
bash skills/tts/scripts/tts.sh speak -f input.txt --voice-id voice_abc --auto-emotion --emo '{"Joy":0.5}' -o out.wav
Voice cloning (Noiz only — no voice-id needed, uses ref audio)
语音克隆(仅Noiz支持 — 无需指定voice-id,使用参考音频)
bash skills/tts/scripts/tts.sh speak -t "Hello" --ref-audio ./ref.wav -o clone.wav
undefinedbash skills/tts/scripts/tts.sh speak -t "Hello" --ref-audio ./ref.wav -o clone.wav
undefinedTimeline Mode — SRT to time-aligned audio
时间轴模式 — SRT转时间轴对齐音频
For precise per-segment timing (dubbing, subtitles, video narration).
用于实现逐段精准定时(如配音、字幕、视频旁白)。
Step 1: Get or create an SRT
步骤1:获取或创建SRT文件
If the user doesn't have one, generate from text:
bash
bash skills/tts/scripts/tts.sh to-srt -i article.txt -o article.srt
bash skills/tts/scripts/tts.sh to-srt -i article.txt -o article.srt --cps 15 --gap 500--cps如果用户没有SRT文件,可从文本生成:
bash
bash skills/tts/scripts/tts.sh to-srt -i article.txt -o article.srt
bash skills/tts/scripts/tts.sh to-srt -i article.txt -o article.srt --cps 15 --gap 500--cpsStep 2: Create a voice map
步骤2:创建语音映射文件
JSON file controlling default + per-segment voice settings. keys support single index or range .
segments"3""5-8"Kokoro voice map:
json
{
"default": { "voice": "zf_xiaoni", "lang": "cmn" },
"segments": {
"1": { "voice": "zm_yunxi" },
"5-8": { "voice": "af_sarah", "lang": "en-us", "speed": 0.9 }
}
}Noiz voice map (adds , support):
emoreference_audiojson
{
"default": { "voice_id": "voice_123", "target_lang": "zh" },
"segments": {
"1": { "voice_id": "voice_host", "emo": { "Joy": 0.6 } },
"2-4": { "reference_audio": "./refs/guest.wav" }
}
}See for full samples.
examples/用于控制默认语音及逐段语音设置的JSON文件。支持单个索引如或范围如。
segments"3""5-8"Kokoro语音映射示例:
json
{
"default": { "voice": "zf_xiaoni", "lang": "cmn" },
"segments": {
"1": { "voice": "zm_yunxi" },
"5-8": { "voice": "af_sarah", "lang": "en-us", "speed": 0.9 }
}
}Noiz语音映射示例(新增、支持):
emoreference_audiojson
{
"default": { "voice_id": "voice_123", "target_lang": "zh" },
"segments": {
"1": { "voice_id": "voice_host", "emo": { "Joy": 0.6 } },
"2-4": { "reference_audio": "./refs/guest.wav" }
}
}完整示例可查看目录。
examples/Step 3: Render
步骤3:渲染生成音频
bash
bash skills/tts/scripts/tts.sh render --srt input.srt --voice-map vm.json -o output.wav
bash skills/tts/scripts/tts.sh render --srt input.srt --voice-map vm.json --backend noiz --auto-emotion -o output.wavbash
bash skills/tts/scripts/tts.sh render --srt input.srt --voice-map vm.json -o output.wav
bash skills/tts/scripts/tts.sh render --srt input.srt --voice-map vm.json --backend noiz --auto-emotion -o output.wavWhen to Choose Which
如何选择合适的后端
| Need | Recommended |
|---|---|
| Just read text aloud, no fuss | Kokoro (default) |
| EPUB/PDF audiobook with chapters | Kokoro (native support) |
Voice blending ( | Kokoro |
| Voice cloning from reference audio | Noiz |
Emotion control ( | Noiz |
| Exact server-side duration per segment | Noiz |
When the user needs emotion control + voice cloning + precise duration together, Noiz is the only backend that supports all three.
| 需求 | 推荐方案 |
|---|---|
| 仅需朗读文本,无需复杂设置 | Kokoro(默认) |
| 带章节的EPUB/PDF有声书 | Kokoro(原生支持) |
语音混合(如 | Kokoro |
| 基于参考音频的语音克隆 | Noiz |
情感控制( | Noiz |
| 逐段精准的服务器端时长控制 | Noiz |
当用户同时需要情感控制、语音克隆和精准时长控制时,Noiz是唯一支持这三项功能的后端。
Requirements
环境要求
- in PATH (timeline mode)
ffmpeg - Noiz: get your API key at developers.noiz.ai, then run
bash skills/tts/scripts/tts.sh config --set-api-key YOUR_KEY - Kokoro: if already installed, pass to use the local backend
--backend kokoro
For backend details and full argument reference, see reference.md.
- 系统PATH中需包含(时间轴模式需要)
ffmpeg - Noiz:请在developers.noiz.ai获取API密钥,然后运行
bash skills/tts/scripts/tts.sh config --set-api-key YOUR_KEY - Kokoro:若已安装,可通过传递参数使用本地后端
--backend kokoro
如需了解后端详情及完整参数说明,请查看reference.md。