text-to-speech
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseText-to-Speech Skill
Text-to-Speech Skill
将文本转换为语音,支持播客脚本解析、情绪标记处理和 voice-changer 后处理。
Convert text to speech, supporting podcast script parsing, emotion tag processing, and voice-changer post-processing.
使用说明
Usage Instructions
当用户请求将文本转换为语音时,使用以下命令:
bash
undefinedWhen users request to convert text to speech, use the following commands:
bash
undefined基本用法
Basic usage
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py <文本文件>
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py <text-file>
指定输出文件
Specify output file
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt -o output.mp3
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt -o output.mp3
指定声音
Specify voice
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt -v zh-CN-XiaoxiaoNeural
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt -v zh-CN-XiaoxiaoNeural
启用后处理(voice-changer)
Enable post-processing (voice-changer)
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt --post-process
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt --post-process
列出所有可用声音
List all available voices
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py --list-voices
undefinedpython3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py --list-voices
undefined核心功能
Core Features
1. 脚本解析
1. Script Parsing
自动识别并移除播客脚本中的注释和标记:
- 时间戳:
(00:00) - BGM 注释:
[BGM渐入:...] - 舞台指示:
(主播声音:...)(停顿 1秒) - 情绪标记:
(语速放慢,加重语气) - Markdown 标记:
**文本**
Automatically identify and remove comments and tags in podcast scripts:
- Timestamps:
(00:00) - BGM comments: (keep original tag format)
[BGM渐入:...] - Stage directions:
(主播声音:...)(keep original tag format)(停顿 1秒) - Emotion tags: (keep original tag format)
(语速放慢,加重语气) - Markdown tags:
**文本**
2. 多种声音支持
2. Multi-Voice Support
支持 18+ 种中文声音,包括:
- 男声:YunyangNeural(新闻播音)、YunxiNeural(年轻活力)、YunjianNeural(成熟稳重)
- 女声:XiaoxiaoNeural(温柔亲切)、XiaoyiNeural(活泼开朗)、XiaoyanNeural(新闻播音)
Supports over 18 Chinese voices, including:
- Male Voices: YunyangNeural (news broadcasting), YunxiNeural (young and energetic), YunjianNeural (mature and steady)
- Female Voices: XiaoxiaoNeural (gentle and friendly), XiaoyiNeural (lively and cheerful), XiaoyanNeural (news broadcasting)
3. 语音参数调整
3. Speech Parameter Adjustment
- 语速调整:或
--rate "+20%"--rate "-10%" - 音调调整:或
--pitch "+5Hz"--pitch "-3Hz" - 音量调整:或
--volume "+20%"--volume "-10%"
- Speech rate adjustment: or
--rate "+20%"--rate "-10%" - Pitch adjustment: or
--pitch "+5Hz"--pitch "-3Hz" - Volume adjustment: or
--volume "+20%"--volume "-10%"
4. 后处理集成
4. Post-Processing Integration
可选集成 voice-changer skill 进行变声处理。
Optional integration with voice-changer skill for voice transformation.
配置文件
Configuration File
配置文件位于:
~/.claude/skills/text-to-speech/config/tts_config.jsonThe configuration file is located at:
~/.claude/skills/text-to-speech/config/tts_config.json主要配置项
Main Configuration Items
json
{
"edge_tts": {
"voice": "zh-CN-YunyangNeural",
"rate": "+0%",
"pitch": "+0Hz",
"volume": "+0%"
},
"script_parsing": {
"enabled": true,
"remove_timestamps": true,
"remove_bgm_notes": true,
"remove_stage_directions": true,
"remove_markdown": true
},
"emotion_processing": {
"enabled": true,
"use_ssml": true
},
"output": {
"format": "mp3",
"default_output_dir": "same_as_input",
"filename_suffix": "_tts"
},
"post_processing": {
"enabled": false,
"voice_changer": {
"enabled": false,
"voice_type": "female_1",
"pitch_shift": 0
}
}
}json
{
"edge_tts": {
"voice": "zh-CN-YunyangNeural",
"rate": "+0%",
"pitch": "+0Hz",
"volume": "+0%"
},
"script_parsing": {
"enabled": true,
"remove_timestamps": true,
"remove_bgm_notes": true,
"remove_stage_directions": true,
"remove_markdown": true
},
"emotion_processing": {
"enabled": true,
"use_ssml": true
},
"output": {
"format": "mp3",
"default_output_dir": "same_as_input",
"filename_suffix": "_tts"
},
"post_processing": {
"enabled": false,
"voice_changer": {
"enabled": false,
"voice_type": "female_1",
"pitch_shift": 0
}
}
}工作流程
Workflow
输入文本/文件
↓
脚本解析(移除注释和标记)
↓
情绪标记处理(可选)
↓
Edge TTS 语音合成
↓
后处理(voice-changer,可选)
↓
输出 MP3 文件Input Text/File
↓
Script Parsing (remove comments and tags)
↓
Emotion Tag Processing (optional)
↓
Edge TTS Speech Synthesis
↓
Post-Processing (voice-changer, optional)
↓
Output MP3 File依赖安装
Dependency Installation
bash
undefinedbash
undefined安装 Edge TTS
Install Edge TTS
pip install edge-tts
pip install edge-tts
如果需要后处理,确保 voice-changer skill 已安装
If post-processing is needed, ensure voice-changer skill is installed
undefinedundefined使用示例
Usage Examples
示例 1:转换播客脚本
Example 1: Convert Podcast Script
bash
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py podcast_script.txt脚本会自动:
- 移除时间戳和 BGM 注释
- 移除舞台指示
- 只保留实际要朗读的文本
- 生成
podcast_script_tts.mp3
bash
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py podcast_script.txtThe script will automatically:
- Remove timestamps and BGM comments
- Remove stage directions
- Keep only the actual text to be read
- Generate
podcast_script_tts.mp3
示例 2:使用女声并调整语速
Example 2: Use Female Voice and Adjust Speech Rate
bash
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt \
-v zh-CN-XiaoxiaoNeural \
--rate "+10%"bash
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt \
-v zh-CN-XiaoxiaoNeural \
--rate "+10%"示例 3:启用后处理
Example 3: Enable Post-Processing
bash
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt \
--post-process会先生成语音,然后调用 voice-changer 进行变声处理。
bash
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt \
--post-processThis will first generate the speech, then call voice-changer for voice transformation.
注意事项
Notes
- 网络要求:Edge TTS 需要网络连接
- 文本长度:建议单次转换不超过 10000 字
- 脚本格式:支持纯文本和带注释的播客脚本
- 后处理:需要先安装 voice-changer skill
- Network Requirement: Edge TTS requires an internet connection
- Text Length: It is recommended to convert no more than 10,000 words at a time
- Script Format: Supports plain text and annotated podcast scripts
- Post-Processing: Requires voice-changer skill to be installed first
技术实现
Technical Implementation
- TTS 引擎:Microsoft Edge TTS(免费、高质量)
- 脚本解析:正则表达式匹配
- 音频格式:MP3(默认)
- 后处理:可选集成 voice-changer
- TTS Engine: Microsoft Edge TTS (free, high-quality)
- Script Parsing: Regular expression matching
- Audio Format: MP3 (default)
- Post-Processing: Optional integration with voice-changer
性能参考
Performance Reference
- 1000 字文本:约 10-20 秒
- 5000 字文本:约 30-60 秒
- 网络速度影响较大
- 1000 words: Approximately 10-20 seconds
- 5000 words: Approximately 30-60 seconds
- Network speed has a significant impact
许可
License
MIT
MIT