text-to-speech

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Text-to-Speech Skill

Text-to-Speech Skill

将文本转换为语音,支持播客脚本解析、情绪标记处理和 voice-changer 后处理。
Convert text to speech, supporting podcast script parsing, emotion tag processing, and voice-changer post-processing.

使用说明

Usage Instructions

当用户请求将文本转换为语音时,使用以下命令:
bash
undefined
When users request to convert text to speech, use the following commands:
bash
undefined

基本用法

Basic usage

python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py <文本文件>
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py <text-file>

指定输出文件

Specify output file

python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt -o output.mp3
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt -o output.mp3

指定声音

Specify voice

python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt -v zh-CN-XiaoxiaoNeural
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt -v zh-CN-XiaoxiaoNeural

启用后处理(voice-changer)

Enable post-processing (voice-changer)

python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt --post-process
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt --post-process

列出所有可用声音

List all available voices

python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py --list-voices
undefined
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py --list-voices
undefined

核心功能

Core Features

1. 脚本解析

1. Script Parsing

自动识别并移除播客脚本中的注释和标记:
  • 时间戳:
    (00:00)
  • BGM 注释:
    [BGM渐入:...]
  • 舞台指示:
    (主播声音:...)
    (停顿 1秒)
  • 情绪标记:
    (语速放慢,加重语气)
  • Markdown 标记:
    **文本**
Automatically identify and remove comments and tags in podcast scripts:
  • Timestamps:
    (00:00)
  • BGM comments:
    [BGM渐入:...]
    (keep original tag format)
  • Stage directions:
    (主播声音:...)
    (停顿 1秒)
    (keep original tag format)
  • Emotion tags:
    (语速放慢,加重语气)
    (keep original tag format)
  • Markdown tags:
    **文本**

2. 多种声音支持

2. Multi-Voice Support

支持 18+ 种中文声音,包括:
  • 男声:YunyangNeural(新闻播音)、YunxiNeural(年轻活力)、YunjianNeural(成熟稳重)
  • 女声:XiaoxiaoNeural(温柔亲切)、XiaoyiNeural(活泼开朗)、XiaoyanNeural(新闻播音)
Supports over 18 Chinese voices, including:
  • Male Voices: YunyangNeural (news broadcasting), YunxiNeural (young and energetic), YunjianNeural (mature and steady)
  • Female Voices: XiaoxiaoNeural (gentle and friendly), XiaoyiNeural (lively and cheerful), XiaoyanNeural (news broadcasting)

3. 语音参数调整

3. Speech Parameter Adjustment

  • 语速调整:
    --rate "+20%"
    --rate "-10%"
  • 音调调整:
    --pitch "+5Hz"
    --pitch "-3Hz"
  • 音量调整:
    --volume "+20%"
    --volume "-10%"
  • Speech rate adjustment:
    --rate "+20%"
    or
    --rate "-10%"
  • Pitch adjustment:
    --pitch "+5Hz"
    or
    --pitch "-3Hz"
  • Volume adjustment:
    --volume "+20%"
    or
    --volume "-10%"

4. 后处理集成

4. Post-Processing Integration

可选集成 voice-changer skill 进行变声处理。
Optional integration with voice-changer skill for voice transformation.

配置文件

Configuration File

配置文件位于:
~/.claude/skills/text-to-speech/config/tts_config.json
The configuration file is located at:
~/.claude/skills/text-to-speech/config/tts_config.json

主要配置项

Main Configuration Items

json
{
  "edge_tts": {
    "voice": "zh-CN-YunyangNeural",
    "rate": "+0%",
    "pitch": "+0Hz",
    "volume": "+0%"
  },
  "script_parsing": {
    "enabled": true,
    "remove_timestamps": true,
    "remove_bgm_notes": true,
    "remove_stage_directions": true,
    "remove_markdown": true
  },
  "emotion_processing": {
    "enabled": true,
    "use_ssml": true
  },
  "output": {
    "format": "mp3",
    "default_output_dir": "same_as_input",
    "filename_suffix": "_tts"
  },
  "post_processing": {
    "enabled": false,
    "voice_changer": {
      "enabled": false,
      "voice_type": "female_1",
      "pitch_shift": 0
    }
  }
}
json
{
  "edge_tts": {
    "voice": "zh-CN-YunyangNeural",
    "rate": "+0%",
    "pitch": "+0Hz",
    "volume": "+0%"
  },
  "script_parsing": {
    "enabled": true,
    "remove_timestamps": true,
    "remove_bgm_notes": true,
    "remove_stage_directions": true,
    "remove_markdown": true
  },
  "emotion_processing": {
    "enabled": true,
    "use_ssml": true
  },
  "output": {
    "format": "mp3",
    "default_output_dir": "same_as_input",
    "filename_suffix": "_tts"
  },
  "post_processing": {
    "enabled": false,
    "voice_changer": {
      "enabled": false,
      "voice_type": "female_1",
      "pitch_shift": 0
    }
  }
}

工作流程

Workflow

输入文本/文件
脚本解析(移除注释和标记)
情绪标记处理(可选)
Edge TTS 语音合成
后处理(voice-changer,可选)
输出 MP3 文件
Input Text/File
Script Parsing (remove comments and tags)
Emotion Tag Processing (optional)
Edge TTS Speech Synthesis
Post-Processing (voice-changer, optional)
Output MP3 File

依赖安装

Dependency Installation

bash
undefined
bash
undefined

安装 Edge TTS

Install Edge TTS

pip install edge-tts
pip install edge-tts

如果需要后处理,确保 voice-changer skill 已安装

If post-processing is needed, ensure voice-changer skill is installed

undefined
undefined

使用示例

Usage Examples

示例 1:转换播客脚本

Example 1: Convert Podcast Script

bash
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py podcast_script.txt
脚本会自动:
  • 移除时间戳和 BGM 注释
  • 移除舞台指示
  • 只保留实际要朗读的文本
  • 生成
    podcast_script_tts.mp3
bash
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py podcast_script.txt
The script will automatically:
  • Remove timestamps and BGM comments
  • Remove stage directions
  • Keep only the actual text to be read
  • Generate
    podcast_script_tts.mp3

示例 2:使用女声并调整语速

Example 2: Use Female Voice and Adjust Speech Rate

bash
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt \
  -v zh-CN-XiaoxiaoNeural \
  --rate "+10%"
bash
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt \
  -v zh-CN-XiaoxiaoNeural \
  --rate "+10%"

示例 3:启用后处理

Example 3: Enable Post-Processing

bash
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt \
  --post-process
会先生成语音,然后调用 voice-changer 进行变声处理。
bash
python3 ~/.claude/skills/text-to-speech/scripts/text_to_speech.py script.txt \
  --post-process
This will first generate the speech, then call voice-changer for voice transformation.

注意事项

Notes

  1. 网络要求:Edge TTS 需要网络连接
  2. 文本长度:建议单次转换不超过 10000 字
  3. 脚本格式:支持纯文本和带注释的播客脚本
  4. 后处理:需要先安装 voice-changer skill
  1. Network Requirement: Edge TTS requires an internet connection
  2. Text Length: It is recommended to convert no more than 10,000 words at a time
  3. Script Format: Supports plain text and annotated podcast scripts
  4. Post-Processing: Requires voice-changer skill to be installed first

技术实现

Technical Implementation

  • TTS 引擎:Microsoft Edge TTS(免费、高质量)
  • 脚本解析:正则表达式匹配
  • 音频格式:MP3(默认)
  • 后处理:可选集成 voice-changer
  • TTS Engine: Microsoft Edge TTS (free, high-quality)
  • Script Parsing: Regular expression matching
  • Audio Format: MP3 (default)
  • Post-Processing: Optional integration with voice-changer

性能参考

Performance Reference

  • 1000 字文本:约 10-20 秒
  • 5000 字文本:约 30-60 秒
  • 网络速度影响较大
  • 1000 words: Approximately 10-20 seconds
  • 5000 words: Approximately 30-60 seconds
  • Network speed has a significant impact

许可

License

MIT
MIT