audio-reply
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAudio Reply Skill
Audio Reply Skill
Generate spoken audio responses using MLX Audio TTS (chatterbox-turbo model).
使用MLX Audio TTS(chatterbox-turbo模型)生成语音回复。
Trigger Phrases
触发短语
- "read it to me [URL]" - Fetch content from URL and read it aloud
- "talk to me [topic/question]" - Generate a conversational response as audio
- "speak", "say it", "voice reply" - Convert your response to audio
- "read it to me [URL]" - 获取URL中的内容并朗读
- "talk to me [topic/question]" - 生成对话式音频回复
- "speak", "say it", "voice reply" - 将回复转换为音频
How to Use
使用方法
Mode 1: Read URL Content
模式1:朗读URL内容
User: read it to me https://example.com/article- Fetch the URL content using WebFetch
- Extract readable text (strip HTML, focus on main content)
- Generate audio using TTS
- Play the audio and delete the file afterward
用户: read it to me https://example.com/article- 使用WebFetch获取URL内容
- 提取可读文本(去除HTML标签,聚焦主要内容)
- 使用TTS生成音频
- 播放音频并在之后删除文件
Mode 2: Conversational Audio Response
模式2:对话式音频回复
User: talk to me about the weather today- Generate a natural, conversational response
- Keep it concise (TTS works best with shorter segments)
- Convert to audio, play it, then delete the file
用户: talk to me about the weather today- 生成自然的对话式回复文本
- 保持内容简洁(TTS在处理短文本时效果最佳)
- 转换为音频,播放后删除文件
Implementation
实现细节
TTS Command
TTS命令
bash
uv run mlx_audio.tts.generate \
--model mlx-community/chatterbox-turbo-fp16 \
--text "Your text here" \
--play \
--file_prefix /tmp/audio_replybash
uv run mlx_audio.tts.generate \
--model mlx-community/chatterbox-turbo-fp16 \
--text "Your text here" \
--play \
--file_prefix /tmp/audio_replyKey Parameters
关键参数
- - Fast, natural voice
--model mlx-community/chatterbox-turbo-fp16 - - Auto-play the generated audio
--play - - Save to temp location for cleanup
--file_prefix - - Optional: add expressiveness (0.0-1.0)
--exaggeration 0.3 - - Adjust speech rate if needed
--speed 1.0
- - 快速、自然的语音模型
--model mlx-community/chatterbox-turbo-fp16 - - 自动播放生成的音频
--play - - 将文件保存到临时位置以便清理
--file_prefix - - 可选:增加语音表现力(取值范围0.0-1.0)
--exaggeration 0.3 - - 根据需要调整语速
--speed 1.0
Text Preparation Guidelines
文本准备指南
For "read it to me" mode:
- Fetch URL with WebFetch tool
- Extract main content, strip navigation/ads/boilerplate
- Summarize if very long (>500 words) - keep key points
- Add natural pauses with periods and commas
For "talk to me" mode:
- Write conversationally, as if speaking
- Use contractions (I'm, you're, it's)
- Add filler words sparingly for naturalness ([chuckle], um, anyway)
- Keep responses under 200 words for best quality
- Avoid technical jargon unless explaining it
针对"read it to me"模式:
- 使用WebFetch工具获取URL内容
- 提取主要内容,去除导航栏、广告和冗余内容
- 如果内容过长(超过500词),进行总结 - 保留关键点
- 使用句号和逗号添加自然停顿
针对"talk to me"模式:
- 以口语化风格撰写回复,如同日常对话
- 使用缩写形式(如I'm、you're、it's)
- 适当使用填充词提升自然度(如[chuckle]、um、anyway)
- 回复内容控制在200词以内以获得最佳效果
- 避免使用专业术语,除非需要解释
Audio Generation & Cleanup (IMPORTANT)
音频生成与清理(重要)
Always delete the audio file after playing - it's already in the chat history.
bash
undefined播放后务必删除音频文件 - 内容已记录在聊天历史中。
bash
undefinedGenerate with unique filename and play
生成带唯一文件名的音频并播放
OUTPUT_FILE="/tmp/audio_reply_$(date +%s)"
uv run mlx_audio.tts.generate
--model mlx-community/chatterbox-turbo-fp16
--text "Your response text"
--play
--file_prefix "$OUTPUT_FILE"
--model mlx-community/chatterbox-turbo-fp16
--text "Your response text"
--play
--file_prefix "$OUTPUT_FILE"
OUTPUT_FILE="/tmp/audio_reply_$(date +%s)"
uv run mlx_audio.tts.generate
--model mlx-community/chatterbox-turbo-fp16
--text "Your response text"
--play
--file_prefix "$OUTPUT_FILE"
--model mlx-community/chatterbox-turbo-fp16
--text "Your response text"
--play
--file_prefix "$OUTPUT_FILE"
ALWAYS clean up after playing
播放后务必清理
rm -f "${OUTPUT_FILE}"*.wav 2>/dev/null
undefinedrm -f "${OUTPUT_FILE}"*.wav 2>/dev/null
undefinedError Handling
错误处理
If TTS fails:
- Check if model is downloaded (first run downloads ~500MB)
- Ensure is installed and in PATH
uv - Fall back to text response with apology
如果TTS生成失败:
- 检查模型是否已下载(首次运行会下载约500MB的文件)
- 确保已安装并在PATH环境变量中
uv - fallback到文本回复并致歉
Example Workflows
示例流程
Example 1: Read URL
示例1:朗读URL内容
User: read it to me https://blog.example.com/new-feature
Assistant actions:
1. WebFetch the URL
2. Extract article content
3. Generate TTS:
uv run mlx_audio.tts.generate \
--model mlx-community/chatterbox-turbo-fp16 \
--text "Here's what I found... [article summary]" \
--play --file_prefix /tmp/audio_reply_1706123456
4. Delete: rm -f /tmp/audio_reply_1706123456*.wav
5. Confirm: "Done reading the article to you."用户: read it to me https://blog.example.com/new-feature
助手操作:
1. 使用WebFetch获取URL内容
2. 提取文章内容
3. 生成TTS音频:
uv run mlx_audio.tts.generate \
--model mlx-community/chatterbox-turbo-fp16 \
--text "这是我找到的内容... [文章摘要]" \
--play --file_prefix /tmp/audio_reply_1706123456
4. 删除文件:rm -f /tmp/audio_reply_1706123456*.wav
5. 确认回复:"已为您朗读完文章。"Example 2: Talk to Me
示例2:对话式回复
User: talk to me about what you can help with
Assistant actions:
1. Generate conversational response text
2. Generate TTS:
uv run mlx_audio.tts.generate \
--model mlx-community/chatterbox-turbo-fp16 \
--text "Hey! So I can help you with all kinds of things..." \
--play --file_prefix /tmp/audio_reply_1706123789
3. Delete: rm -f /tmp/audio_reply_1706123789*.wav
4. (No text output needed - audio IS the response)用户: talk to me about what you can help with
助手操作:
1. 生成对话式回复文本
2. 生成TTS音频:
uv run mlx_audio.tts.generate \
--model mlx-community/chatterbox-turbo-fp16 \
--text "嘿!我可以帮你处理各种事情..." \
--play --file_prefix /tmp/audio_reply_1706123789
3. 删除文件:rm -f /tmp/audio_reply_1706123789*.wav
4. (无需文本输出 - 音频即为回复)Notes
注意事项
- First run may take longer as the model downloads (~500MB)
- Audio quality is best for English; other languages may vary
- For long content, consider chunking into multiple audio segments
- The flag uses system audio - ensure volume is up
--play
- 首次运行耗时较长,因为需要下载模型(约500MB)
- 英文内容的音频质量最佳;其他语言效果可能不同
- 对于长内容,建议拆分为多个音频片段
- 标志会使用系统音频 - 请确保音量已调大
--play