audio-reply

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Audio Reply Skill

Generate spoken audio responses using MLX Audio TTS (chatterbox-turbo model).

使用MLX Audio TTS（chatterbox-turbo模型）生成语音回复。

Trigger Phrases

触发短语

"read it to me [URL]" - Fetch content from URL and read it aloud
"talk to me [topic/question]" - Generate a conversational response as audio
"speak", "say it", "voice reply" - Convert your response to audio

"read it to me [URL]" - 获取URL中的内容并朗读
"talk to me [topic/question]" - 生成对话式音频回复
"speak", "say it", "voice reply" - 将回复转换为音频

How to Use

使用方法

Mode 1: Read URL Content

模式1：朗读URL内容

User: read it to me https://example.com/article

Fetch the URL content using WebFetch
Extract readable text (strip HTML, focus on main content)
Generate audio using TTS
Play the audio and delete the file afterward

用户: read it to me https://example.com/article

使用WebFetch获取URL内容
提取可读文本（去除HTML标签，聚焦主要内容）
使用TTS生成音频
播放音频并在之后删除文件

Mode 2: Conversational Audio Response

模式2：对话式音频回复

User: talk to me about the weather today

Generate a natural, conversational response
Keep it concise (TTS works best with shorter segments)
Convert to audio, play it, then delete the file

用户: talk to me about the weather today

生成自然的对话式回复文本
保持内容简洁（TTS在处理短文本时效果最佳）
转换为音频，播放后删除文件

Implementation

实现细节

TTS Command

TTS命令

bash

uv run mlx_audio.tts.generate \
  --model mlx-community/chatterbox-turbo-fp16 \
  --text "Your text here" \
  --play \
  --file_prefix /tmp/audio_reply

bash

uv run mlx_audio.tts.generate \
  --model mlx-community/chatterbox-turbo-fp16 \
  --text "Your text here" \
  --play \
  --file_prefix /tmp/audio_reply

Key Parameters

关键参数

--model mlx-community/chatterbox-turbo-fp16

- Fast, natural voice

```
--play
```
- Auto-play the generated audio
```
--file_prefix
```
- Save to temp location for cleanup
```
--exaggeration 0.3
```
- Optional: add expressiveness (0.0-1.0)
```
--speed 1.0
```
- Adjust speech rate if needed

--model mlx-community/chatterbox-turbo-fp16

- 快速、自然的语音模型

```
--play
```
- 自动播放生成的音频
```
--file_prefix
```
- 将文件保存到临时位置以便清理
```
--exaggeration 0.3
```
- 可选：增加语音表现力（取值范围0.0-1.0）
```
--speed 1.0
```
- 根据需要调整语速

Text Preparation Guidelines

文本准备指南

For "read it to me" mode:

Fetch URL with WebFetch tool
Extract main content, strip navigation/ads/boilerplate
Summarize if very long (>500 words) - keep key points
Add natural pauses with periods and commas

For "talk to me" mode:

Write conversationally, as if speaking
Use contractions (I'm, you're, it's)
Add filler words sparingly for naturalness ([chuckle], um, anyway)
Keep responses under 200 words for best quality
Avoid technical jargon unless explaining it

针对"read it to me"模式：

使用WebFetch工具获取URL内容
提取主要内容，去除导航栏、广告和冗余内容
如果内容过长（超过500词），进行总结 - 保留关键点
使用句号和逗号添加自然停顿

针对"talk to me"模式：

以口语化风格撰写回复，如同日常对话
使用缩写形式（如I'm、you're、it's）
适当使用填充词提升自然度（如[chuckle]、um、anyway）
回复内容控制在200词以内以获得最佳效果
避免使用专业术语，除非需要解释

Audio Generation & Cleanup (IMPORTANT)

音频生成与清理（重要）

Always delete the audio file after playing - it's already in the chat history.

bash

undefined

播放后务必删除音频文件 - 内容已记录在聊天历史中。

bash

undefined

Generate with unique filename and play

生成带唯一文件名的音频并播放

OUTPUT_FILE="/tmp/audio_reply_$(date +%s)" uv run mlx_audio.tts.generate
--model mlx-community/chatterbox-turbo-fp16
--text "Your response text"
--play
--file_prefix "$OUTPUT_FILE"

ALWAYS clean up after playing

播放后务必清理

rm -f "${OUTPUT_FILE}"*.wav 2>/dev/null

undefined

rm -f "${OUTPUT_FILE}"*.wav 2>/dev/null

undefined

Error Handling

错误处理

If TTS fails:

Check if model is downloaded (first run downloads ~500MB)
Ensure
```
uv
```
is installed and in PATH
Fall back to text response with apology

如果TTS生成失败：

检查模型是否已下载（首次运行会下载约500MB的文件）
确保
```
uv
```
已安装并在PATH环境变量中
fallback到文本回复并致歉

Example Workflows

示例流程

Example 1: Read URL

示例1：朗读URL内容

User: read it to me https://blog.example.com/new-feature

Assistant actions:
1. WebFetch the URL
2. Extract article content
3. Generate TTS:
   uv run mlx_audio.tts.generate \
     --model mlx-community/chatterbox-turbo-fp16 \
     --text "Here's what I found... [article summary]" \
     --play --file_prefix /tmp/audio_reply_1706123456
4. Delete: rm -f /tmp/audio_reply_1706123456*.wav
5. Confirm: "Done reading the article to you."

用户: read it to me https://blog.example.com/new-feature

助手操作：
1. 使用WebFetch获取URL内容
2. 提取文章内容
3. 生成TTS音频：
   uv run mlx_audio.tts.generate \
     --model mlx-community/chatterbox-turbo-fp16 \
     --text "这是我找到的内容... [文章摘要]" \
     --play --file_prefix /tmp/audio_reply_1706123456
4. 删除文件：rm -f /tmp/audio_reply_1706123456*.wav
5. 确认回复："已为您朗读完文章。"

Example 2: Talk to Me

示例2：对话式回复

User: talk to me about what you can help with

Assistant actions:
1. Generate conversational response text
2. Generate TTS:
   uv run mlx_audio.tts.generate \
     --model mlx-community/chatterbox-turbo-fp16 \
     --text "Hey! So I can help you with all kinds of things..." \
     --play --file_prefix /tmp/audio_reply_1706123789
3. Delete: rm -f /tmp/audio_reply_1706123789*.wav
4. (No text output needed - audio IS the response)

用户: talk to me about what you can help with

助手操作：
1. 生成对话式回复文本
2. 生成TTS音频：
   uv run mlx_audio.tts.generate \
     --model mlx-community/chatterbox-turbo-fp16 \
     --text "嘿！我可以帮你处理各种事情..." \
     --play --file_prefix /tmp/audio_reply_1706123789
3. 删除文件：rm -f /tmp/audio_reply_1706123789*.wav
4. （无需文本输出 - 音频即为回复）

Notes

注意事项

First run may take longer as the model downloads (~500MB)
Audio quality is best for English; other languages may vary
For long content, consider chunking into multiple audio segments
The
```
--play
```
flag uses system audio - ensure volume is up

首次运行耗时较长，因为需要下载模型（约500MB）
英文内容的音频质量最佳；其他语言效果可能不同
对于长内容，建议拆分为多个音频片段
```
--play
```
标志会使用系统音频 - 请确保音量已调大