diagnostic-issue-resolver
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDiagnostic Issue Resolver
问题诊断解决工具
Diagnose and fix common TTS + Telegram bot issues through systematic symptom collection, automated diagnostics, and targeted fixes.
Platform: macOS (Apple Silicon)
通过系统化的症状收集、自动化诊断和针对性修复,诊断并解决常见的TTS + Telegram机器人问题。
适用平台: macOS (Apple Silicon)
When to Use This Skill
适用场景
- TTS audio is not playing or sounds wrong
- Telegram bot is not responding to messages
- Kokoro engine errors or timeouts
- Lock file appears stuck
- Audio plays twice (race condition)
- MPS acceleration is not working
- Queue appears full or backed up
- TTS音频无法播放或声音异常
- Telegram机器人不响应消息
- Kokoro引擎报错或超时
- 锁文件卡住
- 音频重复播放(竞态条件)
- MPS加速不生效
- 队列满或任务堆积
Requirements
前置要求
- Access to (bot source)
~/.claude/automation/claude-telegram-sync/ - Access to (Kokoro engine)
~/.local/share/kokoro/ - Access to (centralized logs)
~/.local/share/tts-telegram-sync/logs/
- 有权限访问 (机器人源码目录)
~/.claude/automation/claude-telegram-sync/ - 有权限访问 (Kokoro引擎目录)
~/.local/share/kokoro/ - 有权限访问 (统一日志目录)
~/.local/share/tts-telegram-sync/logs/
Known Issue Table
已知问题对照表
| Issue | Likely Cause | Diagnostic | Fix |
|---|---|---|---|
| No audio output | Stale TTS lock | | |
| Bot not responding | Process crashed | | Restart: |
| Kokoro timeout | First-run model load | Check | Wait for download, or re-run |
| Queue full | Rapid-fire notifications | Check queue depth in audit log | Increase |
| Lock stuck forever | Heartbeat process died | | If lock stale >30s AND no audio process, rm lock |
| No MPS acceleration | Wrong Python/torch | | Reinstall torch via |
| Double audio playback | Lock race condition | Check for multiple afplay processes | Kill all: |
| 问题 | 可能原因 | 诊断方法 | 修复方案 |
|---|---|---|---|
| 无音频输出 | TTS锁过期失效 | | |
| 机器人无响应 | 进程崩溃 | | 重启命令: |
| Kokoro超时 | 首次运行加载模型 | 检查 | 等待下载完成,或重新执行 |
| 队列满 | 短时间内大量通知 | 检查审计日志中的队列深度 | 调高mise.toml中的 |
| 锁永久卡住 | 心跳进程崩溃 | | 如果锁过期超过30秒且无音频进程运行,删除锁 |
| 无MPS加速 | Python/torch版本错误 | | 执行 |
| 音频重复播放 | 锁竞态条件 | 检查是否存在多个afplay进程 | 执行 |
Workflow Phases
工作流程阶段
Phase 1: Symptom Collection
阶段1: 症状收集
Use AskUserQuestion to understand what the user is experiencing. Key questions:
- What happened? (no audio, wrong audio, bot silent, error message)
- When did it start? (after upgrade, suddenly, always)
- What were you doing? (clipboard read, Telegram notification, manual TTS)
使用AskUserQuestion了解用户遇到的问题,核心询问点:
- 发生了什么问题?(无音频、音频错误、机器人无响应、报错信息)
- 什么时候开始出现的?(升级后、突然出现、一直存在)
- 出现问题时你正在做什么?(读取剪贴板、Telegram通知、手动触发TTS)
Phase 2: Automated Diagnostics
阶段2: 自动化诊断
Based on symptoms, run the relevant subset of these checks:
bash
undefined根据症状,运行以下相关检查项:
bash
undefinedLock state
锁状态
ls -la /tmp/kokoro-tts.lock 2>/dev/null && stat -f "%Sm" /tmp/kokoro-tts.lock || echo "No lock file"
ls -la /tmp/kokoro-tts.lock 2>/dev/null && stat -f "%Sm" /tmp/kokoro-tts.lock || echo "No lock file"
Audio processes
音频进程
pgrep -la afplay; pgrep -la say
pgrep -la afplay; pgrep -la say
Bot process
机器人进程
pgrep -la 'bun.*src/main.ts'
pgrep -la 'bun.*src/main.ts'
Kokoro health
Kokoro健康状态
~/.local/share/kokoro/.venv/bin/python -c "import kokoro; import torch; print(f'kokoro OK, MPS: {torch.backends.mps.is_available()}')"
~/.local/share/kokoro/.venv/bin/python -c "import kokoro; import torch; print(f'kokoro OK, MPS: {torch.backends.mps.is_available()}')"
Recent errors in audit log
审计日志中的近期错误
tail -20 ~/.local/share/tts-telegram-sync/logs/audit/*.ndjson 2>/dev/null | grep -i error
tail -20 ~/.local/share/tts-telegram-sync/logs/audit/*.ndjson 2>/dev/null | grep -i error
Recent bot console output
近期机器人控制台输出
tail -50 /private/tmp/telegram-bot.log 2>/dev/null | grep -i -E '(error|fail|timeout)'
undefinedtail -50 /private/tmp/telegram-bot.log 2>/dev/null | grep -i -E '(error|fail|timeout)'
undefinedPhase 3: Root Cause Analysis
阶段3: 根因分析
Map diagnostic output to the Known Issue Table above. Common patterns:
- Lock file exists + mtime > 30s ago + no afplay = stale lock
- No bot PID found = bot crashed
- returns False = MPS broken
torch.backends.mps.is_available() - Multiple afplay PIDs = race condition
将诊断输出对应到上方的已知问题对照表,常见模式:
- 锁文件存在 + 修改时间超过30秒 + 无afplay进程 = 锁过期失效
- 找不到机器人PID = 机器人崩溃
- 返回False = MPS失效
torch.backends.mps.is_available() - 多个afplay PID = 竞态条件
Phase 4: Fix Application
阶段4: 应用修复方案
Apply the targeted fix from the Known Issue Table. Always use the least disruptive fix first.
应用已知问题对照表中对应的修复方案,优先选择影响最小的修复方式。
Phase 5: Verification
阶段5: 效果验证
After applying the fix, verify the issue is resolved:
bash
undefined应用修复后,验证问题是否已解决:
bash
undefinedQuick TTS test
快速TTS测试
~/.local/share/kokoro/.venv/bin/python ~/.local/share/kokoro/tts_generate.py
--text "Diagnostic test complete" --voice af_heart --lang en-us --speed 1.0
--output /tmp/kokoro-tts-diag-test.wav && afplay /tmp/kokoro-tts-diag-test.wav && echo "OK"
--text "Diagnostic test complete" --voice af_heart --lang en-us --speed 1.0
--output /tmp/kokoro-tts-diag-test.wav && afplay /tmp/kokoro-tts-diag-test.wav && echo "OK"
~/.local/share/kokoro/.venv/bin/python ~/.local/share/kokoro/tts_generate.py
--text "Diagnostic test complete" --voice af_heart --lang en-us --speed 1.0
--output /tmp/kokoro-tts-diag-test.wav && afplay /tmp/kokoro-tts-diag-test.wav && echo "OK"
--text "Diagnostic test complete" --voice af_heart --lang en-us --speed 1.0
--output /tmp/kokoro-tts-diag-test.wav && afplay /tmp/kokoro-tts-diag-test.wav && echo "OK"
Full health check
全量健康检查
~/eon/cc-skills/plugins/tts-telegram-sync/scripts/kokoro-install.sh --health
---~/eon/cc-skills/plugins/tts-telegram-sync/scripts/kokoro-install.sh --health
---TodoWrite Task Templates
待办任务模板
1. [Symptoms] Collect symptoms via AskUserQuestion
2. [Triage] Map symptoms to likely causes
3. [Lock] Check TTS lock state (mtime, PID, stale detection)
4. [Process] Check bot process and audio processes
5. [Kokoro] Verify Kokoro venv and MPS availability
6. [Logs] Check recent audit logs for errors
7. [Fix] Apply targeted fix for identified root cause
8. [Verify] Run health check to confirm resolution1. [症状收集] 通过AskUserQuestion收集问题症状
2. [问题分诊] 将症状匹配到可能的原因
3. [锁检查] 检查TTS锁状态(修改时间、PID、过期检测)
4. [进程检查] 检查机器人进程和音频进程
5. [Kokoro检查] 验证Kokoro虚拟环境和MPS可用性
6. [日志检查] 检查近期审计日志中的错误
7. [修复] 针对定位到的根因应用针对性修复
8. [验证] 运行健康检查确认问题已解决Post-Change Checklist
变更后检查清单
- Root cause identified and documented
- Fix applied successfully
- Health check passes
- Test audio plays correctly
- No stale locks or orphan processes remain
- 根因已定位并记录
- 修复成功应用
- 健康检查通过
- 测试音频播放正常
- 无过期锁或孤立进程残留
Troubleshooting
故障排查
This skill IS the troubleshooting skill. If the standard diagnostics do not identify the issue:
- Check the full bot console log:
cat /private/tmp/telegram-bot.log - Check all NDJSON audit logs:
ls -lt ~/.local/share/tts-telegram-sync/logs/audit/ - Check system audio: (if this fails, it is a macOS audio issue, not TTS)
afplay /System/Library/Sounds/Tink.aiff - Run a manual Kokoro generation outside the bot to isolate the problem
- If all else fails, do a full teardown and reinstall using then
clean-component-removalfull-stack-bootstrap
本技能本身就是故障排查工具,如果标准诊断无法定位问题:
- 查看完整的机器人控制台日志:
cat /private/tmp/telegram-bot.log - 查看所有NDJSON审计日志:
ls -lt ~/.local/share/tts-telegram-sync/logs/audit/ - 检查系统音频:(如果此命令失败,说明是macOS音频问题,不是TTS问题)
afplay /System/Library/Sounds/Tink.aiff - 在机器人外手动运行Kokoro生成音频来隔离问题
- 如果以上方法都无效,使用和
clean-component-removal执行完整卸载和重装full-stack-bootstrap
Reference Documentation
参考文档
- Common Issues -- Expanded diagnostic procedures for each known issue
- Lock Debugging -- Deep dive into the two-layer lock mechanism
- Evolution Log -- Change history for this skill
- 常见问题 -- 每个已知问题的扩展诊断流程
- 锁调试 -- 双层锁机制深度解析
- 迭代日志 -- 本技能的变更历史