asr
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWhen to Use
适用场景
- User wants to transcribe an audio file to text
- User provides an audio file path and asks for transcription
- User says "转录", "识别", "transcribe", "语音转文字"
- 用户需要将音频文件转录为文本
- 用户提供音频文件路径并请求转录
- 用户说出 "转录"、"识别"、"transcribe"、"语音转文字"
When NOT to Use
不适用场景
- User wants to synthesize speech from text (use )
/tts - User wants to create a podcast or explainer (use or
/podcast)/explainer
- 用户需要将文本合成为语音(请使用功能)
/tts - 用户需要创建播客或讲解内容(请使用或
/podcast功能)/explainer
Purpose
功能用途
Transcribe audio files to text using , which runs fully offline via local
speech recognition models. No API key required. Supports Chinese, English, Japanese,
Korean, and Cantonese (sensevoice model) or English-only (whisper model).
coli asrRun for current CLI options and supported flags.
coli asr --help使用将音频文件转录为文本,该工具通过本地语音识别模型完全离线运行,无需API密钥。支持中英日韩粤语言(sensevoice模型)或仅支持英文(whisper模型)。
coli asr运行查看当前CLI选项及支持的参数。
coli asr --helpHard Constraints
硬性约束
- No shell scripts. Use direct commands only.
- Always read config following before any interaction
shared/config-pattern.md - Follow for interaction patterns
shared/common-patterns.md - Never ask more than one question at a time
- 禁止使用Shell脚本,仅可使用直接命令。
- 任何交互前请务必遵循读取配置
shared/config-pattern.md - 交互模式请遵循
shared/common-patterns.md - 一次不得询问多个问题
Interaction Flow
交互流程
Step 0: Prerequisites Check
步骤0:前置条件检查
Before config setup, silently check the environment:
bash
COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)| Issue | Action |
|---|---|
| Block. Tell user to run |
| Warn (WAV files still work). Suggest |
| Models not downloaded | Inform user: first transcription will auto-download models (~60MB) to |
If is missing, stop here and do not proceed.
coli配置设置前,静默检查环境:
bash
COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)| 问题 | 处理动作 |
|---|---|
未找到 | 阻止操作,告知用户先运行 |
未找到 | 发出警告(WAV文件仍可正常使用),建议执行 |
| 未下载模型 | 告知用户:首次转录时会自动将模型(约60MB)下载至 |
如果未找到,请在此停止操作,不得继续。
coliStep 0: Config Setup
步骤0:配置设置
Follow Step 0.
shared/config-pattern.mdInitial defaults:
bash
undefined遵循中的步骤0。
shared/config-pattern.md初始默认配置:
bash
undefined当前目录:
当前目录:
mkdir -p ".listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > ".listenhub/asr/config.json"
CONFIG_PATH=".listenhub/asr/config.json"
mkdir -p ".listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > ".listenhub/asr/config.json"
CONFIG_PATH=".listenhub/asr/config.json"
全局:
全局:
mkdir -p "$HOME/.listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > "$HOME/.listenhub/asr/config.json"
CONFIG_PATH="$HOME/.listenhub/asr/config.json"
Config summary display:当前配置 (asr):
模型:sensevoice / whisper-tiny.en
润色:开启 / 关闭
undefinedmkdir -p "$HOME/.listenhub/asr"
echo '{"model":"sensevoice","polish":true}' > "$HOME/.listenhub/asr/config.json"
CONFIG_PATH="$HOME/.listenhub/asr/config.json"
配置摘要显示:当前配置 (asr):
模型:sensevoice / whisper-tiny.en
润色:开启 / 关闭
undefinedSetup Flow (first run or reconfigure)
设置流程(首次运行或重新配置)
Ask in order:
-
model: "默认使用哪个语音识别模型?"
- "sensevoice(推荐)" — 支持中英日韩粤,可检测语言、情绪、音频事件
- "whisper-tiny.en" — 仅英文
-
polish: "转录后由 AI 润色文本?(修正标点、去语气词、提升可读性)"
- "是(推荐)" →
polish: true - "否,保留原始转录" →
polish: false
- "是(推荐)" →
Save all answers at once after collecting them.
按顺序询问:
-
model: "默认使用哪个语音识别模型?"
- "sensevoice(推荐)" — 支持中英日韩粤,可检测语言、情绪、音频事件
- "whisper-tiny.en" — 仅英文
-
polish: "转录后由 AI 润色文本?(修正标点、去语气词、提升可读性)"
- "是(推荐)" →
polish: true - "否,保留原始转录" →
polish: false
- "是(推荐)" →
收集完所有答案后一次性保存。
Step 1: Get Audio File
步骤1:获取音频文件
If the user hasn't provided a file path, ask:
"请提供要转录的音频文件路径。"
Verify the file exists before proceeding.
如果用户尚未提供文件路径,请询问:
"请提供要转录的音频文件路径。"
继续操作前请验证文件是否存在。
Step 2: Confirm
步骤2:确认信息
准备转录:
文件:{filename}
模型:{model}
润色:{是 / 否}
继续?准备转录:
文件:{filename}
模型:{model}
润色:{是 / 否}
继续?Step 3: Transcribe
步骤3:执行转录
Run with JSON output (to get metadata):
coli asrbash
coli asr -j --model {model} "{file}"On first run, will automatically download the required model. This may take a
moment — inform the user if models haven't been downloaded yet.
coliParse the JSON result to extract , , , , .
textlangemotioneventduration运行带JSON输出的命令(以获取元数据):
coli asrbash
coli asr -j --model {model} "{file}"首次运行时,会自动下载所需模型。这可能需要一些时间——如果模型尚未下载,请告知用户。
coli解析JSON结果,提取、、、、字段。
textlangemotioneventdurationStep 4: Polish (if enabled)
步骤4:文本润色(若启用)
If is , take the raw from the transcription result and rewrite
it to fix punctuation, remove filler words, and improve readability. Preserve the
original meaning and speaker intent. Do not summarize or paraphrase.
polishtruetext如果设置为,请提取转录结果中的原始并进行改写,修正标点、去除语气词、提升可读性。需保留原意及说话者意图,不得进行总结或意译。
polishtruetextStep 5: Present Result
步骤5:展示结果
Display the transcript directly in the conversation:
转录完成
{transcript text}
─────────────────
语言:{lang} · 情绪:{emotion} · 时长:{duration}sIf polished, show the polished version with a note that it was AI-refined. Offer to
show the raw original on request.
在对话中直接展示转录文本:
转录完成
{transcript text}
─────────────────
语言:{lang} · 情绪:{emotion} · 时长:{duration}s若已进行润色,请展示润色后的版本,并标注该文本由AI优化。可应请求展示原始转录文本。
Step 6: Export as Markdown (optional)
步骤6:导出为Markdown(可选)
After presenting the result, ask:
Question: "保存为 Markdown 文件到当前目录?"
Options:
- "是" — save to current directory
- "否" — doneIf yes, write to the current working directory
(where the user is running Claude Code). The file should contain the transcript text
(polished version if polish was enabled), with a front-matter header:
{audio-filename}-transcript.mdmarkdown
---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---
{transcript text}展示结果后,请询问:
Question: "保存为 Markdown 文件到当前目录?"
Options:
- "是" — save to current directory
- "否" — done若用户选择是,请将写入当前工作目录(即用户运行Claude Code的目录)。文件应包含转录文本(若启用润色则为润色后的版本),并添加如下前置头信息:
{audio-filename}-transcript.mdmarkdown
---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---
{transcript text}Composability
可组合性
- Invoked by: future skills that need to transcribe recorded audio
- Invokes: nothing
- 被调用方: 需要转录录制音频的后续功能
- 调用方: 无
Examples
示例
"帮我转录这个文件 meeting.m4a"
- Check prerequisites
- Read config
- Confirm: meeting.m4a, sensevoice, polish on
- Run
coli asr -j --model sensevoice "meeting.m4a" - Polish the raw text
- Display inline
"transcribe interview.wav, no polish"
- Check prerequisites
- Read config
- Override polish to false for this session
- Run
coli asr -j --model sensevoice "interview.wav" - Display raw transcript inline
"帮我转录这个文件 meeting.m4a"
- 检查前置条件
- 读取配置
- 确认信息:meeting.m4a,sensevoice模型,启用润色
- 运行
coli asr -j --model sensevoice "meeting.m4a" - 对原始文本进行润色
- 在线展示结果
"transcribe interview.wav, no polish"
- 检查前置条件
- 读取配置
- 本次会话临时将polish设置为false
- 运行
coli asr -j --model sensevoice "interview.wav" - 在线展示原始转录文本