asr

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

When to Use

适用场景

  • User wants to transcribe an audio file to text
  • User provides an audio file path and asks for transcription
  • User says "转录", "识别", "transcribe", "语音转文字"
  • 用户需要将音频文件转录为文本
  • 用户提供音频文件路径并请求转录
  • 用户说出 "转录"、"识别"、"transcribe"、"语音转文字"

When NOT to Use

不适用场景

  • User wants to synthesize speech from text (use
    /tts
    )
  • User wants to create a podcast or explainer (use
    /podcast
    or
    /explainer
    )
  • 用户需要将文本合成为语音(请使用
    /tts
    功能)
  • 用户需要创建播客或讲解内容(请使用
    /podcast
    /explainer
    功能)

Purpose

功能用途

Transcribe audio files to text using
coli asr
, which runs fully offline via local speech recognition models. No API key required. Supports Chinese, English, Japanese, Korean, and Cantonese (sensevoice model) or English-only (whisper model).
Run
coli asr --help
for current CLI options and supported flags.
使用
coli asr
将音频文件转录为文本,该工具通过本地语音识别模型完全离线运行,无需API密钥。支持中英日韩粤语言(sensevoice模型)或仅支持英文(whisper模型)。
运行
coli asr --help
查看当前CLI选项及支持的参数。

Hard Constraints

硬性约束

  • No shell scripts. Use direct commands only.
  • Always read config following
    shared/config-pattern.md
    before any interaction
  • Follow
    shared/common-patterns.md
    for interaction patterns
  • Never ask more than one question at a time
<HARD-GATE> Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding. After all parameters are collected, summarize and ask the user to confirm before running any transcription. </HARD-GATE>
  • 禁止使用Shell脚本,仅可使用直接命令。
  • 任何交互前请务必遵循
    shared/config-pattern.md
    读取配置
  • 交互模式请遵循
    shared/common-patterns.md
  • 一次不得询问多个问题
<HARD-GATE> 每一步选择题都必须使用AskUserQuestion工具——不得将选项以纯文本形式打印。一次只问一个问题,等待用户回答后再继续。收集完所有参数后,先汇总并请用户确认,再执行转录操作。 </HARD-GATE>

Interaction Flow

交互流程

Step 0: Prerequisites Check

步骤0:前置条件检查

Before config setup, silently check the environment:
bash
COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)
IssueAction
coli
not found
Block. Tell user to run
npm install -g @marswave/coli
first
ffmpeg
not found
Warn (WAV files still work). Suggest
brew install ffmpeg
/
sudo apt install ffmpeg
Models not downloadedInform user: first transcription will auto-download models (~60MB) to
~/.coli/models/
If
coli
is missing, stop here and do not proceed.
配置设置前,静默检查环境:
bash
COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)
问题处理动作
未找到
coli
阻止操作,告知用户先运行
npm install -g @marswave/coli
未找到
ffmpeg
发出警告(WAV文件仍可正常使用),建议执行
brew install ffmpeg
/
sudo apt install ffmpeg
安装
未下载模型告知用户:首次转录时会自动将模型(约60MB)下载至
~/.coli/models/
目录
如果未找到
coli
,请在此停止操作,不得继续。

Step 0: Config Setup

步骤0:配置设置

Follow
shared/config-pattern.md
Step 0.
Initial defaults:
bash
undefined
遵循
shared/config-pattern.md
中的步骤0。
初始默认配置:
bash
undefined

当前目录:

当前目录:

mkdir -p ".listenhub/asr" echo '{"model":"sensevoice","polish":true}' > ".listenhub/asr/config.json" CONFIG_PATH=".listenhub/asr/config.json"
mkdir -p ".listenhub/asr" echo '{"model":"sensevoice","polish":true}' > ".listenhub/asr/config.json" CONFIG_PATH=".listenhub/asr/config.json"

全局:

全局:

mkdir -p "$HOME/.listenhub/asr" echo '{"model":"sensevoice","polish":true}' > "$HOME/.listenhub/asr/config.json" CONFIG_PATH="$HOME/.listenhub/asr/config.json"

Config summary display:
当前配置 (asr): 模型:sensevoice / whisper-tiny.en 润色:开启 / 关闭
undefined
mkdir -p "$HOME/.listenhub/asr" echo '{"model":"sensevoice","polish":true}' > "$HOME/.listenhub/asr/config.json" CONFIG_PATH="$HOME/.listenhub/asr/config.json"

配置摘要显示:
当前配置 (asr): 模型:sensevoice / whisper-tiny.en 润色:开启 / 关闭
undefined

Setup Flow (first run or reconfigure)

设置流程(首次运行或重新配置)

Ask in order:
  1. model: "默认使用哪个语音识别模型?"
    • "sensevoice(推荐)" — 支持中英日韩粤,可检测语言、情绪、音频事件
    • "whisper-tiny.en" — 仅英文
  2. polish: "转录后由 AI 润色文本?(修正标点、去语气词、提升可读性)"
    • "是(推荐)" →
      polish: true
    • "否,保留原始转录" →
      polish: false
Save all answers at once after collecting them.
按顺序询问:
  1. model: "默认使用哪个语音识别模型?"
    • "sensevoice(推荐)" — 支持中英日韩粤,可检测语言、情绪、音频事件
    • "whisper-tiny.en" — 仅英文
  2. polish: "转录后由 AI 润色文本?(修正标点、去语气词、提升可读性)"
    • "是(推荐)" →
      polish: true
    • "否,保留原始转录" →
      polish: false
收集完所有答案后一次性保存。

Step 1: Get Audio File

步骤1:获取音频文件

If the user hasn't provided a file path, ask:
"请提供要转录的音频文件路径。"
Verify the file exists before proceeding.
如果用户尚未提供文件路径,请询问:
"请提供要转录的音频文件路径。"
继续操作前请验证文件是否存在。

Step 2: Confirm

步骤2:确认信息

准备转录:

  文件:{filename}
  模型:{model}
  润色:{是 / 否}

继续?
准备转录:

  文件:{filename}
  模型:{model}
  润色:{是 / 否}

继续?

Step 3: Transcribe

步骤3:执行转录

Run
coli asr
with JSON output (to get metadata):
bash
coli asr -j --model {model} "{file}"
On first run,
coli
will automatically download the required model. This may take a moment — inform the user if models haven't been downloaded yet.
Parse the JSON result to extract
text
,
lang
,
emotion
,
event
,
duration
.
运行带JSON输出的
coli asr
命令(以获取元数据):
bash
coli asr -j --model {model} "{file}"
首次运行时,
coli
会自动下载所需模型。这可能需要一些时间——如果模型尚未下载,请告知用户。
解析JSON结果,提取
text
lang
emotion
event
duration
字段。

Step 4: Polish (if enabled)

步骤4:文本润色(若启用)

If
polish
is
true
, take the raw
text
from the transcription result and rewrite it to fix punctuation, remove filler words, and improve readability. Preserve the original meaning and speaker intent. Do not summarize or paraphrase.
如果
polish
设置为
true
,请提取转录结果中的原始
text
并进行改写,修正标点、去除语气词、提升可读性。需保留原意及说话者意图,不得进行总结或意译。

Step 5: Present Result

步骤5:展示结果

Display the transcript directly in the conversation:
转录完成

{transcript text}

─────────────────
语言:{lang} · 情绪:{emotion} · 时长:{duration}s
If polished, show the polished version with a note that it was AI-refined. Offer to show the raw original on request.
在对话中直接展示转录文本:
转录完成

{transcript text}

─────────────────
语言:{lang} · 情绪:{emotion} · 时长:{duration}s
若已进行润色,请展示润色后的版本,并标注该文本由AI优化。可应请求展示原始转录文本。

Step 6: Export as Markdown (optional)

步骤6:导出为Markdown(可选)

After presenting the result, ask:
Question: "保存为 Markdown 文件到当前目录?"
Options:
  - "是" — save to current directory
  - "否" — done
If yes, write
{audio-filename}-transcript.md
to the current working directory (where the user is running Claude Code). The file should contain the transcript text (polished version if polish was enabled), with a front-matter header:
markdown
---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---

{transcript text}
展示结果后,请询问:
Question: "保存为 Markdown 文件到当前目录?"
Options:
  - "是" — save to current directory
  - "否" — done
若用户选择是,请将
{audio-filename}-transcript.md
写入当前工作目录(即用户运行Claude Code的目录)。文件应包含转录文本(若启用润色则为润色后的版本),并添加如下前置头信息:
markdown
---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---

{transcript text}

Composability

可组合性

  • Invoked by: future skills that need to transcribe recorded audio
  • Invokes: nothing
  • 被调用方: 需要转录录制音频的后续功能
  • 调用方: 无

Examples

示例

"帮我转录这个文件 meeting.m4a"
  1. Check prerequisites
  2. Read config
  3. Confirm: meeting.m4a, sensevoice, polish on
  4. Run
    coli asr -j --model sensevoice "meeting.m4a"
  5. Polish the raw text
  6. Display inline
"transcribe interview.wav, no polish"
  1. Check prerequisites
  2. Read config
  3. Override polish to false for this session
  4. Run
    coli asr -j --model sensevoice "interview.wav"
  5. Display raw transcript inline
"帮我转录这个文件 meeting.m4a"
  1. 检查前置条件
  2. 读取配置
  3. 确认信息:meeting.m4a,sensevoice模型,启用润色
  4. 运行
    coli asr -j --model sensevoice "meeting.m4a"
  5. 对原始文本进行润色
  6. 在线展示结果
"transcribe interview.wav, no polish"
  1. 检查前置条件
  2. 读取配置
  3. 本次会话临时将polish设置为false
  4. 运行
    coli asr -j --model sensevoice "interview.wav"
  5. 在线展示原始转录文本