asr

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

When to Use

适用场景

User wants to transcribe an audio file to text
User provides an audio file path and asks for transcription
User says "转录", "识别", "transcribe", "语音转文字"

用户需要将音频文件转录为文本
用户提供音频文件路径并请求转录
用户说出 "转录"、"识别"、"transcribe"、"语音转文字"

When NOT to Use

不适用场景

User wants to synthesize speech from text (use
```
/tts
```
)
User wants to create a podcast or explainer (use
```
/podcast
```
or
```
/explainer
```
)

用户需要将文本合成为语音（请使用
```
/tts
```
功能）
用户需要创建播客或讲解内容（请使用
```
/podcast
```
或
```
/explainer
```
功能）

Purpose

功能用途

Transcribe audio files to text using

coli asr

, which runs fully offline via local speech recognition models. No API key required. Supports Chinese, English, Japanese, Korean, and Cantonese (sensevoice model) or English-only (whisper model).

Run

coli asr --help

for current CLI options and supported flags.

使用

coli asr

将音频文件转录为文本，该工具通过本地语音识别模型完全离线运行，无需API密钥。支持中英日韩粤语言（sensevoice模型）或仅支持英文（whisper模型）。

运行

coli asr --help

查看当前CLI选项及支持的参数。

Hard Constraints

硬性约束

No shell scripts. Use direct commands only.
Always read config following
```
shared/config-pattern.md
```
before any interaction
Follow
```
shared/common-patterns.md
```
for interaction patterns
Never ask more than one question at a time

<HARD-GATE> Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding. After all parameters are collected, summarize and ask the user to confirm before running any transcription. </HARD-GATE>

禁止使用Shell脚本，仅可使用直接命令。
任何交互前请务必遵循
```
shared/config-pattern.md
```
读取配置
交互模式请遵循
```
shared/common-patterns.md
```
一次不得询问多个问题

<HARD-GATE> 每一步选择题都必须使用AskUserQuestion工具——不得将选项以纯文本形式打印。一次只问一个问题，等待用户回答后再继续。收集完所有参数后，先汇总并请用户确认，再执行转录操作。 </HARD-GATE>

Interaction Flow

交互流程

Step 0: Prerequisites Check

步骤0：前置条件检查

Before config setup, silently check the environment:

bash

COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)

Issue	Action
`coli` not found	Block. Tell user to run `npm install -g @marswave/coli` first
`ffmpeg` not found	Warn (WAV files still work). Suggest `brew install ffmpeg` / `sudo apt install ffmpeg`
Models not downloaded	Inform user: first transcription will auto-download models (~60MB) to `~/.coli/models/`

coli

is missing, stop here and do not proceed.

配置设置前，静默检查环境：

bash

COLI_OK=$(which coli 2>/dev/null && echo yes || echo no)
FFMPEG_OK=$(which ffmpeg 2>/dev/null && echo yes || echo no)
MODELS_DIR="$HOME/.coli/models"
MODELS_OK=$([ -d "$MODELS_DIR" ] && ls "$MODELS_DIR" | grep -q sherpa && echo yes || echo no)

问题	处理动作
未找到 `coli`	阻止操作，告知用户先运行 `npm install -g @marswave/coli`
未找到 `ffmpeg`	发出警告（WAV文件仍可正常使用），建议执行 `brew install ffmpeg` / `sudo apt install ffmpeg` 安装
未下载模型	告知用户：首次转录时会自动将模型（约60MB）下载至 `~/.coli/models/` 目录

如果未找到

coli

，请在此停止操作，不得继续。

Step 0: Config Setup

步骤0：配置设置

shared/config-pattern.md

Step 0.

Initial defaults:

bash

undefined

遵循

shared/config-pattern.md

中的步骤0。

初始默认配置：

bash

undefined

当前目录:

mkdir -p ".listenhub/asr" echo '{"model":"sensevoice","polish":true}' > ".listenhub/asr/config.json" CONFIG_PATH=".listenhub/asr/config.json"

全局:

mkdir -p "$HOME/.listenhub/asr" echo '{"model":"sensevoice","polish":true}' > "$HOME/.listenhub/asr/config.json" CONFIG_PATH="$HOME/.listenhub/asr/config.json"


Config summary display:

当前配置 (asr)：模型：sensevoice / whisper-tiny.en 润色：开启 / 关闭

undefined

mkdir -p "$HOME/.listenhub/asr" echo '{"model":"sensevoice","polish":true}' > "$HOME/.listenhub/asr/config.json" CONFIG_PATH="$HOME/.listenhub/asr/config.json"


配置摘要显示：

当前配置 (asr)：模型：sensevoice / whisper-tiny.en 润色：开启 / 关闭

undefined

Setup Flow (first run or reconfigure)

设置流程（首次运行或重新配置）

Ask in order:

model: "默认使用哪个语音识别模型？"
- "sensevoice（推荐）" — 支持中英日韩粤，可检测语言、情绪、音频事件
- "whisper-tiny.en" — 仅英文
polish: "转录后由 AI 润色文本？（修正标点、去语气词、提升可读性）"
- "是（推荐）" →
```
polish: true
```
- "否，保留原始转录" →
```
polish: false
```

Save all answers at once after collecting them.

按顺序询问：

model: "默认使用哪个语音识别模型？"
- "sensevoice（推荐）" — 支持中英日韩粤，可检测语言、情绪、音频事件
- "whisper-tiny.en" — 仅英文
polish: "转录后由 AI 润色文本？（修正标点、去语气词、提升可读性）"
- "是（推荐）" →
```
polish: true
```
- "否，保留原始转录" →
```
polish: false
```

收集完所有答案后一次性保存。

Step 1: Get Audio File

步骤1：获取音频文件

If the user hasn't provided a file path, ask:

"请提供要转录的音频文件路径。"

Verify the file exists before proceeding.

如果用户尚未提供文件路径，请询问：

"请提供要转录的音频文件路径。"

继续操作前请验证文件是否存在。

Step 2: Confirm

步骤2：确认信息

准备转录：

  文件：{filename}
  模型：{model}
  润色：{是 / 否}

继续？

准备转录：

  文件：{filename}
  模型：{model}
  润色：{是 / 否}

继续？

Step 3: Transcribe

步骤3：执行转录

Run

coli asr

with JSON output (to get metadata):

bash

coli asr -j --model {model} "{file}"

On first run,

coli

will automatically download the required model. This may take a moment — inform the user if models haven't been downloaded yet.

Parse the JSON result to extract

text

lang

emotion

event

duration

运行带JSON输出的

coli asr

命令（以获取元数据）：

bash

coli asr -j --model {model} "{file}"

首次运行时，

coli

会自动下载所需模型。这可能需要一些时间——如果模型尚未下载，请告知用户。

解析JSON结果，提取

text

、

lang

、

emotion

、

event

、

duration

字段。

Step 4: Polish (if enabled)

步骤4：文本润色（若启用）

polish

true

, take the raw

text

from the transcription result and rewrite it to fix punctuation, remove filler words, and improve readability. Preserve the original meaning and speaker intent. Do not summarize or paraphrase.

如果

polish

设置为

true

，请提取转录结果中的原始

text

并进行改写，修正标点、去除语气词、提升可读性。需保留原意及说话者意图，不得进行总结或意译。

Step 5: Present Result

步骤5：展示结果

Display the transcript directly in the conversation:

转录完成

{transcript text}

─────────────────
语言：{lang} · 情绪：{emotion} · 时长：{duration}s

If polished, show the polished version with a note that it was AI-refined. Offer to show the raw original on request.

在对话中直接展示转录文本：

转录完成

{transcript text}

─────────────────
语言：{lang} · 情绪：{emotion} · 时长：{duration}s

若已进行润色，请展示润色后的版本，并标注该文本由AI优化。可应请求展示原始转录文本。

Step 6: Export as Markdown (optional)

步骤6：导出为Markdown（可选）

After presenting the result, ask:

Question: "保存为 Markdown 文件到当前目录？"
Options:
  - "是" — save to current directory
  - "否" — done

If yes, write

{audio-filename}-transcript.md

to the current working directory (where the user is running Claude Code). The file should contain the transcript text (polished version if polish was enabled), with a front-matter header:

markdown

---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---

{transcript text}

展示结果后，请询问：

Question: "保存为 Markdown 文件到当前目录？"
Options:
  - "是" — save to current directory
  - "否" — done

若用户选择是，请将

{audio-filename}-transcript.md

写入当前工作目录（即用户运行Claude Code的目录）。文件应包含转录文本（若启用润色则为润色后的版本），并添加如下前置头信息：

markdown

---
source: {original audio filename}
date: {YYYY-MM-DD}
model: {model used}
duration: {duration}s
lang: {detected language}
---

{transcript text}

Composability

可组合性

Invoked by: future skills that need to transcribe recorded audio
Invokes: nothing

被调用方: 需要转录录制音频的后续功能
调用方: 无

Examples

示例

"帮我转录这个文件 meeting.m4a"

Check prerequisites
Read config
Confirm: meeting.m4a, sensevoice, polish on

Run

coli asr -j --model sensevoice "meeting.m4a"

Polish the raw text
Display inline

"transcribe interview.wav, no polish"

Check prerequisites
Read config
Override polish to false for this session

Run

coli asr -j --model sensevoice "interview.wav"

Display raw transcript inline

"帮我转录这个文件 meeting.m4a"

检查前置条件
读取配置
确认信息：meeting.m4a，sensevoice模型，启用润色

运行

coli asr -j --model sensevoice "meeting.m4a"

对原始文本进行润色
在线展示结果

"transcribe interview.wav, no polish"

检查前置条件
读取配置
本次会话临时将polish设置为false

运行

coli asr -j --model sensevoice "interview.wav"

在线展示原始转录文本