funasr-transcribe

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

FunASR 语音转文字

FunASR Speech-to-Text

本 skill 提供本地语音识别服务，将音频或视频文件转换为结构化的 Markdown 文档。

This skill provides local speech recognition service to convert audio or video files into structured Markdown documents.

功能概述

Feature Overview

支持多种音视频格式（mp4、mov、mp3、wav、m4a、flac 等）
自动生成时间戳
支持说话人分离（diarization）
输出 Markdown 格式，便于阅读和编辑

Supports multiple audio and video formats (mp4, mov, mp3, wav, m4a, flac, etc.)
Automatically generates timestamps
Supports speaker diarization
Outputs in Markdown format for easy reading and editing

使用流程

Usage Workflow

首次使用：安装依赖和下载模型

First-time Use: Install Dependencies and Download Models

运行安装脚本完成环境配置：

bash

python scripts/setup.py

安装脚本会自动：

检查 Python 版本（需要 >= 3.8）
安装依赖包（FastAPI、Uvicorn、FunASR、PyTorch）
下载 ASR 模型到
```
~/.cache/modelscope/hub/models/
```

验证安装状态：

bash

python scripts/setup.py --verify

Run the installation script to complete environment configuration:

bash

python scripts/setup.py

The installation script will automatically:

Check Python version (requires >= 3.8)
Install dependency packages (FastAPI, Uvicorn, FunASR, PyTorch)
Download ASR models to
```
~/.cache/modelscope/hub/models/
```

Verify installation status:

bash

python scripts/setup.py --verify

启动转录服务

Start Transcription Service

bash

python scripts/server.py

服务默认运行在

http://127.0.0.1:8765

智能特性：

自动启动：首次请求时自动加载模型
空闲关闭：默认 10 分钟无活动后自动关闭以节约资源
可配置超时：使用
```
--idle-timeout
```
参数自定义空闲超时时间（秒）

服务生命周期：

启动后进入空闲监控状态
接收到请求时自动加载模型并执行转录
每次请求都会重置空闲计时器
连续 10 分钟无请求时自动关闭
下次请求时重新启动

重要提示：

⚠️ 请勿手动关闭服务 - 转录完成后让服务继续运行，它会自动在 10 分钟无活动后关闭
这样可以连续转录多个文件，无需重复启动服务
如需立即关闭服务，按
```
Ctrl+C
```
或等待 10 分钟空闲超时

示例：自定义 30 分钟空闲超时

bash

python scripts/server.py --idle-timeout 1800

bash

python scripts/server.py

The service runs on

http://127.0.0.1:8765

by default

Smart Features:

Auto-start: Automatically loads models on first request
Idle Shutdown: Automatically shuts down after 10 minutes of inactivity by default to save resources
Configurable Timeout: Use the
```
--idle-timeout
```
parameter to customize idle timeout (in seconds)

Service Lifecycle:

Enters idle monitoring state after startup
Automatically loads models and executes transcription when receiving a request
Resets the idle timer for each request
Automatically shuts down if no requests are received for 10 consecutive minutes
Restarts on next request

Important Notes:

⚠️ Do not manually shut down the service - Leave the service running after transcription is completed, it will automatically shut down after 10 minutes of inactivity
This allows continuous transcription of multiple files without restarting the service repeatedly
To shut down the service immediately, press
```
Ctrl+C
```
or wait for the 10-minute idle timeout

Example: Customize 30-minute idle timeout

bash

python scripts/server.py --idle-timeout 1800

执行转录

Execute Transcription

使用客户端脚本转录文件：

bash

undefined

Use the client script to transcribe files:

bash

undefined

转录单个文件

Transcribe a single file

python scripts/transcribe.py /path/to/audio.mp3

指定输出路径

Specify output path

python scripts/transcribe.py /path/to/video.mp4 -o transcript.md

启用说话人分离

Enable speaker diarization

python scripts/transcribe.py /path/to/meeting.m4a --diarize

批量转录目录

Batch transcribe a directory

python scripts/transcribe.py /path/to/media_folder/

undefined

python scripts/transcribe.py /path/to/media_folder/

undefined

AI 智能总结（Claude Code 环境）

AI Intelligent Summary (Claude Code Environment)

转录完成后，可以生成 AI 智能总结，充分利用 Claude Code 的原生 AI 能力。

工作流程：

执行转录后，脚本会自动准备总结提示词
将提示词发送给 Claude AI 生成结构化总结
将 Claude 返回的 JSON 结果粘贴回脚本
自动将总结注入到 Markdown 文件

使用方法：

bash

undefined

After transcription, you can generate an AI intelligent summary, making full use of Claude Code's native AI capabilities.

Workflow:

After transcription, the script will automatically prepare summary prompts
Send the prompts to Claude AI to generate structured summaries
Paste the JSON result returned by Claude back into the script
Automatically inject the summary into the Markdown file

Usage:

bash

undefined

转录单个文件（会自动提示是否生成总结）

Transcribe a single file (will automatically prompt whether to generate a summary)

python scripts/transcribe.py /path/to/audio.mp3

启用说话人分离并生成总结

Enable speaker diarization and generate summary

python scripts/transcribe.py /path/to/meeting.m4a --diarize --summary


**总结内容结构：**

- **全文总结** - 400+ 字，包含背景、问题、关键事实
- **发言人总结** - 每个发言人的观点、态度和贡献
- **重点内容** - 6-10 条核心要点
- **关键词** - 5-8 个关键术语

**提示词特点：**

- 专门针对中文口语化对话优化
- 保留发言人上下文和对话流程
- 结构化 JSON 输出便于解析和格式化

详细文档请查看：<references/api-reference.md>

python scripts/transcribe.py /path/to/meeting.m4a --diarize --summary


**Summary Content Structure:**

- **Full Text Summary** - Over 400 words, including background, issues, and key facts
- **Speaker Summary** - Each speaker's viewpoints, attitudes, and contributions
- **Key Content** - 6-10 core points
- **Keywords** - 5-8 key terms

**Prompt Features:**

- Optimized specifically for Chinese colloquial conversations
- Retains speaker context and dialogue flow
- Structured JSON output for easy parsing and formatting

For detailed documentation, please refer to: <references/api-reference.md>

通过 HTTP API 调用

Call via HTTP API

检查服务状态：

bash

curl http://127.0.0.1:8765/health

使用 curl 直接调用 API：

bash

curl -X POST http://127.0.0.1:8765/transcribe \
  -H "Content-Type: application/json" \
  -d '{"file_path": "/path/to/audio.mp3"}'

API 文档（Swagger UI）：

FastAPI 自动生成交互式 API 文档，访问：http://127.0.0.1:8765/docs

可在此页面中：

查看所有 API 端点
在线测试 API（不需要 curl）
查看请求/响应格式
查看详细参数说明

响应示例（健康检查）：

json

{
  "status": "ok",
  "service": "FunASR Transcribe",
  "uptime": 300,
  "idle_time": 120
}

返回字段说明：

```
uptime
```
：服务运行时间（秒）
```
idle_time
```
：当前空闲时间（秒）

Check Service Status:

bash

curl http://127.0.0.1:8765/health

Call the API directly using curl:

bash

curl -X POST http://127.0.0.1:8765/transcribe \
  -H "Content-Type: application/json" \
  -d '{"file_path": "/path/to/audio.mp3"}'

API Documentation (Swagger UI):

FastAPI automatically generates interactive API documentation, visit: http://127.0.0.1:8765/docs

On this page, you can:

View all API endpoints
Test APIs online (no curl required)
View request/response formats
View detailed parameter descriptions

Response Example (Health Check):

json

{
  "status": "ok",
  "service": "FunASR Transcribe",
  "uptime": 300,
  "idle_time": 120
}

Return field descriptions:

```
uptime
```
: Service running time (in seconds)
```
idle_time
```
: Current idle time (in seconds)

完整 API 文档

Complete API Documentation

详细的 API 参考文档请查看：<references/api-reference.md>

包含：

所有 API 端点的完整规范
请求/响应格式详解
参数说明和示例
完整的 curl 命令示例

For detailed API reference documentation, please refer to: <references/api-reference.md>

Including:

Complete specifications for all API endpoints
Detailed explanations of request/response formats
Parameter descriptions and examples
Complete curl command examples

脚本说明

Script Description

脚本	用途
`scripts/setup.py`	一键安装依赖和下载模型
`scripts/server.py`	启动 HTTP API 服务
`scripts/transcribe.py`	命令行客户端

Script	Purpose
`scripts/setup.py`	One-click installation of dependencies and model download
`scripts/server.py`	Start HTTP API service
`scripts/transcribe.py`	Command-line client

配置文件

Configuration Files

文件	说明
`assets/models.json`	ASR 模型配置清单
`assets/requirements.txt`	Python 依赖清单

File	Description
`assets/models.json`	ASR model configuration list
`assets/requirements.txt`	Python dependency list

输出格式

Output Format

转录结果保存为 Markdown 文件，包含：

标题 - 文件名（无转录时间戳）
转录内容 - 格式：
```
发言人N HH:MM:SS
```
换行
```
内容
```
AI 摘要（可选）- 包含全文总结、发言人总结、重点内容、关键词

示例格式：

markdown

undefined

Transcription results are saved as Markdown files, including:

Title - File name (without transcription timestamp)
Transcription Content - Format:
```
SpeakerN HH:MM:SS
```
followed by
```
content
```
on a new line
AI Summary (Optional) - Includes full text summary, speaker summary, key content, and keywords

Example Format:

markdown

undefined

转录：filename.mp4

Transcription: filename.mp4

转录内容

Transcription Content

发言人1 00:00:01 这是第一句话的内容。

发言人2 00:00:05 这是第二句话的内容。

undefined

Speaker1 00:00:01 This is the content of the first sentence.

Speaker2 00:00:05 This is the content of the second sentence.

undefined

模型信息

Model Information

模型存储在 ModelScope 默认缓存目录

~/.cache/modelscope/hub/models/

：

ASR 主模型 (Paraformer) - 867MB
VAD 模型 - 4MB
标点模型 - 283MB
说话人分离模型 - 28MB

Models are stored in the ModelScope default cache directory

~/.cache/modelscope/hub/models/

ASR Main Model (Paraformer) - 867MB
VAD Model - 4MB
Punctuation Model - 283MB
Speaker Diarization Model - 28MB

故障排除

Troubleshooting

服务启动失败时，运行验证命令检查安装状态：

bash

python scripts/setup.py --verify

重新下载模型：

bash

python scripts/setup.py --skip-deps

If the service fails to start, run the verification command to check the installation status:

bash

python scripts/setup.py --verify

Re-download models:

bash

python scripts/setup.py --skip-deps