funasr-transcribe

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

FunASR 语音转文字

FunASR Speech-to-Text

本 skill 提供本地语音识别服务,将音频或视频文件转换为结构化的 Markdown 文档。
This skill provides local speech recognition service to convert audio or video files into structured Markdown documents.

功能概述

Feature Overview

  • 支持多种音视频格式(mp4、mov、mp3、wav、m4a、flac 等)
  • 自动生成时间戳
  • 支持说话人分离(diarization)
  • 输出 Markdown 格式,便于阅读和编辑
  • Supports multiple audio and video formats (mp4, mov, mp3, wav, m4a, flac, etc.)
  • Automatically generates timestamps
  • Supports speaker diarization
  • Outputs in Markdown format for easy reading and editing

使用流程

Usage Workflow

首次使用:安装依赖和下载模型

First-time Use: Install Dependencies and Download Models

运行安装脚本完成环境配置:
bash
python scripts/setup.py
安装脚本会自动:
  1. 检查 Python 版本(需要 >= 3.8)
  2. 安装依赖包(FastAPI、Uvicorn、FunASR、PyTorch)
  3. 下载 ASR 模型到
    ~/.cache/modelscope/hub/models/
验证安装状态:
bash
python scripts/setup.py --verify
Run the installation script to complete environment configuration:
bash
python scripts/setup.py
The installation script will automatically:
  1. Check Python version (requires >= 3.8)
  2. Install dependency packages (FastAPI, Uvicorn, FunASR, PyTorch)
  3. Download ASR models to
    ~/.cache/modelscope/hub/models/
Verify installation status:
bash
python scripts/setup.py --verify

启动转录服务

Start Transcription Service

bash
python scripts/server.py
服务默认运行在
http://127.0.0.1:8765
智能特性:
  • 自动启动:首次请求时自动加载模型
  • 空闲关闭:默认 10 分钟无活动后自动关闭以节约资源
  • 可配置超时:使用
    --idle-timeout
    参数自定义空闲超时时间(秒)
服务生命周期:
  1. 启动后进入空闲监控状态
  2. 接收到请求时自动加载模型并执行转录
  3. 每次请求都会重置空闲计时器
  4. 连续 10 分钟无请求时自动关闭
  5. 下次请求时重新启动
重要提示:
  • ⚠️ 请勿手动关闭服务 - 转录完成后让服务继续运行,它会自动在 10 分钟无活动后关闭
  • 这样可以连续转录多个文件,无需重复启动服务
  • 如需立即关闭服务,按
    Ctrl+C
    或等待 10 分钟空闲超时
示例:自定义 30 分钟空闲超时
bash
python scripts/server.py --idle-timeout 1800
bash
python scripts/server.py
The service runs on
http://127.0.0.1:8765
by default
Smart Features:
  • Auto-start: Automatically loads models on first request
  • Idle Shutdown: Automatically shuts down after 10 minutes of inactivity by default to save resources
  • Configurable Timeout: Use the
    --idle-timeout
    parameter to customize idle timeout (in seconds)
Service Lifecycle:
  1. Enters idle monitoring state after startup
  2. Automatically loads models and executes transcription when receiving a request
  3. Resets the idle timer for each request
  4. Automatically shuts down if no requests are received for 10 consecutive minutes
  5. Restarts on next request
Important Notes:
  • ⚠️ Do not manually shut down the service - Leave the service running after transcription is completed, it will automatically shut down after 10 minutes of inactivity
  • This allows continuous transcription of multiple files without restarting the service repeatedly
  • To shut down the service immediately, press
    Ctrl+C
    or wait for the 10-minute idle timeout
Example: Customize 30-minute idle timeout
bash
python scripts/server.py --idle-timeout 1800

执行转录

Execute Transcription

使用客户端脚本转录文件:
bash
undefined
Use the client script to transcribe files:
bash
undefined

转录单个文件

Transcribe a single file

python scripts/transcribe.py /path/to/audio.mp3
python scripts/transcribe.py /path/to/audio.mp3

指定输出路径

Specify output path

python scripts/transcribe.py /path/to/video.mp4 -o transcript.md
python scripts/transcribe.py /path/to/video.mp4 -o transcript.md

启用说话人分离

Enable speaker diarization

python scripts/transcribe.py /path/to/meeting.m4a --diarize
python scripts/transcribe.py /path/to/meeting.m4a --diarize

批量转录目录

Batch transcribe a directory

python scripts/transcribe.py /path/to/media_folder/
undefined
python scripts/transcribe.py /path/to/media_folder/
undefined

AI 智能总结(Claude Code 环境)

AI Intelligent Summary (Claude Code Environment)

转录完成后,可以生成 AI 智能总结,充分利用 Claude Code 的原生 AI 能力。
工作流程:
  1. 执行转录后,脚本会自动准备总结提示词
  2. 将提示词发送给 Claude AI 生成结构化总结
  3. 将 Claude 返回的 JSON 结果粘贴回脚本
  4. 自动将总结注入到 Markdown 文件
使用方法:
bash
undefined
After transcription, you can generate an AI intelligent summary, making full use of Claude Code's native AI capabilities.
Workflow:
  1. After transcription, the script will automatically prepare summary prompts
  2. Send the prompts to Claude AI to generate structured summaries
  3. Paste the JSON result returned by Claude back into the script
  4. Automatically inject the summary into the Markdown file
Usage:
bash
undefined

转录单个文件(会自动提示是否生成总结)

Transcribe a single file (will automatically prompt whether to generate a summary)

python scripts/transcribe.py /path/to/audio.mp3
python scripts/transcribe.py /path/to/audio.mp3

启用说话人分离并生成总结

Enable speaker diarization and generate summary

python scripts/transcribe.py /path/to/meeting.m4a --diarize --summary

**总结内容结构:**

- **全文总结** - 400+ 字,包含背景、问题、关键事实
- **发言人总结** - 每个发言人的观点、态度和贡献
- **重点内容** - 6-10 条核心要点
- **关键词** - 5-8 个关键术语

**提示词特点:**

- 专门针对中文口语化对话优化
- 保留发言人上下文和对话流程
- 结构化 JSON 输出便于解析和格式化

详细文档请查看:<references/api-reference.md>
python scripts/transcribe.py /path/to/meeting.m4a --diarize --summary

**Summary Content Structure:**

- **Full Text Summary** - Over 400 words, including background, issues, and key facts
- **Speaker Summary** - Each speaker's viewpoints, attitudes, and contributions
- **Key Content** - 6-10 core points
- **Keywords** - 5-8 key terms

**Prompt Features:**

- Optimized specifically for Chinese colloquial conversations
- Retains speaker context and dialogue flow
- Structured JSON output for easy parsing and formatting

For detailed documentation, please refer to: <references/api-reference.md>

通过 HTTP API 调用

Call via HTTP API

检查服务状态
bash
curl http://127.0.0.1:8765/health
使用 curl 直接调用 API:
bash
curl -X POST http://127.0.0.1:8765/transcribe \
  -H "Content-Type: application/json" \
  -d '{"file_path": "/path/to/audio.mp3"}'
API 文档(Swagger UI)
FastAPI 自动生成交互式 API 文档,访问:http://127.0.0.1:8765/docs
可在此页面中:
  • 查看所有 API 端点
  • 在线测试 API(不需要 curl)
  • 查看请求/响应格式
  • 查看详细参数说明
响应示例(健康检查):
json
{
  "status": "ok",
  "service": "FunASR Transcribe",
  "uptime": 300,
  "idle_time": 120
}
返回字段说明:
  • uptime
    :服务运行时间(秒)
  • idle_time
    :当前空闲时间(秒)
Check Service Status:
bash
curl http://127.0.0.1:8765/health
Call the API directly using curl:
bash
curl -X POST http://127.0.0.1:8765/transcribe \
  -H "Content-Type: application/json" \
  -d '{"file_path": "/path/to/audio.mp3"}'
API Documentation (Swagger UI):
FastAPI automatically generates interactive API documentation, visit: http://127.0.0.1:8765/docs
On this page, you can:
  • View all API endpoints
  • Test APIs online (no curl required)
  • View request/response formats
  • View detailed parameter descriptions
Response Example (Health Check):
json
{
  "status": "ok",
  "service": "FunASR Transcribe",
  "uptime": 300,
  "idle_time": 120
}
Return field descriptions:
  • uptime
    : Service running time (in seconds)
  • idle_time
    : Current idle time (in seconds)

完整 API 文档

Complete API Documentation

详细的 API 参考文档请查看:<references/api-reference.md>
包含:
  • 所有 API 端点的完整规范
  • 请求/响应格式详解
  • 参数说明和示例
  • 完整的 curl 命令示例
For detailed API reference documentation, please refer to: <references/api-reference.md>
Including:
  • Complete specifications for all API endpoints
  • Detailed explanations of request/response formats
  • Parameter descriptions and examples
  • Complete curl command examples

脚本说明

Script Description

脚本用途
scripts/setup.py
一键安装依赖和下载模型
scripts/server.py
启动 HTTP API 服务
scripts/transcribe.py
命令行客户端
ScriptPurpose
scripts/setup.py
One-click installation of dependencies and model download
scripts/server.py
Start HTTP API service
scripts/transcribe.py
Command-line client

配置文件

Configuration Files

文件说明
assets/models.json
ASR 模型配置清单
assets/requirements.txt
Python 依赖清单
FileDescription
assets/models.json
ASR model configuration list
assets/requirements.txt
Python dependency list

输出格式

Output Format

转录结果保存为 Markdown 文件,包含:
  1. 标题 - 文件名(无转录时间戳)
  2. 转录内容 - 格式:
    发言人N HH:MM:SS
    换行
    内容
  3. AI 摘要(可选)- 包含全文总结、发言人总结、重点内容、关键词
示例格式:
markdown
undefined
Transcription results are saved as Markdown files, including:
  1. Title - File name (without transcription timestamp)
  2. Transcription Content - Format:
    SpeakerN HH:MM:SS
    followed by
    content
    on a new line
  3. AI Summary (Optional) - Includes full text summary, speaker summary, key content, and keywords
Example Format:
markdown
undefined

转录:filename.mp4

Transcription: filename.mp4

转录内容

Transcription Content

发言人1 00:00:01 这是第一句话的内容。
发言人2 00:00:05 这是第二句话的内容。
undefined
Speaker1 00:00:01 This is the content of the first sentence.
Speaker2 00:00:05 This is the content of the second sentence.
undefined

模型信息

Model Information

模型存储在 ModelScope 默认缓存目录
~/.cache/modelscope/hub/models/
  • ASR 主模型 (Paraformer) - 867MB
  • VAD 模型 - 4MB
  • 标点模型 - 283MB
  • 说话人分离模型 - 28MB
Models are stored in the ModelScope default cache directory
~/.cache/modelscope/hub/models/
:
  • ASR Main Model (Paraformer) - 867MB
  • VAD Model - 4MB
  • Punctuation Model - 283MB
  • Speaker Diarization Model - 28MB

故障排除

Troubleshooting

服务启动失败时,运行验证命令检查安装状态:
bash
python scripts/setup.py --verify
重新下载模型:
bash
python scripts/setup.py --skip-deps
If the service fails to start, run the verification command to check the installation status:
bash
python scripts/setup.py --verify
Re-download models:
bash
python scripts/setup.py --skip-deps