funasr-transcribe
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFunASR 语音转文字
FunASR Speech-to-Text
本 skill 提供本地语音识别服务,将音频或视频文件转换为结构化的 Markdown 文档。
This skill provides local speech recognition service to convert audio or video files into structured Markdown documents.
功能概述
Feature Overview
- 支持多种音视频格式(mp4、mov、mp3、wav、m4a、flac 等)
- 自动生成时间戳
- 支持说话人分离(diarization)
- 输出 Markdown 格式,便于阅读和编辑
- Supports multiple audio and video formats (mp4, mov, mp3, wav, m4a, flac, etc.)
- Automatically generates timestamps
- Supports speaker diarization
- Outputs in Markdown format for easy reading and editing
使用流程
Usage Workflow
首次使用:安装依赖和下载模型
First-time Use: Install Dependencies and Download Models
运行安装脚本完成环境配置:
bash
python scripts/setup.py安装脚本会自动:
- 检查 Python 版本(需要 >= 3.8)
- 安装依赖包(FastAPI、Uvicorn、FunASR、PyTorch)
- 下载 ASR 模型到
~/.cache/modelscope/hub/models/
验证安装状态:
bash
python scripts/setup.py --verifyRun the installation script to complete environment configuration:
bash
python scripts/setup.pyThe installation script will automatically:
- Check Python version (requires >= 3.8)
- Install dependency packages (FastAPI, Uvicorn, FunASR, PyTorch)
- Download ASR models to
~/.cache/modelscope/hub/models/
Verify installation status:
bash
python scripts/setup.py --verify启动转录服务
Start Transcription Service
bash
python scripts/server.py服务默认运行在
http://127.0.0.1:8765智能特性:
- 自动启动:首次请求时自动加载模型
- 空闲关闭:默认 10 分钟无活动后自动关闭以节约资源
- 可配置超时:使用 参数自定义空闲超时时间(秒)
--idle-timeout
服务生命周期:
- 启动后进入空闲监控状态
- 接收到请求时自动加载模型并执行转录
- 每次请求都会重置空闲计时器
- 连续 10 分钟无请求时自动关闭
- 下次请求时重新启动
重要提示:
- ⚠️ 请勿手动关闭服务 - 转录完成后让服务继续运行,它会自动在 10 分钟无活动后关闭
- 这样可以连续转录多个文件,无需重复启动服务
- 如需立即关闭服务,按 或等待 10 分钟空闲超时
Ctrl+C
示例:自定义 30 分钟空闲超时
bash
python scripts/server.py --idle-timeout 1800bash
python scripts/server.pyThe service runs on by default
http://127.0.0.1:8765Smart Features:
- Auto-start: Automatically loads models on first request
- Idle Shutdown: Automatically shuts down after 10 minutes of inactivity by default to save resources
- Configurable Timeout: Use the parameter to customize idle timeout (in seconds)
--idle-timeout
Service Lifecycle:
- Enters idle monitoring state after startup
- Automatically loads models and executes transcription when receiving a request
- Resets the idle timer for each request
- Automatically shuts down if no requests are received for 10 consecutive minutes
- Restarts on next request
Important Notes:
- ⚠️ Do not manually shut down the service - Leave the service running after transcription is completed, it will automatically shut down after 10 minutes of inactivity
- This allows continuous transcription of multiple files without restarting the service repeatedly
- To shut down the service immediately, press or wait for the 10-minute idle timeout
Ctrl+C
Example: Customize 30-minute idle timeout
bash
python scripts/server.py --idle-timeout 1800执行转录
Execute Transcription
使用客户端脚本转录文件:
bash
undefinedUse the client script to transcribe files:
bash
undefined转录单个文件
Transcribe a single file
python scripts/transcribe.py /path/to/audio.mp3
python scripts/transcribe.py /path/to/audio.mp3
指定输出路径
Specify output path
python scripts/transcribe.py /path/to/video.mp4 -o transcript.md
python scripts/transcribe.py /path/to/video.mp4 -o transcript.md
启用说话人分离
Enable speaker diarization
python scripts/transcribe.py /path/to/meeting.m4a --diarize
python scripts/transcribe.py /path/to/meeting.m4a --diarize
批量转录目录
Batch transcribe a directory
python scripts/transcribe.py /path/to/media_folder/
undefinedpython scripts/transcribe.py /path/to/media_folder/
undefinedAI 智能总结(Claude Code 环境)
AI Intelligent Summary (Claude Code Environment)
转录完成后,可以生成 AI 智能总结,充分利用 Claude Code 的原生 AI 能力。
工作流程:
- 执行转录后,脚本会自动准备总结提示词
- 将提示词发送给 Claude AI 生成结构化总结
- 将 Claude 返回的 JSON 结果粘贴回脚本
- 自动将总结注入到 Markdown 文件
使用方法:
bash
undefinedAfter transcription, you can generate an AI intelligent summary, making full use of Claude Code's native AI capabilities.
Workflow:
- After transcription, the script will automatically prepare summary prompts
- Send the prompts to Claude AI to generate structured summaries
- Paste the JSON result returned by Claude back into the script
- Automatically inject the summary into the Markdown file
Usage:
bash
undefined转录单个文件(会自动提示是否生成总结)
Transcribe a single file (will automatically prompt whether to generate a summary)
python scripts/transcribe.py /path/to/audio.mp3
python scripts/transcribe.py /path/to/audio.mp3
启用说话人分离并生成总结
Enable speaker diarization and generate summary
python scripts/transcribe.py /path/to/meeting.m4a --diarize --summary
**总结内容结构:**
- **全文总结** - 400+ 字,包含背景、问题、关键事实
- **发言人总结** - 每个发言人的观点、态度和贡献
- **重点内容** - 6-10 条核心要点
- **关键词** - 5-8 个关键术语
**提示词特点:**
- 专门针对中文口语化对话优化
- 保留发言人上下文和对话流程
- 结构化 JSON 输出便于解析和格式化
详细文档请查看:<references/api-reference.md>python scripts/transcribe.py /path/to/meeting.m4a --diarize --summary
**Summary Content Structure:**
- **Full Text Summary** - Over 400 words, including background, issues, and key facts
- **Speaker Summary** - Each speaker's viewpoints, attitudes, and contributions
- **Key Content** - 6-10 core points
- **Keywords** - 5-8 key terms
**Prompt Features:**
- Optimized specifically for Chinese colloquial conversations
- Retains speaker context and dialogue flow
- Structured JSON output for easy parsing and formatting
For detailed documentation, please refer to: <references/api-reference.md>通过 HTTP API 调用
Call via HTTP API
检查服务状态:
bash
curl http://127.0.0.1:8765/health使用 curl 直接调用 API:
bash
curl -X POST http://127.0.0.1:8765/transcribe \
-H "Content-Type: application/json" \
-d '{"file_path": "/path/to/audio.mp3"}'API 文档(Swagger UI):
FastAPI 自动生成交互式 API 文档,访问:http://127.0.0.1:8765/docs
可在此页面中:
- 查看所有 API 端点
- 在线测试 API(不需要 curl)
- 查看请求/响应格式
- 查看详细参数说明
响应示例(健康检查):
json
{
"status": "ok",
"service": "FunASR Transcribe",
"uptime": 300,
"idle_time": 120
}返回字段说明:
- :服务运行时间(秒)
uptime - :当前空闲时间(秒)
idle_time
Check Service Status:
bash
curl http://127.0.0.1:8765/healthCall the API directly using curl:
bash
curl -X POST http://127.0.0.1:8765/transcribe \
-H "Content-Type: application/json" \
-d '{"file_path": "/path/to/audio.mp3"}'API Documentation (Swagger UI):
FastAPI automatically generates interactive API documentation, visit: http://127.0.0.1:8765/docs
On this page, you can:
- View all API endpoints
- Test APIs online (no curl required)
- View request/response formats
- View detailed parameter descriptions
Response Example (Health Check):
json
{
"status": "ok",
"service": "FunASR Transcribe",
"uptime": 300,
"idle_time": 120
}Return field descriptions:
- : Service running time (in seconds)
uptime - : Current idle time (in seconds)
idle_time
完整 API 文档
Complete API Documentation
详细的 API 参考文档请查看:<references/api-reference.md>
包含:
- 所有 API 端点的完整规范
- 请求/响应格式详解
- 参数说明和示例
- 完整的 curl 命令示例
For detailed API reference documentation, please refer to: <references/api-reference.md>
Including:
- Complete specifications for all API endpoints
- Detailed explanations of request/response formats
- Parameter descriptions and examples
- Complete curl command examples
脚本说明
Script Description
| 脚本 | 用途 |
|---|---|
| 一键安装依赖和下载模型 |
| 启动 HTTP API 服务 |
| 命令行客户端 |
| Script | Purpose |
|---|---|
| One-click installation of dependencies and model download |
| Start HTTP API service |
| Command-line client |
配置文件
Configuration Files
| 文件 | 说明 |
|---|---|
| ASR 模型配置清单 |
| Python 依赖清单 |
| File | Description |
|---|---|
| ASR model configuration list |
| Python dependency list |
输出格式
Output Format
转录结果保存为 Markdown 文件,包含:
- 标题 - 文件名(无转录时间戳)
- 转录内容 - 格式:换行
发言人N HH:MM:SS内容 - AI 摘要(可选)- 包含全文总结、发言人总结、重点内容、关键词
示例格式:
markdown
undefinedTranscription results are saved as Markdown files, including:
- Title - File name (without transcription timestamp)
- Transcription Content - Format: followed by
SpeakerN HH:MM:SSon a new linecontent - AI Summary (Optional) - Includes full text summary, speaker summary, key content, and keywords
Example Format:
markdown
undefined转录:filename.mp4
Transcription: filename.mp4
转录内容
Transcription Content
发言人1 00:00:01
这是第一句话的内容。
发言人2 00:00:05
这是第二句话的内容。
undefinedSpeaker1 00:00:01
This is the content of the first sentence.
Speaker2 00:00:05
This is the content of the second sentence.
undefined模型信息
Model Information
模型存储在 ModelScope 默认缓存目录 :
~/.cache/modelscope/hub/models/- ASR 主模型 (Paraformer) - 867MB
- VAD 模型 - 4MB
- 标点模型 - 283MB
- 说话人分离模型 - 28MB
Models are stored in the ModelScope default cache directory :
~/.cache/modelscope/hub/models/- ASR Main Model (Paraformer) - 867MB
- VAD Model - 4MB
- Punctuation Model - 283MB
- Speaker Diarization Model - 28MB
故障排除
Troubleshooting
服务启动失败时,运行验证命令检查安装状态:
bash
python scripts/setup.py --verify重新下载模型:
bash
python scripts/setup.py --skip-depsIf the service fails to start, run the verification command to check the installation status:
bash
python scripts/setup.py --verifyRe-download models:
bash
python scripts/setup.py --skip-deps