funasr-transcribe
Original:🇨🇳 Chinese
Translated
4 scripts
Use local FunASR service to transcribe audio or video files into timestamped Markdown files, supporting common formats such as mp4, mov, mp3, wav, m4a, etc. This skill should be used when users need speech-to-text conversion, meeting minutes, video subtitles, or podcast transcription.
7installs
Sourcecat-xierluo/legal-skills
Added on
NPX Install
npx skill4agent add cat-xierluo/legal-skills funasr-transcribeTags
Translated version includes tags in frontmatterSKILL.md Content (Chinese)
View Translation Comparison →FunASR Speech-to-Text
This skill provides local speech recognition service to convert audio or video files into structured Markdown documents.
Feature Overview
- Supports multiple audio and video formats (mp4, mov, mp3, wav, m4a, flac, etc.)
- Automatically generates timestamps
- Supports speaker diarization
- Outputs in Markdown format for easy reading and editing
Usage Workflow
First-time Use: Install Dependencies and Download Models
Run the installation script to complete environment configuration:
bash
python scripts/setup.pyThe installation script will automatically:
- Check Python version (requires >= 3.8)
- Install dependency packages (FastAPI, Uvicorn, FunASR, PyTorch)
- Download ASR models to
~/.cache/modelscope/hub/models/
Verify installation status:
bash
python scripts/setup.py --verifyStart Transcription Service
bash
python scripts/server.pyThe service runs on by default
http://127.0.0.1:8765Smart Features:
- Auto-start: Automatically loads models on first request
- Idle Shutdown: Automatically shuts down after 10 minutes of inactivity by default to save resources
- Configurable Timeout: Use the parameter to customize idle timeout (in seconds)
--idle-timeout
Service Lifecycle:
- Enters idle monitoring state after startup
- Automatically loads models and executes transcription when receiving a request
- Resets the idle timer for each request
- Automatically shuts down if no requests are received for 10 consecutive minutes
- Restarts on next request
Important Notes:
- ⚠️ Do not manually shut down the service - Leave the service running after transcription is completed, it will automatically shut down after 10 minutes of inactivity
- This allows continuous transcription of multiple files without restarting the service repeatedly
- To shut down the service immediately, press or wait for the 10-minute idle timeout
Ctrl+C
Example: Customize 30-minute idle timeout
bash
python scripts/server.py --idle-timeout 1800Execute Transcription
Use the client script to transcribe files:
bash
# Transcribe a single file
python scripts/transcribe.py /path/to/audio.mp3
# Specify output path
python scripts/transcribe.py /path/to/video.mp4 -o transcript.md
# Enable speaker diarization
python scripts/transcribe.py /path/to/meeting.m4a --diarize
# Batch transcribe a directory
python scripts/transcribe.py /path/to/media_folder/AI Intelligent Summary (Claude Code Environment)
After transcription, you can generate an AI intelligent summary, making full use of Claude Code's native AI capabilities.
Workflow:
- After transcription, the script will automatically prepare summary prompts
- Send the prompts to Claude AI to generate structured summaries
- Paste the JSON result returned by Claude back into the script
- Automatically inject the summary into the Markdown file
Usage:
bash
# Transcribe a single file (will automatically prompt whether to generate a summary)
python scripts/transcribe.py /path/to/audio.mp3
# Enable speaker diarization and generate summary
python scripts/transcribe.py /path/to/meeting.m4a --diarize --summarySummary Content Structure:
- Full Text Summary - Over 400 words, including background, issues, and key facts
- Speaker Summary - Each speaker's viewpoints, attitudes, and contributions
- Key Content - 6-10 core points
- Keywords - 5-8 key terms
Prompt Features:
- Optimized specifically for Chinese colloquial conversations
- Retains speaker context and dialogue flow
- Structured JSON output for easy parsing and formatting
For detailed documentation, please refer to: <references/api-reference.md>
Call via HTTP API
Check Service Status:
bash
curl http://127.0.0.1:8765/healthCall the API directly using curl:
bash
curl -X POST http://127.0.0.1:8765/transcribe \
-H "Content-Type: application/json" \
-d '{"file_path": "/path/to/audio.mp3"}'API Documentation (Swagger UI):
FastAPI automatically generates interactive API documentation, visit: http://127.0.0.1:8765/docs
On this page, you can:
- View all API endpoints
- Test APIs online (no curl required)
- View request/response formats
- View detailed parameter descriptions
Response Example (Health Check):
json
{
"status": "ok",
"service": "FunASR Transcribe",
"uptime": 300,
"idle_time": 120
}Return field descriptions:
- : Service running time (in seconds)
uptime - : Current idle time (in seconds)
idle_time
Complete API Documentation
For detailed API reference documentation, please refer to: <references/api-reference.md>
Including:
- Complete specifications for all API endpoints
- Detailed explanations of request/response formats
- Parameter descriptions and examples
- Complete curl command examples
Script Description
| Script | Purpose |
|---|---|
| One-click installation of dependencies and model download |
| Start HTTP API service |
| Command-line client |
Configuration Files
| File | Description |
|---|---|
| ASR model configuration list |
| Python dependency list |
Output Format
Transcription results are saved as Markdown files, including:
- Title - File name (without transcription timestamp)
- Transcription Content - Format: followed by
SpeakerN HH:MM:SSon a new linecontent - AI Summary (Optional) - Includes full text summary, speaker summary, key content, and keywords
Example Format:
markdown
# Transcription: filename.mp4
## Transcription Content
Speaker1 00:00:01
This is the content of the first sentence.
Speaker2 00:00:05
This is the content of the second sentence.Model Information
Models are stored in the ModelScope default cache directory :
~/.cache/modelscope/hub/models/- ASR Main Model (Paraformer) - 867MB
- VAD Model - 4MB
- Punctuation Model - 283MB
- Speaker Diarization Model - 28MB
Troubleshooting
If the service fails to start, run the verification command to check the installation status:
bash
python scripts/setup.py --verifyRe-download models:
bash
python scripts/setup.py --skip-deps