funasr-transcribe

Original🇨🇳 Chinese
Translated
4 scripts

Use local FunASR service to transcribe audio or video files into timestamped Markdown files, supporting common formats such as mp4, mov, mp3, wav, m4a, etc. This skill should be used when users need speech-to-text conversion, meeting minutes, video subtitles, or podcast transcription.

7installs
Added on

NPX Install

npx skill4agent add cat-xierluo/legal-skills funasr-transcribe

SKILL.md Content (Chinese)

View Translation Comparison →

FunASR Speech-to-Text

This skill provides local speech recognition service to convert audio or video files into structured Markdown documents.

Feature Overview

  • Supports multiple audio and video formats (mp4, mov, mp3, wav, m4a, flac, etc.)
  • Automatically generates timestamps
  • Supports speaker diarization
  • Outputs in Markdown format for easy reading and editing

Usage Workflow

First-time Use: Install Dependencies and Download Models

Run the installation script to complete environment configuration:
bash
python scripts/setup.py
The installation script will automatically:
  1. Check Python version (requires >= 3.8)
  2. Install dependency packages (FastAPI, Uvicorn, FunASR, PyTorch)
  3. Download ASR models to
    ~/.cache/modelscope/hub/models/
Verify installation status:
bash
python scripts/setup.py --verify

Start Transcription Service

bash
python scripts/server.py
The service runs on
http://127.0.0.1:8765
by default
Smart Features:
  • Auto-start: Automatically loads models on first request
  • Idle Shutdown: Automatically shuts down after 10 minutes of inactivity by default to save resources
  • Configurable Timeout: Use the
    --idle-timeout
    parameter to customize idle timeout (in seconds)
Service Lifecycle:
  1. Enters idle monitoring state after startup
  2. Automatically loads models and executes transcription when receiving a request
  3. Resets the idle timer for each request
  4. Automatically shuts down if no requests are received for 10 consecutive minutes
  5. Restarts on next request
Important Notes:
  • ⚠️ Do not manually shut down the service - Leave the service running after transcription is completed, it will automatically shut down after 10 minutes of inactivity
  • This allows continuous transcription of multiple files without restarting the service repeatedly
  • To shut down the service immediately, press
    Ctrl+C
    or wait for the 10-minute idle timeout
Example: Customize 30-minute idle timeout
bash
python scripts/server.py --idle-timeout 1800

Execute Transcription

Use the client script to transcribe files:
bash
# Transcribe a single file
python scripts/transcribe.py /path/to/audio.mp3

# Specify output path
python scripts/transcribe.py /path/to/video.mp4 -o transcript.md

# Enable speaker diarization
python scripts/transcribe.py /path/to/meeting.m4a --diarize

# Batch transcribe a directory
python scripts/transcribe.py /path/to/media_folder/

AI Intelligent Summary (Claude Code Environment)

After transcription, you can generate an AI intelligent summary, making full use of Claude Code's native AI capabilities.
Workflow:
  1. After transcription, the script will automatically prepare summary prompts
  2. Send the prompts to Claude AI to generate structured summaries
  3. Paste the JSON result returned by Claude back into the script
  4. Automatically inject the summary into the Markdown file
Usage:
bash
# Transcribe a single file (will automatically prompt whether to generate a summary)
python scripts/transcribe.py /path/to/audio.mp3

# Enable speaker diarization and generate summary
python scripts/transcribe.py /path/to/meeting.m4a --diarize --summary
Summary Content Structure:
  • Full Text Summary - Over 400 words, including background, issues, and key facts
  • Speaker Summary - Each speaker's viewpoints, attitudes, and contributions
  • Key Content - 6-10 core points
  • Keywords - 5-8 key terms
Prompt Features:
  • Optimized specifically for Chinese colloquial conversations
  • Retains speaker context and dialogue flow
  • Structured JSON output for easy parsing and formatting
For detailed documentation, please refer to: <references/api-reference.md>

Call via HTTP API

Check Service Status:
bash
curl http://127.0.0.1:8765/health
Call the API directly using curl:
bash
curl -X POST http://127.0.0.1:8765/transcribe \
  -H "Content-Type: application/json" \
  -d '{"file_path": "/path/to/audio.mp3"}'
API Documentation (Swagger UI):
FastAPI automatically generates interactive API documentation, visit: http://127.0.0.1:8765/docs
On this page, you can:
  • View all API endpoints
  • Test APIs online (no curl required)
  • View request/response formats
  • View detailed parameter descriptions
Response Example (Health Check):
json
{
  "status": "ok",
  "service": "FunASR Transcribe",
  "uptime": 300,
  "idle_time": 120
}
Return field descriptions:
  • uptime
    : Service running time (in seconds)
  • idle_time
    : Current idle time (in seconds)

Complete API Documentation

For detailed API reference documentation, please refer to: <references/api-reference.md>
Including:
  • Complete specifications for all API endpoints
  • Detailed explanations of request/response formats
  • Parameter descriptions and examples
  • Complete curl command examples

Script Description

ScriptPurpose
scripts/setup.py
One-click installation of dependencies and model download
scripts/server.py
Start HTTP API service
scripts/transcribe.py
Command-line client

Configuration Files

FileDescription
assets/models.json
ASR model configuration list
assets/requirements.txt
Python dependency list

Output Format

Transcription results are saved as Markdown files, including:
  1. Title - File name (without transcription timestamp)
  2. Transcription Content - Format:
    SpeakerN HH:MM:SS
    followed by
    content
    on a new line
  3. AI Summary (Optional) - Includes full text summary, speaker summary, key content, and keywords
Example Format:
markdown
# Transcription: filename.mp4

## Transcription Content

Speaker1 00:00:01
This is the content of the first sentence.

Speaker2 00:00:05
This is the content of the second sentence.

Model Information

Models are stored in the ModelScope default cache directory
~/.cache/modelscope/hub/models/
:
  • ASR Main Model (Paraformer) - 867MB
  • VAD Model - 4MB
  • Punctuation Model - 283MB
  • Speaker Diarization Model - 28MB

Troubleshooting

If the service fails to start, run the verification command to check the installation status:
bash
python scripts/setup.py --verify
Re-download models:
bash
python scripts/setup.py --skip-deps