funasr-transcribe

Original：🇨🇳 Chinese

Translated

4 scripts

Use local FunASR service to transcribe audio or video files into timestamped Markdown files, supporting common formats such as mp4, mov, mp3, wav, m4a, etc. This skill should be used when users need speech-to-text conversion, meeting minutes, video subtitles, or podcast transcription.

7installs

Sourcecat-xierluo/legal-skills

Added on2026-02-20

NPX Install

npx skill4agent add cat-xierluo/legal-skills funasr-transcribe

SKILL.md Content (Chinese)

View Translation Comparison →

FunASR Speech-to-Text

This skill provides local speech recognition service to convert audio or video files into structured Markdown documents.

Feature Overview

Supports multiple audio and video formats (mp4, mov, mp3, wav, m4a, flac, etc.)
Automatically generates timestamps
Supports speaker diarization
Outputs in Markdown format for easy reading and editing

Usage Workflow

First-time Use: Install Dependencies and Download Models

Run the installation script to complete environment configuration:

bash

python scripts/setup.py

The installation script will automatically:

Check Python version (requires >= 3.8)
Install dependency packages (FastAPI, Uvicorn, FunASR, PyTorch)
Download ASR models to
```
~/.cache/modelscope/hub/models/
```

Verify installation status:

bash

python scripts/setup.py --verify

Start Transcription Service

bash

python scripts/server.py

The service runs on

http://127.0.0.1:8765

by default

Smart Features:

Auto-start: Automatically loads models on first request
Idle Shutdown: Automatically shuts down after 10 minutes of inactivity by default to save resources
Configurable Timeout: Use the
```
--idle-timeout
```
parameter to customize idle timeout (in seconds)

Service Lifecycle:

Enters idle monitoring state after startup
Automatically loads models and executes transcription when receiving a request
Resets the idle timer for each request
Automatically shuts down if no requests are received for 10 consecutive minutes
Restarts on next request

Important Notes:

⚠️ Do not manually shut down the service - Leave the service running after transcription is completed, it will automatically shut down after 10 minutes of inactivity
This allows continuous transcription of multiple files without restarting the service repeatedly
To shut down the service immediately, press
```
Ctrl+C
```
or wait for the 10-minute idle timeout

Example: Customize 30-minute idle timeout

bash

python scripts/server.py --idle-timeout 1800

Execute Transcription

Use the client script to transcribe files:

bash

# Transcribe a single file
python scripts/transcribe.py /path/to/audio.mp3

# Specify output path
python scripts/transcribe.py /path/to/video.mp4 -o transcript.md

# Enable speaker diarization
python scripts/transcribe.py /path/to/meeting.m4a --diarize

# Batch transcribe a directory
python scripts/transcribe.py /path/to/media_folder/

AI Intelligent Summary (Claude Code Environment)

After transcription, you can generate an AI intelligent summary, making full use of Claude Code's native AI capabilities.

Workflow:

After transcription, the script will automatically prepare summary prompts
Send the prompts to Claude AI to generate structured summaries
Paste the JSON result returned by Claude back into the script
Automatically inject the summary into the Markdown file

Usage:

bash

# Transcribe a single file (will automatically prompt whether to generate a summary)
python scripts/transcribe.py /path/to/audio.mp3

# Enable speaker diarization and generate summary
python scripts/transcribe.py /path/to/meeting.m4a --diarize --summary

Summary Content Structure:

Full Text Summary - Over 400 words, including background, issues, and key facts
Speaker Summary - Each speaker's viewpoints, attitudes, and contributions
Key Content - 6-10 core points
Keywords - 5-8 key terms

Prompt Features:

Optimized specifically for Chinese colloquial conversations
Retains speaker context and dialogue flow
Structured JSON output for easy parsing and formatting

For detailed documentation, please refer to: <references/api-reference.md>

Call via HTTP API

Check Service Status:

bash

curl http://127.0.0.1:8765/health

Call the API directly using curl:

bash

curl -X POST http://127.0.0.1:8765/transcribe \
  -H "Content-Type: application/json" \
  -d '{"file_path": "/path/to/audio.mp3"}'

API Documentation (Swagger UI):

FastAPI automatically generates interactive API documentation, visit: http://127.0.0.1:8765/docs

On this page, you can:

View all API endpoints
Test APIs online (no curl required)
View request/response formats
View detailed parameter descriptions

Response Example (Health Check):

json

{
  "status": "ok",
  "service": "FunASR Transcribe",
  "uptime": 300,
  "idle_time": 120
}

Return field descriptions:

```
uptime
```
: Service running time (in seconds)
```
idle_time
```
: Current idle time (in seconds)

Complete API Documentation

For detailed API reference documentation, please refer to: <references/api-reference.md>

Including:

Complete specifications for all API endpoints
Detailed explanations of request/response formats
Parameter descriptions and examples
Complete curl command examples

Script Description

Script	Purpose
`scripts/setup.py`	One-click installation of dependencies and model download
`scripts/server.py`	Start HTTP API service
`scripts/transcribe.py`	Command-line client

Configuration Files

File	Description
`assets/models.json`	ASR model configuration list
`assets/requirements.txt`	Python dependency list

Output Format

Transcription results are saved as Markdown files, including:

Title - File name (without transcription timestamp)
Transcription Content - Format:
```
SpeakerN HH:MM:SS
```
followed by
```
content
```
on a new line
AI Summary (Optional) - Includes full text summary, speaker summary, key content, and keywords

Example Format:

markdown

# Transcription: filename.mp4

## Transcription Content

Speaker1 00:00:01
This is the content of the first sentence.

Speaker2 00:00:05
This is the content of the second sentence.

Model Information

Models are stored in the ModelScope default cache directory

~/.cache/modelscope/hub/models/

ASR Main Model (Paraformer) - 867MB
VAD Model - 4MB
Punctuation Model - 283MB
Speaker Diarization Model - 28MB

Troubleshooting

If the service fails to start, run the verification command to check the installation status:

bash

python scripts/setup.py --verify

Re-download models:

bash

python scripts/setup.py --skip-deps

FunASR Speech-to-Text

This skill provides local speech recognition service to convert audio or video files into structured Markdown documents.

Feature Overview

Supports multiple audio and video formats (mp4, mov, mp3, wav, m4a, flac, etc.)
Automatically generates timestamps
Supports speaker diarization
Outputs in Markdown format for easy reading and editing

Usage Workflow

First-time Use: Install Dependencies and Download Models

Run the installation script to complete environment configuration:

bash

python scripts/setup.py

The installation script will automatically:

Check Python version (requires >= 3.8)
Install dependency packages (FastAPI, Uvicorn, FunASR, PyTorch)
Download ASR models to
```
~/.cache/modelscope/hub/models/
```

Verify installation status:

bash

python scripts/setup.py --verify

Start Transcription Service

bash

python scripts/server.py

The service runs on

http://127.0.0.1:8765

by default

Smart Features:

Auto-start: Automatically loads models on first request
Idle Shutdown: Automatically shuts down after 10 minutes of inactivity by default to save resources
Configurable Timeout: Use the
```
--idle-timeout
```
parameter to customize idle timeout (in seconds)

Service Lifecycle:

Enters idle monitoring state after startup
Automatically loads models and executes transcription when receiving a request
Resets the idle timer for each request
Automatically shuts down if no requests are received for 10 consecutive minutes
Restarts on next request

Important Notes:

⚠️ Do not manually shut down the service - Leave the service running after transcription is completed, it will automatically shut down after 10 minutes of inactivity
This allows continuous transcription of multiple files without restarting the service repeatedly
To shut down the service immediately, press
```
Ctrl+C
```
or wait for the 10-minute idle timeout

Example: Customize 30-minute idle timeout

bash

python scripts/server.py --idle-timeout 1800

Execute Transcription

Use the client script to transcribe files:

bash

# Transcribe a single file
python scripts/transcribe.py /path/to/audio.mp3

# Specify output path
python scripts/transcribe.py /path/to/video.mp4 -o transcript.md

# Enable speaker diarization
python scripts/transcribe.py /path/to/meeting.m4a --diarize

# Batch transcribe a directory
python scripts/transcribe.py /path/to/media_folder/

AI Intelligent Summary (Claude Code Environment)

After transcription, you can generate an AI intelligent summary, making full use of Claude Code's native AI capabilities.

Workflow:

After transcription, the script will automatically prepare summary prompts
Send the prompts to Claude AI to generate structured summaries
Paste the JSON result returned by Claude back into the script
Automatically inject the summary into the Markdown file

Usage:

bash

# Transcribe a single file (will automatically prompt whether to generate a summary)
python scripts/transcribe.py /path/to/audio.mp3

# Enable speaker diarization and generate summary
python scripts/transcribe.py /path/to/meeting.m4a --diarize --summary

Summary Content Structure:

Full Text Summary - Over 400 words, including background, issues, and key facts
Speaker Summary - Each speaker's viewpoints, attitudes, and contributions
Key Content - 6-10 core points
Keywords - 5-8 key terms

Prompt Features:

Optimized specifically for Chinese colloquial conversations
Retains speaker context and dialogue flow
Structured JSON output for easy parsing and formatting

For detailed documentation, please refer to: <references/api-reference.md>

Call via HTTP API

Check Service Status:

bash

curl http://127.0.0.1:8765/health

Call the API directly using curl:

bash

curl -X POST http://127.0.0.1:8765/transcribe \
  -H "Content-Type: application/json" \
  -d '{"file_path": "/path/to/audio.mp3"}'

API Documentation (Swagger UI):

FastAPI automatically generates interactive API documentation, visit: http://127.0.0.1:8765/docs

On this page, you can:

View all API endpoints
Test APIs online (no curl required)
View request/response formats
View detailed parameter descriptions

Response Example (Health Check):

json

{
  "status": "ok",
  "service": "FunASR Transcribe",
  "uptime": 300,
  "idle_time": 120
}

Return field descriptions:

```
uptime
```
: Service running time (in seconds)
```
idle_time
```
: Current idle time (in seconds)

Complete API Documentation

For detailed API reference documentation, please refer to: <references/api-reference.md>

Including:

Complete specifications for all API endpoints
Detailed explanations of request/response formats
Parameter descriptions and examples
Complete curl command examples

Script Description

Script	Purpose
`scripts/setup.py`	One-click installation of dependencies and model download
`scripts/server.py`	Start HTTP API service
`scripts/transcribe.py`	Command-line client

Configuration Files

File	Description
`assets/models.json`	ASR model configuration list
`assets/requirements.txt`	Python dependency list

Output Format

Transcription results are saved as Markdown files, including:

Title - File name (without transcription timestamp)
Transcription Content - Format:
```
SpeakerN HH:MM:SS
```
followed by
```
content
```
on a new line
AI Summary (Optional) - Includes full text summary, speaker summary, key content, and keywords

Example Format:

markdown

# Transcription: filename.mp4

## Transcription Content

Speaker1 00:00:01
This is the content of the first sentence.

Speaker2 00:00:05
This is the content of the second sentence.

Model Information

Models are stored in the ModelScope default cache directory

~/.cache/modelscope/hub/models/

ASR Main Model (Paraformer) - 867MB
VAD Model - 4MB
Punctuation Model - 283MB
Speaker Diarization Model - 28MB

Troubleshooting

If the service fails to start, run the verification command to check the installation status:

bash

python scripts/setup.py --verify

Re-download models:

bash

python scripts/setup.py --skip-deps

funasr-transcribe

NPX Install

Tags

SKILL.md Content (Chinese)

FunASR Speech-to-Text

Feature Overview

Usage Workflow

First-time Use: Install Dependencies and Download Models

Start Transcription Service

Execute Transcription

AI Intelligent Summary (Claude Code Environment)

Call via HTTP API

Complete API Documentation

Script Description

Configuration Files

Output Format

Model Information

Troubleshooting

funasr-transcribe

NPX Install

Tags

SKILL.md Content (Chinese)

FunASR Speech-to-Text

Feature Overview

Usage Workflow

First-time Use: Install Dependencies and Download Models

Start Transcription Service

Execute Transcription

AI Intelligent Summary (Claude Code Environment)

Call via HTTP API

Complete API Documentation

Script Description

Configuration Files

Output Format

Model Information

Troubleshooting