whisper

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Whisper Audio Transcription Skill

Whisper音频转录Skill

Transcribe audio files to text using OpenAI Whisper.

使用OpenAI Whisper将音频文件转录为文本。

Capabilities

功能特性

Transcribe audio files (MP3, WAV, M4A, FLAC, OGG, etc.) to text
Support for 90+ languages with auto-detection
Optional timestamp generation
Multiple model sizes (tiny/base/small/medium/large)
Output in plain text or JSON format

将音频文件（MP3、WAV、M4A、FLAC、OGG等）转录为文本
支持90+种语言，具备自动检测功能
可选择生成时间戳
提供多种模型尺寸（tiny/base/small/medium/large）
支持纯文本或JSON格式输出

Usage

使用方法

Basic Transcription

基础转录

bash

python3 scripts/transcribe.py <audio_file> <output_file>

bash

python3 scripts/transcribe.py <audio_file> <output_file>

With Options

带参数使用

bash

undefined

bash

undefined

Specify model size (default: base)

指定模型尺寸（默认：base）

python3 scripts/transcribe.py audio.mp3 transcript.txt --model medium

Specify language (improves accuracy)

指定语言（提升准确率）

python3 scripts/transcribe.py audio.mp3 transcript.txt --language zh

Include timestamps

包含时间戳

python3 scripts/transcribe.py audio.mp3 transcript.txt --timestamps

JSON output with metadata

带元数据的JSON输出

python3 scripts/transcribe.py audio.mp3 output.json --format json

undefined

python3 scripts/transcribe.py audio.mp3 output.json --format json

undefined

Parameters

参数说明

```
audio_file
```
(required): Path to input audio file
```
output_file
```
(required): Path to output text/JSON file
```
--model
```
: Whisper model size (tiny/base/small/medium/large, default: base)
```
--language
```
: Language code (e.g., en, zh, es, fr, auto for detection)
```
--timestamps
```
: Include word-level timestamps in output
```
--format
```
: Output format (text/json, default: text)

```
audio_file
```
（必填）：输入音频文件路径
```
output_file
```
（必填）：输出文本/JSON文件路径
```
--model
```
：Whisper模型尺寸（tiny/base/small/medium/large，默认：base）
```
--language
```
：语言代码（例如en、zh、es、fr，auto表示自动检测）
```
--timestamps
```
：在输出中包含单词级时间戳
```
--format
```
：输出格式（text/json，默认：text）

Model Sizes

模型尺寸

Model	Parameters	Speed	Accuracy	Memory
tiny	39M	~32x	Good	~1GB
base	74M	~16x	Better	~1GB
small	244M	~6x	Great	~2GB
medium	769M	~2x	Excellent	~5GB
large	1.5B	1x	Best	~10GB

模型	参数数量	速度	准确率	内存占用
tiny	39M	~32x	良好	~1GB
base	74M	~16x	较好	~1GB
small	244M	~6x	优秀	~2GB
medium	769M	~2x	极佳	~5GB
large	1.5B	1x	最佳	~10GB

Supported Audio Formats

支持的音频格式

MP3, WAV, M4A, FLAC, OGG, AAC, WMA, and more (via FFmpeg)

MP3、WAV、M4A、FLAC、OGG、AAC、WMA及更多（通过FFmpeg支持）

Dependencies

依赖项

Python 3.8+
openai-whisper
ffmpeg

Python 3.8+
openai-whisper
ffmpeg

Installation

安装步骤

bash

pip install openai-whisper
sudo apt-get install ffmpeg  # Ubuntu/Debian

bash

pip install openai-whisper
sudo apt-get install ffmpeg  # Ubuntu/Debian