whisper
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWhisper Audio Transcription Skill
Whisper音频转录Skill
Transcribe audio files to text using OpenAI Whisper.
使用OpenAI Whisper将音频文件转录为文本。
Capabilities
功能特性
- Transcribe audio files (MP3, WAV, M4A, FLAC, OGG, etc.) to text
- Support for 90+ languages with auto-detection
- Optional timestamp generation
- Multiple model sizes (tiny/base/small/medium/large)
- Output in plain text or JSON format
- 将音频文件(MP3、WAV、M4A、FLAC、OGG等)转录为文本
- 支持90+种语言,具备自动检测功能
- 可选择生成时间戳
- 提供多种模型尺寸(tiny/base/small/medium/large)
- 支持纯文本或JSON格式输出
Usage
使用方法
Basic Transcription
基础转录
bash
python3 scripts/transcribe.py <audio_file> <output_file>bash
python3 scripts/transcribe.py <audio_file> <output_file>With Options
带参数使用
bash
undefinedbash
undefinedSpecify model size (default: base)
指定模型尺寸(默认:base)
python3 scripts/transcribe.py audio.mp3 transcript.txt --model medium
python3 scripts/transcribe.py audio.mp3 transcript.txt --model medium
Specify language (improves accuracy)
指定语言(提升准确率)
python3 scripts/transcribe.py audio.mp3 transcript.txt --language zh
python3 scripts/transcribe.py audio.mp3 transcript.txt --language zh
Include timestamps
包含时间戳
python3 scripts/transcribe.py audio.mp3 transcript.txt --timestamps
python3 scripts/transcribe.py audio.mp3 transcript.txt --timestamps
JSON output with metadata
带元数据的JSON输出
python3 scripts/transcribe.py audio.mp3 output.json --format json
undefinedpython3 scripts/transcribe.py audio.mp3 output.json --format json
undefinedParameters
参数说明
- (required): Path to input audio file
audio_file - (required): Path to output text/JSON file
output_file - : Whisper model size (tiny/base/small/medium/large, default: base)
--model - : Language code (e.g., en, zh, es, fr, auto for detection)
--language - : Include word-level timestamps in output
--timestamps - : Output format (text/json, default: text)
--format
- (必填):输入音频文件路径
audio_file - (必填):输出文本/JSON文件路径
output_file - :Whisper模型尺寸(tiny/base/small/medium/large,默认:base)
--model - :语言代码(例如en、zh、es、fr,auto表示自动检测)
--language - :在输出中包含单词级时间戳
--timestamps - :输出格式(text/json,默认:text)
--format
Model Sizes
模型尺寸
| Model | Parameters | Speed | Accuracy | Memory |
|---|---|---|---|---|
| tiny | 39M | ~32x | Good | ~1GB |
| base | 74M | ~16x | Better | ~1GB |
| small | 244M | ~6x | Great | ~2GB |
| medium | 769M | ~2x | Excellent | ~5GB |
| large | 1.5B | 1x | Best | ~10GB |
| 模型 | 参数数量 | 速度 | 准确率 | 内存占用 |
|---|---|---|---|---|
| tiny | 39M | ~32x | 良好 | ~1GB |
| base | 74M | ~16x | 较好 | ~1GB |
| small | 244M | ~6x | 优秀 | ~2GB |
| medium | 769M | ~2x | 极佳 | ~5GB |
| large | 1.5B | 1x | 最佳 | ~10GB |
Supported Audio Formats
支持的音频格式
MP3, WAV, M4A, FLAC, OGG, AAC, WMA, and more (via FFmpeg)
MP3、WAV、M4A、FLAC、OGG、AAC、WMA及更多(通过FFmpeg支持)
Dependencies
依赖项
- Python 3.8+
- openai-whisper
- ffmpeg
- Python 3.8+
- openai-whisper
- ffmpeg
Installation
安装步骤
bash
pip install openai-whisper
sudo apt-get install ffmpeg # Ubuntu/Debianbash
pip install openai-whisper
sudo apt-get install ffmpeg # Ubuntu/Debian