whisper

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Whisper Audio Transcription Skill

Whisper音频转录Skill

Transcribe audio files to text using OpenAI Whisper.
使用OpenAI Whisper将音频文件转录为文本。

Capabilities

功能特性

  • Transcribe audio files (MP3, WAV, M4A, FLAC, OGG, etc.) to text
  • Support for 90+ languages with auto-detection
  • Optional timestamp generation
  • Multiple model sizes (tiny/base/small/medium/large)
  • Output in plain text or JSON format
  • 将音频文件(MP3、WAV、M4A、FLAC、OGG等)转录为文本
  • 支持90+种语言,具备自动检测功能
  • 可选择生成时间戳
  • 提供多种模型尺寸(tiny/base/small/medium/large)
  • 支持纯文本或JSON格式输出

Usage

使用方法

Basic Transcription

基础转录

bash
python3 scripts/transcribe.py <audio_file> <output_file>
bash
python3 scripts/transcribe.py <audio_file> <output_file>

With Options

带参数使用

bash
undefined
bash
undefined

Specify model size (default: base)

指定模型尺寸(默认:base)

python3 scripts/transcribe.py audio.mp3 transcript.txt --model medium
python3 scripts/transcribe.py audio.mp3 transcript.txt --model medium

Specify language (improves accuracy)

指定语言(提升准确率)

python3 scripts/transcribe.py audio.mp3 transcript.txt --language zh
python3 scripts/transcribe.py audio.mp3 transcript.txt --language zh

Include timestamps

包含时间戳

python3 scripts/transcribe.py audio.mp3 transcript.txt --timestamps
python3 scripts/transcribe.py audio.mp3 transcript.txt --timestamps

JSON output with metadata

带元数据的JSON输出

python3 scripts/transcribe.py audio.mp3 output.json --format json
undefined
python3 scripts/transcribe.py audio.mp3 output.json --format json
undefined

Parameters

参数说明

  • audio_file
    (required): Path to input audio file
  • output_file
    (required): Path to output text/JSON file
  • --model
    : Whisper model size (tiny/base/small/medium/large, default: base)
  • --language
    : Language code (e.g., en, zh, es, fr, auto for detection)
  • --timestamps
    : Include word-level timestamps in output
  • --format
    : Output format (text/json, default: text)
  • audio_file
    (必填):输入音频文件路径
  • output_file
    (必填):输出文本/JSON文件路径
  • --model
    :Whisper模型尺寸(tiny/base/small/medium/large,默认:base)
  • --language
    :语言代码(例如en、zh、es、fr,auto表示自动检测)
  • --timestamps
    :在输出中包含单词级时间戳
  • --format
    :输出格式(text/json,默认:text)

Model Sizes

模型尺寸

ModelParametersSpeedAccuracyMemory
tiny39M~32xGood~1GB
base74M~16xBetter~1GB
small244M~6xGreat~2GB
medium769M~2xExcellent~5GB
large1.5B1xBest~10GB
模型参数数量速度准确率内存占用
tiny39M~32x良好~1GB
base74M~16x较好~1GB
small244M~6x优秀~2GB
medium769M~2x极佳~5GB
large1.5B1x最佳~10GB

Supported Audio Formats

支持的音频格式

MP3, WAV, M4A, FLAC, OGG, AAC, WMA, and more (via FFmpeg)
MP3、WAV、M4A、FLAC、OGG、AAC、WMA及更多(通过FFmpeg支持)

Dependencies

依赖项

  • Python 3.8+
  • openai-whisper
  • ffmpeg
  • Python 3.8+
  • openai-whisper
  • ffmpeg

Installation

安装步骤

bash
pip install openai-whisper
sudo apt-get install ffmpeg  # Ubuntu/Debian
bash
pip install openai-whisper
sudo apt-get install ffmpeg  # Ubuntu/Debian