ez-stt

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ez-stt - Local Speech-to-Text

ez-stt - 本地语音转文本

Unified local speech-to-text using ONNX Runtime with int8 quantization. Choose your backend:
  • Parakeet (default): Best accuracy for English, correctly captures names and filler words
  • Whisper: Fastest inference, supports 99 languages
Requires
ffmpeg
installed.
基于ONNX Runtime并采用int8量化的统一本地语音转文本工具。您可以选择以下后端:
  • Parakeet(默认):英文识别准确率最高,能准确捕捉姓名和填充词
  • Whisper:推理速度最快,支持99种语言
需要预先安装
ffmpeg

Usage

使用方法

bash
undefined
bash
undefined

Default: Parakeet v2 (best English accuracy)

默认:Parakeet v2(英文识别准确率最高)

scripts/stt.py audio.ogg
scripts/stt.py audio.ogg

Explicit backend selection

显式选择后端

scripts/stt.py audio.ogg -b whisper scripts/stt.py audio.ogg -b parakeet -m v3
scripts/stt.py audio.ogg -b whisper scripts/stt.py audio.ogg -b parakeet -m v3

Quiet mode (suppress progress)

静默模式(隐藏进度)

scripts/stt.py audio.ogg --quiet
undefined
scripts/stt.py audio.ogg --quiet
undefined

Options

选项

  • -b/--backend
    :
    parakeet
    (default),
    whisper
  • -m/--model
    : Model variant (see below)
  • --no-int8
    : Disable int8 quantization
  • -q/--quiet
    : Suppress progress
  • --room-id
    : Matrix room ID for direct message
  • -b/--backend
    :
    parakeet
    (默认)、
    whisper
  • -m/--model
    : 模型变体(详见下文)
  • --no-int8
    : 禁用int8量化
  • -q/--quiet
    : 隐藏进度
  • --room-id
    : 用于直接消息的Matrix房间ID

Models

模型

Parakeet (default backend)

Parakeet(默认后端)

ModelDescription
v2 (default)English only, best accuracy
v3Multilingual
模型描述
v2(默认)仅支持英文,准确率最高
v3支持多语言

Whisper

Whisper

ModelDescription
tinyFastest, lower accuracy
base (default)Good balance
smallBetter accuracy
large-v3-turboBest quality, slower
模型描述
tiny速度最快,准确率较低
base(默认)平衡性能与准确率
small准确率更高
large-v3-turbo质量最佳,速度较慢

Benchmark (24s audio)

基准测试(24秒音频)

Backend/ModelTimeRTFNotes
Whisper Base int80.43s0.018xFastest
Parakeet v2 int80.60s0.025xBest accuracy
Parakeet v3 int80.63s0.026xMultilingual
后端/模型耗时RTF说明
Whisper Base int80.43秒0.018x速度最快
Parakeet v2 int80.60秒0.025x准确率最高
Parakeet v3 int80.63秒0.026x支持多语言

OpenClaw

OpenClaw

See OPENCLAW.md for OpenClaw-specific setup and
openclaw.json
configuration.
请查看OPENCLAW.md了解OpenClaw专属设置和
openclaw.json
配置方法。