Loading...
Loading...
Local speech-to-text with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).
npx skill4agent add araa47/ez-voice ez-sttffmpeg# Default: Parakeet v2 (best English accuracy)
scripts/stt.py audio.ogg
# Explicit backend selection
scripts/stt.py audio.ogg -b whisper
scripts/stt.py audio.ogg -b parakeet -m v3
# Quiet mode (suppress progress)
scripts/stt.py audio.ogg --quiet-b/--backendparakeetwhisper-m/--model--no-int8-q/--quiet--room-id| Model | Description |
|---|---|
| v2 (default) | English only, best accuracy |
| v3 | Multilingual |
| Model | Description |
|---|---|
| tiny | Fastest, lower accuracy |
| base (default) | Good balance |
| small | Better accuracy |
| large-v3-turbo | Best quality, slower |
| Backend/Model | Time | RTF | Notes |
|---|---|---|---|
| Whisper Base int8 | 0.43s | 0.018x | Fastest |
| Parakeet v2 int8 | 0.60s | 0.025x | Best accuracy |
| Parakeet v3 int8 | 0.63s | 0.026x | Multilingual |
openclaw.json