Loading...
Loading...
ElevenLabs speech-to-text with Scribe models and forced alignment via inference.sh CLI. Models: Scribe v1/v2 (98%+ accuracy, 90+ languages). Capabilities: transcription, speaker diarization, audio event tagging, word-level timestamps, forced alignment, subtitle generation. Use for: meeting transcription, subtitles, podcast transcripts, lip-sync timing, karaoke. Triggers: elevenlabs stt, elevenlabs transcription, scribe, elevenlabs speech to text, forced alignment, word alignment, subtitle timing, diarization, speaker identification, audio event detection, eleven labs transcribe
npx skill4agent add tool-belt/skills elevenlabs-stt
Requires inference.sh CLI (). Install instructionsinfsh
infsh login
# Transcribe audio
infsh app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'| Model | ID | Best For |
|---|---|---|
| Scribe v2 | | Latest, highest accuracy (default) |
| Scribe v1 | | Stable, proven |
infsh app run elevenlabs/stt --input '{"audio": "https://meeting-recording.mp3"}'infsh app run elevenlabs/stt --input '{
"audio": "https://meeting.mp3",
"diarize": true
}'infsh app run elevenlabs/stt --input '{
"audio": "https://podcast.mp3",
"tag_audio_events": true
}'infsh app run elevenlabs/stt --input '{
"audio": "https://spanish-audio.mp3",
"language_code": "spa"
}'infsh app run elevenlabs/stt --input '{
"audio": "https://conference.mp3",
"model": "scribe_v2",
"diarize": true,
"tag_audio_events": true,
"language_code": "eng"
}'infsh app run elevenlabs/forced-alignment --input '{
"audio": "https://narration.mp3",
"text": "This is the exact text spoken in the audio file."
}'{
"words": [
{"text": "This", "start": 0.0, "end": 0.3},
{"text": "is", "start": 0.35, "end": 0.5},
{"text": "the", "start": 0.55, "end": 0.65}
],
"text": "This is the exact text spoken in the audio file."
}# 1. Transcribe video audio
infsh app run elevenlabs/stt --input '{
"audio": "https://video.mp4",
"diarize": true
}' > transcript.json
# 2. Use transcript for captions
infsh app run infsh/caption-videos --input '{
"video_url": "https://video.mp4",
"captions": "<transcript-from-step-1>"
}'language_code# ElevenLabs TTS (reverse direction)
npx skills add inference-sh/skills@elevenlabs-tts
# ElevenLabs dubbing (translate audio)
npx skills add inference-sh/skills@elevenlabs-dubbing
# Other STT models (Whisper)
npx skills add inference-sh/skills@speech-to-text
# Full platform skill (all 250+ apps)
npx skills add inference-sh/skills@infsh-cliinfsh app list --category audio