elevenlabs-stt
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseElevenLabs Speech-to-Text
ElevenLabs 语音转文本
High-accuracy transcription with Scribe models via inference.sh CLI.

通过inference.sh CLI,借助Scribe模型实现高精度转录。

Quick Start
快速开始
Requires inference.sh CLI (). Install instructionsbelt
bash
belt login需要inference.sh CLI()。安装说明belt
bash
belt loginTranscribe audio
转录音频
belt app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'
undefinedbelt app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'
undefinedAvailable Models
可用模型
| Model | ID | Best For |
|---|---|---|
| Scribe v2 | | Latest, highest accuracy (default) |
| Scribe v1 | | Stable, proven |
- 98%+ transcription accuracy
- 90+ languages with auto-detection
| 模型 | ID | 适用场景 |
|---|---|---|
| Scribe v2 | | 最新版本,准确率最高(默认) |
| Scribe v1 | | 稳定可靠,经过验证 |
- 转录准确率达98%+
- 支持90+种语言,可自动检测
Examples
示例
Basic Transcription
基础转录
bash
belt app run elevenlabs/stt --input '{"audio": "https://meeting-recording.mp3"}'bash
belt app run elevenlabs/stt --input '{"audio": "https://meeting-recording.mp3"}'With Speaker Identification
带说话人识别
bash
belt app run elevenlabs/stt --input '{
"audio": "https://meeting.mp3",
"diarize": true
}'bash
belt app run elevenlabs/stt --input '{
"audio": "https://meeting.mp3",
"diarize": true
}'Audio Event Tagging
音频事件标记
Detect laughter, applause, music, and other non-speech events:
bash
belt app run elevenlabs/stt --input '{
"audio": "https://podcast.mp3",
"tag_audio_events": true
}'检测笑声、掌声、音乐及其他非语音事件:
bash
belt app run elevenlabs/stt --input '{
"audio": "https://podcast.mp3",
"tag_audio_events": true
}'Specify Language
指定语言
bash
belt app run elevenlabs/stt --input '{
"audio": "https://spanish-audio.mp3",
"language_code": "spa"
}'bash
belt app run elevenlabs/stt --input '{
"audio": "https://spanish-audio.mp3",
"language_code": "spa"
}'Full Options
完整选项
bash
belt app run elevenlabs/stt --input '{
"audio": "https://conference.mp3",
"model": "scribe_v2",
"diarize": true,
"tag_audio_events": true,
"language_code": "eng"
}'bash
belt app run elevenlabs/stt --input '{
"audio": "https://conference.mp3",
"model": "scribe_v2",
"diarize": true,
"tag_audio_events": true,
"language_code": "eng"
}'Forced Alignment
强制对齐
Get precise word-level and character-level timestamps by aligning known text to audio. Useful for subtitles, lip-sync, and karaoke.
bash
belt app run elevenlabs/forced-alignment --input '{
"audio": "https://narration.mp3",
"text": "This is the exact text spoken in the audio file."
}'通过将已知文本与音频对齐,获取精确的词级和字符级时间戳。适用于字幕制作、唇同步、卡拉OK等场景。
bash
belt app run elevenlabs/forced-alignment --input '{
"audio": "https://narration.mp3",
"text": "This is the exact text spoken in the audio file."
}'Output Format
输出格式
json
{
"words": [
{"text": "This", "start": 0.0, "end": 0.3},
{"text": "is", "start": 0.35, "end": 0.5},
{"text": "the", "start": 0.55, "end": 0.65}
],
"text": "This is the exact text spoken in the audio file."
}json
{
"words": [
{"text": "This", "start": 0.0, "end": 0.3},
{"text": "is", "start": 0.35, "end": 0.5},
{"text": "the", "start": 0.55, "end": 0.65}
],
"text": "This is the exact text spoken in the audio file."
}Forced Alignment Use Cases
强制对齐适用场景
- Subtitles: Precise timing for video captions
- Lip-sync: Align audio to animated characters
- Karaoke: Word-by-word timing for lyrics
- Accessibility: Synchronized transcripts
- 字幕制作:为视频字幕提供精准计时
- 唇同步:将音频与动画角色对齐
- 卡拉OK:歌词逐词计时
- 无障碍服务:同步文稿
Workflow: Video Subtitles
工作流:视频字幕制作
bash
undefinedbash
undefined1. Transcribe video audio
1. 转录视频音频
belt app run elevenlabs/stt --input '{
"audio": "https://video.mp4",
"diarize": true
}' > transcript.json
belt app run elevenlabs/stt --input '{
"audio": "https://video.mp4",
"diarize": true
}' > transcript.json
2. Use transcript for captions
2. 用转录结果生成字幕
belt app run infsh/caption-videos --input '{
"video_url": "https://video.mp4",
"captions": "<transcript-from-step-1>"
}'
undefinedbelt app run infsh/caption-videos --input '{
"video_url": "https://video.mp4",
"captions": "<transcript-from-step-1>"
}'
undefinedSupported Languages
支持语言
90+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Turkish, Dutch, Swedish, and many more. Leave empty for automatic detection.
language_code支持90+种语言,包括:英语、西班牙语、法语、德语、意大利语、葡萄牙语、中文、日语、韩语、阿拉伯语、印地语、俄语、土耳其语、荷兰语、瑞典语等。留空即可自动检测语言。
language_codeUse Cases
适用场景
- Meetings: Transcribe recordings with speaker identification
- Podcasts: Generate transcripts with audio event tags
- Subtitles: Create timed captions for videos
- Research: Interview transcription with diarization
- Accessibility: Make audio content searchable and accessible
- Lip-sync: Forced alignment for animation timing
- 会议:转录录音并识别说话人
- 播客:生成带音频事件标记的文稿
- 字幕:为视频制作带时间戳的字幕
- 研究:转录访谈并进行说话人分离
- 无障碍服务:让音频内容可搜索、易访问
- 唇同步:强制对齐实现动画计时
Related Skills
相关技能
bash
undefinedbash
undefinedElevenLabs TTS (reverse direction)
ElevenLabs 文本转语音(反向功能)
npx skills add inference-sh/skills@elevenlabs-tts
npx skills add inference-sh/skills@elevenlabs-tts
ElevenLabs dubbing (translate audio)
ElevenLabs 配音(音频翻译)
npx skills add inference-sh/skills@elevenlabs-dubbing
npx skills add inference-sh/skills@elevenlabs-dubbing
Other STT models (Whisper)
其他语音转文本模型(Whisper)
npx skills add inference-sh/skills@speech-to-text
npx skills add inference-sh/skills@speech-to-text
Full platform skill (all 250+ apps)
全平台技能(包含250+应用)
npx skills add inference-sh/skills@infsh-cli
Browse all audio apps: `belt app list --category audio`npx skills add inference-sh/skills@infsh-cli
浏览所有音频应用:`belt app list --category audio`