elevenlabs-stt
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseElevenLabs Speech-to-Text
ElevenLabs 语音转文本
High-accuracy transcription with Scribe models via inference.sh CLI.

通过inference.sh CLI,借助Scribe模型实现高精度转录。

Quick Start
快速开始
Requires inference.sh CLI (). Install instructionsinfsh
bash
infsh login需要 inference.sh CLI()。安装说明infsh
bash
infsh loginTranscribe audio
转录音频
infsh app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'
undefinedinfsh app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'
undefinedAvailable Models
可用模型
| Model | ID | Best For |
|---|---|---|
| Scribe v2 | | Latest, highest accuracy (default) |
| Scribe v1 | | Stable, proven |
- 98%+ transcription accuracy
- 90+ languages with auto-detection
| 模型 | ID | 最佳适用场景 |
|---|---|---|
| Scribe v2 | | 最新版本,准确率最高(默认) |
| Scribe v1 | | 稳定可靠,经过验证 |
- 转录准确率达98%以上
- 支持90+种语言,可自动检测
Examples
示例
Basic Transcription
基础转录
bash
infsh app run elevenlabs/stt --input '{"audio": "https://meeting-recording.mp3"}'bash
infsh app run elevenlabs/stt --input '{"audio": "https://meeting-recording.mp3"}'With Speaker Identification
带说话人识别功能
bash
infsh app run elevenlabs/stt --input '{
"audio": "https://meeting.mp3",
"diarize": true
}'bash
infsh app run elevenlabs/stt --input '{
"audio": "https://meeting.mp3",
"diarize": true
}'Audio Event Tagging
音频事件标记
Detect laughter, applause, music, and other non-speech events:
bash
infsh app run elevenlabs/stt --input '{
"audio": "https://podcast.mp3",
"tag_audio_events": true
}'检测笑声、掌声、音乐及其他非语音事件:
bash
infsh app run elevenlabs/stt --input '{
"audio": "https://podcast.mp3",
"tag_audio_events": true
}'Specify Language
指定语言
bash
infsh app run elevenlabs/stt --input '{
"audio": "https://spanish-audio.mp3",
"language_code": "spa"
}'bash
infsh app run elevenlabs/stt --input '{
"audio": "https://spanish-audio.mp3",
"language_code": "spa"
}'Full Options
完整选项
bash
infsh app run elevenlabs/stt --input '{
"audio": "https://conference.mp3",
"model": "scribe_v2",
"diarize": true,
"tag_audio_events": true,
"language_code": "eng"
}'bash
infsh app run elevenlabs/stt --input '{
"audio": "https://conference.mp3",
"model": "scribe_v2",
"diarize": true,
"tag_audio_events": true,
"language_code": "eng"
}'Forced Alignment
强制对齐
Get precise word-level and character-level timestamps by aligning known text to audio. Useful for subtitles, lip-sync, and karaoke.
bash
infsh app run elevenlabs/forced-alignment --input '{
"audio": "https://narration.mp3",
"text": "This is the exact text spoken in the audio file."
}'通过将已知文本与音频对齐,获取精确的词级和字符级时间戳,适用于字幕制作、唇形同步和卡拉OK场景。
bash
infsh app run elevenlabs/forced-alignment --input '{
"audio": "https://narration.mp3",
"text": "This is the exact text spoken in the audio file."
}'Output Format
输出格式
json
{
"words": [
{"text": "This", "start": 0.0, "end": 0.3},
{"text": "is", "start": 0.35, "end": 0.5},
{"text": "the", "start": 0.55, "end": 0.65}
],
"text": "This is the exact text spoken in the audio file."
}json
{
"words": [
{"text": "This", "start": 0.0, "end": 0.3},
{"text": "is", "start": 0.35, "end": 0.5},
{"text": "the", "start": 0.55, "end": 0.65}
],
"text": "This is the exact text spoken in the audio file."
}Forced Alignment Use Cases
强制对齐适用场景
- Subtitles: Precise timing for video captions
- Lip-sync: Align audio to animated characters
- Karaoke: Word-by-word timing for lyrics
- Accessibility: Synchronized transcripts
- 字幕制作:为视频字幕提供精准计时
- 唇形同步:将音频与动画角色对齐
- 卡拉OK:歌词逐词计时
- 无障碍访问:同步文稿
Workflow: Video Subtitles
工作流程:视频字幕制作
bash
undefinedbash
undefined1. Transcribe video audio
1. 转录视频音频
infsh app run elevenlabs/stt --input '{
"audio": "https://video.mp4",
"diarize": true
}' > transcript.json
infsh app run elevenlabs/stt --input '{
"audio": "https://video.mp4",
"diarize": true
}' > transcript.json
2. Use transcript for captions
2. 使用转录结果生成字幕
infsh app run infsh/caption-videos --input '{
"video_url": "https://video.mp4",
"captions": "<transcript-from-step-1>"
}'
undefinedinfsh app run infsh/caption-videos --input '{
"video_url": "https://video.mp4",
"captions": "<transcript-from-step-1>"
}'
undefinedSupported Languages
支持语言
90+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Turkish, Dutch, Swedish, and many more. Leave empty for automatic detection.
language_code支持90+种语言,包括:英语、西班牙语、法语、德语、意大利语、葡萄牙语、中文、日语、韩语、阿拉伯语、印地语、俄语、土耳其语、荷兰语、瑞典语等。留空即可自动检测语言。
language_codeUse Cases
适用场景
- Meetings: Transcribe recordings with speaker identification
- Podcasts: Generate transcripts with audio event tags
- Subtitles: Create timed captions for videos
- Research: Interview transcription with diarization
- Accessibility: Make audio content searchable and accessible
- Lip-sync: Forced alignment for animation timing
- 会议:带说话人识别的录音转录
- 播客:生成带音频事件标记的文稿
- 字幕:为视频创建带计时的字幕
- 研究:带说话人分离的访谈转录
- 无障碍访问:让音频内容可搜索、易访问
- 唇形同步:用于动画计时的强制对齐
Related Skills
相关技能
bash
undefinedbash
undefinedElevenLabs TTS (reverse direction)
ElevenLabs 文本转语音(反向功能)
npx skills add inference-sh/skills@elevenlabs-tts
npx skills add inference-sh/skills@elevenlabs-tts
ElevenLabs dubbing (translate audio)
ElevenLabs 配音(音频翻译)
npx skills add inference-sh/skills@elevenlabs-dubbing
npx skills add inference-sh/skills@elevenlabs-dubbing
Other STT models (Whisper)
其他语音转文本模型(Whisper)
npx skills add inference-sh/skills@speech-to-text
npx skills add inference-sh/skills@speech-to-text
Full platform skill (all 250+ apps)
全平台技能(250+应用)
npx skills add inference-sh/skills@infsh-cli
Browse all audio apps: `infsh app list --category audio`npx skills add inference-sh/skills@infsh-cli
浏览所有音频应用:`infsh app list --category audio`