elevenlabs-stt

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ElevenLabs Speech-to-Text

ElevenLabs 语音转文本

High-accuracy transcription with Scribe models via inference.sh CLI.
ElevenLabs STT
通过inference.sh CLI,借助Scribe模型实现高精度转录。
ElevenLabs STT

Quick Start

快速开始

Requires inference.sh CLI (
infsh
). Install instructions
bash
infsh login
需要 inference.sh CLI(
infsh
)。安装说明
bash
infsh login

Transcribe audio

转录音频

infsh app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'
undefined
infsh app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'
undefined

Available Models

可用模型

ModelIDBest For
Scribe v2
scribe_v2
Latest, highest accuracy (default)
Scribe v1
scribe_v1
Stable, proven
  • 98%+ transcription accuracy
  • 90+ languages with auto-detection
模型ID最佳适用场景
Scribe v2
scribe_v2
最新版本,准确率最高(默认)
Scribe v1
scribe_v1
稳定可靠,经过验证
  • 转录准确率达98%以上
  • 支持90+种语言,可自动检测

Examples

示例

Basic Transcription

基础转录

bash
infsh app run elevenlabs/stt --input '{"audio": "https://meeting-recording.mp3"}'
bash
infsh app run elevenlabs/stt --input '{"audio": "https://meeting-recording.mp3"}'

With Speaker Identification

带说话人识别功能

bash
infsh app run elevenlabs/stt --input '{
  "audio": "https://meeting.mp3",
  "diarize": true
}'
bash
infsh app run elevenlabs/stt --input '{
  "audio": "https://meeting.mp3",
  "diarize": true
}'

Audio Event Tagging

音频事件标记

Detect laughter, applause, music, and other non-speech events:
bash
infsh app run elevenlabs/stt --input '{
  "audio": "https://podcast.mp3",
  "tag_audio_events": true
}'
检测笑声、掌声、音乐及其他非语音事件:
bash
infsh app run elevenlabs/stt --input '{
  "audio": "https://podcast.mp3",
  "tag_audio_events": true
}'

Specify Language

指定语言

bash
infsh app run elevenlabs/stt --input '{
  "audio": "https://spanish-audio.mp3",
  "language_code": "spa"
}'
bash
infsh app run elevenlabs/stt --input '{
  "audio": "https://spanish-audio.mp3",
  "language_code": "spa"
}'

Full Options

完整选项

bash
infsh app run elevenlabs/stt --input '{
  "audio": "https://conference.mp3",
  "model": "scribe_v2",
  "diarize": true,
  "tag_audio_events": true,
  "language_code": "eng"
}'
bash
infsh app run elevenlabs/stt --input '{
  "audio": "https://conference.mp3",
  "model": "scribe_v2",
  "diarize": true,
  "tag_audio_events": true,
  "language_code": "eng"
}'

Forced Alignment

强制对齐

Get precise word-level and character-level timestamps by aligning known text to audio. Useful for subtitles, lip-sync, and karaoke.
bash
infsh app run elevenlabs/forced-alignment --input '{
  "audio": "https://narration.mp3",
  "text": "This is the exact text spoken in the audio file."
}'
通过将已知文本与音频对齐,获取精确的词级和字符级时间戳,适用于字幕制作、唇形同步和卡拉OK场景。
bash
infsh app run elevenlabs/forced-alignment --input '{
  "audio": "https://narration.mp3",
  "text": "This is the exact text spoken in the audio file."
}'

Output Format

输出格式

json
{
  "words": [
    {"text": "This", "start": 0.0, "end": 0.3},
    {"text": "is", "start": 0.35, "end": 0.5},
    {"text": "the", "start": 0.55, "end": 0.65}
  ],
  "text": "This is the exact text spoken in the audio file."
}
json
{
  "words": [
    {"text": "This", "start": 0.0, "end": 0.3},
    {"text": "is", "start": 0.35, "end": 0.5},
    {"text": "the", "start": 0.55, "end": 0.65}
  ],
  "text": "This is the exact text spoken in the audio file."
}

Forced Alignment Use Cases

强制对齐适用场景

  • Subtitles: Precise timing for video captions
  • Lip-sync: Align audio to animated characters
  • Karaoke: Word-by-word timing for lyrics
  • Accessibility: Synchronized transcripts
  • 字幕制作:为视频字幕提供精准计时
  • 唇形同步:将音频与动画角色对齐
  • 卡拉OK:歌词逐词计时
  • 无障碍访问:同步文稿

Workflow: Video Subtitles

工作流程:视频字幕制作

bash
undefined
bash
undefined

1. Transcribe video audio

1. 转录视频音频

infsh app run elevenlabs/stt --input '{ "audio": "https://video.mp4", "diarize": true }' > transcript.json
infsh app run elevenlabs/stt --input '{ "audio": "https://video.mp4", "diarize": true }' > transcript.json

2. Use transcript for captions

2. 使用转录结果生成字幕

infsh app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'
undefined
infsh app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'
undefined

Supported Languages

支持语言

90+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Turkish, Dutch, Swedish, and many more. Leave
language_code
empty for automatic detection.
支持90+种语言,包括:英语、西班牙语、法语、德语、意大利语、葡萄牙语、中文、日语、韩语、阿拉伯语、印地语、俄语、土耳其语、荷兰语、瑞典语等。留空
language_code
即可自动检测语言。

Use Cases

适用场景

  • Meetings: Transcribe recordings with speaker identification
  • Podcasts: Generate transcripts with audio event tags
  • Subtitles: Create timed captions for videos
  • Research: Interview transcription with diarization
  • Accessibility: Make audio content searchable and accessible
  • Lip-sync: Forced alignment for animation timing
  • 会议:带说话人识别的录音转录
  • 播客:生成带音频事件标记的文稿
  • 字幕:为视频创建带计时的字幕
  • 研究:带说话人分离的访谈转录
  • 无障碍访问:让音频内容可搜索、易访问
  • 唇形同步:用于动画计时的强制对齐

Related Skills

相关技能

bash
undefined
bash
undefined

ElevenLabs TTS (reverse direction)

ElevenLabs 文本转语音(反向功能)

npx skills add inference-sh/skills@elevenlabs-tts
npx skills add inference-sh/skills@elevenlabs-tts

ElevenLabs dubbing (translate audio)

ElevenLabs 配音(音频翻译)

npx skills add inference-sh/skills@elevenlabs-dubbing
npx skills add inference-sh/skills@elevenlabs-dubbing

Other STT models (Whisper)

其他语音转文本模型(Whisper)

npx skills add inference-sh/skills@speech-to-text
npx skills add inference-sh/skills@speech-to-text

Full platform skill (all 250+ apps)

全平台技能(250+应用)

npx skills add inference-sh/skills@infsh-cli

Browse all audio apps: `infsh app list --category audio`
npx skills add inference-sh/skills@infsh-cli

浏览所有音频应用:`infsh app list --category audio`