elevenlabs-stt

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ElevenLabs Speech-to-Text

ElevenLabs 语音转文本

High-accuracy transcription with Scribe models via inference.sh CLI.

通过inference.sh CLI，借助Scribe模型实现高精度转录。

Quick Start

快速开始

Requires inference.sh CLI (
belt
). Install instructions

bash

belt login

需要inference.sh CLI（
belt
）。安装说明

bash

belt login

Transcribe audio

转录音频

belt app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'

undefined

belt app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'

undefined

Available Models

可用模型

Model	ID	Best For
Scribe v2	`scribe_v2`	Latest, highest accuracy (default)
Scribe v1	`scribe_v1`	Stable, proven

98%+ transcription accuracy
90+ languages with auto-detection

模型	ID	适用场景
Scribe v2	`scribe_v2`	最新版本，准确率最高（默认）
Scribe v1	`scribe_v1`	稳定可靠，经过验证

转录准确率达98%+
支持90+种语言，可自动检测

Examples

示例

Basic Transcription

基础转录

bash

belt app run elevenlabs/stt --input '{"audio": "https://meeting-recording.mp3"}'

bash

belt app run elevenlabs/stt --input '{"audio": "https://meeting-recording.mp3"}'

With Speaker Identification

带说话人识别

bash

belt app run elevenlabs/stt --input '{
  "audio": "https://meeting.mp3",
  "diarize": true
}'

bash

belt app run elevenlabs/stt --input '{
  "audio": "https://meeting.mp3",
  "diarize": true
}'

Audio Event Tagging

音频事件标记

Detect laughter, applause, music, and other non-speech events:

bash

belt app run elevenlabs/stt --input '{
  "audio": "https://podcast.mp3",
  "tag_audio_events": true
}'

检测笑声、掌声、音乐及其他非语音事件：

bash

belt app run elevenlabs/stt --input '{
  "audio": "https://podcast.mp3",
  "tag_audio_events": true
}'

Specify Language

指定语言

bash

belt app run elevenlabs/stt --input '{
  "audio": "https://spanish-audio.mp3",
  "language_code": "spa"
}'

bash

belt app run elevenlabs/stt --input '{
  "audio": "https://spanish-audio.mp3",
  "language_code": "spa"
}'

Full Options

完整选项

bash

belt app run elevenlabs/stt --input '{
  "audio": "https://conference.mp3",
  "model": "scribe_v2",
  "diarize": true,
  "tag_audio_events": true,
  "language_code": "eng"
}'

bash

belt app run elevenlabs/stt --input '{
  "audio": "https://conference.mp3",
  "model": "scribe_v2",
  "diarize": true,
  "tag_audio_events": true,
  "language_code": "eng"
}'

Forced Alignment

强制对齐

Get precise word-level and character-level timestamps by aligning known text to audio. Useful for subtitles, lip-sync, and karaoke.

bash

belt app run elevenlabs/forced-alignment --input '{
  "audio": "https://narration.mp3",
  "text": "This is the exact text spoken in the audio file."
}'

通过将已知文本与音频对齐，获取精确的词级和字符级时间戳。适用于字幕制作、唇同步、卡拉OK等场景。

bash

belt app run elevenlabs/forced-alignment --input '{
  "audio": "https://narration.mp3",
  "text": "This is the exact text spoken in the audio file."
}'

Output Format

输出格式

json

{
  "words": [
    {"text": "This", "start": 0.0, "end": 0.3},
    {"text": "is", "start": 0.35, "end": 0.5},
    {"text": "the", "start": 0.55, "end": 0.65}
  ],
  "text": "This is the exact text spoken in the audio file."
}

json

{
  "words": [
    {"text": "This", "start": 0.0, "end": 0.3},
    {"text": "is", "start": 0.35, "end": 0.5},
    {"text": "the", "start": 0.55, "end": 0.65}
  ],
  "text": "This is the exact text spoken in the audio file."
}

Forced Alignment Use Cases

强制对齐适用场景

Subtitles: Precise timing for video captions
Lip-sync: Align audio to animated characters
Karaoke: Word-by-word timing for lyrics
Accessibility: Synchronized transcripts

字幕制作：为视频字幕提供精准计时
唇同步：将音频与动画角色对齐
卡拉OK：歌词逐词计时
无障碍服务：同步文稿

Workflow: Video Subtitles

工作流：视频字幕制作

bash

undefined

bash

undefined

1. Transcribe video audio

1. 转录视频音频

belt app run elevenlabs/stt --input '{ "audio": "https://video.mp4", "diarize": true }' > transcript.json

2. Use transcript for captions

2. 用转录结果生成字幕

belt app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'

undefined

belt app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'

undefined

Supported Languages

支持语言

90+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Turkish, Dutch, Swedish, and many more. Leave

language_code

empty for automatic detection.

支持90+种语言，包括：英语、西班牙语、法语、德语、意大利语、葡萄牙语、中文、日语、韩语、阿拉伯语、印地语、俄语、土耳其语、荷兰语、瑞典语等。留空

language_code

即可自动检测语言。

Use Cases

适用场景

Meetings: Transcribe recordings with speaker identification
Podcasts: Generate transcripts with audio event tags
Subtitles: Create timed captions for videos
Research: Interview transcription with diarization
Accessibility: Make audio content searchable and accessible
Lip-sync: Forced alignment for animation timing

会议：转录录音并识别说话人
播客：生成带音频事件标记的文稿
字幕：为视频制作带时间戳的字幕
研究：转录访谈并进行说话人分离
无障碍服务：让音频内容可搜索、易访问
唇同步：强制对齐实现动画计时

Related Skills

ElevenLabs TTS (reverse direction)

ElevenLabs 文本转语音（反向功能）

npx skills add inference-sh/skills@elevenlabs-tts

ElevenLabs dubbing (translate audio)

ElevenLabs 配音（音频翻译）

npx skills add inference-sh/skills@elevenlabs-dubbing

Other STT models (Whisper)

其他语音转文本模型（Whisper）

npx skills add inference-sh/skills@speech-to-text

Full platform skill (all 250+ apps)

全平台技能（包含250+应用）

npx skills add inference-sh/skills@infsh-cli


Browse all audio apps: `belt app list --category audio`

npx skills add inference-sh/skills@infsh-cli


浏览所有音频应用：`belt app list --category audio`

elevenlabs-stt

Original

Translation

ElevenLabs Speech-to-Text

ElevenLabs 语音转文本

Quick Start

快速开始

Transcribe audio

转录音频

Available Models

可用模型

Examples

示例

Basic Transcription

基础转录

With Speaker Identification

带说话人识别

Audio Event Tagging

音频事件标记

Specify Language

指定语言

Full Options

完整选项

Forced Alignment

强制对齐

Output Format

输出格式

Forced Alignment Use Cases

强制对齐适用场景

Workflow: Video Subtitles

工作流：视频字幕制作

1. Transcribe video audio

1. 转录视频音频

2. Use transcript for captions

2. 用转录结果生成字幕

Supported Languages

支持语言

Use Cases

适用场景

Related Skills

相关技能

ElevenLabs TTS (reverse direction)

ElevenLabs 文本转语音（反向功能）

ElevenLabs dubbing (translate audio)

ElevenLabs 配音（音频翻译）

Other STT models (Whisper)

其他语音转文本模型（Whisper）

Full platform skill (all 250+ apps)

全平台技能（包含250+应用）