elevenlabs-stt

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ElevenLabs Speech-to-Text

ElevenLabs 语音转文本

High-accuracy transcription with Scribe models via inference.sh CLI.

通过inference.sh CLI，借助Scribe模型实现高精度转录。

Quick Start

快速开始

Requires inference.sh CLI (
infsh
). Install instructions

bash

infsh login

需要 inference.sh CLI（
infsh
）。安装说明

bash

infsh login

Transcribe audio

转录音频

infsh app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'

undefined

infsh app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'

undefined

Available Models

可用模型

Model	ID	Best For
Scribe v2	`scribe_v2`	Latest, highest accuracy (default)
Scribe v1	`scribe_v1`	Stable, proven

98%+ transcription accuracy
90+ languages with auto-detection

模型	ID	最佳适用场景
Scribe v2	`scribe_v2`	最新版本，准确率最高（默认）
Scribe v1	`scribe_v1`	稳定可靠，经过验证

转录准确率达98%以上
支持90+种语言，可自动检测

Examples

示例

Basic Transcription

基础转录

bash

infsh app run elevenlabs/stt --input '{"audio": "https://meeting-recording.mp3"}'

bash

infsh app run elevenlabs/stt --input '{"audio": "https://meeting-recording.mp3"}'

With Speaker Identification

带说话人识别功能

bash

infsh app run elevenlabs/stt --input '{
  "audio": "https://meeting.mp3",
  "diarize": true
}'

bash

infsh app run elevenlabs/stt --input '{
  "audio": "https://meeting.mp3",
  "diarize": true
}'

Audio Event Tagging

音频事件标记

Detect laughter, applause, music, and other non-speech events:

bash

infsh app run elevenlabs/stt --input '{
  "audio": "https://podcast.mp3",
  "tag_audio_events": true
}'

检测笑声、掌声、音乐及其他非语音事件：

bash

infsh app run elevenlabs/stt --input '{
  "audio": "https://podcast.mp3",
  "tag_audio_events": true
}'

Specify Language

指定语言

bash

infsh app run elevenlabs/stt --input '{
  "audio": "https://spanish-audio.mp3",
  "language_code": "spa"
}'

bash

infsh app run elevenlabs/stt --input '{
  "audio": "https://spanish-audio.mp3",
  "language_code": "spa"
}'

Full Options

完整选项

bash

infsh app run elevenlabs/stt --input '{
  "audio": "https://conference.mp3",
  "model": "scribe_v2",
  "diarize": true,
  "tag_audio_events": true,
  "language_code": "eng"
}'

bash

infsh app run elevenlabs/stt --input '{
  "audio": "https://conference.mp3",
  "model": "scribe_v2",
  "diarize": true,
  "tag_audio_events": true,
  "language_code": "eng"
}'

Forced Alignment

强制对齐

Get precise word-level and character-level timestamps by aligning known text to audio. Useful for subtitles, lip-sync, and karaoke.

bash

infsh app run elevenlabs/forced-alignment --input '{
  "audio": "https://narration.mp3",
  "text": "This is the exact text spoken in the audio file."
}'

通过将已知文本与音频对齐，获取精确的词级和字符级时间戳，适用于字幕制作、唇形同步和卡拉OK场景。

bash

infsh app run elevenlabs/forced-alignment --input '{
  "audio": "https://narration.mp3",
  "text": "This is the exact text spoken in the audio file."
}'

Output Format

输出格式

json

{
  "words": [
    {"text": "This", "start": 0.0, "end": 0.3},
    {"text": "is", "start": 0.35, "end": 0.5},
    {"text": "the", "start": 0.55, "end": 0.65}
  ],
  "text": "This is the exact text spoken in the audio file."
}

json

{
  "words": [
    {"text": "This", "start": 0.0, "end": 0.3},
    {"text": "is", "start": 0.35, "end": 0.5},
    {"text": "the", "start": 0.55, "end": 0.65}
  ],
  "text": "This is the exact text spoken in the audio file."
}

Forced Alignment Use Cases

强制对齐适用场景

Subtitles: Precise timing for video captions
Lip-sync: Align audio to animated characters
Karaoke: Word-by-word timing for lyrics
Accessibility: Synchronized transcripts

字幕制作：为视频字幕提供精准计时
唇形同步：将音频与动画角色对齐
卡拉OK：歌词逐词计时
无障碍访问：同步文稿

Workflow: Video Subtitles

工作流程：视频字幕制作

bash

undefined

bash

undefined

1. Transcribe video audio

1. 转录视频音频

infsh app run elevenlabs/stt --input '{ "audio": "https://video.mp4", "diarize": true }' > transcript.json

2. Use transcript for captions

2. 使用转录结果生成字幕

infsh app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'

undefined

infsh app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'

undefined

Supported Languages

支持语言

90+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Turkish, Dutch, Swedish, and many more. Leave

language_code

empty for automatic detection.

支持90+种语言，包括：英语、西班牙语、法语、德语、意大利语、葡萄牙语、中文、日语、韩语、阿拉伯语、印地语、俄语、土耳其语、荷兰语、瑞典语等。留空

language_code

即可自动检测语言。

Use Cases

适用场景

Meetings: Transcribe recordings with speaker identification
Podcasts: Generate transcripts with audio event tags
Subtitles: Create timed captions for videos
Research: Interview transcription with diarization
Accessibility: Make audio content searchable and accessible
Lip-sync: Forced alignment for animation timing

会议：带说话人识别的录音转录
播客：生成带音频事件标记的文稿
字幕：为视频创建带计时的字幕
研究：带说话人分离的访谈转录
无障碍访问：让音频内容可搜索、易访问
唇形同步：用于动画计时的强制对齐

Related Skills

ElevenLabs TTS (reverse direction)

ElevenLabs 文本转语音（反向功能）

npx skills add inference-sh/skills@elevenlabs-tts

ElevenLabs dubbing (translate audio)

ElevenLabs 配音（音频翻译）

npx skills add inference-sh/skills@elevenlabs-dubbing

Other STT models (Whisper)

其他语音转文本模型（Whisper）

npx skills add inference-sh/skills@speech-to-text

Full platform skill (all 250+ apps)

全平台技能（250+应用）

npx skills add inference-sh/skills@infsh-cli


Browse all audio apps: `infsh app list --category audio`

npx skills add inference-sh/skills@infsh-cli


浏览所有音频应用：`infsh app list --category audio`

elevenlabs-stt

Original

Translation

ElevenLabs Speech-to-Text

ElevenLabs 语音转文本

Quick Start

快速开始

Transcribe audio

转录音频

Available Models

可用模型

Examples

示例

Basic Transcription

基础转录

With Speaker Identification

带说话人识别功能

Audio Event Tagging

音频事件标记

Specify Language

指定语言

Full Options

完整选项

Forced Alignment

强制对齐

Output Format

输出格式

Forced Alignment Use Cases

强制对齐适用场景

Workflow: Video Subtitles

工作流程：视频字幕制作

1. Transcribe video audio

1. 转录视频音频

2. Use transcript for captions

2. 使用转录结果生成字幕

Supported Languages

支持语言

Use Cases

适用场景

Related Skills

相关技能

ElevenLabs TTS (reverse direction)

ElevenLabs 文本转语音（反向功能）

ElevenLabs dubbing (translate audio)

ElevenLabs 配音（音频翻译）

Other STT models (Whisper)

其他语音转文本模型（Whisper）

Full platform skill (all 250+ apps)

全平台技能（250+应用）