speech-to-text

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Speech-to-Text

语音转文本

Transcribe audio to text via inference.sh CLI.

通过inference.sh CLI将音频转录为文本。

Quick Start

快速开始

bash

curl -fsSL https://cli.inference.sh | sh && infsh login

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'

bash

curl -fsSL https://cli.inference.sh | sh && infsh login

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'

Available Models

可用模型

Model	App ID	Best For
Fast Whisper V3	`infsh/fast-whisper-large-v3`	Fast transcription
Whisper V3 Large	`infsh/whisper-v3-large`	Highest accuracy

模型	App ID	最适用场景
Fast Whisper V3	`infsh/fast-whisper-large-v3`	快速转录
Whisper V3 Large	`infsh/whisper-v3-large`	最高准确率

Examples

示例

Basic Transcription

基础转录

bash

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'

bash

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'

With Timestamps

带时间戳

bash

infsh app sample infsh/fast-whisper-large-v3 --save input.json

bash

infsh app sample infsh/fast-whisper-large-v3 --save input.json

{

"audio_url": "https://podcast.mp3",

"timestamps": true

}

infsh app run infsh/fast-whisper-large-v3 --input input.json

undefined

infsh app run infsh/fast-whisper-large-v3 --input input.json

undefined

Translation (to English)

翻译（至英文）

bash

infsh app run infsh/whisper-v3-large --input '{
  "audio_url": "https://french-audio.mp3",
  "task": "translate"
}'

bash

infsh app run infsh/whisper-v3-large --input '{
  "audio_url": "https://french-audio.mp3",
  "task": "translate"
}'

From Video

从视频提取音频转录

bash

undefined

bash

undefined

Extract audio from video first

先从视频中提取音频

infsh app run infsh/video-audio-extractor --input '{"video_url": "https://video.mp4"}' > audio.json

Transcribe the extracted audio

转录提取出的音频

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'

undefined

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'

undefined

Workflow: Video Subtitles

工作流：视频字幕

bash

undefined

bash

undefined

1. Transcribe video audio

1. 转录视频音频

infsh app run infsh/fast-whisper-large-v3 --input '{ "audio_url": "https://video.mp4", "timestamps": true }' > transcript.json

2. Use transcript for captions

2. 使用转录结果生成字幕

infsh app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'

undefined

infsh app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'

undefined

Supported Languages

支持的语言

Whisper supports 99+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, and many more.

Whisper支持99+种语言，包括：英语、西班牙语、法语、德语、意大利语、葡萄牙语、中文、日语、韩语、阿拉伯语、印地语、俄语等多种语言。

Use Cases

适用场景

Meetings: Transcribe recordings
Podcasts: Generate transcripts
Subtitles: Create captions for videos
Voice Notes: Convert to searchable text
Interviews: Transcription for research
Accessibility: Make audio content accessible

会议：转录会议录音
播客：生成播客文稿
字幕：为视频创建字幕
语音笔记：转换为可搜索文本
访谈：为研究转录访谈内容
无障碍访问：让音频内容更易获取

Output Format

输出格式

Returns JSON with:

```
text
```
: Full transcription
```
segments
```
: Timestamped segments (if requested)
```
language
```
: Detected language

返回包含以下内容的JSON：

```
text
```
：完整转录文本
```
segments
```
：带时间戳的片段（若请求）
```
language
```
：检测到的语言

Related Skills

Full platform skill (all 150+ apps)

全平台技能（包含150+应用）

npx skills add inference-sh/skills@inference-sh

Text-to-speech (reverse direction)

文本转语音（反向功能）

npx skills add inference-sh/skills@text-to-speech

Video generation (add captions)

视频生成（添加字幕）

npx skills add inference-sh/skills@ai-video-generation

AI avatars (lipsync with transcripts)

AI虚拟形象（与转录文稿同步唇形）

npx skills add inference-sh/skills@ai-avatar-video


Browse all audio apps: `infsh app list --category audio`

npx skills add inference-sh/skills@ai-avatar-video


浏览所有音频应用：`infsh app list --category audio`

Documentation

文档

Running Apps - How to run apps via CLI
Audio Transcription Example - Complete transcription guide
Apps Overview - Understanding the app ecosystem

运行应用 - 如何通过CLI运行应用
音频转录示例 - 完整转录指南
应用概览 - 了解应用生态系统

speech-to-text

Original

Translation

Speech-to-Text

语音转文本

Quick Start

快速开始

Available Models

可用模型

Examples

示例

Basic Transcription

基础转录

With Timestamps

带时间戳

{

{

"audio_url": "https://podcast.mp3",

"audio_url": "https://podcast.mp3",

"timestamps": true

"timestamps": true

}

}

Translation (to English)

翻译（至英文）

From Video

从视频提取音频转录

Extract audio from video first

先从视频中提取音频

Transcribe the extracted audio

转录提取出的音频

Workflow: Video Subtitles

工作流：视频字幕

1. Transcribe video audio

1. 转录视频音频

2. Use transcript for captions

2. 使用转录结果生成字幕

Supported Languages

支持的语言

Use Cases

适用场景

Output Format

输出格式

Related Skills

相关技能

Full platform skill (all 150+ apps)

全平台技能（包含150+应用）

Text-to-speech (reverse direction)

文本转语音（反向功能）

Video generation (add captions)

视频生成（添加字幕）

AI avatars (lipsync with transcripts)

AI虚拟形象（与转录文稿同步唇形）

Documentation

文档