speech-to-text

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Speech-to-Text

语音转文本

Transcribe audio to text via inference.sh CLI.
通过inference.sh CLI将音频转录为文本。

Quick Start

快速开始

bash
curl -fsSL https://cli.inference.sh | sh && infsh login

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'
bash
curl -fsSL https://cli.inference.sh | sh && infsh login

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'

Available Models

可用模型

ModelApp IDBest For
Fast Whisper V3
infsh/fast-whisper-large-v3
Fast transcription
Whisper V3 Large
infsh/whisper-v3-large
Highest accuracy
模型App ID最适用场景
Fast Whisper V3
infsh/fast-whisper-large-v3
快速转录
Whisper V3 Large
infsh/whisper-v3-large
最高准确率

Examples

示例

Basic Transcription

基础转录

bash
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'
bash
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://meeting.mp3"}'

With Timestamps

带时间戳

bash
infsh app sample infsh/fast-whisper-large-v3 --save input.json
bash
infsh app sample infsh/fast-whisper-large-v3 --save input.json

{

{

"audio_url": "https://podcast.mp3",

"audio_url": "https://podcast.mp3",

"timestamps": true

"timestamps": true

}

}

infsh app run infsh/fast-whisper-large-v3 --input input.json
undefined
infsh app run infsh/fast-whisper-large-v3 --input input.json
undefined

Translation (to English)

翻译(至英文)

bash
infsh app run infsh/whisper-v3-large --input '{
  "audio_url": "https://french-audio.mp3",
  "task": "translate"
}'
bash
infsh app run infsh/whisper-v3-large --input '{
  "audio_url": "https://french-audio.mp3",
  "task": "translate"
}'

From Video

从视频提取音频转录

bash
undefined
bash
undefined

Extract audio from video first

先从视频中提取音频

infsh app run infsh/video-audio-extractor --input '{"video_url": "https://video.mp4"}' > audio.json
infsh app run infsh/video-audio-extractor --input '{"video_url": "https://video.mp4"}' > audio.json

Transcribe the extracted audio

转录提取出的音频

infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'
undefined
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "<audio-url>"}'
undefined

Workflow: Video Subtitles

工作流:视频字幕

bash
undefined
bash
undefined

1. Transcribe video audio

1. 转录视频音频

infsh app run infsh/fast-whisper-large-v3 --input '{ "audio_url": "https://video.mp4", "timestamps": true }' > transcript.json
infsh app run infsh/fast-whisper-large-v3 --input '{ "audio_url": "https://video.mp4", "timestamps": true }' > transcript.json

2. Use transcript for captions

2. 使用转录结果生成字幕

infsh app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'
undefined
infsh app run infsh/caption-videos --input '{ "video_url": "https://video.mp4", "captions": "<transcript-from-step-1>" }'
undefined

Supported Languages

支持的语言

Whisper supports 99+ languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Russian, and many more.
Whisper支持99+种语言,包括: 英语、西班牙语、法语、德语、意大利语、葡萄牙语、中文、日语、韩语、阿拉伯语、印地语、俄语等多种语言。

Use Cases

适用场景

  • Meetings: Transcribe recordings
  • Podcasts: Generate transcripts
  • Subtitles: Create captions for videos
  • Voice Notes: Convert to searchable text
  • Interviews: Transcription for research
  • Accessibility: Make audio content accessible
  • 会议:转录会议录音
  • 播客:生成播客文稿
  • 字幕:为视频创建字幕
  • 语音笔记:转换为可搜索文本
  • 访谈:为研究转录访谈内容
  • 无障碍访问:让音频内容更易获取

Output Format

输出格式

Returns JSON with:
  • text
    : Full transcription
  • segments
    : Timestamped segments (if requested)
  • language
    : Detected language
返回包含以下内容的JSON:
  • text
    :完整转录文本
  • segments
    :带时间戳的片段(若请求)
  • language
    :检测到的语言

Related Skills

相关技能

bash
undefined
bash
undefined

Full platform skill (all 150+ apps)

全平台技能(包含150+应用)

npx skills add inference-sh/skills@inference-sh
npx skills add inference-sh/skills@inference-sh

Text-to-speech (reverse direction)

文本转语音(反向功能)

npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@text-to-speech

Video generation (add captions)

视频生成(添加字幕)

npx skills add inference-sh/skills@ai-video-generation
npx skills add inference-sh/skills@ai-video-generation

AI avatars (lipsync with transcripts)

AI虚拟形象(与转录文稿同步唇形)

npx skills add inference-sh/skills@ai-avatar-video

Browse all audio apps: `infsh app list --category audio`
npx skills add inference-sh/skills@ai-avatar-video

浏览所有音频应用:`infsh app list --category audio`

Documentation

文档