whisper-transcription

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Whisper Transcription

Whisper 转录

Transcribe any audio or video to text using OpenAI's Whisper model - the same technology powering ChatGPT voice features.
使用OpenAI的Whisper模型将任意音频或视频转录为文本——这正是为ChatGPT语音功能提供支持的同款技术。

When to Use This Skill

适用场景

  • Podcast repurposing - Convert episodes to blog posts, show notes, social snippets
  • Video subtitles - Generate SRT/VTT files for YouTube, social media
  • Interview extraction - Pull quotes and insights from recorded calls
  • Content audit - Make audio/video libraries searchable
  • Translation - Transcribe and translate foreign language content
  • 播客内容再利用 - 将节目转换为博客文章、节目笔记、社交平台片段
  • 视频字幕 - 为YouTube、社交媒体生成SRT/VTT文件
  • 采访内容提取 - 从录制的通话中提取引述和见解
  • 内容审计 - 让音视频库可被搜索
  • 翻译 - 转录并翻译外语内容

What Claude Does vs What You Decide

Claude 负责的工作 vs 由你决定的事项

Claude DoesYou Decide
Structures production workflowFinal creative direction
Suggests technical approachesEquipment and tool choices
Creates templates and checklistsQuality standards
Identifies best practicesBrand/voice decisions
Generates script outlinesFinal script approval
Claude 负责由你决定
构建生产工作流最终创意方向
提出技术方案设备和工具选择
创建模板和检查清单质量标准
确定最佳实践品牌/风格决策
生成脚本大纲最终脚本审批

Dependencies

依赖项

bash
pip install openai-whisper torch ffmpeg-python click
bash
pip install openai-whisper torch ffmpeg-python click

Also requires ffmpeg installed on system

系统还需安装ffmpeg

macOS: brew install ffmpeg

macOS: brew install ffmpeg

Ubuntu: sudo apt install ffmpeg

Ubuntu: sudo apt install ffmpeg

undefined
undefined

Commands

命令

Transcribe Single File

单文件转录

bash
python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt
python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt
bash
python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt
python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt

Batch Transcription

批量转录

bash
python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/
bash
python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/

Transcribe + Translate

转录+翻译

bash
python scripts/main.py translate foreign-audio.mp3 --to en
bash
python scripts/main.py translate foreign-audio.mp3 --to en

Extract Timestamps

提取时间戳

bash
python scripts/main.py timestamps podcast.mp3 --format json
bash
python scripts/main.py timestamps podcast.mp3 --format json

Examples

示例

Example 1: Podcast to Blog Post

示例1:播客转博客文章

bash
undefined
bash
undefined

Transcribe 1-hour podcast

转录1小时的播客

python scripts/main.py transcribe episode-42.mp3 --model medium
python scripts/main.py transcribe episode-42.mp3 --model medium

Output: episode-42.txt (full transcript with timestamps)

输出:episode-42.txt(带时间戳的完整转录文本)

Processing time: ~5 min for 1 hour audio on M1 Mac

处理时间:在M1芯片Mac上处理1小时音频约需5分钟

undefined
undefined

Example 2: YouTube Subtitles

示例2:YouTube字幕生成

bash
undefined
bash
undefined

Generate SRT for video upload

生成用于视频上传的SRT字幕

python scripts/main.py transcribe marketing-video.mp4 --format srt
python scripts/main.py transcribe marketing-video.mp4 --format srt

Output: marketing-video.srt

输出:marketing-video.srt

Upload directly to YouTube/Vimeo

可直接上传至YouTube/Vimeo

undefined
undefined

Example 3: Batch Process Interview Library

示例3:批量处理采访录音库

bash
undefined
bash
undefined

Transcribe all recordings in folder

转录文件夹中的所有录音

python scripts/main.py batch ./customer-interviews/ --model small --format txt
python scripts/main.py batch ./customer-interviews/ --model small --format txt

Output: ./customer-interviews/*.txt (one per audio file)

输出:./customer-interviews/*.txt(每个音频文件对应一个转录文本文件)

undefined
undefined

Model Selection Guide

模型选择指南

ModelSpeedAccuracyVRAMBest For
tiny
Fastest~70%1GBQuick drafts, short clips
base
Fast~80%1GBSocial media clips
small
Medium~85%2GBPodcasts, interviews
medium
Slow~90%5GBProfessional transcripts
large
Slowest~95%10GBCritical accuracy needs
Recommendation: Start with
small
for most marketing content. Use
medium
for client deliverables.
模型速度准确率显存最佳适用场景
tiny
最快~70%1GB快速草稿、短视频片段
base
~80%1GB社交媒体片段
small
中等~85%2GB播客、采访
medium
~90%5GB专业级转录文本
large
最慢~95%10GB对准确率有严格要求的场景
建议: 大多数营销内容使用
small
模型即可。客户交付成果使用
medium
模型。

Output Formats

输出格式

FormatExtensionUse Case
txt
.txtBlog posts, analysis
srt
.srtVideo subtitles (YouTube)
vtt
.vttWeb video subtitles
json
.jsonProgrammatic access
tsv
.tsvSpreadsheet analysis
格式扩展名适用场景
txt
.txt博客文章、分析
srt
.srt视频字幕(YouTube)
vtt
.vtt网页视频字幕
json
.json程序化访问
tsv
.tsv电子表格分析

Performance Tips

性能优化技巧

  1. GPU acceleration - 10x faster with CUDA GPU
  2. Audio extraction - Script auto-extracts audio from video
  3. Chunking - Long files auto-split for memory efficiency
  4. Language detection - Automatic, or specify with
    --language
  1. GPU加速 - 使用CUDA GPU可提速10倍
  2. 音频提取 - 脚本会自动从视频中提取音频
  3. 分块处理 - 长文件会自动拆分以提升内存使用效率
  4. 语言检测 - 自动检测,或使用
    --language
    参数指定

Skill Boundaries

技能边界

What This Skill Does Well

本技能擅长的工作

  • Structuring audio production workflows
  • Providing technical guidance
  • Creating quality checklists
  • Suggesting creative approaches
  • 构建音频制作工作流
  • 提供技术指导
  • 创建质量检查清单
  • 提出创意方案

What This Skill Cannot Do

本技能无法完成的工作

  • Replace audio engineering expertise
  • Make subjective creative decisions
  • Access or edit audio files directly
  • Guarantee commercial success
  • 替代音频工程专业知识
  • 做出主观创意决策
  • 直接访问或编辑音频文件
  • 保证商业成功

Related Skills

相关技能

  • video-processing - Extract audio from video
  • youtube-downloader - Download videos to transcribe
  • content-repurposer - Transform transcripts to content
  • podcast-production - Create podcasts
  • 视频处理 - 从视频中提取音频
  • YouTube下载器 - 下载视频以进行转录
  • 内容再利用工具 - 将转录文本转换为其他内容
  • 播客制作 - 制作播客

Skill Metadata

技能元数据

  • Mode: cyborg
yaml
category: automation
subcategory: audio-processing
dependencies: [openai-whisper, torch, ffmpeg-python]
difficulty: beginner
time_saved: 10+ hours/week
  • 模式:人机协作
yaml
category: automation
subcategory: audio-processing
dependencies: [openai-whisper, torch, ffmpeg-python]
difficulty: beginner
time_saved: 10+ hours/week