whisper-transcription

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Whisper Transcription

Whisper 转录

Transcribe any audio or video to text using OpenAI's Whisper model - the same technology powering ChatGPT voice features.

使用OpenAI的Whisper模型将任意音频或视频转录为文本——这正是为ChatGPT语音功能提供支持的同款技术。

When to Use This Skill

适用场景

Podcast repurposing - Convert episodes to blog posts, show notes, social snippets
Video subtitles - Generate SRT/VTT files for YouTube, social media
Interview extraction - Pull quotes and insights from recorded calls
Content audit - Make audio/video libraries searchable
Translation - Transcribe and translate foreign language content

播客内容再利用 - 将节目转换为博客文章、节目笔记、社交平台片段
视频字幕 - 为YouTube、社交媒体生成SRT/VTT文件
采访内容提取 - 从录制的通话中提取引述和见解
内容审计 - 让音视频库可被搜索
翻译 - 转录并翻译外语内容

What Claude Does vs What You Decide

Claude 负责的工作 vs 由你决定的事项

Claude Does	You Decide
Structures production workflow	Final creative direction
Suggests technical approaches	Equipment and tool choices
Creates templates and checklists	Quality standards
Identifies best practices	Brand/voice decisions
Generates script outlines	Final script approval

Claude 负责	由你决定
构建生产工作流	最终创意方向
提出技术方案	设备和工具选择
创建模板和检查清单	质量标准
确定最佳实践	品牌/风格决策
生成脚本大纲	最终脚本审批

Dependencies

依赖项

bash

pip install openai-whisper torch ffmpeg-python click

bash

pip install openai-whisper torch ffmpeg-python click

Also requires ffmpeg installed on system

系统还需安装ffmpeg

macOS: brew install ffmpeg

Ubuntu: sudo apt install ffmpeg

undefined

undefined

Commands

命令

Transcribe Single File

单文件转录

bash

python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt
python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt

bash

python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt
python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt

Batch Transcription

批量转录

bash

python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/

bash

python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/

Transcribe + Translate

转录+翻译

bash

python scripts/main.py translate foreign-audio.mp3 --to en

bash

python scripts/main.py translate foreign-audio.mp3 --to en

Extract Timestamps

提取时间戳

bash

python scripts/main.py timestamps podcast.mp3 --format json

bash

python scripts/main.py timestamps podcast.mp3 --format json

Examples

示例

Example 1: Podcast to Blog Post

示例1：播客转博客文章

bash

undefined

bash

undefined

Transcribe 1-hour podcast

转录1小时的播客

python scripts/main.py transcribe episode-42.mp3 --model medium

Output: episode-42.txt (full transcript with timestamps)

输出：episode-42.txt（带时间戳的完整转录文本）

Processing time: ~5 min for 1 hour audio on M1 Mac

处理时间：在M1芯片Mac上处理1小时音频约需5分钟

undefined

undefined

Example 2: YouTube Subtitles

示例2：YouTube字幕生成

bash

undefined

bash

undefined

Generate SRT for video upload

生成用于视频上传的SRT字幕

python scripts/main.py transcribe marketing-video.mp4 --format srt

Output: marketing-video.srt

输出：marketing-video.srt

Upload directly to YouTube/Vimeo

可直接上传至YouTube/Vimeo

undefined

undefined

Example 3: Batch Process Interview Library

示例3：批量处理采访录音库

bash

undefined

bash

undefined

Transcribe all recordings in folder

转录文件夹中的所有录音

python scripts/main.py batch ./customer-interviews/ --model small --format txt

Output: ./customer-interviews/*.txt (one per audio file)

输出：./customer-interviews/*.txt（每个音频文件对应一个转录文本文件）

undefined

undefined

Model Selection Guide

模型选择指南

Model	Speed	Accuracy	VRAM	Best For
`tiny`	Fastest	~70%	1GB	Quick drafts, short clips
`base`	Fast	~80%	1GB	Social media clips
`small`	Medium	~85%	2GB	Podcasts, interviews
`medium`	Slow	~90%	5GB	Professional transcripts
`large`	Slowest	~95%	10GB	Critical accuracy needs

Recommendation: Start with

small

for most marketing content. Use

medium

for client deliverables.

模型	速度	准确率	显存	最佳适用场景
`tiny`	最快	~70%	1GB	快速草稿、短视频片段
`base`	快	~80%	1GB	社交媒体片段
`small`	中等	~85%	2GB	播客、采访
`medium`	慢	~90%	5GB	专业级转录文本
`large`	最慢	~95%	10GB	对准确率有严格要求的场景

建议： 大多数营销内容使用

small

模型即可。客户交付成果使用

medium

模型。

Output Formats

输出格式

Format	Extension	Use Case
`txt`	.txt	Blog posts, analysis
`srt`	.srt	Video subtitles (YouTube)
`vtt`	.vtt	Web video subtitles
`json`	.json	Programmatic access
`tsv`	.tsv	Spreadsheet analysis

格式	扩展名	适用场景
`txt`	.txt	博客文章、分析
`srt`	.srt	视频字幕（YouTube）
`vtt`	.vtt	网页视频字幕
`json`	.json	程序化访问
`tsv`	.tsv	电子表格分析

Performance Tips

性能优化技巧

GPU acceleration - 10x faster with CUDA GPU
Audio extraction - Script auto-extracts audio from video
Chunking - Long files auto-split for memory efficiency
Language detection - Automatic, or specify with
```
--language
```

GPU加速 - 使用CUDA GPU可提速10倍
音频提取 - 脚本会自动从视频中提取音频
分块处理 - 长文件会自动拆分以提升内存使用效率
语言检测 - 自动检测，或使用
```
--language
```
参数指定

Skill Boundaries

技能边界

What This Skill Does Well

本技能擅长的工作

Structuring audio production workflows
Providing technical guidance
Creating quality checklists
Suggesting creative approaches

构建音频制作工作流
提供技术指导
创建质量检查清单
提出创意方案

What This Skill Cannot Do

本技能无法完成的工作

Replace audio engineering expertise
Make subjective creative decisions
Access or edit audio files directly
Guarantee commercial success

替代音频工程专业知识
做出主观创意决策
直接访问或编辑音频文件
保证商业成功

Related Skills

Skill Metadata

技能元数据

Mode: cyborg

yaml

category: automation
subcategory: audio-processing
dependencies: [openai-whisper, torch, ffmpeg-python]
difficulty: beginner
time_saved: 10+ hours/week

模式：人机协作

yaml

category: automation
subcategory: audio-processing
dependencies: [openai-whisper, torch, ffmpeg-python]
difficulty: beginner
time_saved: 10+ hours/week

whisper-transcription

Original

Translation

Whisper Transcription

Whisper 转录

When to Use This Skill

适用场景

What Claude Does vs What You Decide

Claude 负责的工作 vs 由你决定的事项

Dependencies

依赖项

Also requires ffmpeg installed on system

系统还需安装ffmpeg

macOS: brew install ffmpeg

macOS: brew install ffmpeg

Ubuntu: sudo apt install ffmpeg

Ubuntu: sudo apt install ffmpeg

Commands

命令

Transcribe Single File

单文件转录

Batch Transcription

批量转录

Transcribe + Translate

转录+翻译

Extract Timestamps

提取时间戳

Examples

示例

Example 1: Podcast to Blog Post

示例1：播客转博客文章

Transcribe 1-hour podcast

转录1小时的播客

Output: episode-42.txt (full transcript with timestamps)

输出：episode-42.txt（带时间戳的完整转录文本）

Processing time: ~5 min for 1 hour audio on M1 Mac

处理时间：在M1芯片Mac上处理1小时音频约需5分钟

Example 2: YouTube Subtitles

示例2：YouTube字幕生成

Generate SRT for video upload

生成用于视频上传的SRT字幕

Output: marketing-video.srt

输出：marketing-video.srt

Upload directly to YouTube/Vimeo

可直接上传至YouTube/Vimeo

Example 3: Batch Process Interview Library

示例3：批量处理采访录音库

Transcribe all recordings in folder

转录文件夹中的所有录音

Output: ./customer-interviews/*.txt (one per audio file)

输出：./customer-interviews/*.txt（每个音频文件对应一个转录文本文件）

Model Selection Guide

模型选择指南

Output Formats

输出格式

Performance Tips

性能优化技巧

Skill Boundaries

技能边界

What This Skill Does Well

本技能擅长的工作

What This Skill Cannot Do

本技能无法完成的工作

Related Skills

相关技能

Skill Metadata

技能元数据