whisper-transcription
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWhisper Transcription
Whisper 转录
Transcribe any audio or video to text using OpenAI's Whisper model - the same technology powering ChatGPT voice features.
使用OpenAI的Whisper模型将任意音频或视频转录为文本——这正是为ChatGPT语音功能提供支持的同款技术。
When to Use This Skill
适用场景
- Podcast repurposing - Convert episodes to blog posts, show notes, social snippets
- Video subtitles - Generate SRT/VTT files for YouTube, social media
- Interview extraction - Pull quotes and insights from recorded calls
- Content audit - Make audio/video libraries searchable
- Translation - Transcribe and translate foreign language content
- 播客内容再利用 - 将节目转换为博客文章、节目笔记、社交平台片段
- 视频字幕 - 为YouTube、社交媒体生成SRT/VTT文件
- 采访内容提取 - 从录制的通话中提取引述和见解
- 内容审计 - 让音视频库可被搜索
- 翻译 - 转录并翻译外语内容
What Claude Does vs What You Decide
Claude 负责的工作 vs 由你决定的事项
| Claude Does | You Decide |
|---|---|
| Structures production workflow | Final creative direction |
| Suggests technical approaches | Equipment and tool choices |
| Creates templates and checklists | Quality standards |
| Identifies best practices | Brand/voice decisions |
| Generates script outlines | Final script approval |
| Claude 负责 | 由你决定 |
|---|---|
| 构建生产工作流 | 最终创意方向 |
| 提出技术方案 | 设备和工具选择 |
| 创建模板和检查清单 | 质量标准 |
| 确定最佳实践 | 品牌/风格决策 |
| 生成脚本大纲 | 最终脚本审批 |
Dependencies
依赖项
bash
pip install openai-whisper torch ffmpeg-python clickbash
pip install openai-whisper torch ffmpeg-python clickAlso requires ffmpeg installed on system
系统还需安装ffmpeg
macOS: brew install ffmpeg
macOS: brew install ffmpeg
Ubuntu: sudo apt install ffmpeg
Ubuntu: sudo apt install ffmpeg
undefinedundefinedCommands
命令
Transcribe Single File
单文件转录
bash
python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt
python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srtbash
python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt
python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srtBatch Transcription
批量转录
bash
python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/bash
python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/Transcribe + Translate
转录+翻译
bash
python scripts/main.py translate foreign-audio.mp3 --to enbash
python scripts/main.py translate foreign-audio.mp3 --to enExtract Timestamps
提取时间戳
bash
python scripts/main.py timestamps podcast.mp3 --format jsonbash
python scripts/main.py timestamps podcast.mp3 --format jsonExamples
示例
Example 1: Podcast to Blog Post
示例1:播客转博客文章
bash
undefinedbash
undefinedTranscribe 1-hour podcast
转录1小时的播客
python scripts/main.py transcribe episode-42.mp3 --model medium
python scripts/main.py transcribe episode-42.mp3 --model medium
Output: episode-42.txt (full transcript with timestamps)
输出:episode-42.txt(带时间戳的完整转录文本)
Processing time: ~5 min for 1 hour audio on M1 Mac
处理时间:在M1芯片Mac上处理1小时音频约需5分钟
undefinedundefinedExample 2: YouTube Subtitles
示例2:YouTube字幕生成
bash
undefinedbash
undefinedGenerate SRT for video upload
生成用于视频上传的SRT字幕
python scripts/main.py transcribe marketing-video.mp4 --format srt
python scripts/main.py transcribe marketing-video.mp4 --format srt
Output: marketing-video.srt
输出:marketing-video.srt
Upload directly to YouTube/Vimeo
可直接上传至YouTube/Vimeo
undefinedundefinedExample 3: Batch Process Interview Library
示例3:批量处理采访录音库
bash
undefinedbash
undefinedTranscribe all recordings in folder
转录文件夹中的所有录音
python scripts/main.py batch ./customer-interviews/ --model small --format txt
python scripts/main.py batch ./customer-interviews/ --model small --format txt
Output: ./customer-interviews/*.txt (one per audio file)
输出:./customer-interviews/*.txt(每个音频文件对应一个转录文本文件)
undefinedundefinedModel Selection Guide
模型选择指南
| Model | Speed | Accuracy | VRAM | Best For |
|---|---|---|---|---|
| Fastest | ~70% | 1GB | Quick drafts, short clips |
| Fast | ~80% | 1GB | Social media clips |
| Medium | ~85% | 2GB | Podcasts, interviews |
| Slow | ~90% | 5GB | Professional transcripts |
| Slowest | ~95% | 10GB | Critical accuracy needs |
Recommendation: Start with for most marketing content. Use for client deliverables.
smallmedium| 模型 | 速度 | 准确率 | 显存 | 最佳适用场景 |
|---|---|---|---|---|
| 最快 | ~70% | 1GB | 快速草稿、短视频片段 |
| 快 | ~80% | 1GB | 社交媒体片段 |
| 中等 | ~85% | 2GB | 播客、采访 |
| 慢 | ~90% | 5GB | 专业级转录文本 |
| 最慢 | ~95% | 10GB | 对准确率有严格要求的场景 |
建议: 大多数营销内容使用模型即可。客户交付成果使用模型。
smallmediumOutput Formats
输出格式
| Format | Extension | Use Case |
|---|---|---|
| .txt | Blog posts, analysis |
| .srt | Video subtitles (YouTube) |
| .vtt | Web video subtitles |
| .json | Programmatic access |
| .tsv | Spreadsheet analysis |
| 格式 | 扩展名 | 适用场景 |
|---|---|---|
| .txt | 博客文章、分析 |
| .srt | 视频字幕(YouTube) |
| .vtt | 网页视频字幕 |
| .json | 程序化访问 |
| .tsv | 电子表格分析 |
Performance Tips
性能优化技巧
- GPU acceleration - 10x faster with CUDA GPU
- Audio extraction - Script auto-extracts audio from video
- Chunking - Long files auto-split for memory efficiency
- Language detection - Automatic, or specify with
--language
- GPU加速 - 使用CUDA GPU可提速10倍
- 音频提取 - 脚本会自动从视频中提取音频
- 分块处理 - 长文件会自动拆分以提升内存使用效率
- 语言检测 - 自动检测,或使用参数指定
--language
Skill Boundaries
技能边界
What This Skill Does Well
本技能擅长的工作
- Structuring audio production workflows
- Providing technical guidance
- Creating quality checklists
- Suggesting creative approaches
- 构建音频制作工作流
- 提供技术指导
- 创建质量检查清单
- 提出创意方案
What This Skill Cannot Do
本技能无法完成的工作
- Replace audio engineering expertise
- Make subjective creative decisions
- Access or edit audio files directly
- Guarantee commercial success
- 替代音频工程专业知识
- 做出主观创意决策
- 直接访问或编辑音频文件
- 保证商业成功
Related Skills
相关技能
- video-processing - Extract audio from video
- youtube-downloader - Download videos to transcribe
- content-repurposer - Transform transcripts to content
- podcast-production - Create podcasts
- 视频处理 - 从视频中提取音频
- YouTube下载器 - 下载视频以进行转录
- 内容再利用工具 - 将转录文本转换为其他内容
- 播客制作 - 制作播客
Skill Metadata
技能元数据
- Mode: cyborg
yaml
category: automation
subcategory: audio-processing
dependencies: [openai-whisper, torch, ffmpeg-python]
difficulty: beginner
time_saved: 10+ hours/week- 模式:人机协作
yaml
category: automation
subcategory: audio-processing
dependencies: [openai-whisper, torch, ffmpeg-python]
difficulty: beginner
time_saved: 10+ hours/week