transcribe-audio
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSkill: Transcribe Audio
Skill: 转录音频
Transcribes video audio using WhisperX and creates clean JSON transcripts with word-level timing data.
使用WhisperX转录视频音频,并生成包含单词级时间数据的简洁JSON转录本。
When to Use
使用场景
- Videos need audio transcripts before visual analysis
- 在进行视觉分析前需要为视频生成音频转录本时
Critical Requirements
关键要求
Use WhisperX, NOT standard Whisper. WhisperX preserves the original video timeline including leading silence, ensuring transcripts match actual video timestamps. Run WhisperX directly on video files. Don't extract audio separately - this ensures timestamp alignment.
必须使用WhisperX,而非标准Whisper。WhisperX会保留原始视频时间线,包括开头的静音部分,确保转录本与实际视频时间戳完全匹配。直接对视频文件运行WhisperX,不要单独提取音频——这样才能保证时间戳对齐。
Workflow
工作流程
1. Read Language from Library File
1. 从库文件读取语言信息
Read the library's to get the language code:
library.yamlyaml
undefined读取库中的文件获取语言代码:
library.yamlyaml
undefinedLibrary metadata
Library metadata
library_name: [library-name]
language: en # Language code stored here
...
undefinedlibrary_name: [library-name]
language: en # Language code stored here
...
undefined2. Run WhisperX
2. 运行WhisperX
bash
whisperx "/full/path/to/video.mov" \
--language en \
--model medium \
--compute_type float32 \
--device cpu \
--output_format json \
--output_dir libraries/[library-name]/transcriptsbash
whisperx "/full/path/to/video.mov" \
--language en \
--model medium \
--compute_type float32 \
--device cpu \
--output_format json \
--output_dir libraries/[library-name]/transcripts3. Prepare Audio Transcript
3. 准备音频转录本
After WhisperX completes, format the JSON using our prepare_audio_script:
bash
ruby .claude/skills/transcribe-audio/prepare_audio_script.rb \
libraries/[library-name]/transcripts/video_name.json \
/full/path/to/original/video_name.movThis script:
- Adds video source path as metadata
- Removes unnecessary fields to reduce file size
- Prettifies JSON
WhisperX运行完成后,使用我们的prepare_audio_script格式化JSON:
bash
ruby .claude/skills/transcribe-audio/prepare_audio_script.rb \
libraries/[library-name]/transcripts/video_name.json \
/full/path/to/original/video_name.mov该脚本会:
- 添加视频源路径作为元数据
- 删除不必要的字段以减小文件体积
- 美化JSON格式
4. Return Success Response
4. 返回成功响应
After audio preparation completes, return this structured response to the parent agent:
✓ [video_filename.mov] transcribed successfully
Audio transcript: libraries/[library-name]/transcripts/video_name.json
Video path: /full/path/to/video_filename.movDO NOT update library.yaml - the parent agent will handle this to avoid race conditions when running multiple transcriptions in parallel.
音频转录本准备完成后,向父Agent返回以下结构化响应:
✓ [video_filename.mov] 转录成功
音频转录本:libraries/[library-name]/transcripts/video_name.json
视频路径:/full/path/to/video_filename.mov请勿更新library.yaml——父Agent会处理此操作,以避免在并行运行多个转录任务时出现竞争条件。
Running in Parallel
并行运行
This skill is designed to run inside a Task agent for parallel execution:
- Each agent handles ONE video file
- Multiple agents can run simultaneously
- Parent thread updates library.yaml sequentially after each agent completes
- No race conditions on shared YAML file
此Skill专为在Task Agent中并行执行而设计:
- 每个Agent处理一个视频文件
- 多个Agent可同时运行
- 父线程会在每个Agent完成后按顺序更新library.yaml
- 共享YAML文件不会出现竞争条件
Next Step
下一步操作
After audio transcription, use the analyze-video skill to add visual descriptions and create the visual transcript.
完成音频转录后,使用analyze-video Skill添加视觉描述并生成视觉转录本。
Installation
安装
Ensure WhisperX is installed. Use the setup skill to verify dependencies.
确保已安装WhisperX。使用setup Skill验证依赖项。