analyze-video
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSkill: Analyze Video
技能:分析视频
Add visual descriptions to audio transcripts by extracting JPG frames with ffmpeg and analyzing them. Never read video files directly - extract frames first.
通过使用ffmpeg提取JPG帧并进行分析,为音频转录文本添加视觉描述。切勿直接读取视频文件 - 先提取帧。
Prerequisites
前提条件
Videos must have audio transcripts. Run transcribe-audio skill first if needed.
视频必须已有音频转录文本。如果需要,请先运行transcribe-audio技能。
Workflow
工作流程
1. Copy & Clean Audio Transcript
1. 复制并清理音频转录文本
Don't read the audio transcript, just copy it and then prepare it by using the prepare_visual_script.rb file. This removes word-level timing data and prettifies the JSON for easier editing:
bash
cp libraries/[library]/transcripts/video.json libraries/[library]/transcripts/visual_video.json
ruby .claude/skills/analyze-video/prepare_visual_script.rb libraries/[library]/transcripts/visual_video.json无需读取音频转录文本,只需复制它,然后使用prepare_visual_script.rb文件进行预处理。这会移除单词级别的时间数据并美化JSON格式,以便于编辑:
bash
cp libraries/[library]/transcripts/video.json libraries/[library]/transcripts/visual_video.json
ruby .claude/skills/analyze-video/prepare_visual_script.rb libraries/[library]/transcripts/visual_video.json2. Extract Frames (Binary Search)
2. 提取帧(二分法)
Create frame directory:
mkdir -p tmp/frames/[video_name]Videos ≤30s: Extract one frame at 2s
Videos >30s: Extract start (2s), middle (duration/2), end (duration-2s)
bash
ffmpeg -ss 00:00:02 -i video.mov -vframes 1 -vf "scale=1280:-1" tmp/frames/[video_name]/start.jpgSubdivide when: Footage start, middle and end have different subjects, setting or angle changes
Stop when: The footage no longer seems to be changing or only has minor changes
Never sample more frequently than once per 30 seconds
创建帧目录:
mkdir -p tmp/frames/[video_name]时长≤30秒的视频: 在第2秒处提取一帧
时长>30秒的视频: 提取开头(第2秒)、中间(时长/2处)、结尾(时长-2秒处)的帧
bash
ffmpeg -ss 00:00:02 -i video.mov -vframes 1 -vf "scale=1280:-1" tmp/frames/[video_name]/start.jpg细分场景: 视频片段的开头、中间和结尾主体、场景或角度不同时
停止细分: 视频片段不再有变化或仅有微小变化时
采样频率不得超过每30秒一次
3. Add Visual Descriptions
3. 添加视觉描述
Read the visual video json file that you created earlier.
Read the JPG frames from using Read tool, then Edit :
tmp/frames/[video_name]/visual_video.jsonDo these incrementally. You don't need to create a program or script to do this, just incrementally edit the json whenever you read new frames.
Dialogue segments - add field:
visualjson
{
"start": 2.917,
"end": 7.586,
"text": "Hey, good afternoon everybody.",
"visual": "Man in red shirt speaking to camera in medium shot. Home office with bookshelf. Natural lighting.",
"words": [...]
}B-roll segments - insert new entries:
json
{
"start": 35.474,
"end": 56.162,
"text": "",
"visual": "Green bicycle parked in front of building. Urban street with trees.",
"b_roll": true,
"words": []
}Guidelines:
- Descriptions should be 3 sentences max.
- First segment: detailed (subject, setting, shot type, lighting, camera style)
- Continuing shots: brief if similar, otherwise can be up to 3 sentences if drastically different.
读取你之前创建的visual_video.json文件。
使用读取工具读取中的JPG帧,然后编辑:
tmp/frames/[video_name]/visual_video.json请逐步完成此操作。你无需编写程序或脚本,只需在读取新帧时逐步编辑JSON文件。
对话片段 - 添加字段:
visualjson
{
"start": 2.917,
"end": 7.586,
"text": "Hey, good afternoon everybody.",
"visual": "Man in red shirt speaking to camera in medium shot. Home office with bookshelf. Natural lighting.",
"words": [...]
}B-roll片段 - 插入新条目:
json
{
"start": 35.474,
"end": 56.162,
"text": "",
"visual": "Green bicycle parked in front of building. Urban street with trees.",
"b_roll": true,
"words": []
}指南:
- 描述最多3句话。
- 第一个片段:详细描述(主体、场景、镜头类型、光线、拍摄风格)
- 后续镜头:若与之前相似则简要描述,若差异较大则可最多写3句话。
4. Cleanup & Return
4. 清理与返回
bash
rm -rf tmp/frames/[video_name]Return structured response:
✓ [video_filename.mov] analyzed successfully
Visual transcript: libraries/[library]/transcripts/visual_video.json
Video path: /full/path/to/video_filename.movDO NOT update library.yaml - parent agent handles this to avoid race conditions in parallel execution.
bash
rm -rf tmp/frames/[video_name]返回结构化响应:
✓ [video_filename.mov] 分析成功
视觉转录本:libraries/[library]/transcripts/visual_video.json
视频路径:/full/path/to/video_filename.mov请勿更新library.yaml - 父Agent会处理此操作,以避免并行执行时出现竞争条件。