analyze-video

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Skill: Analyze Video

技能:分析视频

Add visual descriptions to audio transcripts by extracting JPG frames with ffmpeg and analyzing them. Never read video files directly - extract frames first.
通过使用ffmpeg提取JPG帧并进行分析,为音频转录文本添加视觉描述。切勿直接读取视频文件 - 先提取帧。

Prerequisites

前提条件

Videos must have audio transcripts. Run transcribe-audio skill first if needed.
视频必须已有音频转录文本。如果需要,请先运行transcribe-audio技能。

Workflow

工作流程

1. Copy & Clean Audio Transcript

1. 复制并清理音频转录文本

Don't read the audio transcript, just copy it and then prepare it by using the prepare_visual_script.rb file. This removes word-level timing data and prettifies the JSON for easier editing:
bash
cp libraries/[library]/transcripts/video.json libraries/[library]/transcripts/visual_video.json
ruby .claude/skills/analyze-video/prepare_visual_script.rb libraries/[library]/transcripts/visual_video.json
无需读取音频转录文本,只需复制它,然后使用prepare_visual_script.rb文件进行预处理。这会移除单词级别的时间数据并美化JSON格式,以便于编辑:
bash
cp libraries/[library]/transcripts/video.json libraries/[library]/transcripts/visual_video.json
ruby .claude/skills/analyze-video/prepare_visual_script.rb libraries/[library]/transcripts/visual_video.json

2. Extract Frames (Binary Search)

2. 提取帧(二分法)

Create frame directory:
mkdir -p tmp/frames/[video_name]
Videos ≤30s: Extract one frame at 2s Videos >30s: Extract start (2s), middle (duration/2), end (duration-2s)
bash
ffmpeg -ss 00:00:02 -i video.mov -vframes 1 -vf "scale=1280:-1" tmp/frames/[video_name]/start.jpg
Subdivide when: Footage start, middle and end have different subjects, setting or angle changes Stop when: The footage no longer seems to be changing or only has minor changes Never sample more frequently than once per 30 seconds
创建帧目录:
mkdir -p tmp/frames/[video_name]
时长≤30秒的视频: 在第2秒处提取一帧 时长>30秒的视频: 提取开头(第2秒)、中间(时长/2处)、结尾(时长-2秒处)的帧
bash
ffmpeg -ss 00:00:02 -i video.mov -vframes 1 -vf "scale=1280:-1" tmp/frames/[video_name]/start.jpg
细分场景: 视频片段的开头、中间和结尾主体、场景或角度不同时 停止细分: 视频片段不再有变化或仅有微小变化时 采样频率不得超过每30秒一次

3. Add Visual Descriptions

3. 添加视觉描述

Read the visual video json file that you created earlier.
Read the JPG frames from
tmp/frames/[video_name]/
using Read tool, then Edit
visual_video.json
:
Do these incrementally. You don't need to create a program or script to do this, just incrementally edit the json whenever you read new frames.
Dialogue segments - add
visual
field:
json
{
  "start": 2.917,
  "end": 7.586,
  "text": "Hey, good afternoon everybody.",
  "visual": "Man in red shirt speaking to camera in medium shot. Home office with bookshelf. Natural lighting.",
  "words": [...]
}
B-roll segments - insert new entries:
json
{
  "start": 35.474,
  "end": 56.162,
  "text": "",
  "visual": "Green bicycle parked in front of building. Urban street with trees.",
  "b_roll": true,
  "words": []
}
Guidelines:
  • Descriptions should be 3 sentences max.
  • First segment: detailed (subject, setting, shot type, lighting, camera style)
  • Continuing shots: brief if similar, otherwise can be up to 3 sentences if drastically different.
读取你之前创建的visual_video.json文件。
使用读取工具读取
tmp/frames/[video_name]/
中的JPG帧,然后编辑
visual_video.json
请逐步完成此操作。你无需编写程序或脚本,只需在读取新帧时逐步编辑JSON文件。
对话片段 - 添加
visual
字段:
json
{
  "start": 2.917,
  "end": 7.586,
  "text": "Hey, good afternoon everybody.",
  "visual": "Man in red shirt speaking to camera in medium shot. Home office with bookshelf. Natural lighting.",
  "words": [...]
}
B-roll片段 - 插入新条目:
json
{
  "start": 35.474,
  "end": 56.162,
  "text": "",
  "visual": "Green bicycle parked in front of building. Urban street with trees.",
  "b_roll": true,
  "words": []
}
指南:
  • 描述最多3句话。
  • 第一个片段:详细描述(主体、场景、镜头类型、光线、拍摄风格)
  • 后续镜头:若与之前相似则简要描述,若差异较大则可最多写3句话。

4. Cleanup & Return

4. 清理与返回

bash
rm -rf tmp/frames/[video_name]
Return structured response:
✓ [video_filename.mov] analyzed successfully
  Visual transcript: libraries/[library]/transcripts/visual_video.json
  Video path: /full/path/to/video_filename.mov
DO NOT update library.yaml - parent agent handles this to avoid race conditions in parallel execution.
bash
rm -rf tmp/frames/[video_name]
返回结构化响应:
✓ [video_filename.mov] 分析成功
  视觉转录本:libraries/[library]/transcripts/visual_video.json
  视频路径:/full/path/to/video_filename.mov
请勿更新library.yaml - 父Agent会处理此操作,以避免并行执行时出现竞争条件。