analyze-video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Skill: Analyze Video

技能：分析视频

Add visual descriptions to audio transcripts by extracting JPG frames with ffmpeg and analyzing them. Never read video files directly - extract frames first.

通过使用ffmpeg提取JPG帧并进行分析，为音频转录文本添加视觉描述。切勿直接读取视频文件 - 先提取帧。

Prerequisites

前提条件

Videos must have audio transcripts. Run transcribe-audio skill first if needed.

视频必须已有音频转录文本。如果需要，请先运行transcribe-audio技能。

Workflow

工作流程

1. Copy & Clean Audio Transcript

1. 复制并清理音频转录文本

Don't read the audio transcript, just copy it and then prepare it by using the prepare_visual_script.rb file. This removes word-level timing data and prettifies the JSON for easier editing:

bash

cp libraries/[library]/transcripts/video.json libraries/[library]/transcripts/visual_video.json
ruby .claude/skills/analyze-video/prepare_visual_script.rb libraries/[library]/transcripts/visual_video.json

无需读取音频转录文本，只需复制它，然后使用prepare_visual_script.rb文件进行预处理。这会移除单词级别的时间数据并美化JSON格式，以便于编辑：

bash

cp libraries/[library]/transcripts/video.json libraries/[library]/transcripts/visual_video.json
ruby .claude/skills/analyze-video/prepare_visual_script.rb libraries/[library]/transcripts/visual_video.json

2. Extract Frames (Binary Search)

2. 提取帧（二分法）

Create frame directory:

mkdir -p tmp/frames/[video_name]

Videos ≤30s: Extract one frame at 2s Videos >30s: Extract start (2s), middle (duration/2), end (duration-2s)

bash

ffmpeg -ss 00:00:02 -i video.mov -vframes 1 -vf "scale=1280:-1" tmp/frames/[video_name]/start.jpg

Subdivide when: Footage start, middle and end have different subjects, setting or angle changes Stop when: The footage no longer seems to be changing or only has minor changes Never sample more frequently than once per 30 seconds

创建帧目录：

mkdir -p tmp/frames/[video_name]

时长≤30秒的视频： 在第2秒处提取一帧 时长>30秒的视频： 提取开头（第2秒）、中间（时长/2处）、结尾（时长-2秒处）的帧

bash

ffmpeg -ss 00:00:02 -i video.mov -vframes 1 -vf "scale=1280:-1" tmp/frames/[video_name]/start.jpg

细分场景： 视频片段的开头、中间和结尾主体、场景或角度不同时 停止细分： 视频片段不再有变化或仅有微小变化时 采样频率不得超过每30秒一次

3. Add Visual Descriptions

3. 添加视觉描述

Read the visual video json file that you created earlier.

Read the JPG frames from

tmp/frames/[video_name]/

using Read tool, then Edit

visual_video.json

Do these incrementally. You don't need to create a program or script to do this, just incrementally edit the json whenever you read new frames.

Dialogue segments - add
visual
field:

json

{
  "start": 2.917,
  "end": 7.586,
  "text": "Hey, good afternoon everybody.",
  "visual": "Man in red shirt speaking to camera in medium shot. Home office with bookshelf. Natural lighting.",
  "words": [...]
}

B-roll segments - insert new entries:

json

{
  "start": 35.474,
  "end": 56.162,
  "text": "",
  "visual": "Green bicycle parked in front of building. Urban street with trees.",
  "b_roll": true,
  "words": []
}

Guidelines:

Descriptions should be 3 sentences max.
First segment: detailed (subject, setting, shot type, lighting, camera style)
Continuing shots: brief if similar, otherwise can be up to 3 sentences if drastically different.

读取你之前创建的visual_video.json文件。

使用读取工具读取

tmp/frames/[video_name]/

中的JPG帧，然后编辑

visual_video.json

：

请逐步完成此操作。你无需编写程序或脚本，只需在读取新帧时逐步编辑JSON文件。

对话片段 - 添加
visual
字段：

json

{
  "start": 2.917,
  "end": 7.586,
  "text": "Hey, good afternoon everybody.",
  "visual": "Man in red shirt speaking to camera in medium shot. Home office with bookshelf. Natural lighting.",
  "words": [...]
}

B-roll片段 - 插入新条目：

json

{
  "start": 35.474,
  "end": 56.162,
  "text": "",
  "visual": "Green bicycle parked in front of building. Urban street with trees.",
  "b_roll": true,
  "words": []
}

指南：

描述最多3句话。
第一个片段：详细描述（主体、场景、镜头类型、光线、拍摄风格）
后续镜头：若与之前相似则简要描述，若差异较大则可最多写3句话。

4. Cleanup & Return

4. 清理与返回

bash

rm -rf tmp/frames/[video_name]

Return structured response:

✓ [video_filename.mov] analyzed successfully
  Visual transcript: libraries/[library]/transcripts/visual_video.json
  Video path: /full/path/to/video_filename.mov

DO NOT update library.yaml - parent agent handles this to avoid race conditions in parallel execution.

bash

rm -rf tmp/frames/[video_name]

返回结构化响应：

✓ [video_filename.mov] 分析成功
  视觉转录本：libraries/[library]/transcripts/visual_video.json
  视频路径：/full/path/to/video_filename.mov

请勿更新library.yaml - 父Agent会处理此操作，以避免并行执行时出现竞争条件。