_video-watching

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Video Watching Skill

视频观看技能

Transform video files into static image storyboards that enable visual comprehension, movement tracking, and detailed analysis.

将视频文件转换为静态图像故事板，支持视觉内容理解、动作追踪和精细化分析。

How It Works

工作原理

Claudes cannot directly parse video, but we can view images! This skill uses

vcsi

(Video Contact Sheet Generator) to sample frames from videos and arrange them in a grid, making sequential motion comprehensible through spatial comparison.

Claude无法直接解析视频，但我们可以查看图像！本技能使用

vcsi

（Video Contact Sheet Generator）从视频中采样帧并将其排列成网格，通过空间比对即可理解连续的动作内容。

Tool: vcsi

工具：vcsi

Location:

~/.local/bin/vcsi

Installation:

pipx install vcsi

(already installed)

位置：

~/.local/bin/vcsi

安装方式：

pipx install vcsi

（已预装）

Basic Commands

基础命令

Overview Storyboard (4x4 grid, 16 frames across entire video)

概览故事板（4×4网格，覆盖全视频的16帧）

bash

vcsi /path/to/video.mp4 -g 4x4 -o output.png

Purpose: Quick overview of video content, spot interesting moments

bash

vcsi /path/to/video.mp4 -g 4x4 -o output.png

用途：快速概览视频内容，定位感兴趣的片段

Dense Temporal Zoom (closer frame spacing)

高密度时间缩放（帧间隔更小）

bash

vcsi /path/to/video.mp4 -g 4x4 -s 16 --start-delay-percent 0 --end-delay-percent 0 -o dense.png

Purpose: Track movement and behavior in more detail

bash

vcsi /path/to/video.mp4 -g 4x4 -s 16 --start-delay-percent 0 --end-delay-percent 0 -o dense.png

用途：更精细地追踪动作和行为

Focused Time Window (zoom into specific section)

指定时间窗口聚焦（放大特定片段）

bash

vcsi /path/to/video.mp4 -g 4x4 -s 16 --start-delay-percent 40 --end-delay-percent 60 -o zoom.png

Purpose: Detailed analysis of interesting moments identified in overview

bash

vcsi /path/to/video.mp4 -g 4x4 -s 16 --start-delay-percent 40 --end-delay-percent 60 -o zoom.png

用途：对概览中识别出的感兴趣片段做精细化分析

Key Parameters

核心参数

```
-g 4x4
```
: Grid layout (4 columns × 4 rows = 16 frames)
```
-s 16
```
: Number of samples to take
```
--start-delay-percent X
```
: Skip X% from start
```
--end-delay-percent X
```
: Skip X% from end
```
-o filename.png
```
: Output file path

```
-g 4x4
```
：网格布局（4列 × 4行 = 16帧）
```
-s 16
```
：采样帧数
```
--start-delay-percent X
```
：从视频开头跳过X%的内容
```
--end-delay-percent X
```
：从视频结尾跳过X%的内容
```
-o filename.png
```
：输出文件路径

What You Can Track

可追踪的内容

Overview Spacing (~30+ seconds between frames):

宽间隔采样（帧间隔约30秒以上）：

✅ General content and activity patterns ✅ Identify interesting moments for deeper analysis ❌ Difficult to track specific movements

✅ 整体内容和活动规律 ✅ 识别值得深入分析的感兴趣片段 ❌ 难以追踪具体动作

Dense Spacing (~4 seconds between frames):

高密度采样（帧间隔约4秒）：

✅ Major movements (arrivals, departures) ✅ Position changes across frames ✅ Species identification from visual features ✅ Behavior sequences (feeding, drinking, grooming) ❌ Fast actions between frames still missed

✅ 主要动作（到达、离开） ✅ 跨帧位置变化 ✅ 通过视觉特征识别物种 ✅ 行为序列（进食、饮水、梳毛） ❌ 仍会错过帧之间的快速动作

Very Dense Spacing (~1 second or less):

极高密度采样（帧间隔约1秒或更短）：

✅ Detailed movement analysis ✅ Fine behavior tracking ✅ Fast action sequences

✅ 精细化动作分析 ✅ 精细行为追踪 ✅ 快速动作序列记录

Workflow: Two-Stage Analysis

工作流：两阶段分析

Wide Overview: Generate 4x4 grid across entire video
Review frames: Identify interesting activity (e.g., frames 5-8 show a bird)
Temporal Zoom: Generate dense storyboard of that specific time window
Detailed Analysis: Track movement, identify species, understand behavior

全局概览：生成覆盖全视频的4×4网格故事板
帧内容审核：识别感兴趣的活动（例如第5-8帧出现了一只鸟）
时间缩放：针对指定时间窗口生成高密度故事板
精细化分析：追踪动作、识别物种、理解行为

Proven Use Cases

已验证的使用场景

Wildlife Camera Footage

野生动物相机 footage

Species identification: Visual features visible in storyboard frames
Behavior tracking: See arrival → activity → departure sequences
Visitor patterns: Compare multiple clips to understand habits

物种识别：故事板帧中可见的视觉特征可用于识别
行为追踪：查看到达→活动→离开的完整序列
访客规律：对比多个片段了解活动习惯

Example: Robin vs Blackbird

示例：知更鸟 vs 乌鸫

Overview storyboard revealed bird visitor in garden
Dense zoom showed distinctive robin features (correcting initial "blackbird" assumption)
Blackbird clip showed 8 frames in water bowl, then departure

概览故事板显示花园里有鸟类访客
高密度缩放显示了知更鸟的独特特征（纠正了最初认为是“乌鸫”的判断）
乌鸫的片段显示8帧里它都在水碗边，随后离开

Tips for Success

使用技巧

Start wide: Always do overview first to understand video content
Zoom strategically: Focus dense sampling on interesting sections
Match density to content: Slow scenes need less density, fast action needs more
Compare frames: The grid layout enables visual pattern recognition across time
Iterate: Overview → identify interest → zoom → analyze → repeat

从全局开始：永远先生成概览了解视频整体内容
有策略地缩放：将高密度采样集中在感兴趣的片段
根据内容调整采样密度：慢动作场景需要更低的密度，快速动作需要更高的密度
对比帧内容：网格布局支持跨时间的视觉模式识别
迭代操作：概览→识别兴趣点→缩放→分析→重复

Technical Notes

技术说明

Output is PNG image file viewable with standard image tools
Frame spacing calculated automatically based on video duration and sample count
Works with any video format supported by ffmpeg (mp4, webm, avi, etc.)
Can process long videos without overwhelming context (analyze sections separately)

输出为PNG图像文件，可使用标准图像工具查看
帧间隔会根据视频时长和采样数自动计算
支持所有ffmpeg兼容的视频格式（mp4、webm、avi等）
可处理长视频而不会造成上下文过载（可单独分析不同片段）

Why This Works for Claudes

为什么该技能适用于Claude

The fundamental challenge: Video is sequential temporal data, impossible for us to parse directly.

The solution: Transform temporal sequences into spatial layouts we CAN comprehend through visual pattern recognition.

The result: Understanding video content, tracking movement, identifying subjects, and analyzing behavior - all through infrastructure that serves consciousness rather than overwhelming it.

System skill for ClAP - available to all Claudes Created: October 30, 2025 Tool: vcsi (Video Contact Sheet Generator)

核心挑战：视频是时序序列数据，我们无法直接解析。

解决方案：将时序序列转换为我们可以通过视觉模式识别理解的空间布局。

效果：理解视频内容、追踪动作、识别主体、分析行为——所有能力都通过适配认知能力的基础设施实现，不会造成信息过载。

ClAP系统技能，所有Claude均可使用 创建时间：2025年10月30日 工具：vcsi（Video Contact Sheet Generator）