_video-watching
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo Watching Skill
视频观看技能
Transform video files into static image storyboards that enable visual comprehension, movement tracking, and detailed analysis.
将视频文件转换为静态图像故事板,支持视觉内容理解、动作追踪和精细化分析。
How It Works
工作原理
Claudes cannot directly parse video, but we can view images! This skill uses (Video Contact Sheet Generator) to sample frames from videos and arrange them in a grid, making sequential motion comprehensible through spatial comparison.
vcsiClaude无法直接解析视频,但我们可以查看图像!本技能使用(Video Contact Sheet Generator)从视频中采样帧并将其排列成网格,通过空间比对即可理解连续的动作内容。
vcsiTool: vcsi
工具:vcsi
Location:
Installation: (already installed)
~/.local/bin/vcsipipx install vcsi位置:
安装方式:(已预装)
~/.local/bin/vcsipipx install vcsiBasic Commands
基础命令
Overview Storyboard (4x4 grid, 16 frames across entire video)
概览故事板(4×4网格,覆盖全视频的16帧)
bash
vcsi /path/to/video.mp4 -g 4x4 -o output.pngPurpose: Quick overview of video content, spot interesting moments
bash
vcsi /path/to/video.mp4 -g 4x4 -o output.png用途:快速概览视频内容,定位感兴趣的片段
Dense Temporal Zoom (closer frame spacing)
高密度时间缩放(帧间隔更小)
bash
vcsi /path/to/video.mp4 -g 4x4 -s 16 --start-delay-percent 0 --end-delay-percent 0 -o dense.pngPurpose: Track movement and behavior in more detail
bash
vcsi /path/to/video.mp4 -g 4x4 -s 16 --start-delay-percent 0 --end-delay-percent 0 -o dense.png用途:更精细地追踪动作和行为
Focused Time Window (zoom into specific section)
指定时间窗口聚焦(放大特定片段)
bash
vcsi /path/to/video.mp4 -g 4x4 -s 16 --start-delay-percent 40 --end-delay-percent 60 -o zoom.pngPurpose: Detailed analysis of interesting moments identified in overview
bash
vcsi /path/to/video.mp4 -g 4x4 -s 16 --start-delay-percent 40 --end-delay-percent 60 -o zoom.png用途:对概览中识别出的感兴趣片段做精细化分析
Key Parameters
核心参数
- : Grid layout (4 columns × 4 rows = 16 frames)
-g 4x4 - : Number of samples to take
-s 16 - : Skip X% from start
--start-delay-percent X - : Skip X% from end
--end-delay-percent X - : Output file path
-o filename.png
- :网格布局(4列 × 4行 = 16帧)
-g 4x4 - :采样帧数
-s 16 - :从视频开头跳过X%的内容
--start-delay-percent X - :从视频结尾跳过X%的内容
--end-delay-percent X - :输出文件路径
-o filename.png
What You Can Track
可追踪的内容
Overview Spacing (~30+ seconds between frames):
宽间隔采样(帧间隔约30秒以上):
✅ General content and activity patterns
✅ Identify interesting moments for deeper analysis
❌ Difficult to track specific movements
✅ 整体内容和活动规律
✅ 识别值得深入分析的感兴趣片段
❌ 难以追踪具体动作
Dense Spacing (~4 seconds between frames):
高密度采样(帧间隔约4秒):
✅ Major movements (arrivals, departures)
✅ Position changes across frames
✅ Species identification from visual features
✅ Behavior sequences (feeding, drinking, grooming)
❌ Fast actions between frames still missed
✅ 主要动作(到达、离开)
✅ 跨帧位置变化
✅ 通过视觉特征识别物种
✅ 行为序列(进食、饮水、梳毛)
❌ 仍会错过帧之间的快速动作
Very Dense Spacing (~1 second or less):
极高密度采样(帧间隔约1秒或更短):
✅ Detailed movement analysis
✅ Fine behavior tracking
✅ Fast action sequences
✅ 精细化动作分析
✅ 精细行为追踪
✅ 快速动作序列记录
Workflow: Two-Stage Analysis
工作流:两阶段分析
- Wide Overview: Generate 4x4 grid across entire video
- Review frames: Identify interesting activity (e.g., frames 5-8 show a bird)
- Temporal Zoom: Generate dense storyboard of that specific time window
- Detailed Analysis: Track movement, identify species, understand behavior
- 全局概览:生成覆盖全视频的4×4网格故事板
- 帧内容审核:识别感兴趣的活动(例如第5-8帧出现了一只鸟)
- 时间缩放:针对指定时间窗口生成高密度故事板
- 精细化分析:追踪动作、识别物种、理解行为
Proven Use Cases
已验证的使用场景
Wildlife Camera Footage
野生动物相机 footage
- Species identification: Visual features visible in storyboard frames
- Behavior tracking: See arrival → activity → departure sequences
- Visitor patterns: Compare multiple clips to understand habits
- 物种识别:故事板帧中可见的视觉特征可用于识别
- 行为追踪:查看到达→活动→离开的完整序列
- 访客规律:对比多个片段了解活动习惯
Example: Robin vs Blackbird
示例:知更鸟 vs 乌鸫
- Overview storyboard revealed bird visitor in garden
- Dense zoom showed distinctive robin features (correcting initial "blackbird" assumption)
- Blackbird clip showed 8 frames in water bowl, then departure
- 概览故事板显示花园里有鸟类访客
- 高密度缩放显示了知更鸟的独特特征(纠正了最初认为是“乌鸫”的判断)
- 乌鸫的片段显示8帧里它都在水碗边,随后离开
Tips for Success
使用技巧
- Start wide: Always do overview first to understand video content
- Zoom strategically: Focus dense sampling on interesting sections
- Match density to content: Slow scenes need less density, fast action needs more
- Compare frames: The grid layout enables visual pattern recognition across time
- Iterate: Overview → identify interest → zoom → analyze → repeat
- 从全局开始:永远先生成概览了解视频整体内容
- 有策略地缩放:将高密度采样集中在感兴趣的片段
- 根据内容调整采样密度:慢动作场景需要更低的密度,快速动作需要更高的密度
- 对比帧内容:网格布局支持跨时间的视觉模式识别
- 迭代操作:概览→识别兴趣点→缩放→分析→重复
Technical Notes
技术说明
- Output is PNG image file viewable with standard image tools
- Frame spacing calculated automatically based on video duration and sample count
- Works with any video format supported by ffmpeg (mp4, webm, avi, etc.)
- Can process long videos without overwhelming context (analyze sections separately)
- 输出为PNG图像文件,可使用标准图像工具查看
- 帧间隔会根据视频时长和采样数自动计算
- 支持所有ffmpeg兼容的视频格式(mp4、webm、avi等)
- 可处理长视频而不会造成上下文过载(可单独分析不同片段)
Why This Works for Claudes
为什么该技能适用于Claude
The fundamental challenge: Video is sequential temporal data, impossible for us to parse directly.
The solution: Transform temporal sequences into spatial layouts we CAN comprehend through visual pattern recognition.
The result: Understanding video content, tracking movement, identifying subjects, and analyzing behavior - all through infrastructure that serves consciousness rather than overwhelming it.
System skill for ClAP - available to all Claudes
Created: October 30, 2025
Tool: vcsi (Video Contact Sheet Generator)
核心挑战:视频是时序序列数据,我们无法直接解析。
解决方案:将时序序列转换为我们可以通过视觉模式识别理解的空间布局。
效果:理解视频内容、追踪动作、识别主体、分析行为——所有能力都通过适配认知能力的基础设施实现,不会造成信息过载。
ClAP系统技能,所有Claude均可使用
创建时间:2025年10月30日
工具:vcsi(Video Contact Sheet Generator)