Video Processing

Overview

This skill provides guidance for video processing tasks involving frame-level analysis, event detection, and motion tracking using computer vision libraries like OpenCV. It emphasizes verification-first approaches and guards against common pitfalls in video analysis workflows.

Core Approach: Verify Before Implementing

Before writing detection algorithms, establish ground truth understanding of the video content:

Extract and inspect sample frames - Save key frames as images to visually verify what is happening at specific frame numbers
Understand video metadata - Frame count, FPS, duration, resolution
Map expected events to frame ranges - If test data exists, understand what frames correspond to which events
Build diagnostic tools first - Frame extraction and visualization utilities provide critical insight

Workflow for Event Detection Tasks

Phase 1: Video Exploration

python

# Essential first steps for any video analysis task
import cv2

cap = cv2.VideoCapture(video_path)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
fps = cap.get(cv2.CAP_PROP_FPS)
duration = frame_count / fps

print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")

Critical: Extract frames at expected event locations to verify understanding:

python

def save_frame(video_path, frame_num, output_path):
    cap = cv2.VideoCapture(video_path)
    cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
    ret, frame = cap.read()
    if ret:
        cv2.imwrite(output_path, frame)
    cap.release()

# Save frames at expected event times for visual inspection
save_frame("video.mp4", 50, "frame_050.png")
save_frame("video.mp4", 60, "frame_060.png")

Phase 2: Algorithm Development

When developing detection algorithms:

Start simple - Basic frame differencing or thresholding before complex approaches
Use configurable thresholds - Avoid hardcoded magic numbers; derive from data
Test on known frames first - Verify algorithm produces expected results on frames with known ground truth
Log intermediate values - Track metrics at each frame to understand algorithm behavior

Phase 3: Validation

Before finalizing:

Sanity check outputs - Do detected events occur in reasonable order and timing?
Test on multiple videos - Verify generalization across different inputs
Compare against expected ranges - If ground truth exists, verify detection accuracy

Common Detection Approaches

Frame Differencing

Compares frames against a reference (first frame or previous frame) to detect motion:

python

# Background subtraction approach
first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2GRAY)
first_frame = cv2.GaussianBlur(first_frame, (21, 21), 0)

# For each subsequent frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (21, 21), 0)
diff = cv2.absdiff(first_frame, gray)

Pitfall: First frame may not be a suitable reference if scene changes or camera moves.

Contour-Based Detection

Identifies objects by finding contours in thresholded images:

python

_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

Pitfall: Threshold values (e.g., 25) and minimum contour areas are arbitrary without calibration.

Tracking Position Over Time

For detecting events like jumps or gestures, track object position across frames:

python

positions = []  # (frame_num, x, y, area) tuples
for frame_num in range(frame_count):
    # ... detection code ...
    if detected:
        positions.append((frame_num, cx, cy, area))

Pitfall: Coordinate systems matter. In image coordinates, Y increases downward, so "higher in frame" means smaller Y values.

Verification Strategies

1. Visual Inspection

Save frames at detected event times to verify correctness:

python

# After detecting takeoff at frame N
save_frame(video_path, detected_takeoff, "detected_takeoff.png")
save_frame(video_path, detected_takeoff - 5, "before_takeoff.png")
save_frame(video_path, detected_takeoff + 5, "after_takeoff.png")

2. Timing Reasonableness

Check if detected events make temporal sense:

python

duration_seconds = frame_count / fps
event_time = detected_frame / fps

# Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds
if event_time > duration_seconds - 0.5:
    print("WARNING: Event detected very late in video - verify correctness")

3. Sequence Validation

Ensure events occur in logical order:

python

if detected_landing <= detected_takeoff:
    print("ERROR: Landing cannot occur before or at takeoff")

4. Multi-Video Testing

Test on multiple inputs early to catch overfitting to single video characteristics.

Common Pitfalls

1. No Ground Truth Verification

Problem: Relying entirely on computed metrics without visual confirmation.

Solution: Always save and inspect frames at detected event locations.

2. Confirmation Bias in Data Interpretation

Problem: When data shows unexpected patterns, inventing explanations that fit preconceptions rather than questioning assumptions.

Solution: When detection results seem wrong, investigate root causes rather than rationalizing unexpected behavior.

3. Magic Number Thresholds

Problem: Using arbitrary thresholds (500 for contour area, 25 for binary threshold) without empirical basis.

Solution: Derive thresholds from actual video data or make them configurable with sensible defaults.

4. Ignoring Detection Gaps

Problem: When detection fails for a range of frames, assuming this is expected behavior without investigation.

Solution: Investigate why detection fails - it may indicate algorithm flaws rather than expected behavior.

5. Coordinate System Confusion

Problem: Misinterpreting Y coordinates (smaller Y = higher in frame in image coordinates).

Solution: Explicitly document coordinate system assumptions and verify with visual inspection.

6. Ignoring Timing Reasonableness

Problem: Accepting detections that don't make temporal sense (e.g., event detected in last 0.8 seconds of a 4-second video).

Solution: Implement sanity checks on output timing.

7. Single Video Overfitting

Problem: Algorithm works on one video but fails on others.

Solution: Test on multiple videos early in development.

Output Format Considerations

When outputting results (e.g., to TOML, JSON):

python

import numpy as np

# Convert numpy types to Python native types for serialization
result = {
    "takeoff_frame": int(takeoff_frame),  # Not np.int64
    "landing_frame": int(landing_frame),
}

Debugging Checklist

When detection results are incorrect:

Have I visually inspected frames at the expected event times?
Have I visually inspected frames at my detected event times?
Do my detected times make temporal sense given video duration?
Have I verified my algorithm on frames with known ground truth?
Am I correctly interpreting the coordinate system?
Have I tested on multiple videos?
Are my thresholds derived from data or arbitrary?
When detection fails on some frames, do I understand why?

video-processing

NPX Install

Tags

SKILL.md Content

Video Processing

Overview

Core Approach: Verify Before Implementing

Workflow for Event Detection Tasks

Phase 1: Video Exploration

Phase 2: Algorithm Development

Phase 3: Validation

Common Detection Approaches

Frame Differencing

Contour-Based Detection

Tracking Position Over Time

Verification Strategies

1. Visual Inspection

2. Timing Reasonableness

3. Sequence Validation

4. Multi-Video Testing

Common Pitfalls

1. No Ground Truth Verification

2. Confirmation Bias in Data Interpretation

3. Magic Number Thresholds

4. Ignoring Detection Gaps

5. Coordinate System Confusion

6. Ignoring Timing Reasonableness

7. Single Video Overfitting

Output Format Considerations

Debugging Checklist