Guide for video analysis and frame-level event detection tasks using OpenCV and similar libraries. This skill should be used when detecting events in videos (jumps, movements, gestures), extracting frames, analyzing motion patterns, or implementing computer vision algorithms on video data. It provides verification strategies and helps avoid common pitfalls in video processing workflows.
This skill provides guidance for video processing tasks involving frame-level analysis, event detection, and motion tracking using computer vision libraries like OpenCV. It emphasizes verification-first approaches and guards against common pitfalls in video analysis workflows.
Core Approach: Verify Before Implementing
Before writing detection algorithms, establish ground truth understanding of the video content:
Extract and inspect sample frames - Save key frames as images to visually verify what is happening at specific frame numbers
Understand video metadata - Frame count, FPS, duration, resolution
Map expected events to frame ranges - If test data exists, understand what frames correspond to which events
Build diagnostic tools first - Frame extraction and visualization utilities provide critical insight
Workflow for Event Detection Tasks
Phase 1: Video Exploration
python
# Essential first steps for any video analysis taskimport cv2
cap = cv2.VideoCapture(video_path)frame_count =int(cap.get(cv2.CAP_PROP_FRAME_COUNT))fps = cap.get(cv2.CAP_PROP_FPS)duration = frame_count / fps
print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")
Critical: Extract frames at expected event locations to verify understanding:
python
defsave_frame(video_path, frame_num, output_path): cap = cv2.VideoCapture(video_path) cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num) ret, frame = cap.read()if ret: cv2.imwrite(output_path, frame) cap.release()# Save frames at expected event times for visual inspectionsave_frame("video.mp4",50,"frame_050.png")save_frame("video.mp4",60,"frame_060.png")
Phase 2: Algorithm Development
When developing detection algorithms:
Start simple - Basic frame differencing or thresholding before complex approaches
Use configurable thresholds - Avoid hardcoded magic numbers; derive from data
Test on known frames first - Verify algorithm produces expected results on frames with known ground truth
Log intermediate values - Track metrics at each frame to understand algorithm behavior
Phase 3: Validation
Before finalizing:
Sanity check outputs - Do detected events occur in reasonable order and timing?
Test on multiple videos - Verify generalization across different inputs
Compare against expected ranges - If ground truth exists, verify detection accuracy
Common Detection Approaches
Frame Differencing
Compares frames against a reference (first frame or previous frame) to detect motion:
Pitfall: Coordinate systems matter. In image coordinates, Y increases downward, so "higher in frame" means smaller Y values.
Verification Strategies
1. Visual Inspection
Save frames at detected event times to verify correctness:
python
# After detecting takeoff at frame Nsave_frame(video_path, detected_takeoff,"detected_takeoff.png")save_frame(video_path, detected_takeoff -5,"before_takeoff.png")save_frame(video_path, detected_takeoff +5,"after_takeoff.png")
2. Timing Reasonableness
Check if detected events make temporal sense:
python
duration_seconds = frame_count / fps
event_time = detected_frame / fps
# Example: A jump in a 4-second video shouldn't be detected in the last 0.5 secondsif event_time > duration_seconds -0.5:print("WARNING: Event detected very late in video - verify correctness")
3. Sequence Validation
Ensure events occur in logical order:
python
if detected_landing <= detected_takeoff:print("ERROR: Landing cannot occur before or at takeoff")
4. Multi-Video Testing
Test on multiple inputs early to catch overfitting to single video characteristics.
Common Pitfalls
1. No Ground Truth Verification
Problem: Relying entirely on computed metrics without visual confirmation.
Solution: Always save and inspect frames at detected event locations.
2. Confirmation Bias in Data Interpretation
Problem: When data shows unexpected patterns, inventing explanations that fit preconceptions rather than questioning assumptions.
Solution: When detection results seem wrong, investigate root causes rather than rationalizing unexpected behavior.
3. Magic Number Thresholds
Problem: Using arbitrary thresholds (500 for contour area, 25 for binary threshold) without empirical basis.
Solution: Derive thresholds from actual video data or make them configurable with sensible defaults.
4. Ignoring Detection Gaps
Problem: When detection fails for a range of frames, assuming this is expected behavior without investigation.
Solution: Investigate why detection fails - it may indicate algorithm flaws rather than expected behavior.
5. Coordinate System Confusion
Problem: Misinterpreting Y coordinates (smaller Y = higher in frame in image coordinates).
Solution: Explicitly document coordinate system assumptions and verify with visual inspection.
6. Ignoring Timing Reasonableness
Problem: Accepting detections that don't make temporal sense (e.g., event detected in last 0.8 seconds of a 4-second video).
Solution: Implement sanity checks on output timing.
7. Single Video Overfitting
Problem: Algorithm works on one video but fails on others.
Solution: Test on multiple videos early in development.
Output Format Considerations
When outputting results (e.g., to TOML, JSON):
python
import numpy as np
# Convert numpy types to Python native types for serializationresult ={"takeoff_frame":int(takeoff_frame),# Not np.int64"landing_frame":int(landing_frame),}
Debugging Checklist
When detection results are incorrect:
Have I visually inspected frames at the expected event times?
Have I visually inspected frames at my detected event times?
Do my detected times make temporal sense given video duration?
Have I verified my algorithm on frames with known ground truth?
Am I correctly interpreting the coordinate system?
Have I tested on multiple videos?
Are my thresholds derived from data or arbitrary?
When detection fails on some frames, do I understand why?