video-processing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Video Processing

视频处理

Overview

概述

This skill provides guidance for video processing tasks involving frame-level analysis, event detection, and motion tracking using computer vision libraries like OpenCV. It emphasizes verification-first approaches and guards against common pitfalls in video analysis workflows.
本技能为使用OpenCV等计算机视觉库进行帧级分析、事件检测和运动跟踪的视频处理任务提供指导。它强调“先验证再实现”的方法,帮助规避视频分析工作流中的常见陷阱。

Core Approach: Verify Before Implementing

核心方法:先验证再实现

Before writing detection algorithms, establish ground truth understanding of the video content:
  1. Extract and inspect sample frames - Save key frames as images to visually verify what is happening at specific frame numbers
  2. Understand video metadata - Frame count, FPS, duration, resolution
  3. Map expected events to frame ranges - If test data exists, understand what frames correspond to which events
  4. Build diagnostic tools first - Frame extraction and visualization utilities provide critical insight
在编写检测算法之前,先建立对视频内容的真实情况认知:
  1. 提取并检查样本帧 - 将关键帧保存为图像,以直观验证特定帧编号处发生的情况
  2. 了解视频元数据 - 帧数量、FPS、时长、分辨率
  3. 将预期事件映射到帧范围 - 如果有测试数据,明确哪些帧对应哪些事件
  4. 先构建诊断工具 - 帧提取和可视化工具能提供关键洞察

Workflow for Event Detection Tasks

事件检测任务工作流

Phase 1: Video Exploration

阶段1:视频探索

python
undefined
python
undefined

Essential first steps for any video analysis task

Essential first steps for any video analysis task

import cv2
cap = cv2.VideoCapture(video_path) frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) fps = cap.get(cv2.CAP_PROP_FPS) duration = frame_count / fps
print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")

**Critical**: Extract frames at expected event locations to verify understanding:

```python
def save_frame(video_path, frame_num, output_path):
    cap = cv2.VideoCapture(video_path)
    cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
    ret, frame = cap.read()
    if ret:
        cv2.imwrite(output_path, frame)
    cap.release()
import cv2
cap = cv2.VideoCapture(video_path) frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) fps = cap.get(cv2.CAP_PROP_FPS) duration = frame_count / fps
print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")

**关键步骤**:在预期事件发生的位置提取帧,以验证你的理解:

```python
def save_frame(video_path, frame_num, output_path):
    cap = cv2.VideoCapture(video_path)
    cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
    ret, frame = cap.read()
    if ret:
        cv2.imwrite(output_path, frame)
    cap.release()

Save frames at expected event times for visual inspection

Save frames at expected event times for visual inspection

save_frame("video.mp4", 50, "frame_050.png") save_frame("video.mp4", 60, "frame_060.png")
undefined
save_frame("video.mp4", 50, "frame_050.png") save_frame("video.mp4", 60, "frame_060.png")
undefined

Phase 2: Algorithm Development

阶段2:算法开发

When developing detection algorithms:
  1. Start simple - Basic frame differencing or thresholding before complex approaches
  2. Use configurable thresholds - Avoid hardcoded magic numbers; derive from data
  3. Test on known frames first - Verify algorithm produces expected results on frames with known ground truth
  4. Log intermediate values - Track metrics at each frame to understand algorithm behavior
开发检测算法时:
  1. 从简单开始 - 在使用复杂方法之前,先尝试基本的帧差分或阈值处理
  2. 使用可配置的阈值 - 避免硬编码的“魔法数字”;从数据中推导阈值
  3. 先在已知帧上测试 - 在有真实标注的帧上验证算法是否产生预期结果
  4. 记录中间值 - 跟踪每一帧的指标,以了解算法行为

Phase 3: Validation

阶段3:验证

Before finalizing:
  1. Sanity check outputs - Do detected events occur in reasonable order and timing?
  2. Test on multiple videos - Verify generalization across different inputs
  3. Compare against expected ranges - If ground truth exists, verify detection accuracy
最终确定之前:
  1. 合理性检查输出 - 检测到的事件是否在合理的顺序和时间点发生?
  2. 在多个视频上测试 - 验证算法在不同输入上的泛化能力
  3. 与预期范围对比 - 如果有真实标注,验证检测准确率

Common Detection Approaches

常见检测方法

Frame Differencing

帧差分法

Compares frames against a reference (first frame or previous frame) to detect motion:
python
undefined
将帧与参考帧(第一帧或前一帧)进行比较以检测运动:
python
undefined

Background subtraction approach

Background subtraction approach

first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2GRAY) first_frame = cv2.GaussianBlur(first_frame, (21, 21), 0)
first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2GRAY) first_frame = cv2.GaussianBlur(first_frame, (21, 21), 0)

For each subsequent frame

For each subsequent frame

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) gray = cv2.GaussianBlur(gray, (21, 21), 0) diff = cv2.absdiff(first_frame, gray)

**Pitfall**: First frame may not be a suitable reference if scene changes or camera moves.
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) gray = cv2.GaussianBlur(gray, (21, 21), 0) diff = cv2.absdiff(first_frame, gray)

**陷阱**:如果场景变化或相机移动,第一帧可能不是合适的参考帧。

Contour-Based Detection

基于轮廓的检测

Identifies objects by finding contours in thresholded images:
python
_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
Pitfall: Threshold values (e.g., 25) and minimum contour areas are arbitrary without calibration.
通过在阈值化图像中查找轮廓来识别物体:
python
_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
陷阱:阈值(如25)和最小轮廓面积在未校准的情况下是随意设定的。

Tracking Position Over Time

随时间跟踪位置

For detecting events like jumps or gestures, track object position across frames:
python
positions = []  # (frame_num, x, y, area) tuples
for frame_num in range(frame_count):
    # ... detection code ...
    if detected:
        positions.append((frame_num, cx, cy, area))
Pitfall: Coordinate systems matter. In image coordinates, Y increases downward, so "higher in frame" means smaller Y values.
为了检测跳跃或手势等事件,跟踪物体在各帧中的位置:
python
positions = []  # (frame_num, x, y, area) tuples
for frame_num in range(frame_count):
    # ... detection code ...
    if detected:
        positions.append((frame_num, cx, cy, area))
陷阱:坐标系很重要。在图像坐标系中,Y值越小表示在帧中的位置越高,因此“帧中位置更高”意味着Y值更小。

Verification Strategies

验证策略

1. Visual Inspection

1. 视觉检查

Save frames at detected event times to verify correctness:
python
undefined
在检测到事件的时间点保存帧,以验证正确性:
python
undefined

After detecting takeoff at frame N

After detecting takeoff at frame N

save_frame(video_path, detected_takeoff, "detected_takeoff.png") save_frame(video_path, detected_takeoff - 5, "before_takeoff.png") save_frame(video_path, detected_takeoff + 5, "after_takeoff.png")
undefined
save_frame(video_path, detected_takeoff, "detected_takeoff.png") save_frame(video_path, detected_takeoff - 5, "before_takeoff.png") save_frame(video_path, detected_takeoff + 5, "after_takeoff.png")
undefined

2. Timing Reasonableness

2. 时间合理性检查

Check if detected events make temporal sense:
python
duration_seconds = frame_count / fps
event_time = detected_frame / fps
检查检测到的事件在时间上是否合理:
python
duration_seconds = frame_count / fps
event_time = detected_frame / fps

Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds

Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds

if event_time > duration_seconds - 0.5: print("WARNING: Event detected very late in video - verify correctness")
undefined
if event_time > duration_seconds - 0.5: print("WARNING: Event detected very late in video - verify correctness")
undefined

3. Sequence Validation

3. 序列验证

Ensure events occur in logical order:
python
if detected_landing <= detected_takeoff:
    print("ERROR: Landing cannot occur before or at takeoff")
确保事件按逻辑顺序发生:
python
if detected_landing <= detected_takeoff:
    print("ERROR: Landing cannot occur before or at takeoff")

4. Multi-Video Testing

4. 多视频测试

Test on multiple inputs early to catch overfitting to single video characteristics.
尽早在多个输入上测试,以避免过拟合到单个视频的特征。

Common Pitfalls

常见陷阱

1. No Ground Truth Verification

1. 未进行真实标注验证

Problem: Relying entirely on computed metrics without visual confirmation.
Solution: Always save and inspect frames at detected event locations.
问题:完全依赖计算指标,而不进行视觉确认。
解决方案:务必保存并检查检测到事件位置的帧。

2. Confirmation Bias in Data Interpretation

2. 数据解读中的确认偏差

Problem: When data shows unexpected patterns, inventing explanations that fit preconceptions rather than questioning assumptions.
Solution: When detection results seem wrong, investigate root causes rather than rationalizing unexpected behavior.
问题:当数据显示意外模式时,编造符合先入为主观念的解释,而非质疑假设。
解决方案:当检测结果看似错误时,调查根本原因,而非合理化意外行为。

3. Magic Number Thresholds

3. 魔法数字阈值

Problem: Using arbitrary thresholds (500 for contour area, 25 for binary threshold) without empirical basis.
Solution: Derive thresholds from actual video data or make them configurable with sensible defaults.
问题:使用随意设定的阈值(如轮廓面积500、二值阈值25),而没有实证依据。
解决方案:从实际视频数据中推导阈值,或使其可配置并设置合理默认值。

4. Ignoring Detection Gaps

4. 忽略检测间隙

Problem: When detection fails for a range of frames, assuming this is expected behavior without investigation.
Solution: Investigate why detection fails - it may indicate algorithm flaws rather than expected behavior.
问题:当检测在某一帧范围内失败时,假设这是预期行为而不进行调查。
解决方案:调查检测失败的原因——这可能表明算法存在缺陷,而非预期行为。

5. Coordinate System Confusion

5. 坐标系混淆

Problem: Misinterpreting Y coordinates (smaller Y = higher in frame in image coordinates).
Solution: Explicitly document coordinate system assumptions and verify with visual inspection.
问题:误解Y坐标(在图像坐标系中,Y值越小表示在帧中的位置越高)。
解决方案:明确记录坐标系假设,并通过视觉检查进行验证。

6. Ignoring Timing Reasonableness

6. 忽略时间合理性

Problem: Accepting detections that don't make temporal sense (e.g., event detected in last 0.8 seconds of a 4-second video).
Solution: Implement sanity checks on output timing.
问题:接受在时间上不合理的检测结果(如在4秒视频的最后0.8秒检测到事件)。
解决方案:对输出时间实施合理性检查。

7. Single Video Overfitting

7. 单视频过拟合

Problem: Algorithm works on one video but fails on others.
Solution: Test on multiple videos early in development.
问题:算法在一个视频上有效,但在其他视频上失效。
解决方案:在开发早期就在多个视频上进行测试。

Output Format Considerations

输出格式注意事项

When outputting results (e.g., to TOML, JSON):
python
import numpy as np
输出结果时(如保存为TOML、JSON):
python
import numpy as np

Convert numpy types to Python native types for serialization

Convert numpy types to Python native types for serialization

result = { "takeoff_frame": int(takeoff_frame), # Not np.int64 "landing_frame": int(landing_frame), }
undefined
result = { "takeoff_frame": int(takeoff_frame), # Not np.int64 "landing_frame": int(landing_frame), }
undefined

Debugging Checklist

调试检查表

When detection results are incorrect:
  1. Have I visually inspected frames at the expected event times?
  2. Have I visually inspected frames at my detected event times?
  3. Do my detected times make temporal sense given video duration?
  4. Have I verified my algorithm on frames with known ground truth?
  5. Am I correctly interpreting the coordinate system?
  6. Have I tested on multiple videos?
  7. Are my thresholds derived from data or arbitrary?
  8. When detection fails on some frames, do I understand why?
当检测结果不正确时:
  1. 我是否已视觉检查预期事件时间点的帧?
  2. 我是否已视觉检查检测到事件时间点的帧?
  3. 考虑到视频时长,检测到的时间在时间上是否合理?
  4. 我是否已在有真实标注的帧上验证算法?
  5. 我是否正确解读了坐标系?
  6. 我是否已在多个视频上测试?
  7. 我的阈值是从数据中推导的还是随意设定的?
  8. 当检测在某些帧上失败时,我是否了解原因?