video-processing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Video Processing

视频处理

Overview

概述

This skill provides guidance for video processing tasks involving frame-level analysis, event detection, and motion tracking using computer vision libraries like OpenCV. It emphasizes verification-first approaches and guards against common pitfalls in video analysis workflows.

本技能为使用OpenCV等计算机视觉库进行帧级分析、事件检测和运动跟踪的视频处理任务提供指导。它强调“先验证再实现”的方法，帮助规避视频分析工作流中的常见陷阱。

Core Approach: Verify Before Implementing

核心方法：先验证再实现

Before writing detection algorithms, establish ground truth understanding of the video content:

Extract and inspect sample frames - Save key frames as images to visually verify what is happening at specific frame numbers
Understand video metadata - Frame count, FPS, duration, resolution
Map expected events to frame ranges - If test data exists, understand what frames correspond to which events
Build diagnostic tools first - Frame extraction and visualization utilities provide critical insight

在编写检测算法之前，先建立对视频内容的真实情况认知：

提取并检查样本帧 - 将关键帧保存为图像，以直观验证特定帧编号处发生的情况
了解视频元数据 - 帧数量、FPS、时长、分辨率
将预期事件映射到帧范围 - 如果有测试数据，明确哪些帧对应哪些事件
先构建诊断工具 - 帧提取和可视化工具能提供关键洞察

Workflow for Event Detection Tasks

事件检测任务工作流

Phase 1: Video Exploration

阶段1：视频探索

python

undefined

python

undefined

Essential first steps for any video analysis task

import cv2

cap = cv2.VideoCapture(video_path) frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) fps = cap.get(cv2.CAP_PROP_FPS) duration = frame_count / fps

print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")


**Critical**: Extract frames at expected event locations to verify understanding:

```python
def save_frame(video_path, frame_num, output_path):
    cap = cv2.VideoCapture(video_path)
    cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
    ret, frame = cap.read()
    if ret:
        cv2.imwrite(output_path, frame)
    cap.release()

import cv2

cap = cv2.VideoCapture(video_path) frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) fps = cap.get(cv2.CAP_PROP_FPS) duration = frame_count / fps

print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")


**关键步骤**：在预期事件发生的位置提取帧，以验证你的理解：

```python
def save_frame(video_path, frame_num, output_path):
    cap = cv2.VideoCapture(video_path)
    cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
    ret, frame = cap.read()
    if ret:
        cv2.imwrite(output_path, frame)
    cap.release()

Save frames at expected event times for visual inspection

save_frame("video.mp4", 50, "frame_050.png") save_frame("video.mp4", 60, "frame_060.png")

undefined

save_frame("video.mp4", 50, "frame_050.png") save_frame("video.mp4", 60, "frame_060.png")

undefined

Phase 2: Algorithm Development

阶段2：算法开发

When developing detection algorithms:

Start simple - Basic frame differencing or thresholding before complex approaches
Use configurable thresholds - Avoid hardcoded magic numbers; derive from data
Test on known frames first - Verify algorithm produces expected results on frames with known ground truth
Log intermediate values - Track metrics at each frame to understand algorithm behavior

开发检测算法时：

从简单开始 - 在使用复杂方法之前，先尝试基本的帧差分或阈值处理
使用可配置的阈值 - 避免硬编码的“魔法数字”；从数据中推导阈值
先在已知帧上测试 - 在有真实标注的帧上验证算法是否产生预期结果
记录中间值 - 跟踪每一帧的指标，以了解算法行为

Phase 3: Validation

阶段3：验证

Before finalizing:

Sanity check outputs - Do detected events occur in reasonable order and timing?
Test on multiple videos - Verify generalization across different inputs
Compare against expected ranges - If ground truth exists, verify detection accuracy

最终确定之前：

合理性检查输出 - 检测到的事件是否在合理的顺序和时间点发生？
在多个视频上测试 - 验证算法在不同输入上的泛化能力
与预期范围对比 - 如果有真实标注，验证检测准确率

Common Detection Approaches

常见检测方法

Frame Differencing

帧差分法

Compares frames against a reference (first frame or previous frame) to detect motion:

python

undefined

将帧与参考帧（第一帧或前一帧）进行比较以检测运动：

python

undefined

Background subtraction approach

first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2GRAY) first_frame = cv2.GaussianBlur(first_frame, (21, 21), 0)

For each subsequent frame

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) gray = cv2.GaussianBlur(gray, (21, 21), 0) diff = cv2.absdiff(first_frame, gray)


**Pitfall**: First frame may not be a suitable reference if scene changes or camera moves.

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) gray = cv2.GaussianBlur(gray, (21, 21), 0) diff = cv2.absdiff(first_frame, gray)


**陷阱**：如果场景变化或相机移动，第一帧可能不是合适的参考帧。

Contour-Based Detection

基于轮廓的检测

Identifies objects by finding contours in thresholded images:

python

_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

Pitfall: Threshold values (e.g., 25) and minimum contour areas are arbitrary without calibration.

通过在阈值化图像中查找轮廓来识别物体：

python

_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

陷阱：阈值（如25）和最小轮廓面积在未校准的情况下是随意设定的。

Tracking Position Over Time

随时间跟踪位置

For detecting events like jumps or gestures, track object position across frames:

python

positions = []  # (frame_num, x, y, area) tuples
for frame_num in range(frame_count):
    # ... detection code ...
    if detected:
        positions.append((frame_num, cx, cy, area))

Pitfall: Coordinate systems matter. In image coordinates, Y increases downward, so "higher in frame" means smaller Y values.

为了检测跳跃或手势等事件，跟踪物体在各帧中的位置：

python

positions = []  # (frame_num, x, y, area) tuples
for frame_num in range(frame_count):
    # ... detection code ...
    if detected:
        positions.append((frame_num, cx, cy, area))

陷阱：坐标系很重要。在图像坐标系中，Y值越小表示在帧中的位置越高，因此“帧中位置更高”意味着Y值更小。

Verification Strategies

验证策略

1. Visual Inspection

1. 视觉检查

Save frames at detected event times to verify correctness:

python

undefined

在检测到事件的时间点保存帧，以验证正确性：

python

undefined

After detecting takeoff at frame N

save_frame(video_path, detected_takeoff, "detected_takeoff.png") save_frame(video_path, detected_takeoff - 5, "before_takeoff.png") save_frame(video_path, detected_takeoff + 5, "after_takeoff.png")

undefined

save_frame(video_path, detected_takeoff, "detected_takeoff.png") save_frame(video_path, detected_takeoff - 5, "before_takeoff.png") save_frame(video_path, detected_takeoff + 5, "after_takeoff.png")

undefined

2. Timing Reasonableness

2. 时间合理性检查

Check if detected events make temporal sense:

python

duration_seconds = frame_count / fps
event_time = detected_frame / fps

检查检测到的事件在时间上是否合理：

python

duration_seconds = frame_count / fps
event_time = detected_frame / fps

Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds

if event_time > duration_seconds - 0.5: print("WARNING: Event detected very late in video - verify correctness")

undefined

if event_time > duration_seconds - 0.5: print("WARNING: Event detected very late in video - verify correctness")

undefined

3. Sequence Validation

3. 序列验证

Ensure events occur in logical order:

python

if detected_landing <= detected_takeoff:
    print("ERROR: Landing cannot occur before or at takeoff")

确保事件按逻辑顺序发生：

python

if detected_landing <= detected_takeoff:
    print("ERROR: Landing cannot occur before or at takeoff")

4. Multi-Video Testing

4. 多视频测试

Test on multiple inputs early to catch overfitting to single video characteristics.

尽早在多个输入上测试，以避免过拟合到单个视频的特征。

Common Pitfalls

常见陷阱

1. No Ground Truth Verification

1. 未进行真实标注验证

Problem: Relying entirely on computed metrics without visual confirmation.

Solution: Always save and inspect frames at detected event locations.

问题：完全依赖计算指标，而不进行视觉确认。

解决方案：务必保存并检查检测到事件位置的帧。

2. Confirmation Bias in Data Interpretation

2. 数据解读中的确认偏差

Problem: When data shows unexpected patterns, inventing explanations that fit preconceptions rather than questioning assumptions.

Solution: When detection results seem wrong, investigate root causes rather than rationalizing unexpected behavior.

问题：当数据显示意外模式时，编造符合先入为主观念的解释，而非质疑假设。

解决方案：当检测结果看似错误时，调查根本原因，而非合理化意外行为。

3. Magic Number Thresholds

3. 魔法数字阈值

Problem: Using arbitrary thresholds (500 for contour area, 25 for binary threshold) without empirical basis.

Solution: Derive thresholds from actual video data or make them configurable with sensible defaults.

问题：使用随意设定的阈值（如轮廓面积500、二值阈值25），而没有实证依据。

解决方案：从实际视频数据中推导阈值，或使其可配置并设置合理默认值。

4. Ignoring Detection Gaps

4. 忽略检测间隙

Problem: When detection fails for a range of frames, assuming this is expected behavior without investigation.

Solution: Investigate why detection fails - it may indicate algorithm flaws rather than expected behavior.

问题：当检测在某一帧范围内失败时，假设这是预期行为而不进行调查。

解决方案：调查检测失败的原因——这可能表明算法存在缺陷，而非预期行为。

5. Coordinate System Confusion

5. 坐标系混淆

Problem: Misinterpreting Y coordinates (smaller Y = higher in frame in image coordinates).

Solution: Explicitly document coordinate system assumptions and verify with visual inspection.

问题：误解Y坐标（在图像坐标系中，Y值越小表示在帧中的位置越高）。

解决方案：明确记录坐标系假设，并通过视觉检查进行验证。

6. Ignoring Timing Reasonableness

6. 忽略时间合理性

Problem: Accepting detections that don't make temporal sense (e.g., event detected in last 0.8 seconds of a 4-second video).

Solution: Implement sanity checks on output timing.

问题：接受在时间上不合理的检测结果（如在4秒视频的最后0.8秒检测到事件）。

解决方案：对输出时间实施合理性检查。

7. Single Video Overfitting

7. 单视频过拟合

Problem: Algorithm works on one video but fails on others.

Solution: Test on multiple videos early in development.

问题：算法在一个视频上有效，但在其他视频上失效。

解决方案：在开发早期就在多个视频上进行测试。

Output Format Considerations

输出格式注意事项

When outputting results (e.g., to TOML, JSON):

python

import numpy as np

输出结果时（如保存为TOML、JSON）：

python

import numpy as np

Convert numpy types to Python native types for serialization

result = { "takeoff_frame": int(takeoff_frame), # Not np.int64 "landing_frame": int(landing_frame), }

undefined

result = { "takeoff_frame": int(takeoff_frame), # Not np.int64 "landing_frame": int(landing_frame), }

undefined

video-processing

Original

Translation

Video Processing

视频处理

Overview

概述

Core Approach: Verify Before Implementing

核心方法：先验证再实现

Workflow for Event Detection Tasks

事件检测任务工作流

Phase 1: Video Exploration

阶段1：视频探索

Essential first steps for any video analysis task

Essential first steps for any video analysis task

Save frames at expected event times for visual inspection

Save frames at expected event times for visual inspection

Phase 2: Algorithm Development

阶段2：算法开发

Phase 3: Validation

阶段3：验证

Common Detection Approaches

常见检测方法

Frame Differencing

帧差分法

Background subtraction approach

Background subtraction approach

For each subsequent frame

For each subsequent frame

Contour-Based Detection

基于轮廓的检测

Tracking Position Over Time

随时间跟踪位置

Verification Strategies

验证策略

1. Visual Inspection

1. 视觉检查

After detecting takeoff at frame N

After detecting takeoff at frame N

2. Timing Reasonableness

2. 时间合理性检查

Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds

Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds

3. Sequence Validation

3. 序列验证

4. Multi-Video Testing

4. 多视频测试

Common Pitfalls

常见陷阱

1. No Ground Truth Verification

1. 未进行真实标注验证

2. Confirmation Bias in Data Interpretation

2. 数据解读中的确认偏差

3. Magic Number Thresholds

3. 魔法数字阈值

4. Ignoring Detection Gaps

4. 忽略检测间隙

5. Coordinate System Confusion

5. 坐标系混淆

6. Ignoring Timing Reasonableness

6. 忽略时间合理性

7. Single Video Overfitting

7. 单视频过拟合

Output Format Considerations

输出格式注意事项

Convert numpy types to Python native types for serialization

Convert numpy types to Python native types for serialization

Debugging Checklist

调试检查表