video-processing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo Processing
视频处理
Overview
概述
This skill provides guidance for video processing tasks involving frame-level analysis, event detection, and motion tracking using computer vision libraries like OpenCV. It emphasizes verification-first approaches and guards against common pitfalls in video analysis workflows.
本技能为使用OpenCV等计算机视觉库进行帧级分析、事件检测和运动跟踪的视频处理任务提供指导。它强调“先验证再实现”的方法,帮助规避视频分析工作流中的常见陷阱。
Core Approach: Verify Before Implementing
核心方法:先验证再实现
Before writing detection algorithms, establish ground truth understanding of the video content:
- Extract and inspect sample frames - Save key frames as images to visually verify what is happening at specific frame numbers
- Understand video metadata - Frame count, FPS, duration, resolution
- Map expected events to frame ranges - If test data exists, understand what frames correspond to which events
- Build diagnostic tools first - Frame extraction and visualization utilities provide critical insight
在编写检测算法之前,先建立对视频内容的真实情况认知:
- 提取并检查样本帧 - 将关键帧保存为图像,以直观验证特定帧编号处发生的情况
- 了解视频元数据 - 帧数量、FPS、时长、分辨率
- 将预期事件映射到帧范围 - 如果有测试数据,明确哪些帧对应哪些事件
- 先构建诊断工具 - 帧提取和可视化工具能提供关键洞察
Workflow for Event Detection Tasks
事件检测任务工作流
Phase 1: Video Exploration
阶段1:视频探索
python
undefinedpython
undefinedEssential first steps for any video analysis task
Essential first steps for any video analysis task
import cv2
cap = cv2.VideoCapture(video_path)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
fps = cap.get(cv2.CAP_PROP_FPS)
duration = frame_count / fps
print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")
**Critical**: Extract frames at expected event locations to verify understanding:
```python
def save_frame(video_path, frame_num, output_path):
cap = cv2.VideoCapture(video_path)
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
ret, frame = cap.read()
if ret:
cv2.imwrite(output_path, frame)
cap.release()import cv2
cap = cv2.VideoCapture(video_path)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
fps = cap.get(cv2.CAP_PROP_FPS)
duration = frame_count / fps
print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")
**关键步骤**:在预期事件发生的位置提取帧,以验证你的理解:
```python
def save_frame(video_path, frame_num, output_path):
cap = cv2.VideoCapture(video_path)
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
ret, frame = cap.read()
if ret:
cv2.imwrite(output_path, frame)
cap.release()Save frames at expected event times for visual inspection
Save frames at expected event times for visual inspection
save_frame("video.mp4", 50, "frame_050.png")
save_frame("video.mp4", 60, "frame_060.png")
undefinedsave_frame("video.mp4", 50, "frame_050.png")
save_frame("video.mp4", 60, "frame_060.png")
undefinedPhase 2: Algorithm Development
阶段2:算法开发
When developing detection algorithms:
- Start simple - Basic frame differencing or thresholding before complex approaches
- Use configurable thresholds - Avoid hardcoded magic numbers; derive from data
- Test on known frames first - Verify algorithm produces expected results on frames with known ground truth
- Log intermediate values - Track metrics at each frame to understand algorithm behavior
开发检测算法时:
- 从简单开始 - 在使用复杂方法之前,先尝试基本的帧差分或阈值处理
- 使用可配置的阈值 - 避免硬编码的“魔法数字”;从数据中推导阈值
- 先在已知帧上测试 - 在有真实标注的帧上验证算法是否产生预期结果
- 记录中间值 - 跟踪每一帧的指标,以了解算法行为
Phase 3: Validation
阶段3:验证
Before finalizing:
- Sanity check outputs - Do detected events occur in reasonable order and timing?
- Test on multiple videos - Verify generalization across different inputs
- Compare against expected ranges - If ground truth exists, verify detection accuracy
最终确定之前:
- 合理性检查输出 - 检测到的事件是否在合理的顺序和时间点发生?
- 在多个视频上测试 - 验证算法在不同输入上的泛化能力
- 与预期范围对比 - 如果有真实标注,验证检测准确率
Common Detection Approaches
常见检测方法
Frame Differencing
帧差分法
Compares frames against a reference (first frame or previous frame) to detect motion:
python
undefined将帧与参考帧(第一帧或前一帧)进行比较以检测运动:
python
undefinedBackground subtraction approach
Background subtraction approach
first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2GRAY)
first_frame = cv2.GaussianBlur(first_frame, (21, 21), 0)
first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2GRAY)
first_frame = cv2.GaussianBlur(first_frame, (21, 21), 0)
For each subsequent frame
For each subsequent frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (21, 21), 0)
diff = cv2.absdiff(first_frame, gray)
**Pitfall**: First frame may not be a suitable reference if scene changes or camera moves.gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (21, 21), 0)
diff = cv2.absdiff(first_frame, gray)
**陷阱**:如果场景变化或相机移动,第一帧可能不是合适的参考帧。Contour-Based Detection
基于轮廓的检测
Identifies objects by finding contours in thresholded images:
python
_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)Pitfall: Threshold values (e.g., 25) and minimum contour areas are arbitrary without calibration.
通过在阈值化图像中查找轮廓来识别物体:
python
_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)陷阱:阈值(如25)和最小轮廓面积在未校准的情况下是随意设定的。
Tracking Position Over Time
随时间跟踪位置
For detecting events like jumps or gestures, track object position across frames:
python
positions = [] # (frame_num, x, y, area) tuples
for frame_num in range(frame_count):
# ... detection code ...
if detected:
positions.append((frame_num, cx, cy, area))Pitfall: Coordinate systems matter. In image coordinates, Y increases downward, so "higher in frame" means smaller Y values.
为了检测跳跃或手势等事件,跟踪物体在各帧中的位置:
python
positions = [] # (frame_num, x, y, area) tuples
for frame_num in range(frame_count):
# ... detection code ...
if detected:
positions.append((frame_num, cx, cy, area))陷阱:坐标系很重要。在图像坐标系中,Y值越小表示在帧中的位置越高,因此“帧中位置更高”意味着Y值更小。
Verification Strategies
验证策略
1. Visual Inspection
1. 视觉检查
Save frames at detected event times to verify correctness:
python
undefined在检测到事件的时间点保存帧,以验证正确性:
python
undefinedAfter detecting takeoff at frame N
After detecting takeoff at frame N
save_frame(video_path, detected_takeoff, "detected_takeoff.png")
save_frame(video_path, detected_takeoff - 5, "before_takeoff.png")
save_frame(video_path, detected_takeoff + 5, "after_takeoff.png")
undefinedsave_frame(video_path, detected_takeoff, "detected_takeoff.png")
save_frame(video_path, detected_takeoff - 5, "before_takeoff.png")
save_frame(video_path, detected_takeoff + 5, "after_takeoff.png")
undefined2. Timing Reasonableness
2. 时间合理性检查
Check if detected events make temporal sense:
python
duration_seconds = frame_count / fps
event_time = detected_frame / fps检查检测到的事件在时间上是否合理:
python
duration_seconds = frame_count / fps
event_time = detected_frame / fpsExample: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds
Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds
if event_time > duration_seconds - 0.5:
print("WARNING: Event detected very late in video - verify correctness")
undefinedif event_time > duration_seconds - 0.5:
print("WARNING: Event detected very late in video - verify correctness")
undefined3. Sequence Validation
3. 序列验证
Ensure events occur in logical order:
python
if detected_landing <= detected_takeoff:
print("ERROR: Landing cannot occur before or at takeoff")确保事件按逻辑顺序发生:
python
if detected_landing <= detected_takeoff:
print("ERROR: Landing cannot occur before or at takeoff")4. Multi-Video Testing
4. 多视频测试
Test on multiple inputs early to catch overfitting to single video characteristics.
尽早在多个输入上测试,以避免过拟合到单个视频的特征。
Common Pitfalls
常见陷阱
1. No Ground Truth Verification
1. 未进行真实标注验证
Problem: Relying entirely on computed metrics without visual confirmation.
Solution: Always save and inspect frames at detected event locations.
问题:完全依赖计算指标,而不进行视觉确认。
解决方案:务必保存并检查检测到事件位置的帧。
2. Confirmation Bias in Data Interpretation
2. 数据解读中的确认偏差
Problem: When data shows unexpected patterns, inventing explanations that fit preconceptions rather than questioning assumptions.
Solution: When detection results seem wrong, investigate root causes rather than rationalizing unexpected behavior.
问题:当数据显示意外模式时,编造符合先入为主观念的解释,而非质疑假设。
解决方案:当检测结果看似错误时,调查根本原因,而非合理化意外行为。
3. Magic Number Thresholds
3. 魔法数字阈值
Problem: Using arbitrary thresholds (500 for contour area, 25 for binary threshold) without empirical basis.
Solution: Derive thresholds from actual video data or make them configurable with sensible defaults.
问题:使用随意设定的阈值(如轮廓面积500、二值阈值25),而没有实证依据。
解决方案:从实际视频数据中推导阈值,或使其可配置并设置合理默认值。
4. Ignoring Detection Gaps
4. 忽略检测间隙
Problem: When detection fails for a range of frames, assuming this is expected behavior without investigation.
Solution: Investigate why detection fails - it may indicate algorithm flaws rather than expected behavior.
问题:当检测在某一帧范围内失败时,假设这是预期行为而不进行调查。
解决方案:调查检测失败的原因——这可能表明算法存在缺陷,而非预期行为。
5. Coordinate System Confusion
5. 坐标系混淆
Problem: Misinterpreting Y coordinates (smaller Y = higher in frame in image coordinates).
Solution: Explicitly document coordinate system assumptions and verify with visual inspection.
问题:误解Y坐标(在图像坐标系中,Y值越小表示在帧中的位置越高)。
解决方案:明确记录坐标系假设,并通过视觉检查进行验证。
6. Ignoring Timing Reasonableness
6. 忽略时间合理性
Problem: Accepting detections that don't make temporal sense (e.g., event detected in last 0.8 seconds of a 4-second video).
Solution: Implement sanity checks on output timing.
问题:接受在时间上不合理的检测结果(如在4秒视频的最后0.8秒检测到事件)。
解决方案:对输出时间实施合理性检查。
7. Single Video Overfitting
7. 单视频过拟合
Problem: Algorithm works on one video but fails on others.
Solution: Test on multiple videos early in development.
问题:算法在一个视频上有效,但在其他视频上失效。
解决方案:在开发早期就在多个视频上进行测试。
Output Format Considerations
输出格式注意事项
When outputting results (e.g., to TOML, JSON):
python
import numpy as np输出结果时(如保存为TOML、JSON):
python
import numpy as npConvert numpy types to Python native types for serialization
Convert numpy types to Python native types for serialization
result = {
"takeoff_frame": int(takeoff_frame), # Not np.int64
"landing_frame": int(landing_frame),
}
undefinedresult = {
"takeoff_frame": int(takeoff_frame), # Not np.int64
"landing_frame": int(landing_frame),
}
undefinedDebugging Checklist
调试检查表
When detection results are incorrect:
- Have I visually inspected frames at the expected event times?
- Have I visually inspected frames at my detected event times?
- Do my detected times make temporal sense given video duration?
- Have I verified my algorithm on frames with known ground truth?
- Am I correctly interpreting the coordinate system?
- Have I tested on multiple videos?
- Are my thresholds derived from data or arbitrary?
- When detection fails on some frames, do I understand why?
当检测结果不正确时:
- 我是否已视觉检查预期事件时间点的帧?
- 我是否已视觉检查检测到事件时间点的帧?
- 考虑到视频时长,检测到的时间在时间上是否合理?
- 我是否已在有真实标注的帧上验证算法?
- 我是否正确解读了坐标系?
- 我是否已在多个视频上测试?
- 我的阈值是从数据中推导的还是随意设定的?
- 当检测在某些帧上失败时,我是否了解原因?