computer-vision-pipeline

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Computer Vision Pipeline

计算机视觉流水线

Expert in building production-ready computer vision systems for object detection, tracking, and video analysis.
专业构建可投入生产的计算机视觉系统,覆盖目标检测、跟踪和视频分析场景。

When to Use

适用场景

Use for:
  • Drone footage analysis (archaeological surveys, conservation)
  • Wildlife monitoring and tracking
  • Real-time object detection systems
  • Video preprocessing and analysis
  • Custom model training and inference
  • Multi-object tracking (MOT)
NOT for:
  • Simple image filters (use Pillow/PIL)
  • Photo editing (use Photoshop/GIMP)
  • Face recognition APIs (use AWS Rekognition)
  • Basic OCR (use Tesseract)

适用场景:
  • 无人机航拍画面分析(考古调查、生态保护)
  • 野生动物监测与跟踪
  • 实时目标检测系统
  • 视频预处理与分析
  • 自定义模型训练与推理
  • 多目标跟踪(MOT)
不适用场景:
  • 简单图像滤镜(使用Pillow/PIL)
  • 照片编辑(使用Photoshop/GIMP)
  • 人脸识别API(使用AWS Rekognition)
  • 基础OCR(使用Tesseract)

Technology Selection

技术选型

Object Detection Models

目标检测模型

ModelSpeed (FPS)Accuracy (mAP)Use Case
YOLOv814053.9%Real-time detection
Detectron22558.7%High accuracy, research
EfficientDet3555.1%Mobile deployment
Faster R-CNN1042.0%Legacy systems
Timeline:
  • 2015: Faster R-CNN (two-stage detection)
  • 2016: YOLO v1 (one-stage, real-time)
  • 2020: YOLOv5 (PyTorch, production-ready)
  • 2023: YOLOv8 (state-of-the-art)
  • 2024: YOLOv8 is industry standard for real-time
Decision tree:
Need real-time (>30 FPS)? → YOLOv8
Need highest accuracy? → Detectron2 Mask R-CNN
Need mobile deployment? → YOLOv8-nano or EfficientDet
Need instance segmentation? → Detectron2 or YOLOv8-seg
Need custom objects? → Fine-tune YOLOv8

模型速度(FPS)准确率(mAP)适用场景
YOLOv814053.9%实时检测
Detectron22558.7%高精度、研究场景
EfficientDet3555.1%移动端部署
Faster R-CNN1042.0%遗留系统
时间线:
  • 2015: Faster R-CNN(两阶段检测)
  • 2016: YOLO v1(单阶段、实时)
  • 2020: YOLOv5(基于PyTorch、可投入生产)
  • 2023: YOLOv8(当前最优)
  • 2024: YOLOv8成为实时检测领域的行业标准
决策树:
需要实时检测(>30 FPS)? → YOLOv8
需要最高精度? → Detectron2 Mask R-CNN
需要移动端部署? → YOLOv8-nano 或 EfficientDet
需要实例分割? → Detectron2 或 YOLOv8-seg
需要自定义目标? → 微调YOLOv8

Common Anti-Patterns

常见反模式

Anti-Pattern 1: Not Preprocessing Frames Before Detection

反模式1:检测前不对帧进行预处理

Novice thinking: "Just run detection on raw video frames"
Problem: Poor detection accuracy, wasted GPU cycles.
Wrong approach:
python
undefined
新手误区: "直接在原始视频帧上运行检测"
问题: 检测准确率低,浪费GPU算力。
错误示例:
python
undefined

❌ No preprocessing - poor results

❌ No preprocessing - poor results

import cv2 from ultralytics import YOLO
model = YOLO('yolov8n.pt') video = cv2.VideoCapture('drone_footage.mp4')
while True: ret, frame = video.read() if not ret: break
# Raw frame detection - no normalization, no resizing
results = model(frame)
# Poor accuracy, slow inference

**Why wrong**:
- Video resolution too high (4K = 8.3 megapixels per frame)
- No normalization (pixel values 0-255 instead of 0-1)
- Aspect ratio not maintained
- GPU memory overflow on high-res frames

**Correct approach**:
```python
import cv2 from ultralytics import YOLO
model = YOLO('yolov8n.pt') video = cv2.VideoCapture('drone_footage.mp4')
while True: ret, frame = video.read() if not ret: break
# Raw frame detection - no normalization, no resizing
results = model(frame)
# Poor accuracy, slow inference

**错误原因**:
- 视频分辨率过高(4K=每帧830万像素)
- 未做归一化(像素值为0-255而非0-1)
- 未保持宽高比
- 高分辨率帧导致GPU内存溢出

**正确示例**:
```python

✅ Proper preprocessing pipeline

✅ Proper preprocessing pipeline

import cv2 import numpy as np from ultralytics import YOLO
model = YOLO('yolov8n.pt') video = cv2.VideoCapture('drone_footage.mp4')
import cv2 import numpy as np from ultralytics import YOLO
model = YOLO('yolov8n.pt') video = cv2.VideoCapture('drone_footage.mp4')

Model expects 640x640 input

Model expects 640x640 input

TARGET_SIZE = 640
def preprocess_frame(frame): # Resize while maintaining aspect ratio h, w = frame.shape[:2] scale = TARGET_SIZE / max(h, w) new_w, new_h = int(w * scale), int(h * scale)
resized = cv2.resize(frame, (new_w, new_h), interpolation=cv2.INTER_LINEAR)

# Pad to square
pad_w = (TARGET_SIZE - new_w) // 2
pad_h = (TARGET_SIZE - new_h) // 2

padded = cv2.copyMakeBorder(
    resized,
    pad_h, TARGET_SIZE - new_h - pad_h,
    pad_w, TARGET_SIZE - new_w - pad_w,
    cv2.BORDER_CONSTANT,
    value=(114, 114, 114)  # Gray padding
)

# Normalize to 0-1 (if model expects it)
# normalized = padded.astype(np.float32) / 255.0

return padded, scale
while True: ret, frame = video.read() if not ret: break
preprocessed, scale = preprocess_frame(frame)
results = model(preprocessed)

# Scale bounding boxes back to original coordinates
for box in results[0].boxes:
    x1, y1, x2, y2 = box.xyxy[0]
    x1, y1, x2, y2 = x1/scale, y1/scale, x2/scale, y2/scale

**Performance comparison**:
- Raw 4K frames: 5 FPS, 72% mAP
- Preprocessed 640x640: 45 FPS, 89% mAP

**Timeline context**:
- 2015: Manual preprocessing required
- 2020: YOLOv5 added auto-resize
- 2023: YOLOv8 has smart preprocessing but explicit control is better

---
TARGET_SIZE = 640
def preprocess_frame(frame): # Resize while maintaining aspect ratio h, w = frame.shape[:2] scale = TARGET_SIZE / max(h, w) new_w, new_h = int(w * scale), int(h * scale)
resized = cv2.resize(frame, (new_w, new_h), interpolation=cv2.INTER_LINEAR)

# Pad to square
pad_w = (TARGET_SIZE - new_w) // 2
pad_h = (TARGET_SIZE - new_h) // 2

padded = cv2.copyMakeBorder(
    resized,
    pad_h, TARGET_SIZE - new_h - pad_h,
    pad_w, TARGET_SIZE - new_w - pad_w,
    cv2.BORDER_CONSTANT,
    value=(114, 114, 114)  # Gray padding
)

# Normalize to 0-1 (if model expects it)
# normalized = padded.astype(np.float32) / 255.0

return padded, scale
while True: ret, frame = video.read() if not ret: break
preprocessed, scale = preprocess_frame(frame)
results = model(preprocessed)

# Scale bounding boxes back to original coordinates
for box in results[0].boxes:
    x1, y1, x2, y2 = box.xyxy[0]
    x1, y1, x2, y2 = x1/scale, y1/scale, x2/scale, y2/scale

**性能对比**:
- 原始4K帧:5 FPS,72% mAP
- 预处理后640x640帧:45 FPS,89% mAP

**时间线背景**:
- 2015: 需要手动预处理
- 2020: YOLOv5新增自动缩放功能
- 2023: YOLOv8具备智能预处理,但显式控制效果更佳

---

Anti-Pattern 2: Processing Every Frame in Video

反模式2:处理视频中的每一帧

Novice thinking: "Run detection on every single frame"
Problem: 99% of frames are redundant, wasting compute.
Wrong approach:
python
undefined
新手误区: "对每一帧都运行检测"
问题: 99%的帧是冗余的,浪费算力。
错误示例:
python
undefined

❌ Process every frame (30 FPS video = 1800 frames/min)

❌ Process every frame (30 FPS video = 1800 frames/min)

import cv2 from ultralytics import YOLO
model = YOLO('yolov8n.pt') video = cv2.VideoCapture('drone_footage.mp4')
detections = []
while True: ret, frame = video.read() if not ret: break
# Run detection on EVERY frame
results = model(frame)
detections.append(results)
import cv2 from ultralytics import YOLO
model = YOLO('yolov8n.pt') video = cv2.VideoCapture('drone_footage.mp4')
detections = []
while True: ret, frame = video.read() if not ret: break
# Run detection on EVERY frame
results = model(frame)
detections.append(results)

10-minute video = 18,000 inferences (15 minutes on GPU)

10-minute video = 18,000 inferences (15 minutes on GPU)


**Why wrong**:
- Adjacent frames are nearly identical
- Wasting 95% of compute on duplicate work
- Slow processing time
- Massive storage for results

**Correct approach 1**: Frame sampling
```python

**错误原因**:
- 相邻帧几乎完全相同
- 95%的算力浪费在重复工作上
- 处理速度慢
- 结果存储占用大量空间

**正确方案1:帧采样**
```python

✅ Sample every Nth frame

✅ Sample every Nth frame

import cv2 from ultralytics import YOLO
model = YOLO('yolov8n.pt') video = cv2.VideoCapture('drone_footage.mp4')
SAMPLE_RATE = 30 # Process 1 frame per second (if 30 FPS video)
frame_count = 0 detections = []
while True: ret, frame = video.read() if not ret: break
frame_count += 1

# Only process every 30th frame
if frame_count % SAMPLE_RATE == 0:
    results = model(frame)
    detections.append({
        'frame': frame_count,
        'timestamp': frame_count / 30.0,
        'results': results
    })
import cv2 from ultralytics import YOLO
model = YOLO('yolov8n.pt') video = cv2.VideoCapture('drone_footage.mp4')
SAMPLE_RATE = 30 # Process 1 frame per second (if 30 FPS video)
frame_count = 0 detections = []
while True: ret, frame = video.read() if not ret: break
frame_count += 1

# Only process every 30th frame
if frame_count % SAMPLE_RATE == 0:
    results = model(frame)
    detections.append({
        'frame': frame_count,
        'timestamp': frame_count / 30.0,
        'results': results
    })

10-minute video = 600 inferences (30 seconds on GPU)

10-minute video = 600 inferences (30 seconds on GPU)


**Correct approach 2**: Adaptive sampling with scene change detection
```python

**正确方案2:结合场景变化检测的自适应采样**
```python

✅ Only process when scene changes significantly

✅ Only process when scene changes significantly

import cv2 import numpy as np from ultralytics import YOLO
model = YOLO('yolov8n.pt') video = cv2.VideoCapture('drone_footage.mp4')
def scene_changed(prev_frame, curr_frame, threshold=0.3): """Detect scene change using histogram comparison""" if prev_frame is None: return True
# Convert to grayscale
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)

# Calculate histograms
prev_hist = cv2.calcHist([prev_gray], [0], None, [256], [0, 256])
curr_hist = cv2.calcHist([curr_gray], [0], None, [256], [0, 256])

# Compare histograms
correlation = cv2.compareHist(prev_hist, curr_hist, cv2.HISTCMP_CORREL)

return correlation < (1 - threshold)
prev_frame = None detections = []
while True: ret, frame = video.read() if not ret: break
# Only run detection if scene changed
if scene_changed(prev_frame, frame):
    results = model(frame)
    detections.append(results)

prev_frame = frame.copy()
import cv2 import numpy as np from ultralytics import YOLO
model = YOLO('yolov8n.pt') video = cv2.VideoCapture('drone_footage.mp4')
def scene_changed(prev_frame, curr_frame, threshold=0.3): """Detect scene change using histogram comparison""" if prev_frame is None: return True
# Convert to grayscale
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)

# Calculate histograms
prev_hist = cv2.calcHist([prev_gray], [0], None, [256], [0, 256])
curr_hist = cv2.calcHist([curr_gray], [0], None, [256], [0, 256])

# Compare histograms
correlation = cv2.compareHist(prev_hist, curr_hist, cv2.HISTCMP_CORREL)

return correlation < (1 - threshold)
prev_frame = None detections = []
while True: ret, frame = video.read() if not ret: break
# Only run detection if scene changed
if scene_changed(prev_frame, frame):
    results = model(frame)
    detections.append(results)

prev_frame = frame.copy()

Adapts to video content - static shots skip frames, action scenes process more

Adapts to video content - static shots skip frames, action scenes process more


**Savings**:
- Every frame: 18,000 inferences
- Sample 1 FPS: 600 inferences (97% reduction)
- Adaptive: ~1,200 inferences (93% reduction)

---

**算力节省**:
- 处理每一帧:18000次推理
- 每秒采样1帧:600次推理(减少97%)
- 自适应采样:约1200次推理(减少93%)

---

Anti-Pattern 3: Not Using Batch Inference

反模式3:不使用批量推理

Novice thinking: "Process one image at a time"
Problem: GPU sits idle 80% of the time waiting for data.
Wrong approach:
python
undefined
新手误区: "一次处理一张图片"
问题: GPU有80%的时间处于空闲状态,等待数据加载。
错误示例:
python
undefined

❌ Sequential processing - GPU underutilized

❌ Sequential processing - GPU underutilized

import cv2 from ultralytics import YOLO import time
model = YOLO('yolov8n.pt')
import cv2 from ultralytics import YOLO import time
model = YOLO('yolov8n.pt')

100 images to process

100 images to process

image_paths = [f'frame_{i:04d}.jpg' for i in range(100)]
start = time.time()
for path in image_paths: frame = cv2.imread(path) results = model(frame) # Process one at a time # GPU utilization: ~20%
elapsed = time.time() - start print(f"Processed {len(image_paths)} images in {elapsed:.2f}s")
image_paths = [f'frame_{i:04d}.jpg' for i in range(100)]
start = time.time()
for path in image_paths: frame = cv2.imread(path) results = model(frame) # Process one at a time # GPU utilization: ~20%
elapsed = time.time() - start print(f"Processed {len(image_paths)} images in {elapsed:.2f}s")

Output: 45 seconds

Output: 45 seconds


**Why wrong**:
- GPU has to wait for CPU to load each image
- No parallelization
- GPU utilization ~20%
- Slow throughput

**Correct approach**:
```python

**错误原因**:
- GPU需要等待CPU加载每张图片
- 无并行处理
- GPU利用率约20%
- 吞吐量低

**正确示例**:
```python

✅ Batch inference - GPU fully utilized

✅ Batch inference - GPU fully utilized

import cv2 from ultralytics import YOLO import time
model = YOLO('yolov8n.pt')
image_paths = [f'frame_{i:04d}.jpg' for i in range(100)]
BATCH_SIZE = 16 # Process 16 images at once
start = time.time()
for i in range(0, len(image_paths), BATCH_SIZE): batch_paths = image_paths[i:i+BATCH_SIZE]
# Load batch
frames = [cv2.imread(path) for path in batch_paths]

# Batch inference (single GPU call)
results = model(frames)  # Pass list of images
# GPU utilization: ~85%
elapsed = time.time() - start print(f"Processed {len(image_paths)} images in {elapsed:.2f}s")
import cv2 from ultralytics import YOLO import time
model = YOLO('yolov8n.pt')
image_paths = [f'frame_{i:04d}.jpg' for i in range(100)]
BATCH_SIZE = 16 # Process 16 images at once
start = time.time()
for i in range(0, len(image_paths), BATCH_SIZE): batch_paths = image_paths[i:i+BATCH_SIZE]
# Load batch
frames = [cv2.imread(path) for path in batch_paths]

# Batch inference (single GPU call)
results = model(frames)  # Pass list of images
# GPU utilization: ~85%
elapsed = time.time() - start print(f"Processed {len(image_paths)} images in {elapsed:.2f}s")

Output: 8 seconds (5.6x faster!)

Output: 8 seconds (5.6x faster!)


**Performance comparison**:
| Method | Time (100 images) | GPU Util | Throughput |
|--------|-------------------|----------|------------|
| Sequential | 45s | 20% | 2.2 img/s |
| Batch (16) | 8s | 85% | 12.5 img/s |
| Batch (32) | 6s | 92% | 16.7 img/s |

**Batch size tuning**:
```python

**性能对比**:
| 方法 | 处理100张图片耗时 | GPU利用率 | 吞吐量 |
|--------|-------------------|----------|------------|
| 串行处理 | 45s | 20% | 2.2 img/s |
| 批量处理(16张) | 8s | 85% | 12.5 img/s |
| 批量处理(32张) | 6s | 92% | 16.7 img/s |

**批量大小调优**:
```python

Find optimal batch size for your GPU

Find optimal batch size for your GPU

import torch
def find_optimal_batch_size(model, image_size=(640, 640)): for batch_size in [1, 2, 4, 8, 16, 32, 64]: try: dummy_input = torch.randn(batch_size, 3, *image_size).cuda()
        start = time.time()
        with torch.no_grad():
            _ = model(dummy_input)
        elapsed = time.time() - start

        throughput = batch_size / elapsed
        print(f"Batch {batch_size}: {throughput:.1f} img/s")
    except RuntimeError as e:
        print(f"Batch {batch_size}: OOM (out of memory)")
        break
import torch
def find_optimal_batch_size(model, image_size=(640, 640)): for batch_size in [1, 2, 4, 8, 16, 32, 64]: try: dummy_input = torch.randn(batch_size, 3, *image_size).cuda()
        start = time.time()
        with torch.no_grad():
            _ = model(dummy_input)
        elapsed = time.time() - start

        throughput = batch_size / elapsed
        print(f"Batch {batch_size}: {throughput:.1f} img/s")
    except RuntimeError as e:
        print(f"Batch {batch_size}: OOM (out of memory)")
        break

Find optimal batch size before production

Find optimal batch size before production

find_optimal_batch_size(model)

---
find_optimal_batch_size(model)

---

Anti-Pattern 4: Ignoring Non-Maximum Suppression (NMS) Tuning

反模式4:忽略非极大值抑制(NMS)调优

Problem: Duplicate detections, missed objects, slow post-processing.
Wrong approach:
python
undefined
问题: 重复检测、漏检目标、后处理速度慢。
错误示例:
python
undefined

❌ Use default NMS settings for everything

❌ Use default NMS settings for everything

from ultralytics import YOLO
model = YOLO('yolov8n.pt')
from ultralytics import YOLO
model = YOLO('yolov8n.pt')

Default settings (iou_threshold=0.45, conf_threshold=0.25)

Default settings (iou_threshold=0.45, conf_threshold=0.25)

results = model('crowded_scene.jpg')
results = model('crowded_scene.jpg')

Result: 50 bounding boxes, 30 are duplicates!

Result: 50 bounding boxes, 30 are duplicates!


**Why wrong**:
- Default IoU=0.45 is too permissive for dense objects
- Default conf=0.25 includes low-quality detections
- No adaptation to use case

**Correct approach**:
```python

**错误原因**:
- 默认IoU=0.45对密集目标过于宽松
- 默认conf=0.25会包含低质量检测结果
- 未根据场景调整参数

**正确示例**:
```python

✅ Tune NMS for your use case

✅ Tune NMS for your use case

from ultralytics import YOLO
model = YOLO('yolov8n.pt')
from ultralytics import YOLO
model = YOLO('yolov8n.pt')

Sparse objects (dolphins in ocean)

稀疏目标(海洋中的海豚)

sparse_results = model( 'ocean_footage.jpg', iou=0.5, # Higher IoU = allow closer boxes conf=0.4 # Higher confidence = fewer false positives )
sparse_results = model( 'ocean_footage.jpg', iou=0.5, # 更高IoU=允许更接近的框 conf=0.4 # 更高置信度=减少误检 )

Dense objects (crowd, flock of birds)

密集目标(人群、鸟群)

dense_results = model( 'crowded_scene.jpg', iou=0.3, # Lower IoU = suppress more duplicates conf=0.5 # Higher confidence = filter noise )
dense_results = model( 'crowded_scene.jpg', iou=0.3, # 更低IoU=抑制更多重复框 conf=0.5 # 更高置信度=过滤噪声 )

High precision needed (legal evidence)

高精度需求(法律证据)

precise_results = model( 'evidence.jpg', iou=0.5, conf=0.7, # Very high confidence max_det=50 # Limit max detections )

**NMS parameter guide**:
| Use Case | IoU | Conf | Max Det |
|----------|-----|------|---------|
| Sparse objects (wildlife) | 0.5 | 0.4 | 100 |
| Dense objects (crowd) | 0.3 | 0.5 | 300 |
| High precision (evidence) | 0.5 | 0.7 | 50 |
| Real-time (speed priority) | 0.45 | 0.3 | 100 |

---
precise_results = model( 'evidence.jpg', iou=0.5, conf=0.7, # 极高置信度 max_det=50 # 限制最大检测数量 )

**NMS参数指南**:
| 场景 | IoU | Conf | Max Det |
|----------|-----|------|---------|
| 稀疏目标(野生动物) | 0.5 | 0.4 | 100 |
| 密集目标(人群) | 0.3 | 0.5 | 300 |
| 高精度需求(证据) | 0.5 | 0.7 | 50 |
| 实时场景(速度优先) | 0.45 | 0.3 | 100 |

---

Anti-Pattern 5: No Tracking Between Frames

反模式5:帧间不使用跟踪

Novice thinking: "Run detection on each frame independently"
Problem: Can't count unique objects, track movement, or build trajectories.
Wrong approach:
python
undefined
新手误区: "独立检测每一帧"
问题: 无法统计唯一目标、跟踪运动轨迹或构建行为路径。
错误示例:
python
undefined

❌ Independent frame detection - no object identity

❌ Independent frame detection - no object identity

from ultralytics import YOLO import cv2
model = YOLO('yolov8n.pt') video = cv2.VideoCapture('dolphins.mp4')
detections = []
while True: ret, frame = video.read() if not ret: break
results = model(frame)
detections.append(results)
from ultralytics import YOLO import cv2
model = YOLO('yolov8n.pt') video = cv2.VideoCapture('dolphins.mp4')
detections = []
while True: ret, frame = video.read() if not ret: break
results = model(frame)
detections.append(results)

Result: Can't tell if frame 10 dolphin is same as frame 20 dolphin

Result: Can't tell if frame 10 dolphin is same as frame 20 dolphin

Can't count unique dolphins

Can't count unique dolphins

Can't track trajectories

Can't track trajectories


**Why wrong**:
- No object identity across frames
- Can't count unique objects
- Can't analyze movement patterns
- Can't build trajectories

**Correct approach**: Use tracking (ByteTrack)
```python

**错误原因**:
- 帧间无目标唯一标识
- 无法统计唯一目标数量
- 无法分析运动模式
- 无法构建行为路径

**正确方案:使用跟踪算法(ByteTrack)**
```python

✅ Multi-object tracking with ByteTrack

✅ Multi-object tracking with ByteTrack

from ultralytics import YOLO import cv2
from ultralytics import YOLO import cv2

YOLO with tracking

YOLO with tracking

model = YOLO('yolov8n.pt') video = cv2.VideoCapture('dolphins.mp4')
model = YOLO('yolov8n.pt') video = cv2.VideoCapture('dolphins.mp4')

Track objects across frames

Track objects across frames

tracks = {}
while True: ret, frame = video.read() if not ret: break
# Run detection + tracking
results = model.track(
    frame,
    persist=True,     # Maintain IDs across frames
    tracker='bytetrack.yaml'  # ByteTrack algorithm
)

# Each detection now has persistent ID
for box in results[0].boxes:
    track_id = int(box.id[0])  # Unique ID across frames
    x1, y1, x2, y2 = box.xyxy[0]

    # Store trajectory
    if track_id not in tracks:
        tracks[track_id] = []

    tracks[track_id].append({
        'frame': len(tracks[track_id]),
        'bbox': (x1, y1, x2, y2),
        'conf': box.conf[0]
    })
tracks = {}
while True: ret, frame = video.read() if not ret: break
# Run detection + tracking
results = model.track(
    frame,
    persist=True,     # Maintain IDs across frames
    tracker='bytetrack.yaml'  # ByteTrack algorithm
)

# Each detection now has persistent ID
for box in results[0].boxes:
    track_id = int(box.id[0])  # Unique ID across frames
    x1, y1, x2, y2 = box.xyxy[0]

    # Store trajectory
    if track_id not in tracks:
        tracks[track_id] = []

    tracks[track_id].append({
        'frame': len(tracks[track_id]),
        'bbox': (x1, y1, x2, y2),
        'conf': box.conf[0]
    })

Now we can analyze:

Now we can analyze:

print(f"Unique dolphins detected: {len(tracks)}")
print(f"Unique dolphins detected: {len(tracks)}")

Trajectory analysis

Trajectory analysis

for track_id, trajectory in tracks.items(): if len(trajectory) > 30: # Only long tracks print(f"Dolphin {track_id} appeared in {len(trajectory)} frames") # Calculate movement, speed, etc.

**Tracking benefits**:
- Count unique objects (not just detections per frame)
- Build trajectories and movement patterns
- Analyze behavior over time
- Filter out brief false positives

**Tracking algorithms**:
| Algorithm | Speed | Robustness | Occlusion Handling |
|-----------|-------|------------|---------------------|
| ByteTrack | Fast | Good | Excellent |
| SORT | Very Fast | Fair | Fair |
| DeepSORT | Medium | Excellent | Good |
| BotSORT | Medium | Excellent | Excellent |

---
for track_id, trajectory in tracks.items(): if len(trajectory) > 30: # Only long tracks print(f"Dolphin {track_id} appeared in {len(trajectory)} frames") # Calculate movement, speed, etc.

**跟踪优势**:
- 统计唯一目标数量(而非每帧检测数)
- 构建运动轨迹和行为模式
- 分析长期行为
- 过滤短暂误检

**跟踪算法对比**:
| 算法 | 速度 | 鲁棒性 | 遮挡处理 |
|-----------|-------|------------|---------------------|
| ByteTrack | 快 | 良好 | 优秀 |
| SORT | 极快 | 一般 | 一般 |
| DeepSORT | 中等 | 优秀 | 良好 |
| BotSORT | 中等 | 优秀 | 优秀 |

---

Production Checklist

生产环境检查清单

□ Preprocess frames (resize, pad, normalize)
□ Sample frames intelligently (1 FPS or scene change detection)
□ Use batch inference (16-32 images per batch)
□ Tune NMS thresholds for your use case
□ Implement tracking if analyzing video
□ Log inference time and GPU utilization
□ Handle edge cases (empty frames, corrupted video)
□ Save results in structured format (JSON, CSV)
□ Visualize detections for debugging
□ Benchmark on representative data

□ 帧预处理(缩放、填充、归一化)
□ 智能帧采样(每秒1帧或场景变化检测)
□ 使用批量推理(每批16-32张图片)
□ 根据场景调优NMS阈值
□ 分析视频时实现跟踪功能
□ 记录推理时间和GPU利用率
□ 处理边缘情况(空帧、损坏视频)
□ 以结构化格式保存结果(JSON、CSV)
□ 可视化检测结果用于调试
□ 在代表性数据上做基准测试

When to Use vs Avoid

适用与不适用场景对比

ScenarioAppropriate?
Analyze drone footage for archaeology✅ Yes - custom object detection
Track wildlife in video✅ Yes - detection + tracking
Count people in crowd✅ Yes - dense object detection
Real-time security camera✅ Yes - YOLOv8 real-time
Filter vacation photos❌ No - use photo management apps
Face recognition login❌ No - use AWS Rekognition API
Read license plates❌ No - use specialized OCR

场景是否适用
考古领域无人机航拍分析✅ 是 - 自定义目标检测
视频中的野生动物跟踪✅ 是 - 检测+跟踪
人群计数✅ 是 - 密集目标检测
实时安防摄像头✅ 是 - YOLOv8实时检测
度假照片过滤❌ 否 - 使用照片管理应用
人脸识别登录❌ 否 - 使用AWS Rekognition API
车牌识别❌ 否 - 使用专用OCR

References

参考资料

  • /references/yolo-guide.md
    - YOLOv8 setup, training, inference patterns
  • /references/video-processing.md
    - Frame extraction, scene detection, optimization
  • /references/tracking-algorithms.md
    - ByteTrack, SORT, DeepSORT comparison
  • /references/yolo-guide.md
    - YOLOv8搭建、训练、推理模式
  • /references/video-processing.md
    - 帧提取、场景检测、优化技巧
  • /references/tracking-algorithms.md
    - ByteTrack、SORT、DeepSORT对比

Scripts

脚本

  • scripts/video_analyzer.py
    - Extract frames, run detection, generate timeline
  • scripts/model_trainer.py
    - Fine-tune YOLO on custom dataset, export weights

This skill guides: Computer vision | Object detection | Video analysis | YOLO | Tracking | Drone footage | Wildlife monitoring
  • scripts/video_analyzer.py
    - 提取帧、运行检测、生成时间线
  • scripts/model_trainer.py
    - 在自定义数据集上微调YOLO、导出权重

本技能覆盖: 计算机视觉 | 目标检测 | 视频分析 | YOLO | 跟踪 | 无人机航拍画面 | 野生动物监测