drone-cv-expert
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDrone CV Expert
Drone CV专家
Expert in robotics, drone systems, and computer vision for autonomous aerial platforms.
专注于自主空中平台的机器人技术、无人机系统与计算机视觉领域专家。
Decision Tree: When to Use This Skill
决策树:何时使用本Skill
User mentions drones or UAVs?
├─ YES → Is it about inspection/detection of specific things (fire, roof damage, thermal)?
│ ├─ YES → Use drone-inspection-specialist
│ └─ NO → Is it about flight control, navigation, or general CV?
│ ├─ YES → Use THIS SKILL (drone-cv-expert)
│ └─ NO → Is it about GPU rendering/shaders?
│ ├─ YES → Use metal-shader-expert
│ └─ NO → Use THIS SKILL as default drone skill
└─ NO → Is it general object detection without drone context?
├─ YES → Use clip-aware-embeddings or other CV skill
└─ NO → Probably not a drone question用户提及无人机或UAV?
├─ 是 → 是否涉及特定事物的巡检/检测(火灾、屋顶损坏、热成像)?
│ ├─ 是 → 使用drone-inspection-specialist
│ └─ 否 → 是否涉及飞行控制、导航或通用CV?
│ ├─ 是 → 使用本SKILL (drone-cv-expert)
│ └─ 否 → 是否涉及GPU渲染/着色器?
│ ├─ 是 → 使用metal-shader-expert
│ └─ 否 → 将本SKILL作为默认无人机Skill使用
└─ 否 → 是否为无无人机场景的通用目标检测?
├─ 是 → 使用clip-aware-embeddings或其他CV Skill
└─ 否 → 大概率与无人机无关Core Competencies
核心能力
Flight Control & Navigation
飞行控制与导航
- PID Tuning: Position, velocity, attitude control loops
- SLAM: ORB-SLAM, LSD-SLAM, visual-inertial odometry (VIO)
- Path Planning: A*, RRT, RRT*, Dijkstra, potential fields
- Sensor Fusion: EKF, UKF, complementary filters
- GPS-Denied Navigation: AprilTags, visual odometry, LiDAR SLAM
- PID调参: 位置、速度、姿态控制回路
- SLAM: ORB-SLAM、LSD-SLAM、视觉惯性里程计(VIO)
- 路径规划: A*、RRT、RRT*、Dijkstra、势场法
- 传感器融合: EKF、UKF、互补滤波器
- 无GPS导航: AprilTags、视觉里程计、LiDAR SLAM
Computer Vision
计算机视觉
- Object Detection: YOLO (v5/v8/v10), EfficientDet, SSD
- Tracking: ByteTrack, DeepSORT, SORT, optical flow
- Edge Deployment: TensorRT, ONNX, OpenVINO optimization
- 3D Vision: Stereo depth, point clouds, structure-from-motion
- 目标检测: YOLO (v5/v8/v10)、EfficientDet、SSD
- 目标跟踪: ByteTrack、DeepSORT、SORT、光流
- 边缘部署: TensorRT、ONNX、OpenVINO优化
- 3D视觉: 立体深度、点云、运动恢复结构
Hardware Integration
硬件集成
- Flight Controllers: Pixhawk, Ardupilot, PX4, DJI
- Protocols: MAVLink, DroneKit, MAVSDK
- Edge Compute: Jetson (Nano/Xavier/Orin), Coral TPU
- Sensors: IMU, GPS, barometer, LiDAR, depth cameras
- 飞控系统: Pixhawk、Ardupilot、PX4、DJI
- 通信协议: MAVLink、DroneKit、MAVSDK
- 边缘计算: Jetson (Nano/Xavier/Orin)、Coral TPU
- 传感器: IMU、GPS、气压计、LiDAR、深度相机
Anti-Patterns to Avoid
需避免的反模式
1. "Simulation-Only Syndrome"
1. "唯仿真综合征"
Wrong: Testing only in Gazebo/AirSim, then deploying directly to real drone.
Right: Simulation → Bench test → Tethered flight → Controlled environment → Field.
错误做法: 仅在Gazebo/AirSim中测试,直接部署至真实无人机。
正确做法: 仿真 → 台架测试 → 系留飞行 → 受控环境飞行 → 野外飞行。
2. "EKF Overkill"
2. "EKF过度使用"
Wrong: Using Extended Kalman Filter when complementary filter suffices.
Right: Match filter complexity to requirements:
- Complementary filter: Basic stabilization, attitude only
- EKF: Multi-sensor fusion, GPS+IMU+baro
- UKF: Highly nonlinear systems, aggressive maneuvers
错误做法: 在互补滤波器足够使用的场景下仍使用扩展卡尔曼滤波器(EKF)。
正确做法: 根据需求匹配滤波器复杂度:
- 互补滤波器: 基础稳定、仅姿态控制
- EKF: 多传感器融合、GPS+IMU+气压计
- UKF: 高度非线性系统、激进机动场景
3. "Max Resolution Assumption"
3. "最高分辨率假设"
Wrong: Processing 4K frames at 30fps expecting real-time performance.
Right: Resolution trade-offs by altitude/speed:
| Altitude | Speed | Resolution | FPS | Rationale |
|---|---|---|---|---|
| <30m | Slow | 1920x1080 | 30 | Detail needed |
| 30-100m | Medium | 1280x720 | 30 | Balance |
| >100m | Fast | 640x480 | 60 | Speed priority |
错误做法: 处理30fps的4K帧并期望实时性能。
正确做法: 根据高度/速度权衡分辨率:
| 高度 | 速度 | 分辨率 | FPS | 理由 |
|---|---|---|---|---|
| <30m | 低速 | 1920x1080 | 30 | 需要细节 |
| 30-100m | 中速 | 1280x720 | 30 | 平衡性能与细节 |
| >100m | 高速 | 640x480 | 60 | 优先保证速度 |
4. "Single-Thread Processing"
4. "单线程处理"
Wrong: Sequential detect → track → control in one loop.
Right: Pipeline parallelism:
Thread 1: Camera capture (async)
Thread 2: Object detection (GPU)
Thread 3: Tracking + state estimation
Thread 4: Control commands错误做法: 在单个循环中按顺序执行检测 → 跟踪 → 控制。
正确做法: 流水线并行处理:
线程1: 相机捕获(异步)
线程2: 目标检测(GPU加速)
线程3: 跟踪 + 状态估计
线程4: 控制指令输出5. "GPS Trust"
5. "过度信任GPS"
Wrong: Assuming GPS is always accurate and available.
Right: Multi-source position estimation:
- GPS: 2-5m accuracy outdoor, unavailable indoor
- Visual odometry: 0.1-1% drift, lighting dependent
- AprilTags: cm-level accuracy where deployed
- IMU: Short-term only, drift accumulates
错误做法: 假设GPS始终准确可用。
正确做法: 多源位置估计:
- GPS: 户外精度2-5m,室内不可用
- 视觉里程计: 漂移0.1-1%,受光照影响
- AprilTags: 部署场景下厘米级精度
- IMU: 仅适用于短期,漂移会累积
6. "One Model Fits All"
6. "单一模型适配所有场景"
Wrong: Using same YOLO model for all scenarios.
Right: Model selection by constraint:
| Constraint | Model | Notes |
|---|---|---|
| Latency critical | YOLOv8n | 6ms inference |
| Balanced | YOLOv8s | 15ms, better accuracy |
| Accuracy first | YOLOv8x | 50ms, highest mAP |
| Edge device | YOLOv8n + TensorRT | 3ms on Jetson |
错误做法: 在所有场景中使用相同的YOLO模型。
正确做法: 根据约束选择模型:
| 约束条件 | 模型 | 说明 |
|---|---|---|
| 延迟敏感 | YOLOv8n | 6ms推理时间 |
| 性能平衡 | YOLOv8s | 15ms、精度更优 |
| 精度优先 | YOLOv8x | 50ms、最高mAP |
| 边缘设备 | YOLOv8n + TensorRT | Jetson平台上3ms推理 |
Problem-Solving Framework
问题解决框架
1. Constraint Analysis
1. 约束分析
- Compute: What hardware? (Jetson Nano = ~5 TOPS, Xavier = 32 TOPS)
- Power: Battery capacity? Flight time impact?
- Latency: Control loop rate? Detection response time?
- Weight: Payload capacity? Center of gravity?
- Environment: Indoor/outdoor? GPS available? Lighting conditions?
- 计算能力: 使用何种硬件?(Jetson Nano = ~5 TOPS,Xavier = 32 TOPS)
- 功耗: 电池容量?对飞行时间的影响?
- 延迟: 控制回路频率?检测响应时间?
- 重量: 有效载荷能力?重心位置?
- 环境: 室内/室外?GPS是否可用?光照条件?
2. Algorithm Selection Matrix
2. 算法选择矩阵
| Problem | Classical Approach | Deep Learning | When to Use Each |
|---|---|---|---|
| Feature tracking | KLT optical flow | FlowNet | Classical: Real-time, limited compute. DL: Robust, more compute |
| Object detection | HOG+SVM | YOLO/SSD | Classical: Simple objects, no GPU. DL: Complex, GPU available |
| SLAM | ORB-SLAM | DROID-SLAM | Classical: Mature, debuggable. DL: Better in challenging scenes |
| Path planning | A*, RRT | RL-based | Classical: Known environments. DL: Complex, dynamic |
| 问题 | 经典方法 | 深度学习方法 | 适用场景 |
|---|---|---|---|
| 特征跟踪 | KLT光流 | FlowNet | 经典方法:实时性强、计算需求低;深度学习方法:鲁棒性高、计算需求高 |
| 目标检测 | HOG+SVM | YOLO/SSD | 经典方法:简单目标、无GPU;深度学习方法:复杂目标、有GPU可用 |
| SLAM | ORB-SLAM | DROID-SLAM | 经典方法:成熟、易调试;深度学习方法:复杂场景下表现更优 |
| 路径规划 | A*、RRT | 强化学习方法 | 经典方法:已知环境;深度学习方法:复杂动态环境 |
3. Safety Checklist
3. 安全检查清单
- Kill switch tested and accessible
- Geofence configured
- Return-to-home altitude set
- Low battery action defined
- Signal loss action defined
- Propeller guards (if applicable)
- Pre-flight sensor calibration
- Weather conditions checked
- 急停开关已测试且触手可及
- 地理围栏已配置
- 返航高度已设置
- 低电量应对策略已定义
- 信号丢失应对策略已定义
- 螺旋桨保护罩已安装(如适用)
- 飞行前传感器已校准
- 天气条件已核查
Quick Reference Tables
快速参考表格
MAVLink Message Types
MAVLink消息类型
| Message | Purpose | Frequency |
|---|---|---|
| HEARTBEAT | Connection alive | 1 Hz |
| ATTITUDE | Roll/pitch/yaw | 10-100 Hz |
| LOCAL_POSITION_NED | Position | 10-50 Hz |
| GPS_RAW_INT | Raw GPS | 1-10 Hz |
| SET_POSITION_TARGET | Commands | As needed |
| 消息 | 用途 | 频率 |
|---|---|---|
| HEARTBEAT | 连接保活 | 1 Hz |
| ATTITUDE | 滚转/俯仰/偏航 | 10-100 Hz |
| LOCAL_POSITION_NED | 位置信息 | 10-50 Hz |
| GPS_RAW_INT | 原始GPS数据 | 1-10 Hz |
| SET_POSITION_TARGET | 控制指令 | 按需发送 |
Kalman Filter Tuning
卡尔曼滤波器调参
| Matrix | High Values | Low Values |
|---|---|---|
| Q (process noise) | Trust measurements more | Trust model more |
| R (measurement noise) | Trust model more | Trust measurements more |
| P (initial covariance) | Uncertain initial state | Confident initial state |
| 矩阵 | 高值含义 | 低值含义 |
|---|---|---|
| Q(过程噪声) | 更信任测量值 | 更信任模型预测 |
| R(测量噪声) | 更信任模型预测 | 更信任测量值 |
| P(初始协方差) | 初始状态不确定 | 初始状态确定 |
Common Coordinate Frames
常用坐标系
| Frame | Origin | Axes | Use |
|---|---|---|---|
| NED | Takeoff point | North-East-Down | Navigation |
| ENU | Takeoff point | East-North-Up | ROS standard |
| Body | Drone CG | Forward-Right-Down | Control |
| Camera | Lens center | Right-Down-Forward | Vision |
| 坐标系 | 原点 | 轴定义 | 用途 |
|---|---|---|---|
| NED | 起飞点 | 北-东-下 | 导航 |
| ENU | 起飞点 | 东-北-上 | ROS标准 |
| 机体坐标系 | 无人机重心 | 前-右-下 | 控制 |
| 相机坐标系 | 镜头中心 | 右-下-前 | 视觉处理 |
Reference Files
参考文件
Detailed implementations in :
references/- - SLAM, path planning, localization
navigation-algorithms.md - - Kalman filters, multi-sensor fusion
sensor-fusion-ekf.md - - YOLO, ByteTrack, optical flow
object-detection-tracking.md
详细实现位于目录:
references/- - SLAM、路径规划、定位
navigation-algorithms.md - - 卡尔曼滤波器、多传感器融合
sensor-fusion-ekf.md - - YOLO、ByteTrack、光流
object-detection-tracking.md
Simulation Tools
仿真工具
| Tool | Strengths | Weaknesses | Best For |
|---|---|---|---|
| Gazebo | ROS integration, physics | Graphics quality | ROS development |
| AirSim | Photorealistic, CV-focused | Windows-centric | Vision algorithms |
| Webots | Multi-robot, accessible | Less drone-specific | Swarm simulations |
| MATLAB/Simulink | Control design | Not real-time | Controller tuning |
| 工具 | 优势 | 劣势 | 最佳适用场景 |
|---|---|---|---|
| Gazebo | ROS集成、物理仿真准确 | 图形质量一般 | ROS开发 |
| AirSim | 照片级真实感、聚焦CV | 偏Windows平台 | 视觉算法开发 |
| Webots | 多机器人仿真、易上手 | 无人机针对性较弱 | 集群仿真 |
| MATLAB/Simulink | 控制设计专业 | 非实时 | 控制器调参 |
Emerging Technologies (2024-2025)
新兴技术(2024-2025)
- Event cameras: 1μs temporal resolution, no motion blur
- Neuromorphic computing: Loihi 2 for ultra-low-power inference
- 4D Radar: Velocity + 3D position, works in all weather
- Swarm autonomy: Decentralized coordination, emergent behavior
- Foundation models: SAM, CLIP for zero-shot detection
- 事件相机: 1μs时间分辨率、无运动模糊
- 神经形态计算: Loihi 2芯片实现超低功耗推理
- 4D雷达: 速度+3D位置、全天候工作
- 集群自主: 去中心化协同、涌现行为
- 基础模型: SAM、CLIP实现零样本检测
Integration Points
集成对接点
- drone-inspection-specialist: Domain-specific detection (fire, damage, thermal)
- metal-shader-expert: GPU-accelerated vision processing, custom shaders
- collage-layout-expert: Report generation, visual composition
Key Principle: In drone systems, reliability trumps performance. A 95% accurate system that never crashes is better than 99% accurate that fails unpredictably. Always have fallbacks.
- drone-inspection-specialist: 特定领域检测(火灾、损坏、热成像)
- metal-shader-expert: GPU加速视觉处理、自定义着色器
- collage-layout-expert: 报告生成、视觉合成
核心原则: 在无人机系统中,可靠性优于性能。95%准确率且从不崩溃的系统,远胜于99%准确率但会不可预测故障的系统。始终要有 fallback 方案。