OpenCV - Computer Vision and Image Processing
OpenCV - 计算机视觉与图像处理
OpenCV (Open Source Computer Vision Library) is the de facto standard library for computer vision tasks. It provides 2500+ optimized algorithms for real-time image and video processing, from basic operations like reading images to advanced tasks like face recognition and 3D reconstruction.
OpenCV(开源计算机视觉库)是计算机视觉领域的事实标准库。它提供了2500余种经过优化的算法,用于实时图像和视频处理,涵盖从读取图像等基础操作到人脸识别、3D重建等高级任务。
- Reading, writing, and displaying images and videos from files or cameras.
- Image preprocessing (resizing, cropping, rotating, color conversion).
- Edge detection (Canny, Sobel) and contour finding.
- Feature detection and matching (SIFT, ORB, AKAZE).
- Object detection (Haar Cascades, HOG, DNN module for YOLO/SSD).
- Face detection and recognition.
- Image segmentation (thresholding, watershed, GrabCut).
- Video analysis (motion detection, object tracking, optical flow).
- Camera calibration and 3D reconstruction.
- Image stitching and panorama creation.
- Real-time applications requiring fast performance.
- 读取、写入和显示来自文件或相机的图像与视频。
- 图像预处理(缩放、裁剪、旋转、颜色转换)。
- 边缘检测(Canny、Sobel算法)与轮廓提取。
- 特征检测与匹配(SIFT、ORB、AKAZE算法)。
- 目标检测(Haar级联、HOG、用于YOLO/SSD的DNN模块)。
- 人脸检测与识别。
- 图像分割(阈值处理、分水岭算法、GrabCut算法)。
- 视频分析(运动检测、目标跟踪、光流)。
- 相机标定与3D重建。
- 图像拼接与全景图创建。
- 对性能要求较高的实时应用场景。
Reference Documentation
参考文档
Image as NumPy Array
图像以NumPy数组表示
OpenCV represents images as NumPy arrays with shape (height, width, channels). This allows seamless integration with NumPy operations and other scientific Python libraries.
OpenCV将图像表示为形状为(高度, 宽度, 通道数)的NumPy数组,这使得它能与NumPy操作及其他Python科学计算库无缝集成。
BGR Color Space (Not RGB!)
BGR颜色空间(注意不是RGB!)
OpenCV uses BGR (Blue-Green-Red) instead of RGB by default. This is critical to remember when displaying images or integrating with other libraries.
OpenCV默认使用BGR(蓝-绿-红)颜色空间而非RGB。在显示图像或与其他库集成时,这一点至关重要。
In-Place vs Copy Operations
原地操作与复制操作
Many OpenCV functions modify images in-place for performance. Understanding when copies are made is essential for efficient code.
许多OpenCV函数为了性能会原地修改图像。理解何时会创建副本对于编写高效代码至关重要。
C++ Performance in Python
Python中的C++级性能
OpenCV is written in optimized C++, making it extremely fast even when called from Python. Avoid Python loops when OpenCV vectorized operations exist.
OpenCV由经过优化的C++编写,因此即使在Python中调用也能达到极高的速度。当OpenCV提供向量化操作时,应避免使用Python循环。
pip install opencv-python
pip install opencv-python
With contrib modules (SIFT, SURF, etc.)
包含扩展模块的版本(支持SIFT、SURF等)
pip install opencv-contrib-python
pip install opencv-contrib-python
Headless (no GUI, for servers)
无头版(无GUI,适用于服务器)
pip install opencv-python-headless
pip install opencv-python-headless
python
import cv2
import numpy as np
import matplotlib.pyplot as plt
python
import cv2
import numpy as np
import matplotlib.pyplot as plt
Basic Pattern - Read, Process, Display
基础流程 - 读取、处理、显示
img = cv2.imread('image.jpg')
img = cv2.imread('image.jpg')
2. Process (convert to grayscale)
2. 处理(转换为灰度图)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imshow('Grayscale', gray)
cv2.waitKey(0) # Wait for key press
cv2.destroyAllWindows()
cv2.imshow('Grayscale', gray)
cv2.waitKey(0) # 等待按键输入
cv2.destroyAllWindows()
Basic Pattern - Video Processing
基础流程 - 视频处理
1. Open video capture
1. 打开视频捕获
cap = cv2.VideoCapture(0) # 0 = default camera, or 'video.mp4'
while True:
# 2. Read frame
ret, frame = cap.read()
if not ret:
break
# 3. Process frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# 4. Display
cv2.imshow('Video', gray)
# 5. Exit on 'q' key
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap = cv2.VideoCapture(0) # 0 = 默认摄像头,或传入视频文件路径如'video.mp4'
while True:
# 2. 读取帧
ret, frame = cap.read()
if not ret:
break
# 3. 处理帧
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# 4. 显示
cv2.imshow('Video', gray)
# 5. 按下'q'键退出
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
cap.release()
cv2.destroyAllWindows()
- Check Image Loaded - Always verify after to catch file errors.
- Use cv2.cvtColor() for Color Conversion - Don't manually rearrange channels; use the provided conversion codes.
- Release Resources - Always call and when done with video/windows.
- Copy Before Modifying - Use if you need to preserve the original image.
- Use Appropriate Data Types - Keep images as uint8 (0-255) for display, convert to float32 (0-1) for mathematical operations.
- Validate VideoCapture - Check before reading frames.
- Use BGR2RGB for Matplotlib - Convert BGR to RGB when displaying with matplotlib.
- Vectorize Operations - Use OpenCV's built-in functions instead of Python loops over pixels.
- 检查图像是否加载成功 - 调用后,务必验证以捕获文件错误。
- 使用cv2.cvtColor()进行颜色转换 - 不要手动调整通道顺序,使用提供的转换代码。
- 释放资源 - 处理完视频或窗口后,务必调用和。
- 修改前先复制 - 如果需要保留原始图像,使用。
- 使用合适的数据类型 - 显示图像时保持uint8格式(0-255),进行数学运算时转换为float32格式(0-1)。
- 验证VideoCapture - 读取帧前检查。
- Matplotlib显示时转换为BGR2RGB - 使用matplotlib显示时,将BGR转换为RGB。
- 使用向量化操作 - 使用OpenCV内置函数而非Python循环遍历像素。
- Don't Assume RGB - OpenCV uses BGR by default; convert to RGB for matplotlib or PIL.
- Don't Forget waitKey() - Without , windows won't display properly.
- Don't Mix PIL and OpenCV Directly - Convert between them explicitly (OpenCV uses BGR, PIL uses RGB).
- Don't Process Video in Memory - Process frame-by-frame to avoid memory issues with large videos.
- Don't Use Python Loops for Pixels - This is 100x slower than vectorized operations.
- Don't Hardcode Paths - Use or for cross-platform compatibility.
- 不要默认使用RGB - OpenCV默认使用BGR;若使用matplotlib或PIL,需转换为RGB。
- 不要忘记waitKey() - 没有,窗口无法正常显示。
- 不要直接混合使用PIL和OpenCV - 显式进行格式转换(OpenCV用BGR,PIL用RGB)。
- 不要在内存中处理整个视频 - 逐帧处理以避免大视频的内存问题。
- 不要使用Python循环遍历像素 - 这比向量化操作慢100倍。
- 不要硬编码路径 - 使用或实现跨平台兼容。
Anti-Patterns (NEVER)
反模式(绝对避免)
python
import cv2
import numpy as np
python
import cv2
import numpy as np
❌ BAD: Not checking if image loaded
❌ 错误示例:未检查图像是否加载成功
img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Crashes if file doesn't exist!
img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # 若文件不存在会崩溃!
✅ GOOD: Always validate
✅ 正确示例:始终验证
img = cv2.imread('image.jpg')
if img is None:
raise FileNotFoundError("Image not found")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.imread('image.jpg')
if img is None:
raise FileNotFoundError("未找到图像文件")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
❌ BAD: Using Python loops for pixel manipulation
❌ 错误示例:使用Python循环操作像素
for i in range(img.shape[0]):
for j in range(img.shape[1]):
img[i, j] = img[i, j] * 0.5 # Extremely slow!
for i in range(img.shape[0]):
for j in range(img.shape[1]):
img[i, j] = img[i, j] * 0.5 # 速度极慢!
✅ GOOD: Vectorized NumPy operations
✅ 正确示例:使用NumPy向量化操作
img = (img * 0.5).astype(np.uint8)
img = (img * 0.5).astype(np.uint8)
❌ BAD: Displaying BGR image with matplotlib
❌ 错误示例:使用matplotlib显示BGR图像
plt.imshow(img) # Colors will be wrong!
plt.imshow(img) # 颜色会显示错误!
✅ GOOD: Convert to RGB first
✅ 正确示例:先转换为RGB
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img_rgb)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img_rgb)
❌ BAD: Not releasing video capture
❌ 错误示例:未释放视频捕获资源
cap = cv2.VideoCapture('video.mp4')
while cap.read()[0]:
pass
cap = cv2.VideoCapture('video.mp4')
while cap.read()[0]:
pass
Memory leak! Camera still locked!
内存泄漏!摄像头仍被占用!
✅ GOOD: Always release
✅ 正确示例:始终释放资源
cap = cv2.VideoCapture('video.mp4')
try:
while cap.read()[0]:
pass
finally:
cap.release()
cap = cv2.VideoCapture('video.mp4')
try:
while cap.read()[0]:
pass
finally:
cap.release()
Image I/O and Display
图像输入输出与显示
Reading and Writing Images
图像的读取与写入
Read image (returns None if failed)
读取图像(若失败返回None)
img = cv2.imread('image.jpg')
img = cv2.imread('image.jpg')
gray = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
gray = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
Read with alpha channel
读取带alpha通道的图像
img_alpha = cv2.imread('image.png', cv2.IMREAD_UNCHANGED)
img_alpha = cv2.imread('image.png', cv2.IMREAD_UNCHANGED)
cv2.imwrite('output.jpg', img)
cv2.imwrite('output.jpg', img)
Write with quality (JPEG: 0-100, PNG: 0-9 compression)
指定质量写入(JPEG: 0-100,PNG: 0-9压缩比)
cv2.imwrite('output.jpg', img, [cv2.IMWRITE_JPEG_QUALITY, 95])
cv2.imwrite('output.png', img, [cv2.IMWRITE_PNG_COMPRESSION, 9])
cv2.imwrite('output.jpg', img, [cv2.IMWRITE_JPEG_QUALITY, 95])
cv2.imwrite('output.png', img, [cv2.IMWRITE_PNG_COMPRESSION, 9])
Check if image loaded
检查图像是否加载成功
if img is None:
print("Error: Could not load image")
else:
print(f"Image shape: {img.shape}") # (height, width, channels)
if img is None:
print("错误:无法加载图像")
else:
print(f"图像尺寸:{img.shape}") # (高度, 宽度, 通道数)
Display image in window
在窗口中显示图像
cv2.imshow('Window Name', img)
cv2.waitKey(0) # Wait indefinitely for key press
cv2.destroyAllWindows()
cv2.imshow('窗口名称', img)
cv2.waitKey(0) # 无限等待按键输入
cv2.destroyAllWindows()
Display for specific duration (milliseconds)
显示指定时长(毫秒)
cv2.imshow('Image', img)
cv2.waitKey(3000) # Wait 3 seconds
cv2.destroyAllWindows()
cv2.imshow('图像', img)
cv2.waitKey(3000) # 等待3秒
cv2.destroyAllWindows()
Display multiple images
显示多张图像
cv2.imshow('Original', img)
cv2.imshow('Gray', gray)
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.imshow('原始图像', img)
cv2.imshow('灰度图像', gray)
cv2.waitKey(0)
cv2.destroyAllWindows()
Display with matplotlib (convert BGR to RGB!)
使用matplotlib显示(需转换为RGB!)
import matplotlib.pyplot as plt
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img_rgb)
plt.axis('off')
plt.show()
import matplotlib.pyplot as plt
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img_rgb)
plt.axis('off')
plt.show()
Open camera (0 = default, 1 = second camera, etc.)
打开摄像头(0 = 默认摄像头,1 = 第二个摄像头等)
cap = cv2.VideoCapture(0)
cap = cv2.VideoCapture(0)
cap = cv2.VideoCapture('video.mp4')
cap = cv2.VideoCapture('video.mp4')
Check if opened successfully
检查是否成功打开
if not cap.isOpened():
print("Error: Could not open video")
exit()
if not cap.isOpened():
print("错误:无法打开视频")
exit()
Get video properties
获取视频属性
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print(f"Video: {width}x{height} @ {fps} fps, {total_frames} frames")
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print(f"视频信息:{width}x{height} @ {fps} fps,共{total_frames}帧")
Read and process frames
读取并处理帧
while True:
ret, frame = cap.read()
if not ret:
print("End of video or error")
break
# Process frame here
cv2.imshow('Frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
while True:
ret, frame = cap.read()
if not ret:
print("视频结束或发生错误")
break
# 在此处处理帧
cv2.imshow('帧', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
python
import cv2
cap = cv2.VideoCapture('input.mp4')
python
import cv2
cap = cv2.VideoCapture('input.mp4')
Get video properties
获取视频属性
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
Create VideoWriter
创建VideoWriter对象
fourcc = cv2.VideoWriter_fourcc(*'mp4v') # or 'XVID', 'MJPG'
out = cv2.VideoWriter('output.mp4', fourcc, fps, (width, height))
while True:
ret, frame = cap.read()
if not ret:
break
# Process frame
processed = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
processed = cv2.cvtColor(processed, cv2.COLOR_GRAY2BGR) # Convert back to 3-channel
# Write frame
out.write(processed)
cap.release()
out.release()
cv2.destroyAllWindows()
fourcc = cv2.VideoWriter_fourcc(*'mp4v') # 或使用'XVID', 'MJPG'
out = cv2.VideoWriter('output.mp4', fourcc, fps, (width, height))
while True:
ret, frame = cap.read()
if not ret:
break
# 处理帧
processed = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
processed = cv2.cvtColor(processed, cv2.COLOR_GRAY2BGR) # 转换回3通道
# 写入帧
out.write(processed)
cap.release()
out.release()
cv2.destroyAllWindows()
Image Transformations
图像变换
Resizing and Cropping
缩放与裁剪
python
import cv2
img = cv2.imread('image.jpg')
python
import cv2
img = cv2.imread('image.jpg')
Resize to specific dimensions
缩放到指定尺寸
resized = cv2.resize(img, (800, 600)) # (width, height)
resized = cv2.resize(img, (800, 600)) # (宽度, 高度)
Resize by scale factor
按比例缩放
scaled = cv2.resize(img, None, fx=0.5, fy=0.5) # 50% of original
scaled = cv2.resize(img, None, fx=0.5, fy=0.5) # 原始尺寸的50%
Resize with interpolation methods
使用不同插值方法缩放
resized_linear = cv2.resize(img, (800, 600), interpolation=cv2.INTER_LINEAR) # Default
resized_cubic = cv2.resize(img, (800, 600), interpolation=cv2.INTER_CUBIC) # Better quality
resized_area = cv2.resize(img, (400, 300), interpolation=cv2.INTER_AREA) # Best for shrinking
resized_linear = cv2.resize(img, (800, 600), interpolation=cv2.INTER_LINEAR) # 默认方法
resized_cubic = cv2.resize(img, (800, 600), interpolation=cv2.INTER_CUBIC) # 更高质量
resized_area = cv2.resize(img, (400, 300), interpolation=cv2.INTER_AREA) # 缩小图像时最佳
Crop (using NumPy slicing)
裁剪(使用NumPy切片)
height, width = img.shape[:2]
cropped = img[100:400, 200:600] # [y1:y2, x1:x2]
height, width = img.shape[:2]
cropped = img[100:400, 200:600] # [y1:y2, x1:x2]
crop_size = 300
center_x, center_y = width // 2, height // 2
x1 = center_x - crop_size // 2
y1 = center_y - crop_size // 2
center_cropped = img[y1:y1+crop_size, x1:x1+crop_size]
crop_size = 300
center_x, center_y = width // 2, height // 2
x1 = center_x - crop_size // 2
y1 = center_y - crop_size // 2
center_cropped = img[y1:y1+crop_size, x1:x1+crop_size]
Rotation and Flipping
旋转与翻转
flipped_h = cv2.flip(img, 1)
flipped_h = cv2.flip(img, 1)
flipped_v = cv2.flip(img, 0)
flipped_v = cv2.flip(img, 0)
flipped_both = cv2.flip(img, -1)
flipped_both = cv2.flip(img, -1)
Rotate 90 degrees clockwise
顺时针旋转90度
rotated_90 = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)
rotated_90 = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)
rotated_180 = cv2.rotate(img, cv2.ROTATE_180)
rotated_180 = cv2.rotate(img, cv2.ROTATE_180)
Rotate 90 degrees counter-clockwise
逆时针旋转90度
rotated_90_ccw = cv2.rotate(img, cv2.ROTATE_90_COUNTERCLOCKWISE)
rotated_90_ccw = cv2.rotate(img, cv2.ROTATE_90_COUNTERCLOCKWISE)
Rotate by arbitrary angle (around center)
任意角度旋转(围绕中心)
height, width = img.shape[:2]
center = (width // 2, height // 2)
angle = 45 # degrees
height, width = img.shape[:2]
center = (width // 2, height // 2)
angle = 45 # 角度
Get rotation matrix
获取旋转矩阵
M = cv2.getRotationMatrix2D(center, angle, scale=1.0)
M = cv2.getRotationMatrix2D(center, angle, scale=1.0)
rotated = cv2.warpAffine(img, M, (width, height))
rotated = cv2.warpAffine(img, M, (width, height))
M_scaled = cv2.getRotationMatrix2D(center, 30, scale=0.8)
rotated_scaled = cv2.warpAffine(img, M_scaled, (width, height))
M_scaled = cv2.getRotationMatrix2D(center, 30, scale=0.8)
rotated_scaled = cv2.warpAffine(img, M_scaled, (width, height))
Color Space Conversions
颜色空间转换
python
import cv2
img = cv2.imread('image.jpg')
python
import cv2
img = cv2.imread('image.jpg')
BGR to RGB (for matplotlib)
BGR转RGB(用于matplotlib)
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
BGR to HSV (useful for color-based segmentation)
BGR转HSV(适用于基于颜色的分割)
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
Grayscale to BGR (add color channels)
灰度图转BGR(添加颜色通道)
gray_bgr = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)
gray_bgr = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)
Extract individual channels
提取单个通道
merged = cv2.merge([b, g, r])
merged = cv2.merge([b, g, r])
Image Filtering and Enhancement
图像滤波与增强
Blurring and Smoothing
模糊与平滑
Gaussian blur (reduce noise)
高斯模糊(降噪)
blurred = cv2.GaussianBlur(img, (5, 5), 0) # (kernel_size, sigma)
blurred = cv2.GaussianBlur(img, (5, 5), 0) # (核大小, 标准差)
Median blur (good for salt-and-pepper noise)
中值模糊(适用于椒盐噪声)
median = cv2.medianBlur(img, 5) # kernel_size must be odd
median = cv2.medianBlur(img, 5) # 核大小必须为奇数
Bilateral filter (edge-preserving smoothing)
双边滤波(保留边缘的平滑)
bilateral = cv2.bilateralFilter(img, 9, 75, 75) # (d, sigmaColor, sigmaSpace)
bilateral = cv2.bilateralFilter(img, 9, 75, 75) # (直径, 颜色标准差, 空间标准差)
avg_blur = cv2.blur(img, (5, 5))
avg_blur = cv2.blur(img, (5, 5))
box = cv2.boxFilter(img, -1, (5, 5))
box = cv2.boxFilter(img, -1, (5, 5))
Convert to grayscale first
先转换为灰度图
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Canny edge detection
Canny边缘检测
edges = cv2.Canny(gray, threshold1=50, threshold2=150)
edges = cv2.Canny(gray, threshold1=50, threshold2=150)
Sobel edge detection (gradient in x and y)
Sobel边缘检测(x和y方向梯度)
sobelx = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3) # X gradient
sobely = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3) # Y gradient
sobel = cv2.magnitude(sobelx, sobely)
sobelx = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3) # X方向梯度
sobely = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3) # Y方向梯度
sobel = cv2.magnitude(sobelx, sobely)
Laplacian edge detection
Laplacian边缘检测
laplacian = cv2.Laplacian(gray, cv2.CV_64F)
laplacian = cv2.Laplacian(gray, cv2.CV_64F)
Scharr (more accurate than Sobel for small kernels)
Scharr算子(小核时比Sobel更准确)
scharrx = cv2.Scharr(gray, cv2.CV_64F, 1, 0)
scharry = cv2.Scharr(gray, cv2.CV_64F, 0, 1)
scharrx = cv2.Scharr(gray, cv2.CV_64F, 1, 0)
scharry = cv2.Scharr(gray, cv2.CV_64F, 0, 1)
Morphological Operations
形态学操作
python
import cv2
import numpy as np
python
import cv2
import numpy as np
kernel = np.ones((5, 5), np.uint8)
kernel = np.ones((5, 5), np.uint8)
Erosion (shrink white regions)
腐蚀(缩小白色区域)
eroded = cv2.erode(img, kernel, iterations=1)
eroded = cv2.erode(img, kernel, iterations=1)
Dilation (expand white regions)
膨胀(扩大白色区域)
dilated = cv2.dilate(img, kernel, iterations=1)
dilated = cv2.dilate(img, kernel, iterations=1)
Opening (erosion followed by dilation) - removes noise
开运算(先腐蚀后膨胀)- 去除噪声
opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
Closing (dilation followed by erosion) - closes gaps
闭运算(先膨胀后腐蚀)- 填补缝隙
closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
Gradient (difference between dilation and erosion) - outlines
梯度运算(膨胀与腐蚀的差)- 提取轮廓
gradient = cv2.morphologyEx(img, cv2.MORPH_GRADIENT, kernel)
gradient = cv2.morphologyEx(img, cv2.MORPH_GRADIENT, kernel)
Top hat (difference between input and opening) - bright spots
顶帽运算(输入与开运算的差)- 提取亮区域
tophat = cv2.morphologyEx(img, cv2.MORPH_TOPHAT, kernel)
tophat = cv2.morphologyEx(img, cv2.MORPH_TOPHAT, kernel)
Black hat (difference between closing and input) - dark spots
黑帽运算(闭运算与输入的差)- 提取暗区域
blackhat = cv2.morphologyEx(img, cv2.MORPH_BLACKHAT, kernel)
blackhat = cv2.morphologyEx(img, cv2.MORPH_BLACKHAT, kernel)
python
import cv2
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
python
import cv2
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
ret, thresh_inv = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)
ret, thresh_inv = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)
ret, thresh_trunc = cv2.threshold(gray, 127, 255, cv2.THRESH_TRUNC)
ret, thresh_trunc = cv2.threshold(gray, 127, 255, cv2.THRESH_TRUNC)
ret, thresh_tozero = cv2.threshold(gray, 127, 255, cv2.THRESH_TOZERO)
ret, thresh_tozero = cv2.threshold(gray, 127, 255, cv2.THRESH_TOZERO)
Otsu's thresholding (automatic threshold calculation)
Otsu阈值(自动计算阈值)
ret, thresh_otsu = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
ret, thresh_otsu = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
Adaptive thresholding (different threshold for different regions)
自适应阈值(不同区域使用不同阈值)
adaptive_mean = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2
)
adaptive_gaussian = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2
)
adaptive_mean = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2
)
adaptive_gaussian = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2
)
Contours and Shape Detection
轮廓与形状检测
Finding and Drawing Contours
轮廓的查找与绘制
Convert to grayscale and threshold
转换为灰度图并进行阈值处理
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
img_contours = img.copy()
cv2.drawContours(img_contours, contours, -1, (0, 255, 0), 2) # -1 = all contours
img_contours = img.copy()
cv2.drawContours(img_contours, contours, -1, (0, 255, 0), 2) # -1 = 绘制所有轮廓
Draw specific contour
绘制指定轮廓
cv2.drawContours(img_contours, contours, 0, (255, 0, 0), 3) # First contour
cv2.drawContours(img_contours, contours, 0, (255, 0, 0), 3) # 绘制第一个轮廓
Iterate through contours
遍历所有轮廓
for i, contour in enumerate(contours):
# Calculate area
area = cv2.contourArea(contour)
# Calculate perimeter
perimeter = cv2.arcLength(contour, True)
# Filter by area
if area > 1000:
cv2.drawContours(img_contours, [contour], -1, (0, 0, 255), 2)
# Get bounding rectangle
x, y, w, h = cv2.boundingRect(contour)
cv2.rectangle(img_contours, (x, y), (x+w, y+h), (255, 0, 0), 2)
for i, contour in enumerate(contours):
# 计算面积
area = cv2.contourArea(contour)
# 计算周长
perimeter = cv2.arcLength(contour, True)
# 按面积过滤
if area > 1000:
cv2.drawContours(img_contours, [contour], -1, (0, 0, 255), 2)
# 获取边界矩形
x, y, w, h = cv2.boundingRect(contour)
cv2.rectangle(img_contours, (x, y), (x+w, y+h), (255, 0, 0), 2)
python
import cv2
for contour in contours:
# Approximate contour to polygon
epsilon = 0.02 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
# Number of vertices
n_vertices = len(approx)
# Classify shape
if n_vertices == 3:
shape = "Triangle"
elif n_vertices == 4:
# Check if rectangle or square
x, y, w, h = cv2.boundingRect(approx)
aspect_ratio = float(w) / h
shape = "Square" if 0.95 <= aspect_ratio <= 1.05 else "Rectangle"
elif n_vertices > 4:
shape = "Circle" if n_vertices > 10 else "Polygon"
# Draw and label
cv2.drawContours(img, [approx], -1, (0, 255, 0), 2)
x, y = approx[0][0]
cv2.putText(img, shape, (x, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)
python
import cv2
for contour in contours:
# 将轮廓逼近为多边形
epsilon = 0.02 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
# 顶点数量
n_vertices = len(approx)
# 分类形状
if n_vertices == 3:
shape = "三角形"
elif n_vertices == 4:
# 判断是矩形还是正方形
x, y, w, h = cv2.boundingRect(approx)
aspect_ratio = float(w) / h
shape = "正方形" if 0.95 <= aspect_ratio <= 1.05 else "矩形"
elif n_vertices > 4:
shape = "圆形" if n_vertices > 10 else "多边形"
# 绘制并标注
cv2.drawContours(img, [approx], -1, (0, 255, 0), 2)
x, y = approx[0][0]
cv2.putText(img, shape, (x, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)
python
import cv2
import numpy as np
for contour in contours:
# Moments (for center of mass)
M = cv2.moments(contour)
if M['m00'] != 0:
cx = int(M['m10'] / M['m00'])
cy = int(M['m01'] / M['m00'])
cv2.circle(img, (cx, cy), 5, (255, 0, 0), -1)
# Minimum enclosing circle
(x, y), radius = cv2.minEnclosingCircle(contour)
center = (int(x), int(y))
radius = int(radius)
cv2.circle(img, center, radius, (0, 255, 0), 2)
# Fit ellipse (requires at least 5 points)
if len(contour) >= 5:
ellipse = cv2.fitEllipse(contour)
cv2.ellipse(img, ellipse, (255, 0, 255), 2)
# Convex hull
hull = cv2.convexHull(contour)
cv2.drawContours(img, [hull], -1, (0, 255, 255), 2)
# Solidity (contour area / convex hull area)
hull_area = cv2.contourArea(hull)
contour_area = cv2.contourArea(contour)
solidity = contour_area / hull_area if hull_area > 0 else 0
python
import cv2
import numpy as np
for contour in contours:
# 矩(用于计算质心)
M = cv2.moments(contour)
if M['m00'] != 0:
cx = int(M['m10'] / M['m00'])
cy = int(M['m01'] / M['m00'])
cv2.circle(img, (cx, cy), 5, (255, 0, 0), -1)
# 最小外接圆
(x, y), radius = cv2.minEnclosingCircle(contour)
center = (int(x), int(y))
radius = int(radius)
cv2.circle(img, center, radius, (0, 255, 0), 2)
# 拟合椭圆(至少需要5个点)
if len(contour) >= 5:
ellipse = cv2.fitEllipse(contour)
cv2.ellipse(img, ellipse, (255, 0, 255), 2)
# 凸包
hull = cv2.convexHull(contour)
cv2.drawContours(img, [hull], -1, (0, 255, 255), 2)
# 坚实度(轮廓面积 / 凸包面积)
hull_area = cv2.contourArea(hull)
contour_area = cv2.contourArea(contour)
solidity = contour_area / hull_area if hull_area > 0 else 0
Feature Detection and Matching
特征检测与匹配
ORB (Oriented FAST and Rotated BRIEF)
ORB(Oriented FAST and Rotated BRIEF)
python
import cv2
img1 = cv2.imread('image1.jpg', cv2.IMREAD_GRAYSCALE)
img2 = cv2.imread('image2.jpg', cv2.IMREAD_GRAYSCALE)
python
import cv2
img1 = cv2.imread('image1.jpg', cv2.IMREAD_GRAYSCALE)
img2 = cv2.imread('image2.jpg', cv2.IMREAD_GRAYSCALE)
Create ORB detector
创建ORB检测器
orb = cv2.ORB_create(nfeatures=1000)
orb = cv2.ORB_create(nfeatures=1000)
Detect keypoints and compute descriptors
检测关键点并计算描述符
kp1, des1 = orb.detectAndCompute(img1, None)
kp2, des2 = orb.detectAndCompute(img2, None)
kp1, des1 = orb.detectAndCompute(img1, None)
kp2, des2 = orb.detectAndCompute(img2, None)
img1_kp = cv2.drawKeypoints(img1, kp1, None, color=(0, 255, 0))
img1_kp = cv2.drawKeypoints(img1, kp1, None, color=(0, 255, 0))
Match descriptors using BFMatcher
使用BFMatcher匹配描述符
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1, des2)
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1, des2)
Sort matches by distance (best first)
按距离排序匹配结果(最优在前)
matches = sorted(matches, key=lambda x: x.distance)
matches = sorted(matches, key=lambda x: x.distance)
Draw top matches
绘制前N个匹配结果
img_matches = cv2.drawMatches(
img1, kp1, img2, kp2, matches[:50],
None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
)
cv2.imshow('Matches', img_matches)
cv2.waitKey(0)
img_matches = cv2.drawMatches(
img1, kp1, img2, kp2, matches[:50],
None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
)
cv2.imshow('匹配结果', img_matches)
cv2.waitKey(0)
SIFT (Scale-Invariant Feature Transform)
SIFT(Scale-Invariant Feature Transform)
Note: SIFT is in opencv-contrib-python, not opencv-python
注意:SIFT包含在opencv-contrib-python中,而非基础版opencv-python
img = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
img = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
Create SIFT detector
创建SIFT检测器
Detect keypoints and compute descriptors
检测关键点并计算描述符
keypoints, descriptors = sift.detectAndCompute(img, None)
keypoints, descriptors = sift.detectAndCompute(img, None)
img_kp = cv2.drawKeypoints(
img, keypoints, None,
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS
)
print(f"Number of keypoints: {len(keypoints)}")
img_kp = cv2.drawKeypoints(
img, keypoints, None,
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS
)
print(f"关键点数量:{len(keypoints)}")
Feature Matching with FLANN
使用FLANN进行特征匹配
python
import cv2
import numpy as np
python
import cv2
import numpy as np
sift = cv2.SIFT_create()
kp1, des1 = sift.detectAndCompute(img1, None)
kp2, des2 = sift.detectAndCompute(img2, None)
sift = cv2.SIFT_create()
kp1, des1 = sift.detectAndCompute(img1, None)
kp2, des2 = sift.detectAndCompute(img2, None)
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1, des2, k=2)
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1, des2, k=2)
Lowe's ratio test
Lowe比率测试
good_matches = []
for m, n in matches:
if m.distance < 0.7 * n.distance:
good_matches.append(m)
print(f"Good matches: {len(good_matches)}")
good_matches = []
for m, n in matches:
if m.distance < 0.7 * n.distance:
good_matches.append(m)
print(f"有效匹配数量:{len(good_matches)}")
Draw good matches
绘制有效匹配结果
img_matches = cv2.drawMatches(
img1, kp1, img2, kp2, good_matches, None,
flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
)
img_matches = cv2.drawMatches(
img1, kp1, img2, kp2, good_matches, None,
flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
)
Haar Cascade (Face Detection)
Haar级联(人脸检测)
Load pre-trained Haar Cascade
加载预训练的Haar级联分类器
face_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
)
eye_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_eye.xml'
)
img = cv2.imread('people.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
face_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
)
eye_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_eye.xml'
)
img = cv2.imread('people.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(
gray,
scaleFactor=1.1,
minNeighbors=5,
minSize=(30, 30)
)
faces = face_cascade.detectMultiScale(
gray,
scaleFactor=1.1,
minNeighbors=5,
minSize=(30, 30)
)
Draw rectangles around faces
在人脸周围绘制矩形
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 0), 2)
# Detect eyes in face region
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex, ey, ew, eh) in eyes:
cv2.rectangle(roi_color, (ex, ey), (ex+ew, ey+eh), (0, 255, 0), 2)
cv2.imshow('Faces', img)
cv2.waitKey(0)
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 0), 2)
# 在人脸区域内检测眼睛
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex, ey, ew, eh) in eyes:
cv2.rectangle(roi_color, (ex, ey), (ex+ew, ey+eh), (0, 255, 0), 2)
cv2.imshow('人脸检测结果', img)
cv2.waitKey(0)
python
import cv2
img = cv2.imread('image.jpg')
template = cv2.imread('template.jpg')
python
import cv2
img = cv2.imread('image.jpg')
template = cv2.imread('template.jpg')
Convert to grayscale
转换为灰度图
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
h, w = template_gray.shape
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
h, w = template_gray.shape
result = cv2.matchTemplate(img_gray, template_gray, cv2.TM_CCOEFF_NORMED)
result = cv2.matchTemplate(img_gray, template_gray, cv2.TM_CCOEFF_NORMED)
Find locations above threshold
查找高于阈值的匹配位置
threshold = 0.8
locations = np.where(result >= threshold)
threshold = 0.8
locations = np.where(result >= threshold)
for pt in zip(*locations[::-1]):
cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (0, 255, 0), 2)
cv2.imshow('Matches', img)
cv2.waitKey(0)
for pt in zip(*locations[::-1]):
cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (0, 255, 0), 2)
cv2.imshow('匹配结果', img)
cv2.waitKey(0)
1. Document Scanner (Perspective Transform)
1. 文档扫描(透视变换)
python
import cv2
import numpy as np
def order_points(pts):
"""Order points: top-left, top-right, bottom-right, bottom-left."""
rect = np.zeros((4, 2), dtype="float32")
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)] # Top-left
rect[2] = pts[np.argmax(s)] # Bottom-right
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)] # Top-right
rect[3] = pts[np.argmax(diff)] # Bottom-left
return rect
def four_point_transform(image, pts):
"""Apply perspective transform to get bird's eye view."""
rect = order_points(pts)
(tl, tr, br, bl) = rect
# Compute width and height
widthA = np.linalg.norm(br - bl)
widthB = np.linalg.norm(tr - tl)
maxWidth = max(int(widthA), int(widthB))
heightA = np.linalg.norm(tr - br)
heightB = np.linalg.norm(tl - bl)
maxHeight = max(int(heightA), int(heightB))
# Destination points
dst = np.array([
[0, 0],
[maxWidth - 1, 0],
[maxWidth - 1, maxHeight - 1],
[0, maxHeight - 1]
], dtype="float32")
# Perspective transform
M = cv2.getPerspectiveTransform(rect, dst)
warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
return warped
python
import cv2
import numpy as np
def order_points(pts):
"""排序点:左上、右上、右下、左下。"""
rect = np.zeros((4, 2), dtype="float32")
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)] # 左上
rect[2] = pts[np.argmax(s)] # 右下
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)] # 右上
rect[3] = pts[np.argmax(diff)] # 左下
return rect
def four_point_transform(image, pts):
"""应用透视变换获取鸟瞰图。"""
rect = order_points(pts)
(tl, tr, br, bl) = rect
# 计算宽度和高度
widthA = np.linalg.norm(br - bl)
widthB = np.linalg.norm(tr - tl)
maxWidth = max(int(widthA), int(widthB))
heightA = np.linalg.norm(tr - br)
heightB = np.linalg.norm(tl - bl)
maxHeight = max(int(heightA), int(heightB))
# 目标点
dst = np.array([
[0, 0],
[maxWidth - 1, 0],
[maxWidth - 1, maxHeight - 1],
[0, maxHeight - 1]
], dtype="float32")
# 透视变换矩阵
M = cv2.getPerspectiveTransform(rect, dst)
warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
return warped
img = cv2.imread('document.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 50, 150)
img = cv2.imread('document.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 50, 150)
contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=cv2.contourArea, reverse=True)
contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=cv2.contourArea, reverse=True)
Find document contour (assume largest quadrilateral)
查找文档轮廓(假设最大的四边形为文档)
for contour in contours:
peri = cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, 0.02 * peri, True)
if len(approx) == 4:
pts = approx.reshape(4, 2)
scanned = four_point_transform(img, pts)
cv2.imshow('Scanned', scanned)
cv2.waitKey(0)
break
for contour in contours:
peri = cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, 0.02 * peri, True)
if len(approx) == 4:
pts = approx.reshape(4, 2)
scanned = four_point_transform(img, pts)
cv2.imshow('扫描结果', scanned)
cv2.waitKey(0)
break
2. Motion Detection
2. 运动检测
python
import cv2
def detect_motion(video_path):
"""Detect motion in video using frame differencing."""
cap = cv2.VideoCapture(video_path)
ret, frame1 = cap.read()
ret, frame2 = cap.read()
while cap.isOpened():
# Compute difference between frames
diff = cv2.absdiff(frame1, frame2)
gray = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
_, thresh = cv2.threshold(blur, 20, 255, cv2.THRESH_BINARY)
# Dilate to fill gaps
dilated = cv2.dilate(thresh, None, iterations=3)
# Find contours
contours, _ = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Draw bounding boxes
for contour in contours:
if cv2.contourArea(contour) < 500:
continue
x, y, w, h = cv2.boundingRect(contour)
cv2.rectangle(frame1, (x, y), (x+w, y+h), (0, 255, 0), 2)
cv2.putText(frame1, "Motion", (x, y-10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
cv2.imshow('Motion Detection', frame1)
# Update frames
frame1 = frame2
ret, frame2 = cap.read()
if not ret or cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
python
import cv2
def detect_motion(video_path):
"""使用帧差法检测视频中的运动。"""
cap = cv2.VideoCapture(video_path)
ret, frame1 = cap.read()
ret, frame2 = cap.read()
while cap.isOpened():
# 计算帧之间的差异
diff = cv2.absdiff(frame1, frame2)
gray = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
_, thresh = cv2.threshold(blur, 20, 255, cv2.THRESH_BINARY)
# 膨胀操作填补缝隙
dilated = cv2.dilate(thresh, None, iterations=3)
# 查找轮廓
contours, _ = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# 绘制边界框
for contour in contours:
if cv2.contourArea(contour) < 500:
continue
x, y, w, h = cv2.boundingRect(contour)
cv2.rectangle(frame1, (x, y), (x+w, y+h), (0, 255, 0), 2)
cv2.putText(frame1, "运动区域", (x, y-10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
cv2.imshow('运动检测结果', frame1)
# 更新帧
frame1 = frame2
ret, frame2 = cap.read()
if not ret or cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
detect_motion('video.mp4')
detect_motion('video.mp4')
3. Color-Based Object Tracking
3. 基于颜色的目标跟踪
python
import cv2
import numpy as np
def track_colored_object(video_path, lower_color, upper_color):
"""Track object by color in HSV space."""
cap = cv2.VideoCapture(video_path)
while True:
ret, frame = cap.read()
if not ret:
break
# Convert to HSV
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# Create mask for color
mask = cv2.inRange(hsv, lower_color, upper_color)
# Remove noise
mask = cv2.erode(mask, None, iterations=2)
mask = cv2.dilate(mask, None, iterations=2)
# Find contours
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
if contours:
# Find largest contour
largest = max(contours, key=cv2.contourArea)
# Get center and radius
((x, y), radius) = cv2.minEnclosingCircle(largest)
if radius > 10:
# Draw circle and center
cv2.circle(frame, (int(x), int(y)), int(radius), (0, 255, 0), 2)
cv2.circle(frame, (int(x), int(y)), 5, (0, 0, 255), -1)
cv2.imshow('Tracking', frame)
cv2.imshow('Mask', mask)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
python
import cv2
import numpy as np
def track_colored_object(video_path, lower_color, upper_color):
"""在HSV空间中基于颜色跟踪目标。"""
cap = cv2.VideoCapture(video_path)
while True:
ret, frame = cap.read()
if not ret:
break
# 转换为HSV颜色空间
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# 创建颜色掩码
mask = cv2.inRange(hsv, lower_color, upper_color)
# 去除噪声
mask = cv2.erode(mask, None, iterations=2)
mask = cv2.dilate(mask, None, iterations=2)
# 查找轮廓
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
if contours:
# 找到最大的轮廓
largest = max(contours, key=cv2.contourArea)
# 获取中心和半径
((x, y), radius) = cv2.minEnclosingCircle(largest)
if radius > 10:
# 绘制圆形和中心
cv2.circle(frame, (int(x), int(y)), int(radius), (0, 255, 0), 2)
cv2.circle(frame, (int(x), int(y)), 5, (0, 0, 255), -1)
cv2.imshow('跟踪结果', frame)
cv2.imshow('掩码', mask)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Usage: Track red object
使用示例:跟踪红色目标
lower_red = np.array([0, 100, 100])
lower_red = np.array([0, 100, 100])
upper_red = np.array([10, 255, 255])
upper_red = np.array([10, 255, 255])
track_colored_object(0, lower_red, upper_red)
track_colored_object(0, lower_red, upper_red)
4. QR Code Detection
4. QR码检测
python
import cv2
def detect_qr_code(image_path):
"""Detect and decode QR codes."""
img = cv2.imread(image_path)
# Initialize QR code detector
detector = cv2.QRCodeDetector()
# Detect and decode
data, bbox, straight_qrcode = detector.detectAndDecode(img)
if bbox is not None:
# Draw bounding box
n_lines = len(bbox)
for i in range(n_lines):
point1 = tuple(bbox[i][0].astype(int))
point2 = tuple(bbox[(i+1) % n_lines][0].astype(int))
cv2.line(img, point1, point2, (0, 255, 0), 3)
# Display decoded data
if data:
print(f"QR Code data: {data}")
cv2.putText(img, data, (50, 50),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow('QR Code', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
python
import cv2
def detect_qr_code(image_path):
"""检测并解码QR码。"""
img = cv2.imread(image_path)
# 初始化QR码检测器
detector = cv2.QRCodeDetector()
# 检测并解码
data, bbox, straight_qrcode = detector.detectAndDecode(img)
if bbox is not None:
# 绘制边界框
n_lines = len(bbox)
for i in range(n_lines):
point1 = tuple(bbox[i][0].astype(int))
point2 = tuple(bbox[(i+1) % n_lines][0].astype(int))
cv2.line(img, point1, point2, (0, 255, 0), 3)
# 显示解码数据
if data:
print(f"QR码内容:{data}")
cv2.putText(img, data, (50, 50),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
cv2.imshow('QR码检测结果', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
detect_qr_code('qrcode.jpg')
detect_qr_code('qrcode.jpg')
5. Image Stitching (Panorama)
5. 图像拼接(全景图)
python
import cv2
def create_panorama(images):
"""Stitch multiple images into panorama."""
# Create stitcher
stitcher = cv2.Stitcher_create()
# Stitch images
status, pano = stitcher.stitch(images)
if status == cv2.Stitcher_OK:
print("Panorama created successfully")
return pano
else:
print(f"Error: {status}")
return None
python
import cv2
def create_panorama(images):
"""将多张图像拼接为全景图。"""
# 创建拼接器
stitcher = cv2.Stitcher_create()
# 拼接图像
status, pano = stitcher.stitch(images)
if status == cv2.Stitcher_OK:
print("全景图创建成功")
return pano
else:
print(f"错误代码:{status}")
return None
img1 = cv2.imread('image1.jpg')
img2 = cv2.imread('image2.jpg')
img3 = cv2.imread('image3.jpg')
panorama = create_panorama([img1, img2, img3])
if panorama is not None:
cv2.imshow('Panorama', panorama)
cv2.waitKey(0)
img1 = cv2.imread('image1.jpg')
img2 = cv2.imread('image2.jpg')
img3 = cv2.imread('image3.jpg')
panorama = create_panorama([img1, img2, img3])
if panorama is not None:
cv2.imshow('全景图', panorama)
cv2.waitKey(0)
Performance Optimization
性能优化
Use GPU Acceleration
使用GPU加速
Check CUDA availability
检查CUDA可用性
print(f"CUDA devices: {cv2.cuda.getCudaEnabledDeviceCount()}")
print(f"CUDA设备数量:{cv2.cuda.getCudaEnabledDeviceCount()}")
gpu_img = cv2.cuda_GpuMat()
gpu_img.upload(img)
gpu_img = cv2.cuda_GpuMat()
gpu_img.upload(img)
GPU operations (must use cv2.cuda module)
GPU操作(必须使用cv2.cuda模块)
gpu_gray = cv2.cuda.cvtColor(gpu_img, cv2.COLOR_BGR2GRAY)
gpu_gray = cv2.cuda.cvtColor(gpu_img, cv2.COLOR_BGR2GRAY)
Download from GPU
从GPU下载结果
result = gpu_gray.download()
result = gpu_gray.download()
Vectorize Operations
向量化操作
❌ SLOW: Python loops
❌ 缓慢:Python循环
for i in range(height):
for j in range(width):
img[i, j] = img[i, j] * 0.5
for i in range(height):
for j in range(width):
img[i, j] = img[i, j] * 0.5
✅ FAST: NumPy vectorization
✅ 快速:NumPy向量化
img = (img * 0.5).astype(np.uint8)
img = (img * 0.5).astype(np.uint8)
✅ FAST: OpenCV built-in functions
✅ 快速:OpenCV内置函数
img = cv2.convertScaleAbs(img, alpha=0.5, beta=0)
img = cv2.convertScaleAbs(img, alpha=0.5, beta=0)
Multi-threading for Video
视频多线程处理
python
import cv2
from threading import Thread
from queue import Queue
class VideoCapture:
"""Threaded video capture for better performance."""
def __init__(self, src):
self.cap = cv2.VideoCapture(src)
self.q = Queue(maxsize=128)
self.stopped = False
def start(self):
Thread(target=self._reader, daemon=True).start()
return self
def _reader(self):
while not self.stopped:
ret, frame = self.cap.read()
if not ret:
self.stop()
break
self.q.put(frame)
def read(self):
return self.q.get()
def stop(self):
self.stopped = True
self.cap.release()
python
import cv2
from threading import Thread
from queue import Queue
class VideoCapture:
"""多线程视频捕获以提升性能。"""
def __init__(self, src):
self.cap = cv2.VideoCapture(src)
self.q = Queue(maxsize=128)
self.stopped = False
def start(self):
Thread(target=self._reader, daemon=True).start()
return self
def _reader(self):
while not self.stopped:
ret, frame = self.cap.read()
if not ret:
self.stop()
break
self.q.put(frame)
def read(self):
return self.q.get()
def stop(self):
self.stopped = True
self.cap.release()
cap = VideoCapture(0).start()
while True:
frame = cap.read()
# Process frame...
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.stop()
cap = VideoCapture(0).start()
while True:
frame = cap.read()
# 处理帧...
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.stop()
Common Pitfalls and Solutions
常见陷阱与解决方案
The "BGR vs RGB" Color Confusion
“BGR vs RGB”颜色混淆
OpenCV uses BGR, most other libraries use RGB.
❌ Problem: Colors look wrong in matplotlib
❌ 问题:matplotlib中颜色显示错误
img = cv2.imread('image.jpg')
plt.imshow(img) # Blue and red are swapped!
img = cv2.imread('image.jpg')
plt.imshow(img) # 蓝色和红色颠倒!
✅ Solution: Convert to RGB
✅ 解决方案:转换为RGB
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img_rgb)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img_rgb)
✅ Alternative: Use OpenCV's imshow
✅ 替代方案:使用OpenCV的imshow
cv2.imshow('Correct Colors', img)
cv2.waitKey(0)
cv2.imshow('颜色正确显示', img)
cv2.waitKey(0)
The "Window Won't Close" Problem
“窗口无法关闭”问题
Windows stay open without proper key handling.
❌ Problem: Window frozen
❌ 问题:窗口冻结
✅ Solution: Always use waitKey
✅ 解决方案:始终使用waitKey
cv2.imshow('Image', img)
cv2.waitKey(0) # Wait for key press
cv2.destroyAllWindows()
cv2.imshow('图像', img)
cv2.waitKey(0) # 等待按键输入
cv2.destroyAllWindows()
The "Video Capture Not Released" Problem
“视频捕获未释放”问题
Camera stays locked if not released properly.
❌ Problem: Camera locked after crash
❌ 问题:程序崩溃后摄像头仍被锁定
cap = cv2.VideoCapture(0)
cap = cv2.VideoCapture(0)
... code crashes ...
... 程序崩溃 ...
Camera still locked!
摄像头仍被占用!
✅ Solution: Use try-finally
✅ 解决方案:使用try-finally
cap = cv2.VideoCapture(0)
try:
while True:
ret, frame = cap.read()
# ... process ...
finally:
cap.release()
cv2.destroyAllWindows()
cap = cv2.VideoCapture(0)
try:
while True:
ret, frame = cap.read()
# ... 处理 ...
finally:
cap.release()
cv2.destroyAllWindows()
The "Image Modification" Confusion
“图像修改”混淆
Some operations modify in-place, others return new images.
In-place modification
原地修改
cv2.rectangle(img, (10, 10), (100, 100), (0, 255, 0), 2) # Modifies img
cv2.rectangle(img, (10, 10), (100, 100), (0, 255, 0), 2) # 修改原img
blurred = cv2.GaussianBlur(img, (5, 5), 0) # img unchanged
blurred = cv2.GaussianBlur(img, (5, 5), 0) # img保持不变
✅ Always use .copy() if you need original
✅ 若需要保留原图像,始终使用.copy()
img_copy = img.copy()
cv2.rectangle(img_copy, (10, 10), (100, 100), (0, 255, 0), 2)
img_copy = img.copy()
cv2.rectangle(img_copy, (10, 10), (100, 100), (0, 255, 0), 2)
The "Contour Hierarchy" Misunderstanding
“轮廓层级”误解
returns different structures based on retrieval mode.
External contours only (most common)
仅检索外部轮廓(最常用)
contours, hierarchy = cv2.findContours(
thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
)
contours, hierarchy = cv2.findContours(
thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
)
All contours with full hierarchy
检索所有轮廓及完整层级
contours, hierarchy = cv2.findContours(
thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE
)
contours, hierarchy = cv2.findContours(
thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE
)
⚠️ hierarchy structure: [Next, Previous, First_Child, Parent]
⚠️ 层级结构:[下一个轮廓, 上一个轮廓, 第一个子轮廓, 父轮廓]
Most use cases only need RETR_EXTERNAL
大多数场景只需使用RETR_EXTERNAL
OpenCV is the Swiss Army knife of computer vision. Its vast library of optimized algorithms, combined with Python's ease of use, makes it the perfect tool for everything from simple image processing to complex real-time vision systems. Master these fundamentals, and you'll have the foundation to tackle any computer vision challenge.
OpenCV是计算机视觉领域的瑞士军刀。其庞大的优化算法库,结合Python的易用性,使其成为从简单图像处理到复杂实时视觉系统开发的完美工具。掌握这些基础知识,你将具备应对任何计算机视觉挑战的基础。