vision-framework

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Vision Framework

Vision框架

Detect text, faces, barcodes, objects, and body poses in images and video using on-device computer vision. Patterns target iOS 26+ with Swift 6.2, backward-compatible where noted.
See
references/vision-requests.md
for complete code patterns and
references/visionkit-scanner.md
for DataScannerViewController integration.
使用设备端计算机视觉技术检测图像和视频中的文本、人脸、条形码、物体和人体姿态。相关代码模式针对iOS 26+和Swift 6.2设计,部分模式支持向后兼容(详见标注)。
完整代码模式请查看
references/vision-requests.md
,DataScannerViewController集成指南请查看
references/visionkit-scanner.md

Contents

目录

Two API Generations

两代API

Vision has two distinct API layers. Prefer the modern API for new code.
AspectModern (iOS 18+)Legacy
Pattern
let result = try await request.perform(on: image)
VNImageRequestHandler
+ completion handler
Request typesSwift types — structs and classes (
RecognizeTextRequest
,
DetectFaceRectanglesRequest
)
ObjC classes (
VNRecognizeTextRequest
,
VNDetectFaceRectanglesRequest
)
ConcurrencyNative async/awaitCompletion handlers or synchronous
perform
ObservationsTyped return valuesCast
results
from
[Any]
AvailabilityiOS 18+ / macOS 15+iOS 11+
The modern API uses the
ImageProcessingRequest
protocol. Each request type has a
perform(on:orientation:)
method that accepts
CGImage
,
CIImage
,
CVPixelBuffer
,
CMSampleBuffer
,
Data
, or
URL
. Most requests are structs; stateful requests for video tracking (e.g.,
TrackObjectRequest
,
TrackRectangleRequest
,
DetectTrajectoriesRequest
) are final classes.
Vision包含两个截然不同的API层级。新代码优先使用现代API。
维度现代版(iOS 18+)传统版
模式
let result = try await request.perform(on: image)
VNImageRequestHandler
+ 完成回调
请求类型Swift类型——结构体和类(
RecognizeTextRequest
,
DetectFaceRectanglesRequest
OC类(
VNRecognizeTextRequest
,
VNDetectFaceRectanglesRequest
并发机制原生async/await完成回调或同步
perform
观测结果强类型返回值
[Any]
中强制转换
results
适配版本iOS 18+ / macOS 15+iOS 11+
现代API基于
ImageProcessingRequest
协议。每种请求类型都有
perform(on:orientation:)
方法,支持传入
CGImage
CIImage
CVPixelBuffer
CMSampleBuffer
Data
URL
。大多数请求为结构体;视频跟踪相关的有状态请求(如
TrackObjectRequest
TrackRectangleRequest
DetectTrajectoriesRequest
)为最终类。

Request Pattern (Modern API)

请求模式(现代API)

All modern Vision requests follow the same pattern: create a request struct, call
perform(on:)
, and handle the typed result.
swift
import Vision

func recognizeText(in image: CGImage) async throws -> [String] {
    var request = RecognizeTextRequest()
    request.recognitionLevel = .accurate
    request.recognitionLanguages = [Locale.Language(identifier: "en-US")]

    let observations = try await request.perform(on: image)
    return observations.compactMap { observation in
        observation.topCandidates(1).first?.string
    }
}
所有现代Vision请求遵循相同模式:创建请求结构体,调用
perform(on:)
,处理强类型结果。
swift
import Vision

func recognizeText(in image: CGImage) async throws -> [String] {
    var request = RecognizeTextRequest()
    request.recognitionLevel = .accurate
    request.recognitionLanguages = [Locale.Language(identifier: "en-US")]

    let observations = try await request.perform(on: image)
    return observations.compactMap { observation in
        observation.topCandidates(1).first?.string
    }
}

Legacy Pattern (Pre-iOS 18)

传统模式(iOS 18之前)

Use
VNImageRequestHandler
with completion-based requests when targeting older deployment versions.
swift
import Vision

func recognizeTextLegacy(in image: CGImage) throws -> [String] {
    var recognized: [String] = []
    let request = VNRecognizeTextRequest { request, error in
        guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
        recognized = observations.compactMap { $0.topCandidates(1).first?.string }
    }
    request.recognitionLevel = .accurate

    let handler = VNImageRequestHandler(cgImage: image)
    try handler.perform([request])
    return recognized
}
当适配旧版本系统时,使用
VNImageRequestHandler
结合基于回调的请求。
swift
import Vision

func recognizeTextLegacy(in image: CGImage) throws -> [String] {
    var recognized: [String] = []
    let request = VNRecognizeTextRequest { request, error in
        guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
        recognized = observations.compactMap { $0.topCandidates(1).first?.string }
    }
    request.recognitionLevel = .accurate

    let handler = VNImageRequestHandler(cgImage: image)
    try handler.perform([request])
    return recognized
}

Text Recognition (OCR)

文本识别(OCR)

Modern: RecognizeTextRequest (iOS 18+)

现代版:RecognizeTextRequest(iOS 18+)

swift
var request = RecognizeTextRequest()
request.recognitionLevel = .accurate       // .fast for real-time
request.recognitionLanguages = [
    Locale.Language(identifier: "en-US"),
    Locale.Language(identifier: "fr-FR"),
]
request.usesLanguageCorrection = true
request.customWords = ["SwiftUI", "Xcode"] // domain-specific terms

let observations = try await request.perform(on: cgImage)
for observation in observations {
    guard let candidate = observation.topCandidates(1).first else { continue }
    let text = candidate.string
    let confidence = candidate.confidence  // 0.0 ... 1.0
    let bounds = observation.boundingBox   // normalized coordinates
}
swift
var request = RecognizeTextRequest()
request.recognitionLevel = .accurate       // .fast适用于实时场景
request.recognitionLanguages = [
    Locale.Language(identifier: "en-US"),
    Locale.Language(identifier: "fr-FR"),
]
request.usesLanguageCorrection = true
request.customWords = ["SwiftUI", "Xcode"] // 领域特定术语

let observations = try await request.perform(on: cgImage)
for observation in observations {
    guard let candidate = observation.topCandidates(1).first else { continue }
    let text = candidate.string
    let confidence = candidate.confidence  // 0.0 ... 1.0
    let bounds = observation.boundingBox   // 归一化坐标
}

Legacy: VNRecognizeTextRequest

传统版:VNRecognizeTextRequest

swift
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate
request.recognitionLanguages = ["en-US", "fr-FR"]
request.usesLanguageCorrection = true
Key differences: Modern API uses
Locale.Language
for languages; legacy uses string identifiers. Both support
.accurate
(best quality) and
.fast
(real-time suitable) recognition levels.
swift
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate
request.recognitionLanguages = ["en-US", "fr-FR"]
request.usesLanguageCorrection = true
核心差异:现代API使用
Locale.Language
指定语言;传统版使用字符串标识符。两者均支持
.accurate
(最高质量)和
.fast
(适合实时场景)两种识别等级。

Face Detection

人脸检测

Detect face rectangles, landmarks (eyes, nose, mouth), and capture quality.
swift
// Modern API
let faceRequest = DetectFaceRectanglesRequest()
let faces = try await faceRequest.perform(on: cgImage)

for face in faces {
    let boundingBox = face.boundingBox   // normalized CGRect
    let roll = face.roll                 // Measurement<UnitAngle>
    let yaw = face.yaw                  // Measurement<UnitAngle>
}

// Landmarks (eyes, nose, mouth contours)
var landmarkRequest = DetectFaceLandmarksRequest()
let landmarkFaces = try await landmarkRequest.perform(on: cgImage)
for face in landmarkFaces {
    let landmarks = face.landmarks
    let leftEye = landmarks?.leftEye?.normalizedPoints
    let nose = landmarks?.nose?.normalizedPoints
}
检测人脸矩形、面部特征点(眼睛、鼻子、嘴巴)和拍摄质量。
swift
// 现代API
let faceRequest = DetectFaceRectanglesRequest()
let faces = try await faceRequest.perform(on: cgImage)

for face in faces {
    let boundingBox = face.boundingBox   // 归一化CGRect
    let roll = face.roll                 // Measurement<UnitAngle>
    let yaw = face.yaw                  // Measurement<UnitAngle>
}

// 面部特征点(眼睛、鼻子、嘴巴轮廓)
var landmarkRequest = DetectFaceLandmarksRequest()
let landmarkFaces = try await landmarkRequest.perform(on: cgImage)
for face in landmarkFaces {
    let landmarks = face.landmarks
    let leftEye = landmarks?.leftEye?.normalizedPoints
    let nose = landmarks?.nose?.normalizedPoints
}

Coordinate System

坐标系

Vision uses a normalized coordinate system with origin at the bottom-left. Convert to UIKit (top-left origin) before display:
swift
func convertToUIKit(_ rect: CGRect, imageHeight: CGFloat) -> CGRect {
    CGRect(
        x: rect.origin.x,
        y: imageHeight - rect.origin.y - rect.height,
        width: rect.width,
        height: rect.height
    )
}
Vision使用归一化坐标系,原点位于左下角。在UI展示前需转换为UIKit的左上角原点坐标系:
swift
func convertToUIKit(_ rect: CGRect, imageHeight: CGFloat) -> CGRect {
    CGRect(
        x: rect.origin.x,
        y: imageHeight - rect.origin.y - rect.height,
        width: rect.width,
        height: rect.height
    )
}

Barcode Detection

条形码检测

Detect 1D and 2D barcodes including QR codes.
swift
var request = DetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128, .pdf417]

let barcodes = try await request.perform(on: cgImage)
for barcode in barcodes {
    let payload = barcode.payloadString          // decoded content
    let symbology = barcode.symbology            // .qr, .ean13, etc.
    let bounds = barcode.boundingBox             // normalized rect
}
Common symbologies:
.qr
,
.aztec
,
.pdf417
,
.dataMatrix
,
.ean8
,
.ean13
,
.code39
,
.code128
,
.upce
,
.itf14
.
检测一维和二维条形码,包括二维码。
swift
var request = DetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128, .pdf417]

let barcodes = try await request.perform(on: cgImage)
for barcode in barcodes {
    let payload = barcode.payloadString          // 解码内容
    let symbology = barcode.symbology            // .qr, .ean13等
    let bounds = barcode.boundingBox             // 归一化矩形
}
常见码制:
.qr
.aztec
.pdf417
.dataMatrix
.ean8
.ean13
.code39
.code128
.upce
.itf14

Document Scanning (iOS 26+)

文档扫描(iOS 26+)

RecognizeDocumentsRequest
provides structured document reading with layout understanding beyond basic OCR. Returns
DocumentObservation
objects with a nested
Container
structure for paragraphs, tables, lists, and barcodes.
swift
var request = RecognizeDocumentsRequest()
let documents = try await request.perform(on: cgImage)

for observation in documents {
    let container = observation.document

    // Full text content
    let fullText = container.text

    // Structured access to paragraphs
    for paragraph in container.paragraphs {
        let paragraphText = paragraph.text
    }

    // Tables and lists
    for table in container.tables { /* structured table data */ }
    for list in container.lists { /* structured list data */ }

    // Embedded barcodes detected within the document
    for barcode in container.barcodes { /* barcode data */ }

    // Document title if detected
    if let title = container.title { print(title) }
}
For simpler document camera scanning, use VisionKit's
VNDocumentCameraViewController
which provides a full-screen camera UI with auto-capture, perspective correction, and multi-page scanning.
RecognizeDocumentsRequest
提供结构化文档读取功能,布局理解能力远超基础OCR。返回包含嵌套
Container
结构的
DocumentObservation
对象,可提取段落、表格、列表和条形码。
swift
var request = RecognizeDocumentsRequest()
let documents = try await request.perform(on: cgImage)

for observation in documents {
    let container = observation.document

    // 完整文本内容
    let fullText = container.text

    // 结构化访问段落
    for paragraph in container.paragraphs {
        let paragraphText = paragraph.text
    }

    // 表格和列表
    for table in container.tables { /* 结构化表格数据 */ }
    for list in container.lists { /* 结构化列表数据 */ }

    // 文档中嵌入的条形码
    for barcode in container.barcodes { /* 条形码数据 */ }

    // 检测到的文档标题
    if let title = container.title { print(title) }
}
如需更简单的文档相机扫描功能,可使用VisionKit的
VNDocumentCameraViewController
,它提供全屏相机UI,支持自动拍摄、透视校正和多页扫描。

Image Segmentation

图像分割

Modern: GeneratePersonSegmentationRequest (iOS 18+)

现代版:GeneratePersonSegmentationRequest(iOS 18+)

swift
var request = GeneratePersonSegmentationRequest()
request.qualityLevel = .accurate  // .balanced, .fast

let mask = try await request.perform(on: cgImage)
// mask is a PersonSegmentationObservation with a pixelBuffer property
let maskBuffer = mask.pixelBuffer
// Apply mask using Core Image: CIFilter.blendWithMask()
swift
var request = GeneratePersonSegmentationRequest()
request.qualityLevel = .accurate  // .balanced, .fast

let mask = try await request.perform(on: cgImage)
// mask为PersonSegmentationObservation对象,包含pixelBuffer属性
let maskBuffer = mask.pixelBuffer
// 使用Core Image应用蒙版:CIFilter.blendWithMask()

Legacy: VNGeneratePersonSegmentationRequest

传统版:VNGeneratePersonSegmentationRequest

swift
let request = VNGeneratePersonSegmentationRequest()
request.qualityLevel = .accurate  // .balanced, .fast
request.outputPixelFormat = kCVPixelFormatType_OneComponent8

let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])

guard let mask = request.results?.first?.pixelBuffer else { return }
// Apply mask using Core Image: CIFilter.blendWithMask()
Quality levels:
  • .accurate
    -- best quality, slowest (~1s), full resolution
  • .balanced
    -- good quality, moderate speed (~100ms), 960x540
  • .fast
    -- lowest quality, fastest (~10ms), 256x144, suitable for real-time
swift
let request = VNGeneratePersonSegmentationRequest()
request.qualityLevel = .accurate  // .balanced, .fast
request.outputPixelFormat = kCVPixelFormatType_OneComponent8

let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])

guard let mask = request.results?.first?.pixelBuffer else { return }
// 使用Core Image应用蒙版:CIFilter.blendWithMask()
质量等级说明:
  • .accurate
    -- 最高质量,速度最慢(约1秒),全分辨率
  • .balanced
    -- 质量良好,速度适中(约100毫秒),960x540
  • .fast
    -- 最低质量,速度最快(约10毫秒),256x144,适合实时场景

Instance Segmentation (iOS 18+)

实例分割(iOS 18+)

Separate masks per person for individual effects.
swift
// Modern API (iOS 18+)
let request = GeneratePersonInstanceMaskRequest()
let observation = try await request.perform(on: cgImage)
let indices = observation.allInstances

for index in indices {
    let mask = try observation.generateMask(forInstances: IndexSet(integer: index))
    // mask is a CVPixelBuffer with only this person visible
}
swift
// Legacy API (iOS 17+)
let request = VNGeneratePersonInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])

guard let result = request.results?.first else { return }
let indices = result.allInstances
for index in indices {
    let instanceMask = try result.generateMaskedImage(
        ofInstances: IndexSet(integer: index),
        from: handler,
        croppedToInstancesExtent: false
    )
}
See
references/vision-requests.md
for mask composition and Core Image filter integration patterns.
为每个人生成独立蒙版,支持个性化效果。
swift
// 现代API(iOS 18+)
let request = GeneratePersonInstanceMaskRequest()
let observation = try await request.perform(on: cgImage)
let indices = observation.allInstances

for index in indices {
    let mask = try observation.generateMask(forInstances: IndexSet(integer: index))
    // mask为仅包含当前人物的CVPixelBuffer
}
swift
// 传统API(iOS 17+)
let request = VNGeneratePersonInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])

guard let result = request.results?.first else { return }
let indices = result.allInstances
for index in indices {
    let instanceMask = try result.generateMaskedImage(
        ofInstances: IndexSet(integer: index),
        from: handler,
        croppedToInstancesExtent: false
    )
}
蒙版合成和Core Image滤镜集成模式请查看
references/vision-requests.md

Object Tracking

目标跟踪

Modern: TrackObjectRequest (iOS 18+)

现代版:TrackObjectRequest(iOS 18+)

TrackObjectRequest
is a stateful request that maintains tracking context across frames. Conforms to both
ImageProcessingRequest
and
StatefulRequest
.
swift
// Initialize with a detected object's bounding box
let initialObservation = DetectedObjectObservation(boundingBox: detectedRect)
var request = TrackObjectRequest(observation: initialObservation)
request.trackingLevel = .accurate

// For each video frame:
let results = try await request.perform(on: pixelBuffer)
if let tracked = results.first {
    let updatedBounds = tracked.boundingBox
    let confidence = tracked.confidence
}
TrackObjectRequest
是有状态请求,可在多帧之间维护跟踪上下文。同时遵循
ImageProcessingRequest
StatefulRequest
协议。
swift
// 基于检测到的物体边界框初始化
let initialObservation = DetectedObjectObservation(boundingBox: detectedRect)
var request = TrackObjectRequest(observation: initialObservation)
request.trackingLevel = .accurate

// 处理每一帧视频:
let results = try await request.perform(on: pixelBuffer)
if let tracked = results.first {
    let updatedBounds = tracked.boundingBox
    let confidence = tracked.confidence
}

Legacy: VNTrackObjectRequest

传统版:VNTrackObjectRequest

swift
let trackRequest = VNTrackObjectRequest(detectedObjectObservation: initialObservation)
trackRequest.trackingLevel = .accurate

let sequenceHandler = VNSequenceRequestHandler()
// For each frame:
try sequenceHandler.perform([trackRequest], on: pixelBuffer)
if let result = trackRequest.results?.first {
    let updatedBounds = result.boundingBox
    trackRequest.inputObservation = result
}
swift
let trackRequest = VNTrackObjectRequest(detectedObjectObservation: initialObservation)
trackRequest.trackingLevel = .accurate

let sequenceHandler = VNSequenceRequestHandler()
// 处理每一帧:
try sequenceHandler.perform([trackRequest], on: pixelBuffer)
if let result = trackRequest.results?.first {
    let updatedBounds = result.boundingBox
    trackRequest.inputObservation = result
}

Other Request Types

其他请求类型

Vision provides additional requests covered in
references/vision-requests.md
:
RequestPurpose
ClassifyImageRequest
Classify scene content (outdoor, food, animal, etc.)
GenerateAttentionBasedSaliencyImageRequest
Heat map of where viewers focus attention
GenerateObjectnessBasedSaliencyImageRequest
Heat map of object-like regions
GenerateForegroundInstanceMaskRequest
Foreground object segmentation (not person-specific)
DetectRectanglesRequest
Detect rectangular shapes (documents, cards, screens)
DetectHorizonRequest
Detect horizon angle for auto-leveling photos
DetectHumanBodyPoseRequest
Detect body joints (shoulders, elbows, knees)
DetectHumanBodyPose3DRequest
3D human body pose estimation
DetectHumanHandPoseRequest
Detect hand joints and finger positions
DetectAnimalBodyPoseRequest
Detect animal body joint positions
DetectFaceCaptureQualityRequest
Face capture quality scoring (0–1) for photo selection
TrackRectangleRequest
Track rectangular objects across video frames
TrackOpticalFlowRequest
Optical flow between video frames
DetectTrajectoriesRequest
Detect object trajectories in video
All modern request types above are iOS 18+ / macOS 15+.
Vision还提供以下请求类型,详情请查看
references/vision-requests.md
请求用途
ClassifyImageRequest
分类场景内容(户外、食物、动物等)
GenerateAttentionBasedSaliencyImageRequest
生成用户注意力热力图
GenerateObjectnessBasedSaliencyImageRequest
生成类物体区域热力图
GenerateForegroundInstanceMaskRequest
前景物体分割(非人物特定)
DetectRectanglesRequest
检测矩形形状(文档、卡片、屏幕)
DetectHorizonRequest
检测地平线角度,用于照片自动校平
DetectHumanBodyPoseRequest
检测人体关节(肩膀、手肘、膝盖)
DetectHumanBodyPose3DRequest
3D人体姿态估计
DetectHumanHandPoseRequest
检测手部关节和手指位置
DetectAnimalBodyPoseRequest
检测动物身体关节位置
DetectFaceCaptureQualityRequest
人脸拍摄质量评分(0–1),用于照片筛选
TrackRectangleRequest
跟踪视频帧中的矩形物体
TrackOpticalFlowRequest
视频帧之间的光流跟踪
DetectTrajectoriesRequest
检测视频中的物体运动轨迹
所有上述现代请求类型均要求iOS 18+ / macOS 15+。

Core ML Integration

Core ML集成

Run custom Core ML models through Vision for automatic image preprocessing (resizing, normalization, color space conversion).
swift
// Modern API (iOS 18+)
let model = try MLModel(contentsOf: modelURL)
let request = CoreMLRequest(model: .init(model))
let results = try await request.perform(on: cgImage)

// Classification model
if let classification = results.first as? ClassificationObservation {
    let label = classification.identifier
    let confidence = classification.confidence
}
swift
// Legacy API
let vnModel = try VNCoreMLModel(for: model)
let request = VNCoreMLRequest(model: vnModel) { request, error in
    guard let results = request.results as? [VNClassificationObservation] else { return }
    let topResult = results.first
}
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
For model conversion and optimization, see the
coreml
skill.
通过Vision运行自定义Core ML模型,自动完成图像预处理(缩放、归一化、色彩空间转换)。
swift
// 现代API(iOS 18+)
let model = try MLModel(contentsOf: modelURL)
let request = CoreMLRequest(model: .init(model))
let results = try await request.perform(on: cgImage)

// 分类模型
if let classification = results.first as? ClassificationObservation {
    let label = classification.identifier
    let confidence = classification.confidence
}
swift
// 传统API
let vnModel = try VNCoreMLModel(for: model)
let request = VNCoreMLRequest(model: vnModel) { request, error in
    guard let results = request.results as? [VNClassificationObservation] else { return }
    let topResult = results.first
}
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
模型转换和优化相关内容请查看
coreml
技能文档。

VisionKit: DataScannerViewController

VisionKit: DataScannerViewController

DataScannerViewController
provides a full-screen live camera scanner for text and barcodes. See
references/visionkit-scanner.md
for complete patterns.
DataScannerViewController
提供全屏实时相机扫描功能,支持文本和条形码识别。完整实现模式请查看
references/visionkit-scanner.md

Quick Start

快速开始

swift
import VisionKit

// Check availability (requires A12+ chip and camera)
guard DataScannerViewController.isSupported,
      DataScannerViewController.isAvailable else { return }

let scanner = DataScannerViewController(
    recognizedDataTypes: [
        .text(languages: ["en"]),
        .barcode(symbologies: [.qr, .ean13])
    ],
    qualityLevel: .balanced,
    recognizesMultipleItems: true,
    isHighFrameRateTrackingEnabled: true,
    isHighlightingEnabled: true
)
scanner.delegate = self
present(scanner, animated: true) {
    try? scanner.startScanning()
}
swift
import VisionKit

// 检查可用性(需要A12+芯片和相机权限)
guard DataScannerViewController.isSupported,
      DataScannerViewController.isAvailable else { return }

let scanner = DataScannerViewController(
    recognizedDataTypes: [
        .text(languages: ["en"]),
        .barcode(symbologies: [.qr, .ean13])
    ],
    qualityLevel: .balanced,
    recognizesMultipleItems: true,
    isHighFrameRateTrackingEnabled: true,
    isHighlightingEnabled: true
)
scanner.delegate = self
present(scanner, animated: true) {
    try? scanner.startScanning()
}

SwiftUI Integration

SwiftUI集成

Wrap
DataScannerViewController
in
UIViewControllerRepresentable
. See
references/visionkit-scanner.md
for the full implementation.
DataScannerViewController
包装为
UIViewControllerRepresentable
。完整实现请查看
references/visionkit-scanner.md

Common Mistakes

常见错误

DON'T: Use the legacy
VNImageRequestHandler
API for new iOS 18+ projects. DO: Use modern struct-based requests with
perform(on:)
and async/await. Why: Modern API provides type safety, better Swift concurrency support, and cleaner error handling.
DON'T: Forget to convert normalized coordinates before drawing bounding boxes. DO: Use
VNImageRectForNormalizedRect(_:_:_:)
or manual conversion from bottom-left origin to UIKit top-left origin. Why: Vision uses normalized coordinates (0...1) with bottom-left origin; UIKit uses points with top-left origin.
DON'T: Run Vision requests on the main thread. DO: Perform requests on a background thread or use async/await from a detached task. Why: Image analysis is CPU/GPU-intensive and blocks the UI if run on the main actor.
DON'T: Use
.accurate
recognition level for real-time camera feeds. DO: Use
.fast
for live video,
.accurate
for still images or offline processing. Why: Accurate recognition is too slow for 30fps video; fast recognition trades quality for speed.
DON'T: Ignore the
confidence
score on observations. DO: Filter results by confidence threshold (e.g., > 0.5) appropriate for your use case. Why: Low-confidence results are often incorrect and degrade user experience.
DON'T: Create a new
VNImageRequestHandler
for each frame when tracking objects. DO: Use
VNSequenceRequestHandler
for video frame sequences. Why: Sequence handler maintains temporal context for tracking; per-frame handlers lose state.
DON'T: Request all barcode symbologies when you only need QR codes. DO: Specify only the symbologies you need in the request. Why: Fewer symbologies means faster detection and fewer false positives.
DON'T: Assume
DataScannerViewController
is available on all devices. DO: Check both
isSupported
(hardware) and
isAvailable
(user permissions) before presenting. Why: Requires A12+ chip;
isAvailable
also checks camera access authorization.
错误做法:在新的iOS 18+项目中使用传统
VNImageRequestHandler
API。 正确做法:使用基于结构体的现代请求,结合
perform(on:)
和async/await。 原因:现代API提供类型安全、更好的Swift并发支持和更简洁的错误处理。
错误做法:绘制边界框前忘记转换归一化坐标。 正确做法:使用
VNImageRectForNormalizedRect(_:_:_:)
或手动将左下角原点转换为UIKit的左上角原点。 原因:Vision使用归一化坐标(0...1),原点在左下角;UIKit使用点坐标,原点在左上角。
错误做法:在主线程运行Vision请求。 正确做法:在后台线程执行请求,或从分离任务中使用async/await。 原因:图像分析是CPU/GPU密集型操作,在主线程运行会阻塞UI。
错误做法:在实时相机流中使用
.accurate
识别等级。 正确做法:实时视频使用
.fast
,静态图像或离线处理使用
.accurate
原因:高精度识别速度过慢,无法适配30fps视频;快速识别以质量换速度,适合实时场景。
错误做法:忽略观测结果的
confidence
分数。 正确做法:根据使用场景设置置信度阈值(如>0.5)过滤结果。 原因:低置信度结果通常不准确,会降低用户体验。
错误做法:跟踪物体时为每一帧创建新的
VNImageRequestHandler
正确做法:对视频帧序列使用
VNSequenceRequestHandler
原因:序列处理程序维护跟踪的时间上下文;逐帧处理程序会丢失状态。
错误做法:仅需要二维码时请求所有条形码类型。 正确做法:在请求中仅指定所需的码制。 原因:更少的码制意味着更快的检测速度和更少的误识别。
错误做法:假设
DataScannerViewController
在所有设备上都可用。 正确做法:展示前检查
isSupported
(硬件)和
isAvailable
(用户权限)。 原因:需要A12+芯片;
isAvailable
还会检查相机访问授权。

Review Checklist

审核检查清单

  • Uses modern Vision API (iOS 18+) unless targeting older deployments
  • Vision requests run off the main thread (async/await or background queue)
  • Normalized coordinates converted before UI display
  • Confidence threshold applied to filter low-quality observations
  • Recognition level matches use case (
    .fast
    for video,
    .accurate
    for stills)
  • Language hints set for text recognition when input language is known
  • Barcode symbologies limited to only those needed
  • DataScannerViewController
    availability checked before presentation
  • Camera usage description (
    NSCameraUsageDescription
    ) in Info.plist for VisionKit
  • Person segmentation quality level appropriate for use case
  • VNSequenceRequestHandler
    used for video frame tracking (not per-frame handler)
  • Error handling covers request failures and empty results
  • 除非适配旧版本,否则使用现代Vision API(iOS 18+)
  • Vision请求在主线程外执行(async/await或后台队列)
  • 归一化坐标已转换为UI展示格式
  • 已设置置信度阈值过滤低质量观测结果
  • 识别等级与使用场景匹配(视频用
    .fast
    ,静态图像用
    .accurate
  • 已知输入语言时,为文本识别设置语言提示
  • 仅指定所需的条形码类型
  • 展示
    DataScannerViewController
    前已检查可用性
  • Info.plist中添加了相机使用描述(
    NSCameraUsageDescription
  • 人物分割质量等级与使用场景匹配
  • 视频帧跟踪使用
    VNSequenceRequestHandler
    (而非逐帧处理程序)
  • 错误处理覆盖了请求失败和空结果场景

References

参考资料