vision-framework
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVision Framework
Vision框架
Detect text, faces, barcodes, objects, and body poses in images and video using
on-device computer vision. Patterns target iOS 26+ with Swift 6.2,
backward-compatible where noted.
See for complete code patterns and
for DataScannerViewController integration.
references/vision-requests.mdreferences/visionkit-scanner.md使用设备端计算机视觉技术检测图像和视频中的文本、人脸、条形码、物体和人体姿态。相关代码模式针对iOS 26+和Swift 6.2设计,部分模式支持向后兼容(详见标注)。
完整代码模式请查看,DataScannerViewController集成指南请查看。
references/vision-requests.mdreferences/visionkit-scanner.mdContents
目录
Two API Generations
两代API
Vision has two distinct API layers. Prefer the modern API for new code.
| Aspect | Modern (iOS 18+) | Legacy |
|---|---|---|
| Pattern | | |
| Request types | Swift types — structs and classes ( | ObjC classes ( |
| Concurrency | Native async/await | Completion handlers or synchronous |
| Observations | Typed return values | Cast |
| Availability | iOS 18+ / macOS 15+ | iOS 11+ |
The modern API uses the protocol. Each request type
has a method that accepts , ,
, , , or . Most requests are
structs; stateful requests for video tracking (e.g., ,
, ) are final classes.
ImageProcessingRequestperform(on:orientation:)CGImageCIImageCVPixelBufferCMSampleBufferDataURLTrackObjectRequestTrackRectangleRequestDetectTrajectoriesRequestVision包含两个截然不同的API层级。新代码优先使用现代API。
| 维度 | 现代版(iOS 18+) | 传统版 |
|---|---|---|
| 模式 | | |
| 请求类型 | Swift类型——结构体和类( | OC类( |
| 并发机制 | 原生async/await | 完成回调或同步 |
| 观测结果 | 强类型返回值 | 从 |
| 适配版本 | iOS 18+ / macOS 15+ | iOS 11+ |
现代API基于协议。每种请求类型都有方法,支持传入、、、、或。大多数请求为结构体;视频跟踪相关的有状态请求(如、、)为最终类。
ImageProcessingRequestperform(on:orientation:)CGImageCIImageCVPixelBufferCMSampleBufferDataURLTrackObjectRequestTrackRectangleRequestDetectTrajectoriesRequestRequest Pattern (Modern API)
请求模式(现代API)
All modern Vision requests follow the same pattern: create a request struct,
call , and handle the typed result.
perform(on:)swift
import Vision
func recognizeText(in image: CGImage) async throws -> [String] {
var request = RecognizeTextRequest()
request.recognitionLevel = .accurate
request.recognitionLanguages = [Locale.Language(identifier: "en-US")]
let observations = try await request.perform(on: image)
return observations.compactMap { observation in
observation.topCandidates(1).first?.string
}
}所有现代Vision请求遵循相同模式:创建请求结构体,调用,处理强类型结果。
perform(on:)swift
import Vision
func recognizeText(in image: CGImage) async throws -> [String] {
var request = RecognizeTextRequest()
request.recognitionLevel = .accurate
request.recognitionLanguages = [Locale.Language(identifier: "en-US")]
let observations = try await request.perform(on: image)
return observations.compactMap { observation in
observation.topCandidates(1).first?.string
}
}Legacy Pattern (Pre-iOS 18)
传统模式(iOS 18之前)
Use with completion-based requests when targeting
older deployment versions.
VNImageRequestHandlerswift
import Vision
func recognizeTextLegacy(in image: CGImage) throws -> [String] {
var recognized: [String] = []
let request = VNRecognizeTextRequest { request, error in
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
recognized = observations.compactMap { $0.topCandidates(1).first?.string }
}
request.recognitionLevel = .accurate
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
return recognized
}当适配旧版本系统时,使用结合基于回调的请求。
VNImageRequestHandlerswift
import Vision
func recognizeTextLegacy(in image: CGImage) throws -> [String] {
var recognized: [String] = []
let request = VNRecognizeTextRequest { request, error in
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
recognized = observations.compactMap { $0.topCandidates(1).first?.string }
}
request.recognitionLevel = .accurate
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
return recognized
}Text Recognition (OCR)
文本识别(OCR)
Modern: RecognizeTextRequest (iOS 18+)
现代版:RecognizeTextRequest(iOS 18+)
swift
var request = RecognizeTextRequest()
request.recognitionLevel = .accurate // .fast for real-time
request.recognitionLanguages = [
Locale.Language(identifier: "en-US"),
Locale.Language(identifier: "fr-FR"),
]
request.usesLanguageCorrection = true
request.customWords = ["SwiftUI", "Xcode"] // domain-specific terms
let observations = try await request.perform(on: cgImage)
for observation in observations {
guard let candidate = observation.topCandidates(1).first else { continue }
let text = candidate.string
let confidence = candidate.confidence // 0.0 ... 1.0
let bounds = observation.boundingBox // normalized coordinates
}swift
var request = RecognizeTextRequest()
request.recognitionLevel = .accurate // .fast适用于实时场景
request.recognitionLanguages = [
Locale.Language(identifier: "en-US"),
Locale.Language(identifier: "fr-FR"),
]
request.usesLanguageCorrection = true
request.customWords = ["SwiftUI", "Xcode"] // 领域特定术语
let observations = try await request.perform(on: cgImage)
for observation in observations {
guard let candidate = observation.topCandidates(1).first else { continue }
let text = candidate.string
let confidence = candidate.confidence // 0.0 ... 1.0
let bounds = observation.boundingBox // 归一化坐标
}Legacy: VNRecognizeTextRequest
传统版:VNRecognizeTextRequest
swift
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate
request.recognitionLanguages = ["en-US", "fr-FR"]
request.usesLanguageCorrection = trueKey differences: Modern API uses for languages; legacy
uses string identifiers. Both support (best quality) and
(real-time suitable) recognition levels.
Locale.Language.accurate.fastswift
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate
request.recognitionLanguages = ["en-US", "fr-FR"]
request.usesLanguageCorrection = true核心差异:现代API使用指定语言;传统版使用字符串标识符。两者均支持(最高质量)和(适合实时场景)两种识别等级。
Locale.Language.accurate.fastFace Detection
人脸检测
Detect face rectangles, landmarks (eyes, nose, mouth), and capture quality.
swift
// Modern API
let faceRequest = DetectFaceRectanglesRequest()
let faces = try await faceRequest.perform(on: cgImage)
for face in faces {
let boundingBox = face.boundingBox // normalized CGRect
let roll = face.roll // Measurement<UnitAngle>
let yaw = face.yaw // Measurement<UnitAngle>
}
// Landmarks (eyes, nose, mouth contours)
var landmarkRequest = DetectFaceLandmarksRequest()
let landmarkFaces = try await landmarkRequest.perform(on: cgImage)
for face in landmarkFaces {
let landmarks = face.landmarks
let leftEye = landmarks?.leftEye?.normalizedPoints
let nose = landmarks?.nose?.normalizedPoints
}检测人脸矩形、面部特征点(眼睛、鼻子、嘴巴)和拍摄质量。
swift
// 现代API
let faceRequest = DetectFaceRectanglesRequest()
let faces = try await faceRequest.perform(on: cgImage)
for face in faces {
let boundingBox = face.boundingBox // 归一化CGRect
let roll = face.roll // Measurement<UnitAngle>
let yaw = face.yaw // Measurement<UnitAngle>
}
// 面部特征点(眼睛、鼻子、嘴巴轮廓)
var landmarkRequest = DetectFaceLandmarksRequest()
let landmarkFaces = try await landmarkRequest.perform(on: cgImage)
for face in landmarkFaces {
let landmarks = face.landmarks
let leftEye = landmarks?.leftEye?.normalizedPoints
let nose = landmarks?.nose?.normalizedPoints
}Coordinate System
坐标系
Vision uses a normalized coordinate system with origin at the bottom-left.
Convert to UIKit (top-left origin) before display:
swift
func convertToUIKit(_ rect: CGRect, imageHeight: CGFloat) -> CGRect {
CGRect(
x: rect.origin.x,
y: imageHeight - rect.origin.y - rect.height,
width: rect.width,
height: rect.height
)
}Vision使用归一化坐标系,原点位于左下角。在UI展示前需转换为UIKit的左上角原点坐标系:
swift
func convertToUIKit(_ rect: CGRect, imageHeight: CGFloat) -> CGRect {
CGRect(
x: rect.origin.x,
y: imageHeight - rect.origin.y - rect.height,
width: rect.width,
height: rect.height
)
}Barcode Detection
条形码检测
Detect 1D and 2D barcodes including QR codes.
swift
var request = DetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128, .pdf417]
let barcodes = try await request.perform(on: cgImage)
for barcode in barcodes {
let payload = barcode.payloadString // decoded content
let symbology = barcode.symbology // .qr, .ean13, etc.
let bounds = barcode.boundingBox // normalized rect
}Common symbologies: , , , , ,
, , , , .
.qr.aztec.pdf417.dataMatrix.ean8.ean13.code39.code128.upce.itf14检测一维和二维条形码,包括二维码。
swift
var request = DetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128, .pdf417]
let barcodes = try await request.perform(on: cgImage)
for barcode in barcodes {
let payload = barcode.payloadString // 解码内容
let symbology = barcode.symbology // .qr, .ean13等
let bounds = barcode.boundingBox // 归一化矩形
}常见码制:、、、、、、、、、。
.qr.aztec.pdf417.dataMatrix.ean8.ean13.code39.code128.upce.itf14Document Scanning (iOS 26+)
文档扫描(iOS 26+)
RecognizeDocumentsRequestDocumentObservationContainerswift
var request = RecognizeDocumentsRequest()
let documents = try await request.perform(on: cgImage)
for observation in documents {
let container = observation.document
// Full text content
let fullText = container.text
// Structured access to paragraphs
for paragraph in container.paragraphs {
let paragraphText = paragraph.text
}
// Tables and lists
for table in container.tables { /* structured table data */ }
for list in container.lists { /* structured list data */ }
// Embedded barcodes detected within the document
for barcode in container.barcodes { /* barcode data */ }
// Document title if detected
if let title = container.title { print(title) }
}For simpler document camera scanning, use VisionKit's
which provides a full-screen camera UI with
auto-capture, perspective correction, and multi-page scanning.
VNDocumentCameraViewControllerRecognizeDocumentsRequestContainerDocumentObservationswift
var request = RecognizeDocumentsRequest()
let documents = try await request.perform(on: cgImage)
for observation in documents {
let container = observation.document
// 完整文本内容
let fullText = container.text
// 结构化访问段落
for paragraph in container.paragraphs {
let paragraphText = paragraph.text
}
// 表格和列表
for table in container.tables { /* 结构化表格数据 */ }
for list in container.lists { /* 结构化列表数据 */ }
// 文档中嵌入的条形码
for barcode in container.barcodes { /* 条形码数据 */ }
// 检测到的文档标题
if let title = container.title { print(title) }
}如需更简单的文档相机扫描功能,可使用VisionKit的,它提供全屏相机UI,支持自动拍摄、透视校正和多页扫描。
VNDocumentCameraViewControllerImage Segmentation
图像分割
Modern: GeneratePersonSegmentationRequest (iOS 18+)
现代版:GeneratePersonSegmentationRequest(iOS 18+)
swift
var request = GeneratePersonSegmentationRequest()
request.qualityLevel = .accurate // .balanced, .fast
let mask = try await request.perform(on: cgImage)
// mask is a PersonSegmentationObservation with a pixelBuffer property
let maskBuffer = mask.pixelBuffer
// Apply mask using Core Image: CIFilter.blendWithMask()swift
var request = GeneratePersonSegmentationRequest()
request.qualityLevel = .accurate // .balanced, .fast
let mask = try await request.perform(on: cgImage)
// mask为PersonSegmentationObservation对象,包含pixelBuffer属性
let maskBuffer = mask.pixelBuffer
// 使用Core Image应用蒙版:CIFilter.blendWithMask()Legacy: VNGeneratePersonSegmentationRequest
传统版:VNGeneratePersonSegmentationRequest
swift
let request = VNGeneratePersonSegmentationRequest()
request.qualityLevel = .accurate // .balanced, .fast
request.outputPixelFormat = kCVPixelFormatType_OneComponent8
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
guard let mask = request.results?.first?.pixelBuffer else { return }
// Apply mask using Core Image: CIFilter.blendWithMask()Quality levels:
- -- best quality, slowest (~1s), full resolution
.accurate - -- good quality, moderate speed (~100ms), 960x540
.balanced - -- lowest quality, fastest (~10ms), 256x144, suitable for real-time
.fast
swift
let request = VNGeneratePersonSegmentationRequest()
request.qualityLevel = .accurate // .balanced, .fast
request.outputPixelFormat = kCVPixelFormatType_OneComponent8
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
guard let mask = request.results?.first?.pixelBuffer else { return }
// 使用Core Image应用蒙版:CIFilter.blendWithMask()质量等级说明:
- -- 最高质量,速度最慢(约1秒),全分辨率
.accurate - -- 质量良好,速度适中(约100毫秒),960x540
.balanced - -- 最低质量,速度最快(约10毫秒),256x144,适合实时场景
.fast
Instance Segmentation (iOS 18+)
实例分割(iOS 18+)
Separate masks per person for individual effects.
swift
// Modern API (iOS 18+)
let request = GeneratePersonInstanceMaskRequest()
let observation = try await request.perform(on: cgImage)
let indices = observation.allInstances
for index in indices {
let mask = try observation.generateMask(forInstances: IndexSet(integer: index))
// mask is a CVPixelBuffer with only this person visible
}swift
// Legacy API (iOS 17+)
let request = VNGeneratePersonInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
guard let result = request.results?.first else { return }
let indices = result.allInstances
for index in indices {
let instanceMask = try result.generateMaskedImage(
ofInstances: IndexSet(integer: index),
from: handler,
croppedToInstancesExtent: false
)
}See for mask composition and Core Image filter
integration patterns.
references/vision-requests.md为每个人生成独立蒙版,支持个性化效果。
swift
// 现代API(iOS 18+)
let request = GeneratePersonInstanceMaskRequest()
let observation = try await request.perform(on: cgImage)
let indices = observation.allInstances
for index in indices {
let mask = try observation.generateMask(forInstances: IndexSet(integer: index))
// mask为仅包含当前人物的CVPixelBuffer
}swift
// 传统API(iOS 17+)
let request = VNGeneratePersonInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
guard let result = request.results?.first else { return }
let indices = result.allInstances
for index in indices {
let instanceMask = try result.generateMaskedImage(
ofInstances: IndexSet(integer: index),
from: handler,
croppedToInstancesExtent: false
)
}蒙版合成和Core Image滤镜集成模式请查看。
references/vision-requests.mdObject Tracking
目标跟踪
Modern: TrackObjectRequest (iOS 18+)
现代版:TrackObjectRequest(iOS 18+)
TrackObjectRequestImageProcessingRequestStatefulRequestswift
// Initialize with a detected object's bounding box
let initialObservation = DetectedObjectObservation(boundingBox: detectedRect)
var request = TrackObjectRequest(observation: initialObservation)
request.trackingLevel = .accurate
// For each video frame:
let results = try await request.perform(on: pixelBuffer)
if let tracked = results.first {
let updatedBounds = tracked.boundingBox
let confidence = tracked.confidence
}TrackObjectRequestImageProcessingRequestStatefulRequestswift
// 基于检测到的物体边界框初始化
let initialObservation = DetectedObjectObservation(boundingBox: detectedRect)
var request = TrackObjectRequest(observation: initialObservation)
request.trackingLevel = .accurate
// 处理每一帧视频:
let results = try await request.perform(on: pixelBuffer)
if let tracked = results.first {
let updatedBounds = tracked.boundingBox
let confidence = tracked.confidence
}Legacy: VNTrackObjectRequest
传统版:VNTrackObjectRequest
swift
let trackRequest = VNTrackObjectRequest(detectedObjectObservation: initialObservation)
trackRequest.trackingLevel = .accurate
let sequenceHandler = VNSequenceRequestHandler()
// For each frame:
try sequenceHandler.perform([trackRequest], on: pixelBuffer)
if let result = trackRequest.results?.first {
let updatedBounds = result.boundingBox
trackRequest.inputObservation = result
}swift
let trackRequest = VNTrackObjectRequest(detectedObjectObservation: initialObservation)
trackRequest.trackingLevel = .accurate
let sequenceHandler = VNSequenceRequestHandler()
// 处理每一帧:
try sequenceHandler.perform([trackRequest], on: pixelBuffer)
if let result = trackRequest.results?.first {
let updatedBounds = result.boundingBox
trackRequest.inputObservation = result
}Other Request Types
其他请求类型
Vision provides additional requests covered in :
references/vision-requests.md| Request | Purpose |
|---|---|
| Classify scene content (outdoor, food, animal, etc.) |
| Heat map of where viewers focus attention |
| Heat map of object-like regions |
| Foreground object segmentation (not person-specific) |
| Detect rectangular shapes (documents, cards, screens) |
| Detect horizon angle for auto-leveling photos |
| Detect body joints (shoulders, elbows, knees) |
| 3D human body pose estimation |
| Detect hand joints and finger positions |
| Detect animal body joint positions |
| Face capture quality scoring (0–1) for photo selection |
| Track rectangular objects across video frames |
| Optical flow between video frames |
| Detect object trajectories in video |
All modern request types above are iOS 18+ / macOS 15+.
Vision还提供以下请求类型,详情请查看:
references/vision-requests.md| 请求 | 用途 |
|---|---|
| 分类场景内容(户外、食物、动物等) |
| 生成用户注意力热力图 |
| 生成类物体区域热力图 |
| 前景物体分割(非人物特定) |
| 检测矩形形状(文档、卡片、屏幕) |
| 检测地平线角度,用于照片自动校平 |
| 检测人体关节(肩膀、手肘、膝盖) |
| 3D人体姿态估计 |
| 检测手部关节和手指位置 |
| 检测动物身体关节位置 |
| 人脸拍摄质量评分(0–1),用于照片筛选 |
| 跟踪视频帧中的矩形物体 |
| 视频帧之间的光流跟踪 |
| 检测视频中的物体运动轨迹 |
所有上述现代请求类型均要求iOS 18+ / macOS 15+。
Core ML Integration
Core ML集成
Run custom Core ML models through Vision for automatic image preprocessing
(resizing, normalization, color space conversion).
swift
// Modern API (iOS 18+)
let model = try MLModel(contentsOf: modelURL)
let request = CoreMLRequest(model: .init(model))
let results = try await request.perform(on: cgImage)
// Classification model
if let classification = results.first as? ClassificationObservation {
let label = classification.identifier
let confidence = classification.confidence
}swift
// Legacy API
let vnModel = try VNCoreMLModel(for: model)
let request = VNCoreMLRequest(model: vnModel) { request, error in
guard let results = request.results as? [VNClassificationObservation] else { return }
let topResult = results.first
}
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])For model conversion and optimization, see the skill.
coreml通过Vision运行自定义Core ML模型,自动完成图像预处理(缩放、归一化、色彩空间转换)。
swift
// 现代API(iOS 18+)
let model = try MLModel(contentsOf: modelURL)
let request = CoreMLRequest(model: .init(model))
let results = try await request.perform(on: cgImage)
// 分类模型
if let classification = results.first as? ClassificationObservation {
let label = classification.identifier
let confidence = classification.confidence
}swift
// 传统API
let vnModel = try VNCoreMLModel(for: model)
let request = VNCoreMLRequest(model: vnModel) { request, error in
guard let results = request.results as? [VNClassificationObservation] else { return }
let topResult = results.first
}
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])模型转换和优化相关内容请查看技能文档。
coremlVisionKit: DataScannerViewController
VisionKit: DataScannerViewController
DataScannerViewControllerreferences/visionkit-scanner.mdDataScannerViewControllerreferences/visionkit-scanner.mdQuick Start
快速开始
swift
import VisionKit
// Check availability (requires A12+ chip and camera)
guard DataScannerViewController.isSupported,
DataScannerViewController.isAvailable else { return }
let scanner = DataScannerViewController(
recognizedDataTypes: [
.text(languages: ["en"]),
.barcode(symbologies: [.qr, .ean13])
],
qualityLevel: .balanced,
recognizesMultipleItems: true,
isHighFrameRateTrackingEnabled: true,
isHighlightingEnabled: true
)
scanner.delegate = self
present(scanner, animated: true) {
try? scanner.startScanning()
}swift
import VisionKit
// 检查可用性(需要A12+芯片和相机权限)
guard DataScannerViewController.isSupported,
DataScannerViewController.isAvailable else { return }
let scanner = DataScannerViewController(
recognizedDataTypes: [
.text(languages: ["en"]),
.barcode(symbologies: [.qr, .ean13])
],
qualityLevel: .balanced,
recognizesMultipleItems: true,
isHighFrameRateTrackingEnabled: true,
isHighlightingEnabled: true
)
scanner.delegate = self
present(scanner, animated: true) {
try? scanner.startScanning()
}SwiftUI Integration
SwiftUI集成
Wrap in . See
for the full implementation.
DataScannerViewControllerUIViewControllerRepresentablereferences/visionkit-scanner.md将包装为。完整实现请查看。
DataScannerViewControllerUIViewControllerRepresentablereferences/visionkit-scanner.mdCommon Mistakes
常见错误
DON'T: Use the legacy API for new iOS 18+ projects.
DO: Use modern struct-based requests with and async/await.
Why: Modern API provides type safety, better Swift concurrency support, and cleaner error handling.
VNImageRequestHandlerperform(on:)DON'T: Forget to convert normalized coordinates before drawing bounding boxes.
DO: Use or manual conversion from bottom-left origin to UIKit top-left origin.
Why: Vision uses normalized coordinates (0...1) with bottom-left origin; UIKit uses points with top-left origin.
VNImageRectForNormalizedRect(_:_:_:)DON'T: Run Vision requests on the main thread.
DO: Perform requests on a background thread or use async/await from a detached task.
Why: Image analysis is CPU/GPU-intensive and blocks the UI if run on the main actor.
DON'T: Use recognition level for real-time camera feeds.
DO: Use for live video, for still images or offline processing.
Why: Accurate recognition is too slow for 30fps video; fast recognition trades quality for speed.
.accurate.fast.accurateDON'T: Ignore the score on observations.
DO: Filter results by confidence threshold (e.g., > 0.5) appropriate for your use case.
Why: Low-confidence results are often incorrect and degrade user experience.
confidenceDON'T: Create a new for each frame when tracking objects.
DO: Use for video frame sequences.
Why: Sequence handler maintains temporal context for tracking; per-frame handlers lose state.
VNImageRequestHandlerVNSequenceRequestHandlerDON'T: Request all barcode symbologies when you only need QR codes.
DO: Specify only the symbologies you need in the request.
Why: Fewer symbologies means faster detection and fewer false positives.
DON'T: Assume is available on all devices.
DO: Check both (hardware) and (user permissions) before presenting.
Why: Requires A12+ chip; also checks camera access authorization.
DataScannerViewControllerisSupportedisAvailableisAvailable错误做法:在新的iOS 18+项目中使用传统API。
正确做法:使用基于结构体的现代请求,结合和async/await。
原因:现代API提供类型安全、更好的Swift并发支持和更简洁的错误处理。
VNImageRequestHandlerperform(on:)错误做法:绘制边界框前忘记转换归一化坐标。
正确做法:使用或手动将左下角原点转换为UIKit的左上角原点。
原因:Vision使用归一化坐标(0...1),原点在左下角;UIKit使用点坐标,原点在左上角。
VNImageRectForNormalizedRect(_:_:_:)错误做法:在主线程运行Vision请求。
正确做法:在后台线程执行请求,或从分离任务中使用async/await。
原因:图像分析是CPU/GPU密集型操作,在主线程运行会阻塞UI。
错误做法:在实时相机流中使用识别等级。
正确做法:实时视频使用,静态图像或离线处理使用。
原因:高精度识别速度过慢,无法适配30fps视频;快速识别以质量换速度,适合实时场景。
.accurate.fast.accurate错误做法:忽略观测结果的分数。
正确做法:根据使用场景设置置信度阈值(如>0.5)过滤结果。
原因:低置信度结果通常不准确,会降低用户体验。
confidence错误做法:跟踪物体时为每一帧创建新的。
正确做法:对视频帧序列使用。
原因:序列处理程序维护跟踪的时间上下文;逐帧处理程序会丢失状态。
VNImageRequestHandlerVNSequenceRequestHandler错误做法:仅需要二维码时请求所有条形码类型。
正确做法:在请求中仅指定所需的码制。
原因:更少的码制意味着更快的检测速度和更少的误识别。
错误做法:假设在所有设备上都可用。
正确做法:展示前检查(硬件)和(用户权限)。
原因:需要A12+芯片;还会检查相机访问授权。
DataScannerViewControllerisSupportedisAvailableisAvailableReview Checklist
审核检查清单
- Uses modern Vision API (iOS 18+) unless targeting older deployments
- Vision requests run off the main thread (async/await or background queue)
- Normalized coordinates converted before UI display
- Confidence threshold applied to filter low-quality observations
- Recognition level matches use case (for video,
.fastfor stills).accurate - Language hints set for text recognition when input language is known
- Barcode symbologies limited to only those needed
- availability checked before presentation
DataScannerViewController - Camera usage description () in Info.plist for VisionKit
NSCameraUsageDescription - Person segmentation quality level appropriate for use case
- used for video frame tracking (not per-frame handler)
VNSequenceRequestHandler - Error handling covers request failures and empty results
- 除非适配旧版本,否则使用现代Vision API(iOS 18+)
- Vision请求在主线程外执行(async/await或后台队列)
- 归一化坐标已转换为UI展示格式
- 已设置置信度阈值过滤低质量观测结果
- 识别等级与使用场景匹配(视频用,静态图像用
.fast).accurate - 已知输入语言时,为文本识别设置语言提示
- 仅指定所需的条形码类型
- 展示前已检查可用性
DataScannerViewController - Info.plist中添加了相机使用描述()
NSCameraUsageDescription - 人物分割质量等级与使用场景匹配
- 视频帧跟踪使用(而非逐帧处理程序)
VNSequenceRequestHandler - 错误处理覆盖了请求失败和空结果场景
References
参考资料
- Vision request patterns:
references/vision-requests.md - VisionKit scanner integration:
references/visionkit-scanner.md - Apple docs: Vision | VisionKit | RecognizeTextRequest | DataScannerViewController
- Vision请求模式:
references/vision-requests.md - VisionKit扫描器集成:
references/visionkit-scanner.md - Apple官方文档:Vision | VisionKit | RecognizeTextRequest | DataScannerViewController