axiom-vision-ref

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Vision Framework API Reference

Vision框架API参考

Comprehensive reference for Vision framework computer vision: subject segmentation, hand/body pose detection, person detection, face analysis, text recognition (OCR), barcode detection, and document scanning.
Vision框架计算机视觉的综合参考:主体分割、手部/身体姿态检测、人物检测、面部分析、文本识别(OCR)、条形码检测以及文档扫描。

When to Use This Reference

何时使用本参考

  • Implementing subject lifting using VisionKit or Vision
  • Detecting hand/body poses for gesture recognition or fitness apps
  • Segmenting people from backgrounds or separating multiple individuals
  • Face detection and landmarks for AR effects or authentication
  • Combining Vision APIs to solve complex computer vision problems
  • Looking up specific API signatures and parameter meanings
  • Recognizing text in images (OCR) with VNRecognizeTextRequest
  • Detecting barcodes and QR codes with VNDetectBarcodesRequest
  • Building live scanners with DataScannerViewController
  • Scanning documents with VNDocumentCameraViewController
  • Extracting structured document data with RecognizeDocumentsRequest (iOS 26+)
Related skills: See
axiom-vision
for decision trees and patterns,
axiom-vision-diag
for troubleshooting
  • 使用VisionKit或Vision实现主体提取
  • 检测手部/身体姿态,用于手势识别或健身应用
  • 人物从背景中分割出来,或分离多个个体
  • 面部检测与标记点,用于AR特效或身份验证
  • 组合Vision API解决复杂的计算机视觉问题
  • 查阅特定API的签名和参数含义
  • 使用VNRecognizeTextRequest识别图像中的文本(OCR)
  • 使用VNDetectBarcodesRequest检测条形码和二维码
  • 使用DataScannerViewController构建实时扫描器
  • 使用VNDocumentCameraViewController扫描文档
  • 使用RecognizeDocumentsRequest提取结构化文档数据(iOS 26+)
相关技能:请查看
axiom-vision
获取决策树和模式,
axiom-vision-diag
用于故障排除

Vision Framework Overview

Vision框架概述

Vision provides computer vision algorithms for still images and video:
Core workflow:
  1. Create request (e.g.,
    VNDetectHumanHandPoseRequest()
    )
  2. Create handler with image (
    VNImageRequestHandler(cgImage: image)
    )
  3. Perform request (
    try handler.perform([request])
    )
  4. Access observations from
    request.results
Coordinate system: Lower-left origin, normalized (0.0-1.0) coordinates
Performance: Run on background queue - resource intensive, blocks UI if on main thread
Vision为静态图像和视频提供计算机视觉算法:
核心工作流:
  1. 创建请求(例如
    VNDetectHumanHandPoseRequest()
  2. 使用图像创建处理器(
    VNImageRequestHandler(cgImage: image)
  3. 执行请求(
    try handler.perform([request])
  4. request.results
    中获取观测结果
坐标系:原点在左下角,坐标为归一化值(0.0-1.0)
性能:在后台队列运行——资源密集型,若在主线程运行会阻塞UI

Subject Segmentation APIs

主体分割API

VNGenerateForegroundInstanceMaskRequest

VNGenerateForegroundInstanceMaskRequest

Availability: iOS 17+, macOS 14+, tvOS 17+, axiom-visionOS 1+
Generates class-agnostic instance mask of foreground objects (people, pets, buildings, food, shoes, etc.)
可用性:iOS 17+、macOS 14+、tvOS 17+、visionOS 1+
生成前景对象(人物、宠物、建筑、食物、鞋子等)的类别无关实例掩码

Basic Usage

基本用法

swift
let request = VNGenerateForegroundInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: image)

try handler.perform([request])

guard let observation = request.results?.first as? VNInstanceMaskObservation else {
    return
}
swift
let request = VNGenerateForegroundInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: image)

try handler.perform([request])

guard let observation = request.results?.first as? VNInstanceMaskObservation else {
    return
}

InstanceMaskObservation

InstanceMaskObservation

allInstances:
IndexSet
containing all foreground instance indices (excludes background 0)
instanceMask:
CVPixelBuffer
with UInt8 labels (0 = background, 1+ = instance indices)
instanceAtPoint(_:): Returns instance index at normalized point
swift
let point = CGPoint(x: 0.5, y: 0.5)  // Center of image
let instance = observation.instanceAtPoint(point)

if instance == 0 {
    print("Background tapped")
} else {
    print("Instance \(instance) tapped")
}
allInstances
IndexSet
包含所有前景实例的索引(不包括背景0)
instanceMask
CVPixelBuffer
,带有UInt8标签(0=背景,1+=实例索引)
instanceAtPoint(_:):返回归一化点处的实例索引
swift
let point = CGPoint(x: 0.5, y: 0.5)  // 图像中心
let instance = observation.instanceAtPoint(point)

if instance == 0 {
    print("点击了背景")
} else {
    print("点击了实例 \(instance)")
}

Generating Masks

生成掩码

createScaledMask(for:croppedToInstancesContent:)
Parameters:
  • for
    :
    IndexSet
    of instances to include
  • croppedToInstancesContent
    :
    • false
      = Output matches input resolution (for compositing)
    • true
      = Tight crop around selected instances
Returns: Single-channel floating-point
CVPixelBuffer
(soft segmentation mask)
swift
// All instances, full resolution
let mask = try observation.createScaledMask(
    for: observation.allInstances,
    croppedToInstancesContent: false
)

// Single instance, cropped
let instances = IndexSet(integer: 1)
let croppedMask = try observation.createScaledMask(
    for: instances,
    croppedToInstancesContent: true
)
createScaledMask(for:croppedToInstancesContent:)
参数:
  • for
    :要包含的实例的
    IndexSet
  • croppedToInstancesContent
    :
    • false
      = 输出与输入分辨率匹配(用于合成)
    • true
      = 围绕所选实例进行紧密裁剪
返回:单通道浮点型
CVPixelBuffer
(软分割掩码)
swift
// 所有实例,全分辨率
let mask = try observation.createScaledMask(
    for: observation.allInstances,
    croppedToInstancesContent: false
)

// 单个实例,裁剪后
let instances = IndexSet(integer: 1)
let croppedMask = try observation.createScaledMask(
    for: instances,
    croppedToInstancesContent: true
)

Instance Mask Hit Testing

实例掩码点击测试

Access raw pixel buffer to map tap coordinates to instance labels:
swift
let instanceMask = observation.instanceMask

CVPixelBufferLockBaseAddress(instanceMask, .readOnly)
defer { CVPixelBufferUnlockBaseAddress(instanceMask, .readOnly) }

let baseAddress = CVPixelBufferGetBaseAddress(instanceMask)
let width = CVPixelBufferGetWidth(instanceMask)
let bytesPerRow = CVPixelBufferGetBytesPerRow(instanceMask)

// Convert normalized tap to pixel coordinates
let pixelPoint = VNImagePointForNormalizedPoint(
    CGPoint(x: normalizedX, y: normalizedY),
    width: imageWidth,
    height: imageHeight
)

// Calculate byte offset
let offset = Int(pixelPoint.y) * bytesPerRow + Int(pixelPoint.x)

// Read instance label
let label = UnsafeRawPointer(baseAddress!).load(
    fromByteOffset: offset,
    as: UInt8.self
)

let instances = label == 0 ? observation.allInstances : IndexSet(integer: Int(label))
访问原始像素缓冲区,将点击坐标映射到实例标签:
swift
let instanceMask = observation.instanceMask

CVPixelBufferLockBaseAddress(instanceMask, .readOnly)
defer { CVPixelBufferUnlockBaseAddress(instanceMask, .readOnly) }

let baseAddress = CVPixelBufferGetBaseAddress(instanceMask)
let width = CVPixelBufferGetWidth(instanceMask)
let bytesPerRow = CVPixelBufferGetBytesPerRow(instanceMask)

// 将归一化点击坐标转换为像素坐标
let pixelPoint = VNImagePointForNormalizedPoint(
    CGPoint(x: normalizedX, y: normalizedY),
    width: imageWidth,
    height: imageHeight
)

// 计算字节偏移量
let offset = Int(pixelPoint.y) * bytesPerRow + Int(pixelPoint.x)

// 读取实例标签
let label = UnsafeRawPointer(baseAddress!).load(
    fromByteOffset: offset,
    as: UInt8.self
)

let instances = label == 0 ? observation.allInstances : IndexSet(integer: Int(label))

VisionKit Subject Lifting

VisionKit主体提取

ImageAnalysisInteraction (iOS)

ImageAnalysisInteraction(iOS)

Availability: iOS 16+, iPadOS 16+
Adds system-like subject lifting UI to views:
swift
let interaction = ImageAnalysisInteraction()
interaction.preferredInteractionTypes = .imageSubject  // Or .automatic
imageView.addInteraction(interaction)
Interaction types:
  • .automatic
    : Subject lifting + Live Text + data detectors
  • .imageSubject
    : Subject lifting only (no interactive text)
可用性:iOS 16+、iPadOS 16+
为视图添加系统级的主体提取UI:
swift
let interaction = ImageAnalysisInteraction()
interaction.preferredInteractionTypes = .imageSubject  // 或.automatic
imageView.addInteraction(interaction)
交互类型:
  • .automatic
    :主体提取 + 实时文本 + 数据检测器
  • .imageSubject
    :仅主体提取(无交互式文本)

ImageAnalysisOverlayView (macOS)

ImageAnalysisOverlayView(macOS)

Availability: macOS 13+
swift
let overlayView = ImageAnalysisOverlayView()
overlayView.preferredInteractionTypes = .imageSubject
nsView.addSubview(overlayView)
可用性:macOS 13+
swift
let overlayView = ImageAnalysisOverlayView()
overlayView.preferredInteractionTypes = .imageSubject
nsView.addSubview(overlayView)

Programmatic Access

程序化访问

ImageAnalyzer

ImageAnalyzer

swift
let analyzer = ImageAnalyzer()
let configuration = ImageAnalyzer.Configuration([.text, .visualLookUp])

let analysis = try await analyzer.analyze(image, configuration: configuration)
swift
let analyzer = ImageAnalyzer()
let configuration = ImageAnalyzer.Configuration([.text, .visualLookUp])

let analysis = try await analyzer.analyze(image, configuration: configuration)

ImageAnalysis

ImageAnalysis

subjects:
[Subject]
- All subjects in image
highlightedSubjects:
Set<Subject>
- Currently highlighted (user long-pressed)
subject(at:): Async lookup of subject at normalized point (returns
nil
if none)
swift
// Get all subjects
let subjects = analysis.subjects

// Look up subject at tap
if let subject = try await analysis.subject(at: tapPoint) {
    // Process subject
}

// Change highlight state
analysis.highlightedSubjects = Set([subjects[0], subjects[1]])
subjects
[Subject]
- 图像中的所有主体
highlightedSubjects
Set<Subject>
- 当前高亮的主体(用户长按选中)
subject(at:):异步查找归一化点处的主体(无主体时返回
nil
swift
// 获取所有主体
let subjects = analysis.subjects

// 查找点击位置的主体
if let subject = try await analysis.subject(at: tapPoint) {
    // 处理主体
}

// 更改高亮状态
analysis.highlightedSubjects = Set([subjects[0], subjects[1]])

Subject Struct

Subject结构体

image:
UIImage
/
NSImage
- Extracted subject with transparency
bounds:
CGRect
- Subject boundaries in image coordinates
swift
// Single subject image
let subjectImage = subject.image

// Composite multiple subjects
let compositeImage = try await analysis.image(for: [subject1, subject2])
Out-of-process: VisionKit analysis happens out-of-process (performance benefit, image size limited)
image
UIImage
/
NSImage
- 提取的带透明度的主体
bounds
CGRect
- 主体在图像坐标系中的边界
swift
// 单个主体图像
let subjectImage = subject.image

// 合成多个主体
let compositeImage = try await analysis.image(for: [subject1, subject2])
进程外处理:VisionKit分析在进程外进行(性能优势,图像大小受限)

Person Segmentation APIs

人物分割API

VNGeneratePersonSegmentationRequest

VNGeneratePersonSegmentationRequest

Availability: iOS 15+, macOS 12+
Returns single mask containing all people in image:
swift
let request = VNGeneratePersonSegmentationRequest()
// Configure quality level if needed
try handler.perform([request])

guard let observation = request.results?.first as? VNPixelBufferObservation else {
    return
}

let personMask = observation.pixelBuffer  // CVPixelBuffer
可用性:iOS 15+、macOS 12+
返回包含图像中所有人的单个掩码:
swift
let request = VNGeneratePersonSegmentationRequest()
// 如有需要,配置质量级别
try handler.perform([request])

guard let observation = request.results?.first as? VNPixelBufferObservation else {
    return
}

let personMask = observation.pixelBuffer  // CVPixelBuffer

VNGeneratePersonInstanceMaskRequest

VNGeneratePersonInstanceMaskRequest

Availability: iOS 17+, macOS 14+
Returns separate masks for up to 4 people:
swift
let request = VNGeneratePersonInstanceMaskRequest()
try handler.perform([request])

guard let observation = request.results?.first as? VNInstanceMaskObservation else {
    return
}

// Same InstanceMaskObservation API as foreground instance masks
let allPeople = observation.allInstances  // Up to 4 people (1-4)

// Get mask for person 1
let person1Mask = try observation.createScaledMask(
    for: IndexSet(integer: 1),
    croppedToInstancesContent: false
)
Limitations:
  • Segments up to 4 people
  • With >4 people: may miss people or combine them (typically background people)
  • Use
    VNDetectFaceRectanglesRequest
    to count faces if you need to handle crowded scenes
可用性:iOS 17+、macOS 14+
返回最多4个人的单独掩码
swift
let request = VNGeneratePersonInstanceMaskRequest()
try handler.perform([request])

guard let observation = request.results?.first as? VNInstanceMaskObservation else {
    return
}

// 与前景实例掩码使用相同的InstanceMaskObservation API
let allPeople = observation.allInstances  // 最多4人(1-4)

// 获取第1个人的掩码
let person1Mask = try observation.createScaledMask(
    for: IndexSet(integer: 1),
    croppedToInstancesContent: false
)
限制:
  • 最多分割4个人
  • 超过4人时:可能会遗漏人物或合并(通常是背景中的人物)
  • 若需要处理拥挤场景,使用
    VNDetectFaceRectanglesRequest
    统计面部数量

Hand Pose Detection

手部姿态检测

VNDetectHumanHandPoseRequest

VNDetectHumanHandPoseRequest

Availability: iOS 14+, macOS 11+
Detects 21 hand landmarks per hand:
swift
let request = VNDetectHumanHandPoseRequest()
request.maximumHandCount = 2  // Default: 2, increase if needed

let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])

for observation in request.results as? [VNHumanHandPoseObservation] ?? [] {
    // Process each hand
}
Performance note:
maximumHandCount
affects latency. Pose computed only for hands ≤ maximum. Set to lowest acceptable value.
可用性:iOS 14+、macOS 11+
每只手检测21个手部标记点
swift
let request = VNDetectHumanHandPoseRequest()
request.maximumHandCount = 2  // 默认值:2,如有需要可增加

let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])

for observation in request.results as? [VNHumanHandPoseObservation] ?? [] {
    // 处理每只手
}
性能说明
maximumHandCount
会影响延迟。仅为数量≤最大值的手计算姿态。设置为可接受的最低值。

Hand Landmarks (21 points)

手部标记点(21个点)

Wrist: 1 landmark
Thumb (4 landmarks):
  • .thumbTip
  • .thumbIP
    (interphalangeal joint)
  • .thumbMP
    (metacarpophalangeal joint)
  • .thumbCMC
    (carpometacarpal joint)
Fingers (4 landmarks each):
  • Tip (
    .indexTip
    ,
    .middleTip
    ,
    .ringTip
    ,
    .littleTip
    )
  • DIP (distal interphalangeal joint)
  • PIP (proximal interphalangeal joint)
  • MCP (metacarpophalangeal joint)
手腕:1个标记点
拇指(4个标记点):
  • .thumbTip
  • .thumbIP
    (指间关节)
  • .thumbMP
    (掌指关节)
  • .thumbCMC
    (腕掌关节)
手指(每个4个标记点):
  • 指尖(
    .indexTip
    .middleTip
    .ringTip
    .littleTip
  • DIP(远侧指间关节)
  • PIP(近侧指间关节)
  • MCP(掌指关节)

Group Keys

组键

Access landmark groups:
Group KeyPoints
.all
All 21 landmarks
.thumb
4 thumb joints
.indexFinger
4 index finger joints
.middleFinger
4 middle finger joints
.ringFinger
4 ring finger joints
.littleFinger
4 little finger joints
swift
// Get all points
let allPoints = try observation.recognizedPoints(.all)

// Get index finger points only
let indexPoints = try observation.recognizedPoints(.indexFinger)

// Get specific point
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)

// Check confidence
guard thumbTip.confidence > 0.5 else { return }

// Access location (normalized coordinates, lower-left origin)
let location = thumbTip.location  // CGPoint
访问标记点组:
组键点数
.all
所有21个标记点
.thumb
4个拇指关节
.indexFinger
4个食指关节
.middleFinger
4个中指关节
.ringFinger
4个无名指关节
.littleFinger
4个小指关节
swift
// 获取所有点
let allPoints = try observation.recognizedPoints(.all)

// 仅获取食指的点
let indexPoints = try observation.recognizedPoints(.indexFinger)

// 获取特定点
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)

// 检查置信度
guard thumbTip.confidence > 0.5 else { return }

// 访问位置(归一化坐标,原点在左下角)
let location = thumbTip.location  // CGPoint

Gesture Recognition Example (Pinch)

手势识别示例(捏合)

swift
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)

guard thumbTip.confidence > 0.5, indexTip.confidence > 0.5 else {
    return
}

let distance = hypot(
    thumbTip.location.x - indexTip.location.x,
    thumbTip.location.y - indexTip.location.y
)

let isPinching = distance < 0.05  // Normalized threshold
swift
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)

guard thumbTip.confidence > 0.5, indexTip.confidence > 0.5 else {
    return
}

let distance = hypot(
    thumbTip.location.x - indexTip.location.x,
    thumbTip.location.y - indexTip.location.y
)

let isPinching = distance < 0.05  // 归一化阈值

Chirality (Handedness)

手性(左右手)

swift
let chirality = observation.chirality  // .left or .right or .unknown
swift
let chirality = observation.chirality  // .left、.right或.unknown

Body Pose Detection

身体姿态检测

VNDetectHumanBodyPoseRequest (2D)

VNDetectHumanBodyPoseRequest(2D)

Availability: iOS 14+, macOS 11+
Detects 18 body landmarks (2D normalized coordinates):
swift
let request = VNDetectHumanBodyPoseRequest()
try handler.perform([request])

for observation in request.results as? [VNHumanBodyPoseObservation] ?? [] {
    // Process each person
}
可用性:iOS 14+、macOS 11+
检测18个身体标记点(2D归一化坐标):
swift
let request = VNDetectHumanBodyPoseRequest()
try handler.perform([request])

for observation in request.results as? [VNHumanBodyPoseObservation] ?? [] {
    // 处理每个人
}

Body Landmarks (18 points)

身体标记点(18个点)

Face (5 landmarks):
  • .nose
    ,
    .leftEye
    ,
    .rightEye
    ,
    .leftEar
    ,
    .rightEar
Arms (6 landmarks):
  • Left:
    .leftShoulder
    ,
    .leftElbow
    ,
    .leftWrist
  • Right:
    .rightShoulder
    ,
    .rightElbow
    ,
    .rightWrist
Torso (7 landmarks):
  • .neck
    (between shoulders)
  • .leftShoulder
    ,
    .rightShoulder
    (also in arm groups)
  • .leftHip
    ,
    .rightHip
  • .root
    (between hips)
Legs (6 landmarks):
  • Left:
    .leftHip
    ,
    .leftKnee
    ,
    .leftAnkle
  • Right:
    .rightHip
    ,
    .rightKnee
    ,
    .rightAnkle
Note: Shoulders and hips appear in multiple groups
面部(5个标记点):
  • .nose
    .leftEye
    .rightEye
    .leftEar
    .rightEar
手臂(6个标记点):
  • 左侧:
    .leftShoulder
    .leftElbow
    .leftWrist
  • 右侧:
    .rightShoulder
    .rightElbow
    .rightWrist
躯干(7个标记点):
  • .neck
    (两肩之间)
  • .leftShoulder
    .rightShoulder
    (也属于手臂组)
  • .leftHip
    .rightHip
  • .root
    (两髋之间)
腿部(6个标记点):
  • 左侧:
    .leftHip
    .leftKnee
    .leftAnkle
  • 右侧:
    .rightHip
    .rightKnee
    .rightAnkle
注意:肩膀和髋部属于多个组

Group Keys (Body)

组键(身体)

Group KeyPoints
.all
All 18 landmarks
.face
5 face landmarks
.leftArm
shoulder, elbow, wrist
.rightArm
shoulder, elbow, wrist
.torso
neck, shoulders, hips, root
.leftLeg
hip, knee, ankle
.rightLeg
hip, knee, ankle
swift
// Get all body points
let allPoints = try observation.recognizedPoints(.all)

// Get left arm only
let leftArmPoints = try observation.recognizedPoints(.leftArm)

// Get specific joint
let leftWrist = try observation.recognizedPoint(.leftWrist)
组键点数
.all
所有18个标记点
.face
5个面部标记点
.leftArm
肩膀、肘部、手腕
.rightArm
肩膀、肘部、手腕
.torso
颈部、肩膀、髋部、根部
.leftLeg
髋部、膝盖、脚踝
.rightLeg
髋部、膝盖、脚踝
swift
// 获取所有身体点
let allPoints = try observation.recognizedPoints(.all)

// 仅获取左臂的点
let leftArmPoints = try observation.recognizedPoints(.leftArm)

// 获取特定关节
let leftWrist = try observation.recognizedPoint(.leftWrist)

VNDetectHumanBodyPose3DRequest (3D)

VNDetectHumanBodyPose3DRequest(3D)

Availability: iOS 17+, macOS 14+
Returns 3D skeleton with 17 joints in meters (real-world coordinates):
swift
let request = VNDetectHumanBodyPose3DRequest()
try handler.perform([request])

guard let observation = request.results?.first as? VNHumanBodyPose3DObservation else {
    return
}

// Get 3D joint position
let leftWrist = try observation.recognizedPoint(.leftWrist)
let position = leftWrist.position  // simd_float4x4 matrix
let localPosition = leftWrist.localPosition  // Relative to parent joint
3D Body Landmarks (17 points): Same as 2D except no ears (15 vs 18 2D landmarks)
可用性:iOS 17+、macOS 14+
返回带17个关节的3D骨架,单位为米(真实世界坐标):
swift
let request = VNDetectHumanBodyPose3DRequest()
try handler.perform([request])

guard let observation = request.results?.first as? VNHumanBodyPose3DObservation else {
    return
}

// 获取3D关节位置
let leftWrist = try observation.recognizedPoint(.leftWrist)
let position = leftWrist.position  // simd_float4x4矩阵
let localPosition = leftWrist.localPosition  // 相对于父关节的位置
3D身体标记点(17个点):与2D相同,但不包含耳朵(15个,而2D是18个标记点)

3D Observation Properties

3D观测属性

bodyHeight: Estimated height in meters
  • With depth data: Measured height
  • Without depth data: Reference height (1.8m)
heightEstimation:
.measured
or
.reference
cameraOriginMatrix:
simd_float4x4
camera position/orientation relative to subject
pointInImage(_:): Project 3D joint back to 2D image coordinates
swift
let wrist2D = try observation.pointInImage(leftWrist)
bodyHeight:估计身高,单位为米
  • 有深度数据时:实测身高
  • 无深度数据时:参考身高(1.8米)
heightEstimation
.measured
.reference
cameraOriginMatrix
simd_float4x4
,相机相对于主体的位置/方向
pointInImage(_:):将3D关节投影回2D图像坐标
swift
let wrist2D = try observation.pointInImage(leftWrist)

3D Point Classes

3D点类

VNPoint3D: Base class with
simd_float4x4
position matrix
VNRecognizedPoint3D: Adds identifier (joint name)
VNHumanBodyRecognizedPoint3D: Adds
localPosition
and
parentJoint
swift
// Position relative to skeleton root (center of hip)
let modelPosition = leftWrist.position

// Position relative to parent joint (left elbow)
let relativePosition = leftWrist.localPosition
VNPoint3D:基类,带有
simd_float4x4
位置矩阵
VNRecognizedPoint3D:添加了标识符(关节名称)
VNHumanBodyRecognizedPoint3D:添加了
localPosition
parentJoint
swift
// 相对于骨架根部(髋部中心)的位置
let modelPosition = leftWrist.position

// 相对于父关节(左肘部)的位置
let relativePosition = leftWrist.localPosition

Depth Input

深度输入

Vision accepts depth data alongside images:
swift
// From AVDepthData
let handler = VNImageRequestHandler(
    cvPixelBuffer: imageBuffer,
    depthData: depthData,
    orientation: orientation
)

// From file (automatic depth extraction)
let handler = VNImageRequestHandler(url: imageURL)  // Depth auto-fetched
Depth formats: Disparity or Depth (interchangeable via AVFoundation)
LiDAR: Use in live capture sessions for accurate scale/measurement
Vision接受与图像一起的深度数据:
swift
// 来自AVDepthData
let handler = VNImageRequestHandler(
    cvPixelBuffer: imageBuffer,
    depthData: depthData,
    orientation: orientation
)

// 来自文件(自动提取深度)
let handler = VNImageRequestHandler(url: imageURL)  // 自动获取深度
深度格式:视差或深度(可通过AVFoundation互换)
LiDAR:在实时捕获会话中使用,以获得准确的比例/测量结果

Face Detection & Landmarks

面部检测与标记点

VNDetectFaceRectanglesRequest

VNDetectFaceRectanglesRequest

Availability: iOS 11+
Detects face bounding boxes:
swift
let request = VNDetectFaceRectanglesRequest()
try handler.perform([request])

for observation in request.results as? [VNFaceObservation] ?? [] {
    let faceBounds = observation.boundingBox  // Normalized rect
}
可用性:iOS 11+
检测面部边界框:
swift
let request = VNDetectFaceRectanglesRequest()
try handler.perform([request])

for observation in request.results as? [VNFaceObservation] ?? [] {
    let faceBounds = observation.boundingBox  // 归一化矩形
}

VNDetectFaceLandmarksRequest

VNDetectFaceLandmarksRequest

Availability: iOS 11+
Detects face with detailed landmarks:
swift
let request = VNDetectFaceLandmarksRequest()
try handler.perform([request])

for observation in request.results as? [VNFaceObservation] ?? [] {
    if let landmarks = observation.landmarks {
        let leftEye = landmarks.leftEye
        let nose = landmarks.nose
        let leftPupil = landmarks.leftPupil  // Revision 2+
    }
}
Revisions:
  • Revision 1: Basic landmarks
  • Revision 2: Detects upside-down faces
  • Revision 3+: Pupil locations
可用性:iOS 11+
检测带有详细标记点的面部:
swift
let request = VNDetectFaceLandmarksRequest()
try handler.perform([request])

for observation in request.results as? [VNFaceObservation] ?? [] {
    if let landmarks = observation.landmarks {
        let leftEye = landmarks.leftEye
        let nose = landmarks.nose
        let leftPupil = landmarks.leftPupil  // 版本2+
    }
}
版本:
  • 版本1:基础标记点
  • 版本2:检测倒置的面部
  • 版本3+:瞳孔位置

Person Detection

人物检测

VNDetectHumanRectanglesRequest

VNDetectHumanRectanglesRequest

Availability: iOS 13+
Detects human bounding boxes (torso detection):
swift
let request = VNDetectHumanRectanglesRequest()
try handler.perform([request])

for observation in request.results as? [VNHumanObservation] ?? [] {
    let humanBounds = observation.boundingBox  // Normalized rect
}
Use case: Faster than pose detection when you only need location
可用性:iOS 13+
检测人物边界框(躯干检测):
swift
let request = VNDetectHumanRectanglesRequest()
try handler.perform([request])

for observation in request.results as? [VNHumanObservation] ?? [] {
    let humanBounds = observation.boundingBox  // 归一化矩形
}
使用场景:当仅需要位置时,比姿态检测更快

CoreImage Integration

CoreImage集成

CIBlendWithMask Filter

CIBlendWithMask滤镜

Composite subject on new background using Vision mask:
swift
// 1. Get mask from Vision
let observation = request.results?.first as? VNInstanceMaskObservation
let visionMask = try observation.createScaledMask(
    for: observation.allInstances,
    croppedToInstancesContent: false
)

// 2. Convert to CIImage
let maskImage = CIImage(cvPixelBuffer: axiom-visionMask)

// 3. Apply filter
let filter = CIFilter(name: "CIBlendWithMask")!
filter.setValue(sourceImage, forKey: kCIInputImageKey)
filter.setValue(maskImage, forKey: kCIInputMaskImageKey)
filter.setValue(newBackground, forKey: kCIInputBackgroundImageKey)

let output = filter.outputImage  // Composited result
Parameters:
  • Input image: Original image to mask
  • Mask image: Vision's soft segmentation mask
  • Background image: New background (or empty image for transparency)
HDR preservation: CoreImage preserves high dynamic range from input (Vision/VisionKit output is SDR)
使用Vision掩码将主体合成到新背景上:
swift
// 1. 从Vision获取掩码
let observation = request.results?.first as? VNInstanceMaskObservation
let visionMask = try observation.createScaledMask(
    for: observation.allInstances,
    croppedToInstancesContent: false
)

// 2. 转换为CIImage
let maskImage = CIImage(cvPixelBuffer: visionMask)

// 3. 应用滤镜
let filter = CIFilter(name: "CIBlendWithMask")!
filter.setValue(sourceImage, forKey: kCIInputImageKey)
filter.setValue(maskImage, forKey: kCIInputMaskImageKey)
filter.setValue(newBackground, forKey: kCIInputBackgroundImageKey)

let output = filter.outputImage  // 合成结果
参数:
  • 输入图像:要掩码的原始图像
  • 掩码图像:Vision的软分割掩码
  • 背景图像:新背景(或空图像以获得透明度)
HDR保留:CoreImage保留输入的高动态范围(Vision/VisionKit输出为SDR)

Text Recognition APIs

文本识别API

VNRecognizeTextRequest

VNRecognizeTextRequest

Availability: iOS 13+, macOS 10.15+
Recognizes text in images with configurable accuracy/speed trade-off.
可用性:iOS 13+、macOS 10.15+
识别图像中的文本,可配置精度/速度权衡。

Basic Usage

基本用法

swift
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate  // Or .fast
request.recognitionLanguages = ["en-US", "de-DE"]  // Order matters
request.usesLanguageCorrection = true

let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])

for observation in request.results as? [VNRecognizedTextObservation] ?? [] {
    // Get top candidates
    let candidates = observation.topCandidates(3)
    let bestText = candidates.first?.string ?? ""
}
swift
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate  // 或.fast
request.recognitionLanguages = ["en-US", "de-DE"]  // 顺序很重要
request.usesLanguageCorrection = true

let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])

for observation in request.results as? [VNRecognizedTextObservation] ?? [] {
    // 获取排名靠前的候选结果
    let candidates = observation.topCandidates(3)
    let bestText = candidates.first?.string ?? ""
}

Recognition Levels

识别级别

LevelPerformanceAccuracyBest For
.fast
Real-timeGoodCamera feed, large text, signs
.accurate
SlowerExcellentDocuments, receipts, handwriting
Fast path: Character-by-character recognition (Neural Network → Character Detection)
Accurate path: Full-line ML recognition (Neural Network → Line/Word Recognition)
级别性能精度最佳适用场景
.fast
实时良好相机画面、大文本、标识
.accurate
较慢优秀文档、收据、手写体
快速路径:逐字符识别(神经网络→字符检测)
精确路径:整行机器学习识别(神经网络→行/单词识别)

Properties

属性

PropertyTypeDescription
recognitionLevel
VNRequestTextRecognitionLevel
.fast
or
.accurate
recognitionLanguages
[String]
BCP 47 language codes, order = priority
usesLanguageCorrection
Bool
Use language model for correction
customWords
[String]
Domain-specific vocabulary
automaticallyDetectsLanguage
Bool
Auto-detect language (iOS 16+)
minimumTextHeight
Float
Min text height as fraction of image (0-1)
revision
Int
API version (affects supported languages)
属性类型描述
recognitionLevel
VNRequestTextRecognitionLevel
.fast
.accurate
recognitionLanguages
[String]
BCP 47语言代码,顺序=优先级
usesLanguageCorrection
Bool
使用语言模型进行校正
customWords
[String]
特定领域词汇
automaticallyDetectsLanguage
Bool
自动检测语言(iOS 16+)
minimumTextHeight
Float
最小文本高度,占图像的比例(0-1)
revision
Int
API版本(影响支持的语言)

Language Support

语言支持

swift
// Check supported languages for current settings
let languages = try VNRecognizeTextRequest.supportedRecognitionLanguages(
    for: .accurate,
    revision: VNRecognizeTextRequestRevision3
)
Language correction: Improves accuracy but takes processing time. Disable for codes/serial numbers.
Custom words: Add domain-specific vocabulary for better recognition (medical terms, product codes).
swift
// 检查当前设置支持的语言
let languages = try VNRecognizeTextRequest.supportedRecognitionLanguages(
    for: .accurate,
    revision: VNRecognizeTextRequestRevision3
)
语言校正:提高精度但会占用处理时间。对于代码/序列号,禁用此功能。
自定义词汇:添加特定领域词汇以提高识别精度(医学术语、产品代码)。

VNRecognizedTextObservation

VNRecognizedTextObservation

boundingBox: Normalized rect containing recognized text
topCandidates(_:): Returns
[VNRecognizedText]
ordered by confidence
boundingBox:包含识别文本的归一化矩形
topCandidates(_:):返回
[VNRecognizedText]
,按置信度排序

VNRecognizedText

VNRecognizedText

PropertyTypeDescription
string
String
Recognized text
confidence
VNConfidence
0.0-1.0
boundingBox(for:)
VNRectangleObservation?
Box for substring range
swift
// Get bounding box for substring
let text = candidate.string
if let range = text.range(of: "invoice") {
    let box = try candidate.boundingBox(for: range)
}
属性类型描述
string
String
识别的文本
confidence
VNConfidence
0.0-1.0
boundingBox(for:)
VNRectangleObservation?
子字符串范围的边界框
swift
// 获取子字符串的边界框
let text = candidate.string
if let range = text.range(of: "invoice") {
    let box = try candidate.boundingBox(for: range)
}

Barcode Detection APIs

条形码检测API

VNDetectBarcodesRequest

VNDetectBarcodesRequest

Availability: iOS 11+, macOS 10.13+
Detects and decodes barcodes and QR codes.
可用性:iOS 11+、macOS 10.13+
检测并解码条形码和二维码。

Basic Usage

基本用法

swift
let request = VNDetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128]  // Specific codes

let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])

for barcode in request.results as? [VNBarcodeObservation] ?? [] {
    let payload = barcode.payloadStringValue
    let type = barcode.symbology
    let bounds = barcode.boundingBox
}
swift
let request = VNDetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128]  // 特定代码类型

let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])

for barcode in request.results as? [VNBarcodeObservation] ?? [] {
    let payload = barcode.payloadStringValue
    let type = barcode.symbology
    let bounds = barcode.boundingBox
}

Symbologies

码制

1D Barcodes:
  • .codabar
    (iOS 15+)
  • .code39
    ,
    .code39Checksum
    ,
    .code39FullASCII
    ,
    .code39FullASCIIChecksum
  • .code93
    ,
    .code93i
  • .code128
  • .ean8
    ,
    .ean13
  • .gs1DataBar
    ,
    .gs1DataBarExpanded
    ,
    .gs1DataBarLimited
    (iOS 15+)
  • .i2of5
    ,
    .i2of5Checksum
  • .itf14
  • .upce
2D Codes:
  • .aztec
  • .dataMatrix
  • .microPDF417
    (iOS 15+)
  • .microQR
    (iOS 15+)
  • .pdf417
  • .qr
Performance: Specifying fewer symbologies = faster detection
1D条形码:
  • .codabar
    (iOS 15+)
  • .code39
    .code39Checksum
    .code39FullASCII
    .code39FullASCIIChecksum
  • .code93
    .code93i
  • .code128
  • .ean8
    .ean13
  • .gs1DataBar
    .gs1DataBarExpanded
    .gs1DataBarLimited
    (iOS 15+)
  • .i2of5
    .i2of5Checksum
  • .itf14
  • .upce
2D码:
  • .aztec
  • .dataMatrix
  • .microPDF417
    (iOS 15+)
  • .microQR
    (iOS 15+)
  • .pdf417
  • .qr
性能:指定的码制越少,检测速度越快

Revisions

版本

RevisioniOSFeatures
111+Basic detection, one code at a time
215+Codabar, GS1, MicroPDF, MicroQR, better ROI
316+ML-based, multiple codes, better bounding boxes
版本iOS功能
111+基础检测,一次检测一个码
215+Codabar、GS1、MicroPDF、MicroQR,更好的ROI
316+基于机器学习,多码检测,更好的边界框

VNBarcodeObservation

VNBarcodeObservation

PropertyTypeDescription
payloadStringValue
String?
Decoded content
symbology
VNBarcodeSymbology
Barcode type
boundingBox
CGRect
Normalized bounds
topLeft/topRight/bottomLeft/bottomRight
CGPoint
Corner points
属性类型描述
payloadStringValue
String?
解码后的内容
symbology
VNBarcodeSymbology
条形码类型
boundingBox
CGRect
归一化边界
topLeft/topRight/bottomLeft/bottomRight
CGPoint
角点

VisionKit Scanner APIs

VisionKit扫描器API

DataScannerViewController

DataScannerViewController

Availability: iOS 16+
Camera-based live scanner with built-in UI for text and barcodes.
可用性:iOS 16+
基于相机的实时扫描器,带有内置UI,支持文本和条形码。

Check Availability

检查可用性

swift
// Hardware support
DataScannerViewController.isSupported

// Runtime availability (camera access, parental controls)
DataScannerViewController.isAvailable
swift
// 硬件支持
DataScannerViewController.isSupported

// 运行时可用性(相机访问、家长控制)
DataScannerViewController.isAvailable

Configuration

配置

swift
import VisionKit

let dataTypes: Set<DataScannerViewController.RecognizedDataType> = [
    .barcode(symbologies: [.qr, .ean13]),
    .text(textContentType: .URL),  // Or nil for all text
    // .text(languages: ["ja"])  // Filter by language
]

let scanner = DataScannerViewController(
    recognizedDataTypes: dataTypes,
    qualityLevel: .balanced,  // .fast, .balanced, .accurate
    recognizesMultipleItems: true,
    isHighFrameRateTrackingEnabled: true,
    isPinchToZoomEnabled: true,
    isGuidanceEnabled: true,
    isHighlightingEnabled: true
)

scanner.delegate = self
present(scanner, animated: true) {
    try? scanner.startScanning()
}
swift
import VisionKit

let dataTypes: Set<DataScannerViewController.RecognizedDataType> = [
    .barcode(symbologies: [.qr, .ean13]),
    .text(textContentType: .URL),  // 或nil以识别所有文本
    // .text(languages: ["ja"])  // 按语言过滤
]

let scanner = DataScannerViewController(
    recognizedDataTypes: dataTypes,
    qualityLevel: .balanced,  // .fast、.balanced、.accurate
    recognizesMultipleItems: true,
    isHighFrameRateTrackingEnabled: true,
    isPinchToZoomEnabled: true,
    isGuidanceEnabled: true,
    isHighlightingEnabled: true
)

scanner.delegate = self
present(scanner, animated: true) {
    try? scanner.startScanning()
}

RecognizedDataType

RecognizedDataType

TypeDescription
.barcode(symbologies:)
Specific barcode types
.text()
All text
.text(languages:)
Text filtered by language
.text(textContentType:)
Text filtered by type (URL, phone, email)
类型描述
.barcode(symbologies:)
特定条形码类型
.text()
所有文本
.text(languages:)
按语言过滤文本
.text(textContentType:)
按类型过滤文本(URL、电话、邮箱)

Delegate Protocol

委托协议

swift
protocol DataScannerViewControllerDelegate {
    func dataScanner(_ dataScanner: DataScannerViewController,
                     didTapOn item: RecognizedItem)

    func dataScanner(_ dataScanner: DataScannerViewController,
                     didAdd addedItems: [RecognizedItem],
                     allItems: [RecognizedItem])

    func dataScanner(_ dataScanner: DataScannerViewController,
                     didUpdate updatedItems: [RecognizedItem],
                     allItems: [RecognizedItem])

    func dataScanner(_ dataScanner: DataScannerViewController,
                     didRemove removedItems: [RecognizedItem],
                     allItems: [RecognizedItem])

    func dataScanner(_ dataScanner: DataScannerViewController,
                     becameUnavailableWithError error: DataScannerViewController.ScanningUnavailable)
}
swift
protocol DataScannerViewControllerDelegate {
    func dataScanner(_ dataScanner: DataScannerViewController,
                     didTapOn item: RecognizedItem)

    func dataScanner(_ dataScanner: DataScannerViewController,
                     didAdd addedItems: [RecognizedItem],
                     allItems: [RecognizedItem])

    func dataScanner(_ dataScanner: DataScannerViewController,
                     didUpdate updatedItems: [RecognizedItem],
                     allItems: [RecognizedItem])

    func dataScanner(_ dataScanner: DataScannerViewController,
                     didRemove removedItems: [RecognizedItem],
                     allItems: [RecognizedItem])

    func dataScanner(_ dataScanner: DataScannerViewController,
                     becameUnavailableWithError error: DataScannerViewController.ScanningUnavailable)
}

RecognizedItem

RecognizedItem

swift
enum RecognizedItem {
    case text(RecognizedItem.Text)
    case barcode(RecognizedItem.Barcode)

    var id: UUID { get }
    var bounds: RecognizedItem.Bounds { get }
}

// Text item
struct Text {
    let transcript: String
}

// Barcode item
struct Barcode {
    let payloadStringValue: String?
    let observation: VNBarcodeObservation
}
swift
enum RecognizedItem {
    case text(RecognizedItem.Text)
    case barcode(RecognizedItem.Barcode)

    var id: UUID { get }
    var bounds: RecognizedItem.Bounds { get }
}

// 文本项
struct Text {
    let transcript: String
}

// 条形码项
struct Barcode {
    let payloadStringValue: String?
    let observation: VNBarcodeObservation
}

Async Stream

异步流

swift
// Alternative to delegate
for await items in scanner.recognizedItems {
    // Current recognized items
}
swift
// 替代委托
for await items in scanner.recognizedItems {
    // 当前识别的项
}

Custom Highlights

自定义高亮

swift
// Add custom views over recognized items
scanner.overlayContainerView.addSubview(customHighlight)

// Capture still photo
let photo = try await scanner.capturePhoto()
swift
// 在识别项上添加自定义视图
scanner.overlayContainerView.addSubview(customHighlight)

// 拍摄静态照片
let photo = try await scanner.capturePhoto()

VNDocumentCameraViewController

VNDocumentCameraViewController

Availability: iOS 13+
Document scanning with automatic edge detection, perspective correction, and lighting adjustment.
可用性:iOS 13+
文档扫描,带有自动边缘检测、透视校正和光照调整。

Basic Usage

基本用法

swift
import VisionKit

let camera = VNDocumentCameraViewController()
camera.delegate = self
present(camera, animated: true)
swift
import VisionKit

let camera = VNDocumentCameraViewController()
camera.delegate = self
present(camera, animated: true)

Delegate Protocol

委托协议

swift
protocol VNDocumentCameraViewControllerDelegate {
    func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                       didFinishWith scan: VNDocumentCameraScan)

    func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController)

    func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                       didFailWithError error: Error)
}
swift
protocol VNDocumentCameraViewControllerDelegate {
    func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                       didFinishWith scan: VNDocumentCameraScan)

    func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController)

    func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                       didFailWithError error: Error)
}

VNDocumentCameraScan

VNDocumentCameraScan

PropertyTypeDescription
pageCount
Int
Number of scanned pages
imageOfPage(at:)
UIImage
Get page image at index
title
String
User-editable title
swift
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                   didFinishWith scan: VNDocumentCameraScan) {
    controller.dismiss(animated: true)

    for i in 0..<scan.pageCount {
        let pageImage = scan.imageOfPage(at: i)
        // Process with VNRecognizeTextRequest
    }
}
属性类型描述
pageCount
Int
扫描页数
imageOfPage(at:)
UIImage
获取指定索引的页面图像
title
String
用户可编辑的标题
swift
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                   didFinishWith scan: VNDocumentCameraScan) {
    controller.dismiss(animated: true)

    for i in 0..<scan.pageCount {
        let pageImage = scan.imageOfPage(at: i)
        // 使用VNRecognizeTextRequest处理
    }
}

Document Analysis APIs

文档分析API

VNDetectDocumentSegmentationRequest

VNDetectDocumentSegmentationRequest

Availability: iOS 15+, macOS 12+
Detects document boundaries for custom camera UIs or post-processing.
swift
let request = VNDetectDocumentSegmentationRequest()
let handler = VNImageRequestHandler(ciImage: image)
try handler.perform([request])

guard let observation = request.results?.first as? VNRectangleObservation else {
    return  // No document found
}

// Get corner points (normalized)
let corners = [
    observation.topLeft,
    observation.topRight,
    observation.bottomLeft,
    observation.bottomRight
]
vs VNDetectRectanglesRequest:
  • Document: ML-based, trained specifically on documents
  • Rectangle: Edge-based, finds any quadrilateral
可用性:iOS 15+、macOS 12+
为自定义相机UI或后处理检测文档边界。
swift
let request = VNDetectDocumentSegmentationRequest()
let handler = VNImageRequestHandler(ciImage: image)
try handler.perform([request])

guard let observation = request.results?.first as? VNRectangleObservation else {
    return  // 未找到文档
}

// 获取角点(归一化)
let corners = [
    observation.topLeft,
    observation.topRight,
    observation.bottomLeft,
    observation.bottomRight
]
与VNDetectRectanglesRequest对比:
  • 文档:基于机器学习,专门针对文档训练
  • 矩形:基于边缘,查找任何四边形

RecognizeDocumentsRequest (iOS 26+)

RecognizeDocumentsRequest(iOS 26+)

Availability: iOS 26+, macOS 26+
Structured document understanding with semantic parsing.
可用性:iOS 26+、macOS 26+
结构化文档理解,带有语义解析。

Basic Usage

基本用法

swift
let request = RecognizeDocumentsRequest()
let observations = try await request.perform(on: imageData)

guard let document = observations.first?.document else {
    return
}
swift
let request = RecognizeDocumentsRequest()
let observations = try await request.perform(on: imageData)

guard let document = observations.first?.document else {
    return
}

DocumentObservation Hierarchy

DocumentObservation层次结构

DocumentObservation
└── document: DocumentObservation.Document
    ├── text: TextObservation
    ├── tables: [Container.Table]
    ├── lists: [Container.List]
    └── barcodes: [Container.Barcode]
DocumentObservation
└── document: DocumentObservation.Document
    ├── text: TextObservation
    ├── tables: [Container.Table]
    ├── lists: [Container.List]
    └── barcodes: [Container.Barcode]

Table Extraction

表格提取

swift
for table in document.tables {
    for row in table.rows {
        for cell in row {
            let text = cell.content.text.transcript
            let detectedData = cell.content.text.detectedData
        }
    }
}
swift
for table in document.tables {
    for row in table.rows {
        for cell in row {
            let text = cell.content.text.transcript
            let detectedData = cell.content.text.detectedData
        }
    }
}

Detected Data Types

检测到的数据类型

swift
for data in document.text.detectedData {
    switch data.match.details {
    case .emailAddress(let email):
        let address = email.emailAddress
    case .phoneNumber(let phone):
        let number = phone.phoneNumber
    case .link(let url):
        let link = url
    case .address(let address):
        let components = address
    case .date(let date):
        let dateValue = date
    default:
        break
    }
}
swift
for data in document.text.detectedData {
    switch data.match.details {
    case .emailAddress(let email):
        let address = email.emailAddress
    case .phoneNumber(let phone):
        let number = phone.phoneNumber
    case .link(let url):
        let link = url
    case .address(let address):
        let components = address
    case .date(let date):
        let dateValue = date
    default:
        break
    }
}

TextObservation Hierarchy

TextObservation层次结构

TextObservation
├── transcript: String
├── lines: [TextObservation.Line]
├── paragraphs: [TextObservation.Paragraph]
├── words: [TextObservation.Word]
└── detectedData: [DetectedDataObservation]
TextObservation
├── transcript: String
├── lines: [TextObservation.Line]
├── paragraphs: [TextObservation.Paragraph]
├── words: [TextObservation.Word]
└── detectedData: [DetectedDataObservation]

API Quick Reference

API快速参考

Subject Segmentation

主体分割

APIPlatformPurpose
VNGenerateForegroundInstanceMaskRequest
iOS 17+Class-agnostic subject instances
VNGeneratePersonInstanceMaskRequest
iOS 17+Up to 4 people separately
VNGeneratePersonSegmentationRequest
iOS 15+All people (single mask)
ImageAnalysisInteraction
(VisionKit)
iOS 16+UI for subject lifting
API平台用途
VNGenerateForegroundInstanceMaskRequest
iOS 17+类别无关的主体实例
VNGeneratePersonInstanceMaskRequest
iOS 17+最多4个人的单独掩码
VNGeneratePersonSegmentationRequest
iOS 15+所有人的单个掩码
ImageAnalysisInteraction
(VisionKit)
iOS 16+主体提取的UI

Pose Detection

姿态检测

APIPlatformLandmarksCoordinates
VNDetectHumanHandPoseRequest
iOS 14+21 per hand2D normalized
VNDetectHumanBodyPoseRequest
iOS 14+18 body joints2D normalized
VNDetectHumanBodyPose3DRequest
iOS 17+17 body joints3D meters
API平台标记点坐标
VNDetectHumanHandPoseRequest
iOS 14+每只手21个2D归一化
VNDetectHumanBodyPoseRequest
iOS 14+18个身体关节2D归一化
VNDetectHumanBodyPose3DRequest
iOS 17+17个身体关节3D米

Face & Person Detection

面部与人物检测

APIPlatformPurpose
VNDetectFaceRectanglesRequest
iOS 11+Face bounding boxes
VNDetectFaceLandmarksRequest
iOS 11+Face with detailed landmarks
VNDetectHumanRectanglesRequest
iOS 13+Human torso bounding boxes
API平台用途
VNDetectFaceRectanglesRequest
iOS 11+面部边界框
VNDetectFaceLandmarksRequest
iOS 11+带详细标记点的面部
VNDetectHumanRectanglesRequest
iOS 13+人物躯干边界框

Text & Barcode

文本与条形码

APIPlatformPurpose
VNRecognizeTextRequest
iOS 13+Text recognition (OCR)
VNDetectBarcodesRequest
iOS 11+Barcode/QR detection
DataScannerViewController
iOS 16+Live camera scanner (text + barcodes)
VNDocumentCameraViewController
iOS 13+Document scanning with perspective correction
VNDetectDocumentSegmentationRequest
iOS 15+Programmatic document edge detection
RecognizeDocumentsRequest
iOS 26+Structured document extraction
API平台用途
VNRecognizeTextRequest
iOS 13+文本识别(OCR)
VNDetectBarcodesRequest
iOS 11+条形码/二维码检测
DataScannerViewController
iOS 16+实时相机扫描器(文本+条形码)
VNDocumentCameraViewController
iOS 13+带透视校正的文档扫描
VNDetectDocumentSegmentationRequest
iOS 15+程序化文档边缘检测
RecognizeDocumentsRequest
iOS 26+结构化文档提取

Observation Types

观测类型

ObservationReturned By
VNInstanceMaskObservation
Foreground/person instance masks
VNPixelBufferObservation
Person segmentation (single mask)
VNHumanHandPoseObservation
Hand pose
VNHumanBodyPoseObservation
Body pose (2D)
VNHumanBodyPose3DObservation
Body pose (3D)
VNFaceObservation
Face detection/landmarks
VNHumanObservation
Human rectangles
VNRecognizedTextObservation
Text recognition
VNBarcodeObservation
Barcode detection
VNRectangleObservation
Document segmentation
DocumentObservation
Structured document (iOS 26+)
观测类型由以下API返回
VNInstanceMaskObservation
前景/人物实例掩码
VNPixelBufferObservation
人物分割(单个掩码)
VNHumanHandPoseObservation
手部姿态
VNHumanBodyPoseObservation
身体姿态(2D)
VNHumanBodyPose3DObservation
身体姿态(3D)
VNFaceObservation
面部检测/标记点
VNHumanObservation
人物矩形
VNRecognizedTextObservation
文本识别
VNBarcodeObservation
条形码检测
VNRectangleObservation
文档分割
DocumentObservation
结构化文档(iOS 26+)

Resources

资源

WWDC: 2019-234, 2021-10041, 2022-10024, 2022-10025, 2025-272, 2023-10176, 2023-111241, 2023-10048, 2020-10653, 2020-10043, 2020-10099
Docs: /vision, /visionkit, /vision/vnrecognizetextrequest, /vision/vndetectbarcodesrequest
Skills: axiom-vision, axiom-vision-diag
WWDC:2019-234、2021-10041、2022-10024、2022-10025、2025-272、2023-10176、2023-111241、2023-10048、2020-10653、2020-10043、2020-10099
文档:/vision、/visionkit、/vision/vnrecognizetextrequest、/vision/vndetectbarcodesrequest
技能:axiom-vision、axiom-vision-diag