axiom-vision-ref

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Vision Framework API Reference

Vision框架API参考

Comprehensive reference for Vision framework computer vision: subject segmentation, hand/body pose detection, person detection, face analysis, text recognition (OCR), barcode detection, and document scanning.

Vision框架计算机视觉的综合参考：主体分割、手部/身体姿态检测、人物检测、面部分析、文本识别（OCR）、条形码检测以及文档扫描。

When to Use This Reference

何时使用本参考

Implementing subject lifting using VisionKit or Vision
Detecting hand/body poses for gesture recognition or fitness apps
Segmenting people from backgrounds or separating multiple individuals
Face detection and landmarks for AR effects or authentication
Combining Vision APIs to solve complex computer vision problems
Looking up specific API signatures and parameter meanings
Recognizing text in images (OCR) with VNRecognizeTextRequest
Detecting barcodes and QR codes with VNDetectBarcodesRequest
Building live scanners with DataScannerViewController
Scanning documents with VNDocumentCameraViewController
Extracting structured document data with RecognizeDocumentsRequest (iOS 26+)

Related skills: See

axiom-vision

for decision trees and patterns,

axiom-vision-diag

for troubleshooting

使用VisionKit或Vision实现主体提取
检测手部/身体姿态，用于手势识别或健身应用
将人物从背景中分割出来，或分离多个个体
面部检测与标记点，用于AR特效或身份验证
组合Vision API解决复杂的计算机视觉问题
查阅特定API的签名和参数含义
使用VNRecognizeTextRequest识别图像中的文本（OCR）
使用VNDetectBarcodesRequest检测条形码和二维码
使用DataScannerViewController构建实时扫描器
使用VNDocumentCameraViewController扫描文档
使用RecognizeDocumentsRequest提取结构化文档数据（iOS 26+）

相关技能：请查看

axiom-vision

获取决策树和模式，

axiom-vision-diag

用于故障排除

Vision Framework Overview

Vision框架概述

Vision provides computer vision algorithms for still images and video:

Core workflow:

Create request (e.g.,
```
VNDetectHumanHandPoseRequest()
```
)
Create handler with image (
```
VNImageRequestHandler(cgImage: image)
```
)
Perform request (
```
try handler.perform([request])
```
)
Access observations from
```
request.results
```

Coordinate system: Lower-left origin, normalized (0.0-1.0) coordinates

Performance: Run on background queue - resource intensive, blocks UI if on main thread

Vision为静态图像和视频提供计算机视觉算法：

核心工作流:

创建请求（例如
```
VNDetectHumanHandPoseRequest()
```
）
使用图像创建处理器（
```
VNImageRequestHandler(cgImage: image)
```
）
执行请求（
```
try handler.perform([request])
```
）
从
```
request.results
```
中获取观测结果

坐标系：原点在左下角，坐标为归一化值（0.0-1.0）

性能：在后台队列运行——资源密集型，若在主线程运行会阻塞UI

Subject Segmentation APIs

主体分割API

VNGenerateForegroundInstanceMaskRequest

Availability: iOS 17+, macOS 14+, tvOS 17+, axiom-visionOS 1+

Generates class-agnostic instance mask of foreground objects (people, pets, buildings, food, shoes, etc.)

可用性：iOS 17+、macOS 14+、tvOS 17+、visionOS 1+

生成前景对象（人物、宠物、建筑、食物、鞋子等）的类别无关实例掩码

Basic Usage

基本用法

swift

let request = VNGenerateForegroundInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: image)

try handler.perform([request])

guard let observation = request.results?.first as? VNInstanceMaskObservation else {
    return
}

swift

let request = VNGenerateForegroundInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: image)

try handler.perform([request])

guard let observation = request.results?.first as? VNInstanceMaskObservation else {
    return
}

InstanceMaskObservation

allInstances:

IndexSet

containing all foreground instance indices (excludes background 0)

instanceMask:

CVPixelBuffer

with UInt8 labels (0 = background, 1+ = instance indices)

instanceAtPoint(_:): Returns instance index at normalized point

swift

let point = CGPoint(x: 0.5, y: 0.5)  // Center of image
let instance = observation.instanceAtPoint(point)

if instance == 0 {
    print("Background tapped")
} else {
    print("Instance \(instance) tapped")
}

allInstances：

IndexSet

包含所有前景实例的索引（不包括背景0）

instanceMask：

CVPixelBuffer

，带有UInt8标签（0=背景，1+=实例索引）

instanceAtPoint(_:)：返回归一化点处的实例索引

swift

let point = CGPoint(x: 0.5, y: 0.5)  // 图像中心
let instance = observation.instanceAtPoint(point)

if instance == 0 {
    print("点击了背景")
} else {
    print("点击了实例 \(instance)")
}

Generating Masks

生成掩码

createScaledMask(for:croppedToInstancesContent:)

Parameters:

```
for
```
:
```
IndexSet
```
of instances to include
```
croppedToInstancesContent
```
:
- ```
false
```
  = Output matches input resolution (for compositing)
- ```
true
```
  = Tight crop around selected instances

Returns: Single-channel floating-point

CVPixelBuffer

(soft segmentation mask)

swift

// All instances, full resolution
let mask = try observation.createScaledMask(
    for: observation.allInstances,
    croppedToInstancesContent: false
)

// Single instance, cropped
let instances = IndexSet(integer: 1)
let croppedMask = try observation.createScaledMask(
    for: instances,
    croppedToInstancesContent: true
)

createScaledMask(for:croppedToInstancesContent:)

参数:

```
for
```
：要包含的实例的
```
IndexSet
```
```
croppedToInstancesContent
```
:
- ```
false
```
  = 输出与输入分辨率匹配（用于合成）
- ```
true
```
  = 围绕所选实例进行紧密裁剪

返回：单通道浮点型

CVPixelBuffer

（软分割掩码）

swift

// 所有实例，全分辨率
let mask = try observation.createScaledMask(
    for: observation.allInstances,
    croppedToInstancesContent: false
)

// 单个实例，裁剪后
let instances = IndexSet(integer: 1)
let croppedMask = try observation.createScaledMask(
    for: instances,
    croppedToInstancesContent: true
)

Instance Mask Hit Testing

实例掩码点击测试

Access raw pixel buffer to map tap coordinates to instance labels:

swift

let instanceMask = observation.instanceMask

CVPixelBufferLockBaseAddress(instanceMask, .readOnly)
defer { CVPixelBufferUnlockBaseAddress(instanceMask, .readOnly) }

let baseAddress = CVPixelBufferGetBaseAddress(instanceMask)
let width = CVPixelBufferGetWidth(instanceMask)
let bytesPerRow = CVPixelBufferGetBytesPerRow(instanceMask)

// Convert normalized tap to pixel coordinates
let pixelPoint = VNImagePointForNormalizedPoint(
    CGPoint(x: normalizedX, y: normalizedY),
    width: imageWidth,
    height: imageHeight
)

// Calculate byte offset
let offset = Int(pixelPoint.y) * bytesPerRow + Int(pixelPoint.x)

// Read instance label
let label = UnsafeRawPointer(baseAddress!).load(
    fromByteOffset: offset,
    as: UInt8.self
)

let instances = label == 0 ? observation.allInstances : IndexSet(integer: Int(label))

访问原始像素缓冲区，将点击坐标映射到实例标签：

swift

let instanceMask = observation.instanceMask

CVPixelBufferLockBaseAddress(instanceMask, .readOnly)
defer { CVPixelBufferUnlockBaseAddress(instanceMask, .readOnly) }

let baseAddress = CVPixelBufferGetBaseAddress(instanceMask)
let width = CVPixelBufferGetWidth(instanceMask)
let bytesPerRow = CVPixelBufferGetBytesPerRow(instanceMask)

// 将归一化点击坐标转换为像素坐标
let pixelPoint = VNImagePointForNormalizedPoint(
    CGPoint(x: normalizedX, y: normalizedY),
    width: imageWidth,
    height: imageHeight
)

// 计算字节偏移量
let offset = Int(pixelPoint.y) * bytesPerRow + Int(pixelPoint.x)

// 读取实例标签
let label = UnsafeRawPointer(baseAddress!).load(
    fromByteOffset: offset,
    as: UInt8.self
)

let instances = label == 0 ? observation.allInstances : IndexSet(integer: Int(label))

VisionKit Subject Lifting

VisionKit主体提取

ImageAnalysisInteraction (iOS)

ImageAnalysisInteraction（iOS）

Availability: iOS 16+, iPadOS 16+

Adds system-like subject lifting UI to views:

swift

let interaction = ImageAnalysisInteraction()
interaction.preferredInteractionTypes = .imageSubject  // Or .automatic
imageView.addInteraction(interaction)

Interaction types:

```
.automatic
```
: Subject lifting + Live Text + data detectors
```
.imageSubject
```
: Subject lifting only (no interactive text)

可用性：iOS 16+、iPadOS 16+

为视图添加系统级的主体提取UI：

swift

let interaction = ImageAnalysisInteraction()
interaction.preferredInteractionTypes = .imageSubject  // 或.automatic
imageView.addInteraction(interaction)

交互类型:

```
.automatic
```
：主体提取 + 实时文本 + 数据检测器
```
.imageSubject
```
：仅主体提取（无交互式文本）

ImageAnalysisOverlayView (macOS)

ImageAnalysisOverlayView（macOS）

Availability: macOS 13+

swift

let overlayView = ImageAnalysisOverlayView()
overlayView.preferredInteractionTypes = .imageSubject
nsView.addSubview(overlayView)

可用性：macOS 13+

swift

let overlayView = ImageAnalysisOverlayView()
overlayView.preferredInteractionTypes = .imageSubject
nsView.addSubview(overlayView)

Programmatic Access

程序化访问

ImageAnalyzer

swift

let analyzer = ImageAnalyzer()
let configuration = ImageAnalyzer.Configuration([.text, .visualLookUp])

let analysis = try await analyzer.analyze(image, configuration: configuration)

swift

let analyzer = ImageAnalyzer()
let configuration = ImageAnalyzer.Configuration([.text, .visualLookUp])

let analysis = try await analyzer.analyze(image, configuration: configuration)

ImageAnalysis

subjects:

[Subject]

- All subjects in image

highlightedSubjects:

Set<Subject>

- Currently highlighted (user long-pressed)

subject(at:): Async lookup of subject at normalized point (returns

nil

if none)

swift

// Get all subjects
let subjects = analysis.subjects

// Look up subject at tap
if let subject = try await analysis.subject(at: tapPoint) {
    // Process subject
}

// Change highlight state
analysis.highlightedSubjects = Set([subjects[0], subjects[1]])

subjects：

[Subject]

- 图像中的所有主体

highlightedSubjects：

Set<Subject>

- 当前高亮的主体（用户长按选中）

subject(at:)：异步查找归一化点处的主体（无主体时返回

nil

）

swift

// 获取所有主体
let subjects = analysis.subjects

// 查找点击位置的主体
if let subject = try await analysis.subject(at: tapPoint) {
    // 处理主体
}

// 更改高亮状态
analysis.highlightedSubjects = Set([subjects[0], subjects[1]])

Subject Struct

Subject结构体

image:

UIImage

NSImage

- Extracted subject with transparency

bounds:

CGRect

- Subject boundaries in image coordinates

swift

// Single subject image
let subjectImage = subject.image

// Composite multiple subjects
let compositeImage = try await analysis.image(for: [subject1, subject2])

Out-of-process: VisionKit analysis happens out-of-process (performance benefit, image size limited)

image：

UIImage

NSImage

- 提取的带透明度的主体

bounds：

CGRect

- 主体在图像坐标系中的边界

swift

// 单个主体图像
let subjectImage = subject.image

// 合成多个主体
let compositeImage = try await analysis.image(for: [subject1, subject2])

进程外处理：VisionKit分析在进程外进行（性能优势，图像大小受限）

Person Segmentation APIs

人物分割API

VNGeneratePersonSegmentationRequest

Availability: iOS 15+, macOS 12+

Returns single mask containing all people in image:

swift

let request = VNGeneratePersonSegmentationRequest()
// Configure quality level if needed
try handler.perform([request])

guard let observation = request.results?.first as? VNPixelBufferObservation else {
    return
}

let personMask = observation.pixelBuffer  // CVPixelBuffer

可用性：iOS 15+、macOS 12+

返回包含图像中所有人的单个掩码：

swift

let request = VNGeneratePersonSegmentationRequest()
// 如有需要，配置质量级别
try handler.perform([request])

guard let observation = request.results?.first as? VNPixelBufferObservation else {
    return
}

let personMask = observation.pixelBuffer  // CVPixelBuffer

VNGeneratePersonInstanceMaskRequest

Availability: iOS 17+, macOS 14+

Returns separate masks for up to 4 people:

swift

let request = VNGeneratePersonInstanceMaskRequest()
try handler.perform([request])

guard let observation = request.results?.first as? VNInstanceMaskObservation else {
    return
}

// Same InstanceMaskObservation API as foreground instance masks
let allPeople = observation.allInstances  // Up to 4 people (1-4)

// Get mask for person 1
let person1Mask = try observation.createScaledMask(
    for: IndexSet(integer: 1),
    croppedToInstancesContent: false
)

Limitations:

Segments up to 4 people
With >4 people: may miss people or combine them (typically background people)
Use
```
VNDetectFaceRectanglesRequest
```
to count faces if you need to handle crowded scenes

可用性：iOS 17+、macOS 14+

返回最多4个人的单独掩码：

swift

let request = VNGeneratePersonInstanceMaskRequest()
try handler.perform([request])

guard let observation = request.results?.first as? VNInstanceMaskObservation else {
    return
}

// 与前景实例掩码使用相同的InstanceMaskObservation API
let allPeople = observation.allInstances  // 最多4人（1-4）

// 获取第1个人的掩码
let person1Mask = try observation.createScaledMask(
    for: IndexSet(integer: 1),
    croppedToInstancesContent: false
)

限制:

最多分割4个人
超过4人时：可能会遗漏人物或合并（通常是背景中的人物）
若需要处理拥挤场景，使用
```
VNDetectFaceRectanglesRequest
```
统计面部数量

Hand Pose Detection

手部姿态检测

VNDetectHumanHandPoseRequest

Availability: iOS 14+, macOS 11+

Detects 21 hand landmarks per hand:

swift

let request = VNDetectHumanHandPoseRequest()
request.maximumHandCount = 2  // Default: 2, increase if needed

let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])

for observation in request.results as? [VNHumanHandPoseObservation] ?? [] {
    // Process each hand
}

Performance note:

maximumHandCount

affects latency. Pose computed only for hands ≤ maximum. Set to lowest acceptable value.

可用性：iOS 14+、macOS 11+

每只手检测21个手部标记点：

swift

let request = VNDetectHumanHandPoseRequest()
request.maximumHandCount = 2  // 默认值：2，如有需要可增加

let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])

for observation in request.results as? [VNHumanHandPoseObservation] ?? [] {
    // 处理每只手
}

性能说明：

maximumHandCount

会影响延迟。仅为数量≤最大值的手计算姿态。设置为可接受的最低值。

Hand Landmarks (21 points)

手部标记点（21个点）

Wrist: 1 landmark

Thumb (4 landmarks):

```
.thumbTip
```
```
.thumbIP
```
(interphalangeal joint)
```
.thumbMP
```
(metacarpophalangeal joint)
```
.thumbCMC
```
(carpometacarpal joint)

Fingers (4 landmarks each):

Tip (
```
.indexTip
```
,
```
.middleTip
```
,
```
.ringTip
```
,
```
.littleTip
```
)
DIP (distal interphalangeal joint)
PIP (proximal interphalangeal joint)
MCP (metacarpophalangeal joint)

手腕：1个标记点

拇指（4个标记点）:

```
.thumbTip
```
```
.thumbIP
```
（指间关节）
```
.thumbMP
```
（掌指关节）
```
.thumbCMC
```
（腕掌关节）

手指（每个4个标记点）:

指尖（
```
.indexTip
```
、
```
.middleTip
```
、
```
.ringTip
```
、
```
.littleTip
```
）
DIP（远侧指间关节）
PIP（近侧指间关节）
MCP（掌指关节）

Group Keys

组键

Access landmark groups:

Group Key	Points
`.all`	All 21 landmarks
`.thumb`	4 thumb joints
`.indexFinger`	4 index finger joints
`.middleFinger`	4 middle finger joints
`.ringFinger`	4 ring finger joints
`.littleFinger`	4 little finger joints

swift

// Get all points
let allPoints = try observation.recognizedPoints(.all)

// Get index finger points only
let indexPoints = try observation.recognizedPoints(.indexFinger)

// Get specific point
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)

// Check confidence
guard thumbTip.confidence > 0.5 else { return }

// Access location (normalized coordinates, lower-left origin)
let location = thumbTip.location  // CGPoint

访问标记点组：

组键	点数
`.all`	所有21个标记点
`.thumb`	4个拇指关节
`.indexFinger`	4个食指关节
`.middleFinger`	4个中指关节
`.ringFinger`	4个无名指关节
`.littleFinger`	4个小指关节

swift

// 获取所有点
let allPoints = try observation.recognizedPoints(.all)

// 仅获取食指的点
let indexPoints = try observation.recognizedPoints(.indexFinger)

// 获取特定点
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)

// 检查置信度
guard thumbTip.confidence > 0.5 else { return }

// 访问位置（归一化坐标，原点在左下角）
let location = thumbTip.location  // CGPoint

Gesture Recognition Example (Pinch)

手势识别示例（捏合）

swift

let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)

guard thumbTip.confidence > 0.5, indexTip.confidence > 0.5 else {
    return
}

let distance = hypot(
    thumbTip.location.x - indexTip.location.x,
    thumbTip.location.y - indexTip.location.y
)

let isPinching = distance < 0.05  // Normalized threshold

swift

let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)

guard thumbTip.confidence > 0.5, indexTip.confidence > 0.5 else {
    return
}

let distance = hypot(
    thumbTip.location.x - indexTip.location.x,
    thumbTip.location.y - indexTip.location.y
)

let isPinching = distance < 0.05  // 归一化阈值

Chirality (Handedness)

手性（左右手）

swift

let chirality = observation.chirality  // .left or .right or .unknown

swift

let chirality = observation.chirality  // .left、.right或.unknown

Body Pose Detection

身体姿态检测

VNDetectHumanBodyPoseRequest (2D)

VNDetectHumanBodyPoseRequest（2D）

Availability: iOS 14+, macOS 11+

Detects 18 body landmarks (2D normalized coordinates):

swift

let request = VNDetectHumanBodyPoseRequest()
try handler.perform([request])

for observation in request.results as? [VNHumanBodyPoseObservation] ?? [] {
    // Process each person
}

可用性：iOS 14+、macOS 11+

检测18个身体标记点（2D归一化坐标）：

swift

let request = VNDetectHumanBodyPoseRequest()
try handler.perform([request])

for observation in request.results as? [VNHumanBodyPoseObservation] ?? [] {
    // 处理每个人
}

Body Landmarks (18 points)

身体标记点（18个点）

Face (5 landmarks):

```
.nose
```
,
```
.leftEye
```
,
```
.rightEye
```
,
```
.leftEar
```
,
```
.rightEar
```

Arms (6 landmarks):

Left:
```
.leftShoulder
```
,
```
.leftElbow
```
,
```
.leftWrist
```
Right:
```
.rightShoulder
```
,
```
.rightElbow
```
,
```
.rightWrist
```

Torso (7 landmarks):

```
.neck
```
(between shoulders)
```
.leftShoulder
```
,
```
.rightShoulder
```
(also in arm groups)
```
.leftHip
```
,
```
.rightHip
```
```
.root
```
(between hips)

Legs (6 landmarks):

Left:
```
.leftHip
```
,
```
.leftKnee
```
,
```
.leftAnkle
```
Right:
```
.rightHip
```
,
```
.rightKnee
```
,
```
.rightAnkle
```

Note: Shoulders and hips appear in multiple groups

面部（5个标记点）:

```
.nose
```
、
```
.leftEye
```
、
```
.rightEye
```
、
```
.leftEar
```
、
```
.rightEar
```

手臂（6个标记点）:

左侧：
```
.leftShoulder
```
、
```
.leftElbow
```
、
```
.leftWrist
```
右侧：
```
.rightShoulder
```
、
```
.rightElbow
```
、
```
.rightWrist
```

躯干（7个标记点）:

```
.neck
```
（两肩之间）
```
.leftShoulder
```
、
```
.rightShoulder
```
（也属于手臂组）
```
.leftHip
```
、
```
.rightHip
```
```
.root
```
（两髋之间）

腿部（6个标记点）:

左侧：
```
.leftHip
```
、
```
.leftKnee
```
、
```
.leftAnkle
```
右侧：
```
.rightHip
```
、
```
.rightKnee
```
、
```
.rightAnkle
```

注意：肩膀和髋部属于多个组

Group Keys (Body)

组键（身体）

Group Key	Points
`.all`	All 18 landmarks
`.face`	5 face landmarks
`.leftArm`	shoulder, elbow, wrist
`.rightArm`	shoulder, elbow, wrist
`.torso`	neck, shoulders, hips, root
`.leftLeg`	hip, knee, ankle
`.rightLeg`	hip, knee, ankle

swift

// Get all body points
let allPoints = try observation.recognizedPoints(.all)

// Get left arm only
let leftArmPoints = try observation.recognizedPoints(.leftArm)

// Get specific joint
let leftWrist = try observation.recognizedPoint(.leftWrist)

组键	点数
`.all`	所有18个标记点
`.face`	5个面部标记点
`.leftArm`	肩膀、肘部、手腕
`.rightArm`	肩膀、肘部、手腕
`.torso`	颈部、肩膀、髋部、根部
`.leftLeg`	髋部、膝盖、脚踝
`.rightLeg`	髋部、膝盖、脚踝

swift

// 获取所有身体点
let allPoints = try observation.recognizedPoints(.all)

// 仅获取左臂的点
let leftArmPoints = try observation.recognizedPoints(.leftArm)

// 获取特定关节
let leftWrist = try observation.recognizedPoint(.leftWrist)

VNDetectHumanBodyPose3DRequest (3D)

VNDetectHumanBodyPose3DRequest（3D）

Availability: iOS 17+, macOS 14+

Returns 3D skeleton with 17 joints in meters (real-world coordinates):

swift

let request = VNDetectHumanBodyPose3DRequest()
try handler.perform([request])

guard let observation = request.results?.first as? VNHumanBodyPose3DObservation else {
    return
}

// Get 3D joint position
let leftWrist = try observation.recognizedPoint(.leftWrist)
let position = leftWrist.position  // simd_float4x4 matrix
let localPosition = leftWrist.localPosition  // Relative to parent joint

3D Body Landmarks (17 points): Same as 2D except no ears (15 vs 18 2D landmarks)

可用性：iOS 17+、macOS 14+

返回带17个关节的3D骨架，单位为米（真实世界坐标）：

swift

let request = VNDetectHumanBodyPose3DRequest()
try handler.perform([request])

guard let observation = request.results?.first as? VNHumanBodyPose3DObservation else {
    return
}

// 获取3D关节位置
let leftWrist = try observation.recognizedPoint(.leftWrist)
let position = leftWrist.position  // simd_float4x4矩阵
let localPosition = leftWrist.localPosition  // 相对于父关节的位置

3D身体标记点（17个点）：与2D相同，但不包含耳朵（15个，而2D是18个标记点）

3D Observation Properties

3D观测属性

bodyHeight: Estimated height in meters

With depth data: Measured height
Without depth data: Reference height (1.8m)

heightEstimation:

.measured

.reference

cameraOriginMatrix:

simd_float4x4

camera position/orientation relative to subject

pointInImage(_:): Project 3D joint back to 2D image coordinates

swift

let wrist2D = try observation.pointInImage(leftWrist)

bodyHeight：估计身高，单位为米

有深度数据时：实测身高
无深度数据时：参考身高（1.8米）

heightEstimation：

.measured

或

.reference

cameraOriginMatrix：

simd_float4x4

，相机相对于主体的位置/方向

pointInImage(_:)：将3D关节投影回2D图像坐标

swift

let wrist2D = try observation.pointInImage(leftWrist)

3D Point Classes

3D点类

VNPoint3D: Base class with

simd_float4x4

position matrix

VNRecognizedPoint3D: Adds identifier (joint name)

VNHumanBodyRecognizedPoint3D: Adds

localPosition

and

parentJoint

swift

// Position relative to skeleton root (center of hip)
let modelPosition = leftWrist.position

// Position relative to parent joint (left elbow)
let relativePosition = leftWrist.localPosition

VNPoint3D：基类，带有

simd_float4x4

位置矩阵

VNRecognizedPoint3D：添加了标识符（关节名称）

VNHumanBodyRecognizedPoint3D：添加了

localPosition

和

parentJoint

swift

// 相对于骨架根部（髋部中心）的位置
let modelPosition = leftWrist.position

// 相对于父关节（左肘部）的位置
let relativePosition = leftWrist.localPosition

Depth Input

深度输入

Vision accepts depth data alongside images:

swift

// From AVDepthData
let handler = VNImageRequestHandler(
    cvPixelBuffer: imageBuffer,
    depthData: depthData,
    orientation: orientation
)

// From file (automatic depth extraction)
let handler = VNImageRequestHandler(url: imageURL)  // Depth auto-fetched

Depth formats: Disparity or Depth (interchangeable via AVFoundation)

LiDAR: Use in live capture sessions for accurate scale/measurement

Vision接受与图像一起的深度数据：

swift

// 来自AVDepthData
let handler = VNImageRequestHandler(
    cvPixelBuffer: imageBuffer,
    depthData: depthData,
    orientation: orientation
)

// 来自文件（自动提取深度）
let handler = VNImageRequestHandler(url: imageURL)  // 自动获取深度

深度格式：视差或深度（可通过AVFoundation互换）

LiDAR：在实时捕获会话中使用，以获得准确的比例/测量结果

Face Detection & Landmarks

面部检测与标记点

VNDetectFaceRectanglesRequest

Availability: iOS 11+

Detects face bounding boxes:

swift

let request = VNDetectFaceRectanglesRequest()
try handler.perform([request])

for observation in request.results as? [VNFaceObservation] ?? [] {
    let faceBounds = observation.boundingBox  // Normalized rect
}

可用性：iOS 11+

检测面部边界框：

swift

let request = VNDetectFaceRectanglesRequest()
try handler.perform([request])

for observation in request.results as? [VNFaceObservation] ?? [] {
    let faceBounds = observation.boundingBox  // 归一化矩形
}

VNDetectFaceLandmarksRequest

Availability: iOS 11+

Detects face with detailed landmarks:

swift

let request = VNDetectFaceLandmarksRequest()
try handler.perform([request])

for observation in request.results as? [VNFaceObservation] ?? [] {
    if let landmarks = observation.landmarks {
        let leftEye = landmarks.leftEye
        let nose = landmarks.nose
        let leftPupil = landmarks.leftPupil  // Revision 2+
    }
}

Revisions:

Revision 1: Basic landmarks
Revision 2: Detects upside-down faces
Revision 3+: Pupil locations

可用性：iOS 11+

检测带有详细标记点的面部：

swift

let request = VNDetectFaceLandmarksRequest()
try handler.perform([request])

for observation in request.results as? [VNFaceObservation] ?? [] {
    if let landmarks = observation.landmarks {
        let leftEye = landmarks.leftEye
        let nose = landmarks.nose
        let leftPupil = landmarks.leftPupil  // 版本2+
    }
}

版本:

版本1：基础标记点
版本2：检测倒置的面部
版本3+：瞳孔位置

Person Detection

人物检测

VNDetectHumanRectanglesRequest

Availability: iOS 13+

Detects human bounding boxes (torso detection):

swift

let request = VNDetectHumanRectanglesRequest()
try handler.perform([request])

for observation in request.results as? [VNHumanObservation] ?? [] {
    let humanBounds = observation.boundingBox  // Normalized rect
}

Use case: Faster than pose detection when you only need location

可用性：iOS 13+

检测人物边界框（躯干检测）：

swift

let request = VNDetectHumanRectanglesRequest()
try handler.perform([request])

for observation in request.results as? [VNHumanObservation] ?? [] {
    let humanBounds = observation.boundingBox  // 归一化矩形
}

使用场景：当仅需要位置时，比姿态检测更快

CoreImage Integration

CoreImage集成

CIBlendWithMask Filter

CIBlendWithMask滤镜

Composite subject on new background using Vision mask:

swift

// 1. Get mask from Vision
let observation = request.results?.first as? VNInstanceMaskObservation
let visionMask = try observation.createScaledMask(
    for: observation.allInstances,
    croppedToInstancesContent: false
)

// 2. Convert to CIImage
let maskImage = CIImage(cvPixelBuffer: axiom-visionMask)

// 3. Apply filter
let filter = CIFilter(name: "CIBlendWithMask")!
filter.setValue(sourceImage, forKey: kCIInputImageKey)
filter.setValue(maskImage, forKey: kCIInputMaskImageKey)
filter.setValue(newBackground, forKey: kCIInputBackgroundImageKey)

let output = filter.outputImage  // Composited result

Parameters:

Input image: Original image to mask
Mask image: Vision's soft segmentation mask
Background image: New background (or empty image for transparency)

HDR preservation: CoreImage preserves high dynamic range from input (Vision/VisionKit output is SDR)

使用Vision掩码将主体合成到新背景上：

swift

// 1. 从Vision获取掩码
let observation = request.results?.first as? VNInstanceMaskObservation
let visionMask = try observation.createScaledMask(
    for: observation.allInstances,
    croppedToInstancesContent: false
)

// 2. 转换为CIImage
let maskImage = CIImage(cvPixelBuffer: visionMask)

// 3. 应用滤镜
let filter = CIFilter(name: "CIBlendWithMask")!
filter.setValue(sourceImage, forKey: kCIInputImageKey)
filter.setValue(maskImage, forKey: kCIInputMaskImageKey)
filter.setValue(newBackground, forKey: kCIInputBackgroundImageKey)

let output = filter.outputImage  // 合成结果

参数:

输入图像：要掩码的原始图像
掩码图像：Vision的软分割掩码
背景图像：新背景（或空图像以获得透明度）

HDR保留：CoreImage保留输入的高动态范围（Vision/VisionKit输出为SDR）

Text Recognition APIs

文本识别API

VNRecognizeTextRequest

Availability: iOS 13+, macOS 10.15+

Recognizes text in images with configurable accuracy/speed trade-off.

可用性：iOS 13+、macOS 10.15+

识别图像中的文本，可配置精度/速度权衡。

Basic Usage

基本用法

swift

let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate  // Or .fast
request.recognitionLanguages = ["en-US", "de-DE"]  // Order matters
request.usesLanguageCorrection = true

let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])

for observation in request.results as? [VNRecognizedTextObservation] ?? [] {
    // Get top candidates
    let candidates = observation.topCandidates(3)
    let bestText = candidates.first?.string ?? ""
}

swift

let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate  // 或.fast
request.recognitionLanguages = ["en-US", "de-DE"]  // 顺序很重要
request.usesLanguageCorrection = true

let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])

for observation in request.results as? [VNRecognizedTextObservation] ?? [] {
    // 获取排名靠前的候选结果
    let candidates = observation.topCandidates(3)
    let bestText = candidates.first?.string ?? ""
}

Recognition Levels

识别级别

Level	Performance	Accuracy	Best For
`.fast`	Real-time	Good	Camera feed, large text, signs
`.accurate`	Slower	Excellent	Documents, receipts, handwriting

Fast path: Character-by-character recognition (Neural Network → Character Detection)

Accurate path: Full-line ML recognition (Neural Network → Line/Word Recognition)

级别	性能	精度	最佳适用场景
`.fast`	实时	良好	相机画面、大文本、标识
`.accurate`	较慢	优秀	文档、收据、手写体

快速路径：逐字符识别（神经网络→字符检测）

精确路径：整行机器学习识别（神经网络→行/单词识别）

Properties

属性

Property	Type	Description
`recognitionLevel`	`VNRequestTextRecognitionLevel`	`.fast` or `.accurate`
`recognitionLanguages`	`[String]`	BCP 47 language codes, order = priority
`usesLanguageCorrection`	`Bool`	Use language model for correction
`customWords`	`[String]`	Domain-specific vocabulary
`automaticallyDetectsLanguage`	`Bool`	Auto-detect language (iOS 16+)
`minimumTextHeight`	`Float`	Min text height as fraction of image (0-1)
`revision`	`Int`	API version (affects supported languages)

属性	类型	描述
`recognitionLevel`	`VNRequestTextRecognitionLevel`	`.fast` 或 `.accurate`
`recognitionLanguages`	`[String]`	BCP 47语言代码，顺序=优先级
`usesLanguageCorrection`	`Bool`	使用语言模型进行校正
`customWords`	`[String]`	特定领域词汇
`automaticallyDetectsLanguage`	`Bool`	自动检测语言（iOS 16+）
`minimumTextHeight`	`Float`	最小文本高度，占图像的比例（0-1）
`revision`	`Int`	API版本（影响支持的语言）

Language Support

语言支持

swift

// Check supported languages for current settings
let languages = try VNRecognizeTextRequest.supportedRecognitionLanguages(
    for: .accurate,
    revision: VNRecognizeTextRequestRevision3
)

Language correction: Improves accuracy but takes processing time. Disable for codes/serial numbers.

Custom words: Add domain-specific vocabulary for better recognition (medical terms, product codes).

swift

// 检查当前设置支持的语言
let languages = try VNRecognizeTextRequest.supportedRecognitionLanguages(
    for: .accurate,
    revision: VNRecognizeTextRequestRevision3
)

语言校正：提高精度但会占用处理时间。对于代码/序列号，禁用此功能。

自定义词汇：添加特定领域词汇以提高识别精度（医学术语、产品代码）。

VNRecognizedTextObservation

boundingBox: Normalized rect containing recognized text

topCandidates(_:): Returns

[VNRecognizedText]

ordered by confidence

boundingBox：包含识别文本的归一化矩形

topCandidates(_:)：返回

[VNRecognizedText]

，按置信度排序

VNRecognizedText

Property	Type	Description
`string`	`String`	Recognized text
`confidence`	`VNConfidence`	0.0-1.0
`boundingBox(for:)`	`VNRectangleObservation?`	Box for substring range

swift

// Get bounding box for substring
let text = candidate.string
if let range = text.range(of: "invoice") {
    let box = try candidate.boundingBox(for: range)
}

属性	类型	描述
`string`	`String`	识别的文本
`confidence`	`VNConfidence`	0.0-1.0
`boundingBox(for:)`	`VNRectangleObservation?`	子字符串范围的边界框

swift

// 获取子字符串的边界框
let text = candidate.string
if let range = text.range(of: "invoice") {
    let box = try candidate.boundingBox(for: range)
}

Barcode Detection APIs

条形码检测API

VNDetectBarcodesRequest

Availability: iOS 11+, macOS 10.13+

Detects and decodes barcodes and QR codes.

可用性：iOS 11+、macOS 10.13+

检测并解码条形码和二维码。

Basic Usage

基本用法

swift

let request = VNDetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128]  // Specific codes

let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])

for barcode in request.results as? [VNBarcodeObservation] ?? [] {
    let payload = barcode.payloadStringValue
    let type = barcode.symbology
    let bounds = barcode.boundingBox
}

swift

let request = VNDetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128]  // 特定代码类型

let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])

for barcode in request.results as? [VNBarcodeObservation] ?? [] {
    let payload = barcode.payloadStringValue
    let type = barcode.symbology
    let bounds = barcode.boundingBox
}

Symbologies

码制

1D Barcodes:

```
.codabar
```
(iOS 15+)

.code39

.code39Checksum

.code39FullASCII

.code39FullASCIIChecksum

```
.code93
```
,
```
.code93i
```
```
.code128
```
```
.ean8
```
,
```
.ean13
```

.gs1DataBar

.gs1DataBarExpanded

.gs1DataBarLimited

(iOS 15+)

```
.i2of5
```
,
```
.i2of5Checksum
```
```
.itf14
```
```
.upce
```

2D Codes:

```
.aztec
```
```
.dataMatrix
```
```
.microPDF417
```
(iOS 15+)
```
.microQR
```
(iOS 15+)
```
.pdf417
```
```
.qr
```

Performance: Specifying fewer symbologies = faster detection

1D条形码:

```
.codabar
```
（iOS 15+）

.code39

、

.code39Checksum

、

.code39FullASCII

、

.code39FullASCIIChecksum

```
.code93
```
、
```
.code93i
```
```
.code128
```
```
.ean8
```
、
```
.ean13
```

.gs1DataBar

、

.gs1DataBarExpanded

、

.gs1DataBarLimited

（iOS 15+）

```
.i2of5
```
、
```
.i2of5Checksum
```
```
.itf14
```
```
.upce
```

2D码:

```
.aztec
```
```
.dataMatrix
```
```
.microPDF417
```
（iOS 15+）
```
.microQR
```
（iOS 15+）
```
.pdf417
```
```
.qr
```

性能：指定的码制越少，检测速度越快

Revisions

版本

Revision	iOS	Features
1	11+	Basic detection, one code at a time
2	15+	Codabar, GS1, MicroPDF, MicroQR, better ROI
3	16+	ML-based, multiple codes, better bounding boxes

版本	iOS	功能
1	11+	基础检测，一次检测一个码
2	15+	Codabar、GS1、MicroPDF、MicroQR，更好的ROI
3	16+	基于机器学习，多码检测，更好的边界框

VNBarcodeObservation

Property	Type	Description
`payloadStringValue`	`String?`	Decoded content
`symbology`	`VNBarcodeSymbology`	Barcode type
`boundingBox`	`CGRect`	Normalized bounds
`topLeft/topRight/bottomLeft/bottomRight`	`CGPoint`	Corner points

属性	类型	描述
`payloadStringValue`	`String?`	解码后的内容
`symbology`	`VNBarcodeSymbology`	条形码类型
`boundingBox`	`CGRect`	归一化边界
`topLeft/topRight/bottomLeft/bottomRight`	`CGPoint`	角点

VisionKit Scanner APIs

VisionKit扫描器API

DataScannerViewController

Availability: iOS 16+

Camera-based live scanner with built-in UI for text and barcodes.

可用性：iOS 16+

基于相机的实时扫描器，带有内置UI，支持文本和条形码。

Check Availability

检查可用性

swift

// Hardware support
DataScannerViewController.isSupported

// Runtime availability (camera access, parental controls)
DataScannerViewController.isAvailable

swift

// 硬件支持
DataScannerViewController.isSupported

// 运行时可用性（相机访问、家长控制）
DataScannerViewController.isAvailable

Configuration

配置

swift

import VisionKit

let dataTypes: Set<DataScannerViewController.RecognizedDataType> = [
    .barcode(symbologies: [.qr, .ean13]),
    .text(textContentType: .URL),  // Or nil for all text
    // .text(languages: ["ja"])  // Filter by language
]

let scanner = DataScannerViewController(
    recognizedDataTypes: dataTypes,
    qualityLevel: .balanced,  // .fast, .balanced, .accurate
    recognizesMultipleItems: true,
    isHighFrameRateTrackingEnabled: true,
    isPinchToZoomEnabled: true,
    isGuidanceEnabled: true,
    isHighlightingEnabled: true
)

scanner.delegate = self
present(scanner, animated: true) {
    try? scanner.startScanning()
}

swift

import VisionKit

let dataTypes: Set<DataScannerViewController.RecognizedDataType> = [
    .barcode(symbologies: [.qr, .ean13]),
    .text(textContentType: .URL),  // 或nil以识别所有文本
    // .text(languages: ["ja"])  // 按语言过滤
]

let scanner = DataScannerViewController(
    recognizedDataTypes: dataTypes,
    qualityLevel: .balanced,  // .fast、.balanced、.accurate
    recognizesMultipleItems: true,
    isHighFrameRateTrackingEnabled: true,
    isPinchToZoomEnabled: true,
    isGuidanceEnabled: true,
    isHighlightingEnabled: true
)

scanner.delegate = self
present(scanner, animated: true) {
    try? scanner.startScanning()
}

RecognizedDataType

Type	Description
`.barcode(symbologies:)`	Specific barcode types
`.text()`	All text
`.text(languages:)`	Text filtered by language
`.text(textContentType:)`	Text filtered by type (URL, phone, email)

类型	描述
`.barcode(symbologies:)`	特定条形码类型
`.text()`	所有文本
`.text(languages:)`	按语言过滤文本
`.text(textContentType:)`	按类型过滤文本（URL、电话、邮箱）

Delegate Protocol

委托协议

swift

protocol DataScannerViewControllerDelegate {
    func dataScanner(_ dataScanner: DataScannerViewController,
                     didTapOn item: RecognizedItem)

    func dataScanner(_ dataScanner: DataScannerViewController,
                     didAdd addedItems: [RecognizedItem],
                     allItems: [RecognizedItem])

    func dataScanner(_ dataScanner: DataScannerViewController,
                     didUpdate updatedItems: [RecognizedItem],
                     allItems: [RecognizedItem])

    func dataScanner(_ dataScanner: DataScannerViewController,
                     didRemove removedItems: [RecognizedItem],
                     allItems: [RecognizedItem])

    func dataScanner(_ dataScanner: DataScannerViewController,
                     becameUnavailableWithError error: DataScannerViewController.ScanningUnavailable)
}

swift

protocol DataScannerViewControllerDelegate {
    func dataScanner(_ dataScanner: DataScannerViewController,
                     didTapOn item: RecognizedItem)

    func dataScanner(_ dataScanner: DataScannerViewController,
                     didAdd addedItems: [RecognizedItem],
                     allItems: [RecognizedItem])

    func dataScanner(_ dataScanner: DataScannerViewController,
                     didUpdate updatedItems: [RecognizedItem],
                     allItems: [RecognizedItem])

    func dataScanner(_ dataScanner: DataScannerViewController,
                     didRemove removedItems: [RecognizedItem],
                     allItems: [RecognizedItem])

    func dataScanner(_ dataScanner: DataScannerViewController,
                     becameUnavailableWithError error: DataScannerViewController.ScanningUnavailable)
}

RecognizedItem

swift

enum RecognizedItem {
    case text(RecognizedItem.Text)
    case barcode(RecognizedItem.Barcode)

    var id: UUID { get }
    var bounds: RecognizedItem.Bounds { get }
}

// Text item
struct Text {
    let transcript: String
}

// Barcode item
struct Barcode {
    let payloadStringValue: String?
    let observation: VNBarcodeObservation
}

swift

enum RecognizedItem {
    case text(RecognizedItem.Text)
    case barcode(RecognizedItem.Barcode)

    var id: UUID { get }
    var bounds: RecognizedItem.Bounds { get }
}

// 文本项
struct Text {
    let transcript: String
}

// 条形码项
struct Barcode {
    let payloadStringValue: String?
    let observation: VNBarcodeObservation
}

Async Stream

异步流

swift

// Alternative to delegate
for await items in scanner.recognizedItems {
    // Current recognized items
}

swift

// 替代委托
for await items in scanner.recognizedItems {
    // 当前识别的项
}

Custom Highlights

自定义高亮

swift

// Add custom views over recognized items
scanner.overlayContainerView.addSubview(customHighlight)

// Capture still photo
let photo = try await scanner.capturePhoto()

swift

// 在识别项上添加自定义视图
scanner.overlayContainerView.addSubview(customHighlight)

// 拍摄静态照片
let photo = try await scanner.capturePhoto()

VNDocumentCameraViewController

Availability: iOS 13+

Document scanning with automatic edge detection, perspective correction, and lighting adjustment.

可用性：iOS 13+

文档扫描，带有自动边缘检测、透视校正和光照调整。

Basic Usage

基本用法

swift

import VisionKit

let camera = VNDocumentCameraViewController()
camera.delegate = self
present(camera, animated: true)

swift

import VisionKit

let camera = VNDocumentCameraViewController()
camera.delegate = self
present(camera, animated: true)

Delegate Protocol

委托协议

swift

protocol VNDocumentCameraViewControllerDelegate {
    func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                       didFinishWith scan: VNDocumentCameraScan)

    func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController)

    func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                       didFailWithError error: Error)
}

swift

protocol VNDocumentCameraViewControllerDelegate {
    func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                       didFinishWith scan: VNDocumentCameraScan)

    func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController)

    func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                       didFailWithError error: Error)
}

VNDocumentCameraScan

Property	Type	Description
`pageCount`	`Int`	Number of scanned pages
`imageOfPage(at:)`	`UIImage`	Get page image at index
`title`	`String`	User-editable title

swift

func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                   didFinishWith scan: VNDocumentCameraScan) {
    controller.dismiss(animated: true)

    for i in 0..<scan.pageCount {
        let pageImage = scan.imageOfPage(at: i)
        // Process with VNRecognizeTextRequest
    }
}

属性	类型	描述
`pageCount`	`Int`	扫描页数
`imageOfPage(at:)`	`UIImage`	获取指定索引的页面图像
`title`	`String`	用户可编辑的标题

swift

func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                   didFinishWith scan: VNDocumentCameraScan) {
    controller.dismiss(animated: true)

    for i in 0..<scan.pageCount {
        let pageImage = scan.imageOfPage(at: i)
        // 使用VNRecognizeTextRequest处理
    }
}

Document Analysis APIs

文档分析API

VNDetectDocumentSegmentationRequest

Availability: iOS 15+, macOS 12+

Detects document boundaries for custom camera UIs or post-processing.

swift

let request = VNDetectDocumentSegmentationRequest()
let handler = VNImageRequestHandler(ciImage: image)
try handler.perform([request])

guard let observation = request.results?.first as? VNRectangleObservation else {
    return  // No document found
}

// Get corner points (normalized)
let corners = [
    observation.topLeft,
    observation.topRight,
    observation.bottomLeft,
    observation.bottomRight
]

vs VNDetectRectanglesRequest:

Document: ML-based, trained specifically on documents
Rectangle: Edge-based, finds any quadrilateral

可用性：iOS 15+、macOS 12+

为自定义相机UI或后处理检测文档边界。

swift

let request = VNDetectDocumentSegmentationRequest()
let handler = VNImageRequestHandler(ciImage: image)
try handler.perform([request])

guard let observation = request.results?.first as? VNRectangleObservation else {
    return  // 未找到文档
}

// 获取角点（归一化）
let corners = [
    observation.topLeft,
    observation.topRight,
    observation.bottomLeft,
    observation.bottomRight
]

与VNDetectRectanglesRequest对比:

文档：基于机器学习，专门针对文档训练
矩形：基于边缘，查找任何四边形

RecognizeDocumentsRequest (iOS 26+)

RecognizeDocumentsRequest（iOS 26+）

Availability: iOS 26+, macOS 26+

Structured document understanding with semantic parsing.

可用性：iOS 26+、macOS 26+

结构化文档理解，带有语义解析。

Basic Usage

基本用法

swift

let request = RecognizeDocumentsRequest()
let observations = try await request.perform(on: imageData)

guard let document = observations.first?.document else {
    return
}

swift

let request = RecognizeDocumentsRequest()
let observations = try await request.perform(on: imageData)

guard let document = observations.first?.document else {
    return
}

DocumentObservation Hierarchy

DocumentObservation层次结构

DocumentObservation
└── document: DocumentObservation.Document
    ├── text: TextObservation
    ├── tables: [Container.Table]
    ├── lists: [Container.List]
    └── barcodes: [Container.Barcode]

DocumentObservation
└── document: DocumentObservation.Document
    ├── text: TextObservation
    ├── tables: [Container.Table]
    ├── lists: [Container.List]
    └── barcodes: [Container.Barcode]

Table Extraction

表格提取

swift

for table in document.tables {
    for row in table.rows {
        for cell in row {
            let text = cell.content.text.transcript
            let detectedData = cell.content.text.detectedData
        }
    }
}

swift

for table in document.tables {
    for row in table.rows {
        for cell in row {
            let text = cell.content.text.transcript
            let detectedData = cell.content.text.detectedData
        }
    }
}

Detected Data Types

检测到的数据类型

swift

for data in document.text.detectedData {
    switch data.match.details {
    case .emailAddress(let email):
        let address = email.emailAddress
    case .phoneNumber(let phone):
        let number = phone.phoneNumber
    case .link(let url):
        let link = url
    case .address(let address):
        let components = address
    case .date(let date):
        let dateValue = date
    default:
        break
    }
}

swift

for data in document.text.detectedData {
    switch data.match.details {
    case .emailAddress(let email):
        let address = email.emailAddress
    case .phoneNumber(let phone):
        let number = phone.phoneNumber
    case .link(let url):
        let link = url
    case .address(let address):
        let components = address
    case .date(let date):
        let dateValue = date
    default:
        break
    }
}

TextObservation Hierarchy

TextObservation层次结构

TextObservation
├── transcript: String
├── lines: [TextObservation.Line]
├── paragraphs: [TextObservation.Paragraph]
├── words: [TextObservation.Word]
└── detectedData: [DetectedDataObservation]

TextObservation
├── transcript: String
├── lines: [TextObservation.Line]
├── paragraphs: [TextObservation.Paragraph]
├── words: [TextObservation.Word]
└── detectedData: [DetectedDataObservation]

API Quick Reference

API快速参考

Subject Segmentation

主体分割

API	Platform	Purpose
`VNGenerateForegroundInstanceMaskRequest`	iOS 17+	Class-agnostic subject instances
`VNGeneratePersonInstanceMaskRequest`	iOS 17+	Up to 4 people separately
`VNGeneratePersonSegmentationRequest`	iOS 15+	All people (single mask)
`ImageAnalysisInteraction` (VisionKit)	iOS 16+	UI for subject lifting

API	平台	用途
`VNGenerateForegroundInstanceMaskRequest`	iOS 17+	类别无关的主体实例
`VNGeneratePersonInstanceMaskRequest`	iOS 17+	最多4个人的单独掩码
`VNGeneratePersonSegmentationRequest`	iOS 15+	所有人的单个掩码
`ImageAnalysisInteraction` (VisionKit)	iOS 16+	主体提取的UI

Pose Detection

姿态检测

API	Platform	Landmarks	Coordinates
`VNDetectHumanHandPoseRequest`	iOS 14+	21 per hand	2D normalized
`VNDetectHumanBodyPoseRequest`	iOS 14+	18 body joints	2D normalized
`VNDetectHumanBodyPose3DRequest`	iOS 17+	17 body joints	3D meters

API	平台	标记点	坐标
`VNDetectHumanHandPoseRequest`	iOS 14+	每只手21个	2D归一化
`VNDetectHumanBodyPoseRequest`	iOS 14+	18个身体关节	2D归一化
`VNDetectHumanBodyPose3DRequest`	iOS 17+	17个身体关节	3D米

Face & Person Detection

面部与人物检测

API	Platform	Purpose
`VNDetectFaceRectanglesRequest`	iOS 11+	Face bounding boxes
`VNDetectFaceLandmarksRequest`	iOS 11+	Face with detailed landmarks
`VNDetectHumanRectanglesRequest`	iOS 13+	Human torso bounding boxes

API	平台	用途
`VNDetectFaceRectanglesRequest`	iOS 11+	面部边界框
`VNDetectFaceLandmarksRequest`	iOS 11+	带详细标记点的面部
`VNDetectHumanRectanglesRequest`	iOS 13+	人物躯干边界框

Text & Barcode

文本与条形码

API	Platform	Purpose
`VNRecognizeTextRequest`	iOS 13+	Text recognition (OCR)
`VNDetectBarcodesRequest`	iOS 11+	Barcode/QR detection
`DataScannerViewController`	iOS 16+	Live camera scanner (text + barcodes)
`VNDocumentCameraViewController`	iOS 13+	Document scanning with perspective correction
`VNDetectDocumentSegmentationRequest`	iOS 15+	Programmatic document edge detection
`RecognizeDocumentsRequest`	iOS 26+	Structured document extraction

API	平台	用途
`VNRecognizeTextRequest`	iOS 13+	文本识别（OCR）
`VNDetectBarcodesRequest`	iOS 11+	条形码/二维码检测
`DataScannerViewController`	iOS 16+	实时相机扫描器（文本+条形码）
`VNDocumentCameraViewController`	iOS 13+	带透视校正的文档扫描
`VNDetectDocumentSegmentationRequest`	iOS 15+	程序化文档边缘检测
`RecognizeDocumentsRequest`	iOS 26+	结构化文档提取

Observation Types

观测类型

Observation	Returned By
`VNInstanceMaskObservation`	Foreground/person instance masks
`VNPixelBufferObservation`	Person segmentation (single mask)
`VNHumanHandPoseObservation`	Hand pose
`VNHumanBodyPoseObservation`	Body pose (2D)
`VNHumanBodyPose3DObservation`	Body pose (3D)
`VNFaceObservation`	Face detection/landmarks
`VNHumanObservation`	Human rectangles
`VNRecognizedTextObservation`	Text recognition
`VNBarcodeObservation`	Barcode detection
`VNRectangleObservation`	Document segmentation
`DocumentObservation`	Structured document (iOS 26+)

观测类型	由以下API返回
`VNInstanceMaskObservation`	前景/人物实例掩码
`VNPixelBufferObservation`	人物分割（单个掩码）
`VNHumanHandPoseObservation`	手部姿态
`VNHumanBodyPoseObservation`	身体姿态（2D）
`VNHumanBodyPose3DObservation`	身体姿态（3D）
`VNFaceObservation`	面部检测/标记点
`VNHumanObservation`	人物矩形
`VNRecognizedTextObservation`	文本识别
`VNBarcodeObservation`	条形码检测
`VNRectangleObservation`	文档分割
`DocumentObservation`	结构化文档（iOS 26+）

Resources

资源

WWDC: 2019-234, 2021-10041, 2022-10024, 2022-10025, 2025-272, 2023-10176, 2023-111241, 2023-10048, 2020-10653, 2020-10043, 2020-10099

Docs: /vision, /visionkit, /vision/vnrecognizetextrequest, /vision/vndetectbarcodesrequest

Skills: axiom-vision, axiom-vision-diag

WWDC：2019-234、2021-10041、2022-10024、2022-10025、2025-272、2023-10176、2023-111241、2023-10048、2020-10653、2020-10043、2020-10099

文档：/vision、/visionkit、/vision/vnrecognizetextrequest、/vision/vndetectbarcodesrequest

技能：axiom-vision、axiom-vision-diag