affinda

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Affinda — AI Document Processing Platform

Affinda — AI文档处理平台

Affinda extracts structured data from documents (invoices, resumes, receipts, contracts, and any custom document type) using machine learning. The API turns uploaded files into clean JSON. Over 250 million documents processed for 500+ organisations in 40 countries.
Full documentation: https://docs.affinda.com OpenAPI spec: https://api.affinda.com/static/v3/api_spec.yaml Support: support@affinda.com

Affinda利用机器学习从发票、简历、收据、合同及任何自定义类型的文档中提取结构化数据。API可将上传的文件转换为清晰的JSON格式。目前已为40个国家的500余家机构处理了超过2.5亿份文档。
完整文档:https://docs.affinda.com OpenAPI规范:https://api.affinda.com/static/v3/api_spec.yaml 支持:support@affinda.com

Core Concepts

核心概念

ConceptDescription
OrganizationTop-level account. Contains users, billing, document types, and workspaces.
WorkspaceLogical container for documents. Scopes permissions, webhooks, and processing settings.
Document TypeA model configuration defining how a specific kind of document is parsed (invoice, resume, custom).
DocumentAn uploaded file (PDF, image, DOCX, etc.) plus its extracted data and metadata.
The workflow is: Upload -> Pre-process -> Split -> Classify -> Extract -> Validate -> Export.

概念描述
Organization顶级账户,包含用户、账单、文档类型及工作区。
Workspace文档的逻辑容器,用于划分权限、Webhook及处理设置的作用范围。
Document Type模型配置,定义特定类型文档的解析方式(如发票、简历、自定义类型)。
Document上传的文件(PDF、图片、DOCX等)及其提取的数据和元数据。
工作流为:上传 -> 预处理 -> 拆分 -> 分类 -> 提取 -> 验证 -> 导出

API Basics

API基础

Base URLs

基础URL

RegionAPI Base URLApp URL
Australia (Global)
https://api.affinda.com
https://app.affinda.com
United States
https://api.us1.affinda.com
https://app.us1.affinda.com
European Union
https://api.eu1.affinda.com
https://app.eu1.affinda.com
Use the base URL matching the region where the user's account was created.
地区API基础URL应用URL
澳大利亚(全球)
https://api.affinda.com
https://app.affinda.com
美国
https://api.us1.affinda.com
https://app.us1.affinda.com
欧盟
https://api.eu1.affinda.com
https://app.eu1.affinda.com
请使用与用户账户创建地区匹配的基础URL。

Authentication

认证

All requests require a Bearer token:
Authorization: Bearer <API_KEY>
API keys are per-user, managed at Settings -> API Keys in the Affinda dashboard. Up to 3 keys per user. Keys can have custom names and expiry dates. A key is only visible once at creation -- store it securely.
所有请求均需携带Bearer令牌:
Authorization: Bearer <API_KEY>
API密钥按用户分配,可在Affinda控制台的“设置 -> API密钥”中管理。每个用户最多可创建3个密钥,密钥可自定义名称和过期日期。密钥仅在创建时可见,请妥善存储。

Rate Limits and File Constraints

速率限制与文件约束

  • High-priority queue: 30 documents/minute (exceeding returns
    429
    )
  • Low-priority queue: No submission limit (set
    lowPriority: true
    )
  • Max file size: 20 MB (5 MB for resumes)
  • Default page limit: 20 pages per document (can be increased on request)
  • Supported formats: PDF, DOC, DOCX, XLSX, ODT, RTF, TXT, HTML, PNG, JPG, TIFF, JPEG

  • 高优先级队列:每分钟30份文档(超出将返回
    429
    错误)
  • 低优先级队列:无提交限制(设置
    lowPriority: true
  • 最大文件大小:20 MB(简历为5 MB)
  • 默认页数限制:每份文档20页(可申请提高限制)
  • 支持格式:PDF、DOC、DOCX、XLSX、ODT、RTF、TXT、HTML、PNG、JPG、TIFF、JPEG

Client Libraries

客户端库

Python (recommended)

Python(推荐)

bash
pip install affinda
python
from pathlib import Path
from affinda import AffindaAPI, TokenCredential

credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)

with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(file=f, workspace="YOUR_WORKSPACE_ID")

print(doc.data)  # Extracted JSON
bash
pip install affinda
python
from pathlib import Path
from affinda import AffindaAPI, TokenCredential

credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)

with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(file=f, workspace="YOUR_WORKSPACE_ID")

print(doc.data)  # 提取的JSON数据

TypeScript / JavaScript (recommended)

TypeScript / JavaScript(推荐)

bash
npm install @affinda/affinda
typescript
import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";

const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);

const doc = await client.createDocument({
  file: fs.createReadStream("invoice.pdf"),
  workspace: "YOUR_WORKSPACE_ID",
});

console.log(doc.data); // Extracted JSON
bash
npm install @affinda/affinda
typescript
import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";

const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);

const doc = await client.createDocument({
  file: fs.createReadStream("invoice.pdf"),
  workspace: "YOUR_WORKSPACE_ID",
});

console.log(doc.data); // 提取的JSON数据

Other Libraries

其他库

Note: The .NET and Java libraries may lag behind the Python and TypeScript libraries in feature parity.
注意:.NET和Java库的功能可能落后于Python和TypeScript库。

Direct HTTP (cURL)

直接HTTP调用(cURL)

bash
curl -X POST https://api.affinda.com/v3/documents \
  -H "Authorization: Bearer $AFFINDA_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "workspace=YOUR_WORKSPACE_ID"

bash
curl -X POST https://api.affinda.com/v3/documents \
  -H "Authorization: Bearer $AFFINDA_API_KEY" \
  -F "file=@invoice.pdf" \
  -F "workspace=YOUR_WORKSPACE_ID"

Structured Outputs (Type-Safe Responses)

结构化输出(类型安全响应)

This is the recommended approach for building robust integrations. Affinda can generate typed models from your document type configuration, giving you auto-completion, validation, and type safety.
这是构建可靠集成的推荐方式。 Affinda可根据你的文档类型配置生成类型化模型,为你提供自动补全、验证和类型安全保障。

Python -- Pydantic Models

Python — Pydantic模型

Generate Pydantic v2 models that match your document type's field schema:
bash
undefined
生成与文档类型字段架构匹配的Pydantic v2模型:
bash
undefined

Set your API key (or export AFFINDA_API_KEY)

设置你的API密钥(或导出AFFINDA_API_KEY环境变量)

python -m affinda generate_models --workspace-id=YOUR_WORKSPACE_ID

This creates a `./affinda_models/` directory with one `.py` file per document type. Each file contains Pydantic `BaseModel` classes with all your configured fields as typed, optional attributes.

**Use the generated models when calling the API:**

```python
from pathlib import Path
from affinda import AffindaAPI, TokenCredential
from affinda_models.invoice import Invoice  # Generated model

credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)

with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(
        file=f,
        workspace="YOUR_WORKSPACE_ID",
        data_model=Invoice,  # Enables Pydantic validation
    )
python -m affinda generate_models --workspace-id=YOUR_WORKSPACE_ID

此命令会创建`./affinda_models/`目录,每个文档类型对应一个`.py`文件。每个文件包含Pydantic `BaseModel`类,其中所有配置字段均为带类型的可选属性。

**调用API时使用生成的模型:**

```python
from pathlib import Path
from affinda import AffindaAPI, TokenCredential
from affinda_models.invoice import Invoice  # 生成的模型

credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)

with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(
        file=f,
        workspace="YOUR_WORKSPACE_ID",
        data_model=Invoice,  # 启用Pydantic验证
    )

doc.parsed is a typed Invoice instance

doc.parsed是类型化的Invoice实例

print(doc.parsed.invoice_number) print(doc.parsed.total_amount)
print(doc.parsed.invoice_number) print(doc.parsed.total_amount)

doc.data is still available as raw JSON

doc.data仍可作为原始JSON数据使用

print(doc.data)

**Handling validation errors gracefully:**

```python
with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(
        file=f,
        workspace="YOUR_WORKSPACE_ID",
        data_model=Invoice,
        ignore_validation_errors=True,  # Don't raise on schema mismatch
    )

if doc.parsed:
    print(doc.parsed.invoice_number)  # Type-safe access
else:
    print("Validation failed, falling back to raw data")
    print(doc.data)
CLI options:
bash
python -m affinda generate_models --workspace-id=ID        # All types in a workspace
python -m affinda generate_models --document-type-id=ID    # Single document type
python -m affinda generate_models --organization-id=ID     # All types in an org
python -m affinda generate_models --output-dir=./my_models # Custom output path
python -m affinda generate_models --help                   # All options
print(doc.data)

**优雅处理验证错误:**

```python
with Path("invoice.pdf").open("rb") as f:
    doc = client.create_document(
        file=f,
        workspace="YOUR_WORKSPACE_ID",
        data_model=Invoice,
        ignore_validation_errors=True,  # 架构不匹配时不抛出异常
    )

if doc.parsed:
    print(doc.parsed.invoice_number)  # 类型安全访问
else:
    print("验证失败,回退到原始数据")
    print(doc.data)
CLI选项:
bash
python -m affinda generate_models --workspace-id=ID        # 工作区中的所有类型
python -m affinda generate_models --document-type-id=ID    # 单个文档类型
python -m affinda generate_models --organization-id=ID     # 组织中的所有类型
python -m affinda generate_models --output-dir=./my_models # 自定义输出路径
python -m affinda generate_models --help                   # 所有选项

TypeScript -- Generated Interfaces

TypeScript — 生成的接口

Generate TypeScript interfaces that match your document type's field schema:
bash
undefined
生成与文档类型字段架构匹配的TypeScript接口:
bash
undefined

Set your API key (or export AFFINDA_API_KEY)

设置你的API密钥(或导出AFFINDA_API_KEY环境变量)

npm exec affinda-generate-interfaces -- --workspace-id=YOUR_WORKSPACE_ID

This creates an `./affinda-interfaces/` directory with one `.ts` file per document type. Each file contains TypeScript interfaces with all your configured fields.

**Use the generated interfaces for type-safe access:**

```typescript
import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";
import { Invoice } from "./affinda-interfaces/Invoice";

const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);

const doc = await client.createDocument({
  file: fs.createReadStream("invoice.pdf"),
  workspace: "YOUR_WORKSPACE_ID",
});

const parsed = doc.data as Invoice;
console.log(parsed.invoiceNumber);  // Type-safe access
console.log(parsed.totalAmount);
CLI options:
bash
npm exec affinda-generate-interfaces -- --workspace-id=ID       # All types in workspace
npm exec affinda-generate-interfaces -- --document-type-id=ID   # Single document type
npm exec affinda-generate-interfaces -- --output-dir=./types    # Custom output path
npm exec affinda-generate-interfaces -- --help                  # All options
npm exec affinda-generate-interfaces -- --workspace-id=YOUR_WORKSPACE_ID

此命令会创建`./affinda-interfaces/`目录,每个文档类型对应一个`.ts`文件。每个文件包含TypeScript接口,其中包含所有配置字段。

**使用生成的接口实现类型安全访问:**

```typescript
import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";
import { Invoice } from "./affinda-interfaces/Invoice";

const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);

const doc = await client.createDocument({
  file: fs.createReadStream("invoice.pdf"),
  workspace: "YOUR_WORKSPACE_ID",
});

const parsed = doc.data as Invoice;
console.log(parsed.invoiceNumber);  # 类型安全访问
console.log(parsed.totalAmount);
CLI选项:
bash
npm exec affinda-generate-interfaces -- --workspace-id=ID       # 工作区中的所有类型
npm exec affinda-generate-interfaces -- --document-type-id=ID   # 单个文档类型
npm exec affinda-generate-interfaces -- --output-dir=./types    # 自定义输出路径
npm exec affinda-generate-interfaces -- --help                  # 所有选项

Why Use Structured Outputs?

为什么使用结构化输出?

  • Type safety: Catch field name typos and type mismatches at compile/lint time
  • Auto-completion: IDE support for all extracted fields
  • Validation: Pydantic automatically validates the API response structure
  • Schema-driven: Models stay in sync with your document type configuration -- regenerate after schema changes
  • Documentation as code: The generated models serve as living documentation of your extraction schema

  • 类型安全:在编译/代码检查阶段捕获字段名称拼写错误和类型不匹配问题
  • 自动补全:IDE支持所有提取字段的自动补全
  • 验证:Pydantic自动验证API响应结构
  • 架构驱动:模型与文档类型配置保持同步——架构变更后重新生成即可
  • 文档即代码:生成的模型可作为提取架构的活文档

Document Upload Options

文档上传选项

There are three patterns for submitting documents and retrieving results:
提交文档并获取结果有三种模式:

1. Synchronous (simplest)

1. 同步模式(最简单)

Upload and block until parsing completes. The response contains the extracted data.
python
doc = client.create_document(file=f, workspace="WORKSPACE_ID")
上传文档并阻塞直到解析完成,响应中包含提取的数据。
python
doc = client.create_document(file=f, workspace="WORKSPACE_ID")

wait defaults to True -- blocks until ready

wait默认值为True——阻塞直到处理完成

print(doc.data)

**Best for**: Interactive apps, low volume, quick prototyping.
**Limitation**: Can timeout on large or complex documents.
print(doc.data)

**最佳适用场景**:交互式应用、低数据量场景、快速原型开发。
**限制**:处理大型或复杂文档时可能超时。

2. Asynchronous with Polling

2. 异步模式+轮询

Upload with
wait=false
, receive a document ID, then poll
GET /documents/{id}
until
ready
is
true
.
python
doc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)
上传时设置
wait=false
,获取文档ID,然后轮询
GET /documents/{id}
直到
ready
变为
true
python
doc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)

doc.data is empty -- poll until ready

doc.data为空——轮询直到处理完成

doc = client.get_document(doc.meta.identifier)

**Best for**: Batch processing, large documents, high volume.
doc = client.get_document(doc.meta.identifier)

**最佳适用场景**:批量处理、大型文档、高数据量场景。

3. Asynchronous with Webhooks (recommended for production)

3. 异步模式+Webhook(生产环境推荐)

Upload the document, then receive a webhook notification when processing completes. This is the most efficient pattern for production systems.
python
undefined
上传文档,处理完成后接收Webhook通知。这是生产系统中最高效的模式。
python
undefined

1. Upload

1. 上传文档

doc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)
doc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)

2. Receive webhook at your endpoint when ready

2. 处理完成后,你的端点会收到Webhook通知

3. Fetch full data

3. 获取完整数据

doc = client.get_document(identifier_from_webhook)

**Best for**: Real-time workflows, event-driven architectures, production systems.

See the [Webhooks section](#webhooks) below for setup details.
doc = client.get_document(identifier_from_webhook)

**最佳适用场景**:实时工作流、事件驱动架构、生产系统。

设置详情请见下方的[Webhook部分](#webhooks)。

Upload Parameters

上传参数

ParameterTypeDescription
file
binaryThe document file. Mutually exclusive with
url
.
url
stringURL to download and process. Mutually exclusive with
file
.
workspace
stringWorkspace identifier (required).
documentType
stringDocument type identifier (optional -- enables skip-classification).
wait
boolean
true
(default): block until done.
false
: return immediately.
customIdentifier
stringYour internal ID for the document.
expiryTime
ISO-8601Auto-delete the document at this time.
rejectDuplicates
booleanReject if duplicate of existing document.
lowPriority
booleanRoute to low-priority queue (no rate limit).
compact
booleanReturn compact response (with
wait=true
).
deleteAfterParse
booleanDelete data after parsing (requires
wait=true
).
enableValidationTool
booleanMake document viewable in validation UI. Set
false
for speed.

参数类型描述
file
二进制文档文件,与
url
参数互斥。
url
字符串用于下载并处理的URL,与
file
参数互斥。
workspace
字符串工作区标识符(必填)。
documentType
字符串文档类型标识符(可选——启用后可跳过分类步骤)。
wait
布尔值
true
(默认):阻塞直到处理完成;
false
:立即返回。
customIdentifier
字符串你系统中的文档内部ID。
expiryTime
ISO-8601格式文档自动删除的时间。
rejectDuplicates
布尔值如果是现有文档的副本则拒绝上传。
lowPriority
布尔值路由到低优先级队列(无速率限制)。
compact
布尔值返回紧凑响应(仅当
wait=true
时有效)。
deleteAfterParse
布尔值解析完成后删除数据(需设置
wait=true
)。
enableValidationTool
布尔值允许在验证UI中查看文档。设置为
false
可提高处理速度。

Response Structure

响应结构

Each extracted field in the response includes metadata:
FieldDescription
raw
Raw extracted text before processing
parsed
Processed value after formatting and mapping
confidence
Overall confidence score (0-1)
classificationConfidence
Confidence the field was correctly classified
textExtractionConfidence
Confidence text was correctly extracted
isVerified
Whether the value has been validated (any means)
isClientVerified
Whether validated by a human
isAutoVerified
Whether auto-validated by rules
rectangle
Bounding box coordinates on the page
pageIndex
Which page the data appears on
Document-level metadata includes
ready
,
failed
,
language
,
pages
,
isOcrd
,
ocrConfidence
,
reviewUrl
,
isConfirmed
,
isRejected
,
isArchived
,
errorCode
, and
errorDetail
.

响应中的每个提取字段都包含元数据:
字段描述
raw
处理前提取的原始文本
parsed
格式化和映射后的处理值
confidence
整体置信度得分(0-1)
classificationConfidence
字段分类正确的置信度
textExtractionConfidence
文本提取正确的置信度
isVerified
该值是否已通过验证(无论通过何种方式)
isClientVerified
是否由人工验证
isAutoVerified
是否由规则自动验证
rectangle
字段在页面上的 bounding box 坐标
pageIndex
数据所在的页码
文档级元数据包括
ready
failed
language
pages
isOcrd
ocrConfidence
reviewUrl
isConfirmed
isRejected
isArchived
errorCode
errorDetail

Webhooks

Webhook

Affinda uses RESTHooks -- webhook subscriptions managed via REST API. Webhooks can be scoped to an organization or workspace.
Affinda使用RESTHooks——通过REST API管理的Webhook订阅。Webhook可作用于组织或工作区级别。

Available Events

可用事件

EventDescription
document.parse.completed
Parsing finished (succeeded or failed)
document.parse.succeeded
Parsing succeeded
document.parse.failed
Parsing failed
document.validate.completed
Document confirmed (manually or auto)
document.classify.completed
Classification finished
document.classify.succeeded
Classification succeeded
document.classify.failed
Classification failed
document.rejected
Document rejected
事件描述
document.parse.completed
解析完成(成功或失败)
document.parse.succeeded
解析成功
document.parse.failed
解析失败
document.validate.completed
文档已确认(人工或自动)
document.classify.completed
分类完成
document.classify.succeeded
分类成功
document.classify.failed
分类失败
document.rejected
文档被拒绝

Setup Flow

设置流程

  1. Subscribe --
    POST /v3/resthook_subscriptions
    with
    targetUrl
    ,
    event
    , and
    organization
    (or
    workspace
    ).
  2. Confirm -- Affinda sends a
    POST
    to your
    targetUrl
    with an
    X-Hook-Secret
    header. Respond with
    200
    , then call
    POST /v3/resthook_subscriptions/activate
    with that secret.
  3. Receive -- Affinda sends webhook payloads to your endpoint. Respond
    200
    to acknowledge.
  1. 订阅 —— 调用
    POST /v3/resthook_subscriptions
    ,携带
    targetUrl
    event
    organization
    (或
    workspace
    )参数。
  2. 确认 —— Affinda会向你的
    targetUrl
    发送
    POST
    请求,包含
    X-Hook-Secret
    请求头。返回
    200
    响应,然后携带该密钥调用
    POST /v3/resthook_subscriptions/activate
  3. 接收通知 —— Affinda会向你的端点发送Webhook负载,返回
    200
    以确认接收。

Signature Verification

签名验证

Enable payload signing via Organization Settings -> Webhook Signature Key. Incoming webhooks include an
X-Hook-Signature
header (
<timestamp>.<signature>
). Verify using HMAC-SHA256:
python
import hmac, hashlib, json, time

def verify_webhook(request, sig_key: bytes) -> bool:
    sig_header = request.headers["X-Hook-Signature"]
    timestamp, sig_received = sig_header.split(".")
    sig_calculated = hmac.new(sig_key, msg=request.body, digestmod=hashlib.sha256).hexdigest()

    sig_ok = hmac.compare_digest(sig_received, sig_calculated)
    body = json.loads(request.body)
    time_ok = (time.time() - body["timestamp"]) < 600  # 10 min window
    return sig_ok and time_ok
可在组织设置 -> Webhook签名密钥中启用负载签名。传入的Webhook包含
X-Hook-Signature
请求头(格式为
<timestamp>.<signature>
)。使用HMAC-SHA256进行验证:
python
import hmac, hashlib, json, time

def verify_webhook(request, sig_key: bytes) -> bool:
    sig_header = request.headers["X-Hook-Signature"]
    timestamp, sig_received = sig_header.split(".")
    sig_calculated = hmac.new(sig_key, msg=request.body, digestmod=hashlib.sha256).hexdigest()

    sig_ok = hmac.compare_digest(sig_received, sig_calculated)
    body = json.loads(request.body)
    time_ok = (time.time() - body["timestamp"]) < 600  # 10分钟时间窗口
    return sig_ok and time_ok

Webhook Payload

Webhook负载

The payload contains document metadata (not the full parsed data). Use the
identifier
to fetch full results:
json
{
  "id": "e3bd1942-...",
  "event": "document.parse.completed",
  "timestamp": 1665637107,
  "payload": {
    "identifier": "abcdXYZ",
    "ready": true,
    "failed": false,
    "fileName": "invoice.pdf",
    "workspace": { "identifier": "...", "name": "..." }
  }
}
负载包含文档元数据(而非完整的解析数据)。使用
identifier
获取完整结果:
json
{
  "id": "e3bd1942-...",
  "event": "document.parse.completed",
  "timestamp": 1665637107,
  "payload": {
    "identifier": "abcdXYZ",
    "ready": true,
    "failed": false,
    "fileName": "invoice.pdf",
    "workspace": { "identifier": "...", "name": "..." }
  }
}

Retry Behavior

重试机制

  • 200
    -- Success, delivery confirmed
  • 410
    -- Subscription auto-deleted (endpoint "gone")
  • Other 4xx/5xx -- Retried with exponential backoff for ~1 day

  • 200
    —— 成功,交付确认
  • 410
    —— 订阅自动删除(端点已“不存在”)
  • 其他4xx/5xx状态码 —— 指数退避重试约1天

Embedded Validation UI

嵌入式验证UI

Affinda provides a human-in-the-loop validation interface that can be embedded in your application via iframe. Each document response includes a
reviewUrl
-- a signed URL valid for 60 minutes.
Implementation pattern:
  1. Store only the Affinda document
    identifier
    in your system
  2. When a user needs to review, fetch a fresh
    reviewUrl
    via
    GET /documents/{id}
  3. Embed the URL in an iframe
  4. Do not persist the URL -- treat it as ephemeral
The UI supports custom theming (colors, fonts, border radius) in embedded mode. Contact Affinda to configure.

Affinda提供了人机协同的验证界面,可通过iframe嵌入你的应用。每个文档响应包含一个
reviewUrl
——有效期为60分钟的签名URL。
实现模式:
  1. 在你的系统中仅存储Affinda文档的
    identifier
  2. 当用户需要审核时,调用
    GET /documents/{id}
    获取最新的
    reviewUrl
  3. 将该URL嵌入iframe
  4. 请勿持久化该URL——将其视为临时链接
嵌入模式下支持自定义主题(颜色、字体、边框圆角)。请联系Affinda进行配置。

Key API Methods

核心API方法

Documents

文档相关

MethodEndpointDescription
POST
/v3/documents
Upload and parse a document
GET
/v3/documents/{id}
Retrieve a document and its data
PATCH
/v3/documents/{id}
Update document fields/status
DELETE
/v3/documents/{id}
Delete a document
GET
/v3/documents
List documents (with filtering)
GET
/v3/documents/{id}/redacted
Download redacted PDF
方法端点描述
POST
/v3/documents
上传并解析文档
GET
/v3/documents/{id}
获取文档及其数据
PATCH
/v3/documents/{id}
更新文档字段/状态
DELETE
/v3/documents/{id}
删除文档
GET
/v3/documents
列出文档(支持过滤)
GET
/v3/documents/{id}/redacted
下载已脱敏的PDF

Workspaces

工作区相关

MethodEndpointDescription
GET
/v3/workspaces
List workspaces
POST
/v3/workspaces
Create a workspace
GET
/v3/workspaces/{id}
Get workspace details
PATCH
/v3/workspaces/{id}
Update workspace
DELETE
/v3/workspaces/{id}
Delete workspace
方法端点描述
GET
/v3/workspaces
列出工作区
POST
/v3/workspaces
创建工作区
GET
/v3/workspaces/{id}
获取工作区详情
PATCH
/v3/workspaces/{id}
更新工作区
DELETE
/v3/workspaces/{id}
删除工作区

Annotations

注释相关

MethodEndpointDescription
GET
/v3/annotations
List annotations for a document
POST
/v3/annotations
Create an annotation
PATCH
/v3/annotations/{id}
Update an annotation
POST
/v3/annotations/batch_create
Batch create annotations
POST
/v3/annotations/batch_update
Batch update annotations
POST
/v3/annotations/batch_delete
Batch delete annotations
方法端点描述
GET
/v3/annotations
列出文档的注释
POST
/v3/annotations
创建注释
PATCH
/v3/annotations/{id}
更新注释
POST
/v3/annotations/batch_create
批量创建注释
POST
/v3/annotations/batch_update
批量更新注释
POST
/v3/annotations/batch_delete
批量删除注释

Webhooks

Webhook相关

MethodEndpointDescription
POST
/v3/resthook_subscriptions
Create subscription
POST
/v3/resthook_subscriptions/activate
Activate with X-Hook-Secret
GET
/v3/resthook_subscriptions
List subscriptions
PATCH
/v3/resthook_subscriptions/{id}
Update subscription
DELETE
/v3/resthook_subscriptions/{id}
Delete subscription

方法端点描述
POST
/v3/resthook_subscriptions
创建订阅
POST
/v3/resthook_subscriptions/activate
使用X-Hook-Secret激活订阅
GET
/v3/resthook_subscriptions
列出订阅
PATCH
/v3/resthook_subscriptions/{id}
更新订阅
DELETE
/v3/resthook_subscriptions/{id}
删除订阅

Common Integration Patterns

常见集成模式

Affinda supports six integration workflow patterns depending on where validation logic lives and where exceptions are handled:
PatternDescriptionWebhook Event
W1 -- No validationUpload -> get JSON. No rules, no human review.
document.parse.completed
W2 -- Client-side validationSame as W1; your system applies rules after export.
document.parse.completed
W3 -- Affinda validation logicAffinda validates automatically; no human review.
document.validate.completed
W4 -- Review all in AffindaHumans review every document in Affinda UI.
document.validate.completed
W5 -- Client rules + Affinda reviewYour rules, pushed back as warnings; flagged docs reviewed in Affinda.
document.parse.completed
then
document.validate.completed
W6 -- Full Affinda validationAffinda validates; exceptions reviewed in Affinda UI.
document.validate.completed
For most new integrations, W1 or W2 is the simplest starting point. W6 provides the most automation with human-in-the-loop for exceptions.

根据验证逻辑的位置和异常处理方式,Affinda支持六种集成工作流模式:
模式描述Webhook事件
W1 — 无验证上传 -> 获取JSON。无规则,无人工审核。
document.parse.completed
W2 — 客户端验证与W1相同;你的系统在导出后应用规则。
document.parse.completed
W3 — Affinda验证逻辑Affinda自动验证;无人工审核。
document.validate.completed
W4 — 所有文档在Affinda中审核人工在Affinda UI中审核每份文档。
document.validate.completed
W5 — 客户端规则 + Affinda审核你的规则以警告形式推送回Affinda;标记的文档在Affinda中审核。
document.parse.completed
随后触发
document.validate.completed
W6 — 完整Affinda验证Affinda自动验证;异常文档在Affinda UI中审核。
document.validate.completed
对于大多数新集成,W1或W2是最简单的起点。W6提供最高程度的自动化,同时支持人机协同处理异常情况。
完整解决方案设计指南:https://docs.affinda.com/academy/solution-design

Common Errors

常见错误

Error CodeMeaningResolution
duplicate_document_error
Document rejected as duplicateDisable "Reject duplicates" or upload unique files
no_text_found
No extractable textCheck file is not a photo of an object; try OCR
file_corrupted
File is corruptedRe-upload a valid file
file_too_large
Exceeds 20 MB limitReduce file size
invalid_file_type
Unsupported formatUse PDF, DOC, DOCX, XLSX, ODT, RTF, TXT, HTML, PNG, JPG, TIFF, JPEG
no_parsing_credits
Out of creditsPurchase more credits and reparse
password_protected
File is password-protectedRemove password and re-upload
document_classification_failed
No matching document typeCheck document type configuration or disable "Reject Documents"
capacity_exceeded
System capacity exceededWait and retry
parse_terminated
Exceeded timeoutContact Affinda for custom limits

错误代码含义解决方法
duplicate_document_error
文档因重复被拒绝禁用“拒绝重复文档”选项或上传唯一文件
no_text_found
未提取到可识别文本检查文件是否为实物照片;尝试OCR处理
file_corrupted
文件已损坏重新上传有效文件
file_too_large
超过20 MB大小限制减小文件大小
invalid_file_type
不支持的格式使用PDF、DOC、DOCX、XLSX、ODT、RTF、TXT、HTML、PNG、JPG、TIFF、JPEG格式
no_parsing_credits
解析额度耗尽购买更多额度后重新解析
password_protected
文件受密码保护移除密码后重新上传
document_classification_failed
无匹配的文档类型检查文档类型配置或禁用“拒绝文档”选项
capacity_exceeded
系统容量超出等待后重试
parse_terminated
超出超时限制联系Affinda申请自定义限制

Documentation Map

文档导航

Use this index to find detailed information on specific topics. Each link goes to the full documentation page.
使用此索引查找特定主题的详细信息。每个链接指向完整的文档页面。

Affinda Academy (Tutorials)

Affinda学院(教程)

Configuration Guide

配置指南

Overview & Workflow:
  • Workflow -- End-to-end document processing pipeline stages.
  • Glossary -- Platform terminology definitions.
  • Document Status -- For Review, Confirmed, Archived, Rejected states.
Ingestion & Pre-Processing:
  • Ingestion -- Upload methods: manual, email, API.
  • Email Upload -- Email-to-workspace document ingestion.
  • Pre-Processing -- Automated cleaning before extraction.
  • OCR -- OCR modes: Skip, Auto-detect, Partial, Full.
  • Duplicates -- Duplicate detection and rejection.
Splitting, Classification & Extraction:
Validation & Export:
概述与工作流:
  • 工作流 — 端到端文档处理管道阶段。
  • 术语表 — 平台术语定义。
  • 文档状态 — 待审核、已确认、已归档、已拒绝状态说明。
摄入与预处理:
  • 摄入 — 上传方式:手动、邮件、API。
  • 邮件上传 — 邮件到工作区的文档摄入。
  • 预处理 — 提取前的自动清理。
  • OCR — OCR模式:跳过、自动检测、部分、完整。
  • 重复项 — 重复项检测与拒绝。
拆分、分类与提取:
验证与导出:

API Reference

API参考

Resume Parsing Guide

简历解析指南

Additional Resources

其他资源