affinda
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAffinda — AI Document Processing Platform
Affinda — AI文档处理平台
Affinda extracts structured data from documents (invoices, resumes, receipts, contracts, and any custom document type) using machine learning. The API turns uploaded files into clean JSON. Over 250 million documents processed for 500+ organisations in 40 countries.
Full documentation: https://docs.affinda.com
OpenAPI spec: https://api.affinda.com/static/v3/api_spec.yaml
Support: support@affinda.com
Affinda利用机器学习从发票、简历、收据、合同及任何自定义类型的文档中提取结构化数据。API可将上传的文件转换为清晰的JSON格式。目前已为40个国家的500余家机构处理了超过2.5亿份文档。
完整文档:https://docs.affinda.com
OpenAPI规范:https://api.affinda.com/static/v3/api_spec.yaml
支持:support@affinda.com
Core Concepts
核心概念
| Concept | Description |
|---|---|
| Organization | Top-level account. Contains users, billing, document types, and workspaces. |
| Workspace | Logical container for documents. Scopes permissions, webhooks, and processing settings. |
| Document Type | A model configuration defining how a specific kind of document is parsed (invoice, resume, custom). |
| Document | An uploaded file (PDF, image, DOCX, etc.) plus its extracted data and metadata. |
The workflow is: Upload -> Pre-process -> Split -> Classify -> Extract -> Validate -> Export.
| 概念 | 描述 |
|---|---|
| Organization | 顶级账户,包含用户、账单、文档类型及工作区。 |
| Workspace | 文档的逻辑容器,用于划分权限、Webhook及处理设置的作用范围。 |
| Document Type | 模型配置,定义特定类型文档的解析方式(如发票、简历、自定义类型)。 |
| Document | 上传的文件(PDF、图片、DOCX等)及其提取的数据和元数据。 |
工作流为:上传 -> 预处理 -> 拆分 -> 分类 -> 提取 -> 验证 -> 导出。
API Basics
API基础
Base URLs
基础URL
| Region | API Base URL | App URL |
|---|---|---|
| Australia (Global) | | |
| United States | | |
| European Union | | |
Use the base URL matching the region where the user's account was created.
| 地区 | API基础URL | 应用URL |
|---|---|---|
| 澳大利亚(全球) | | |
| 美国 | | |
| 欧盟 | | |
请使用与用户账户创建地区匹配的基础URL。
Authentication
认证
All requests require a Bearer token:
Authorization: Bearer <API_KEY>API keys are per-user, managed at Settings -> API Keys in the Affinda dashboard. Up to 3 keys per user. Keys can have custom names and expiry dates. A key is only visible once at creation -- store it securely.
所有请求均需携带Bearer令牌:
Authorization: Bearer <API_KEY>API密钥按用户分配,可在Affinda控制台的“设置 -> API密钥”中管理。每个用户最多可创建3个密钥,密钥可自定义名称和过期日期。密钥仅在创建时可见,请妥善存储。
Rate Limits and File Constraints
速率限制与文件约束
- High-priority queue: 30 documents/minute (exceeding returns )
429 - Low-priority queue: No submission limit (set )
lowPriority: true - Max file size: 20 MB (5 MB for resumes)
- Default page limit: 20 pages per document (can be increased on request)
- Supported formats: PDF, DOC, DOCX, XLSX, ODT, RTF, TXT, HTML, PNG, JPG, TIFF, JPEG
- 高优先级队列:每分钟30份文档(超出将返回错误)
429 - 低优先级队列:无提交限制(设置)
lowPriority: true - 最大文件大小:20 MB(简历为5 MB)
- 默认页数限制:每份文档20页(可申请提高限制)
- 支持格式:PDF、DOC、DOCX、XLSX、ODT、RTF、TXT、HTML、PNG、JPG、TIFF、JPEG
Client Libraries
客户端库
Python (recommended)
Python(推荐)
bash
pip install affindapython
from pathlib import Path
from affinda import AffindaAPI, TokenCredential
credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)
with Path("invoice.pdf").open("rb") as f:
doc = client.create_document(file=f, workspace="YOUR_WORKSPACE_ID")
print(doc.data) # Extracted JSONbash
pip install affindapython
from pathlib import Path
from affinda import AffindaAPI, TokenCredential
credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)
with Path("invoice.pdf").open("rb") as f:
doc = client.create_document(file=f, workspace="YOUR_WORKSPACE_ID")
print(doc.data) # 提取的JSON数据TypeScript / JavaScript (recommended)
TypeScript / JavaScript(推荐)
bash
npm install @affinda/affindatypescript
import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";
const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);
const doc = await client.createDocument({
file: fs.createReadStream("invoice.pdf"),
workspace: "YOUR_WORKSPACE_ID",
});
console.log(doc.data); // Extracted JSONbash
npm install @affinda/affindatypescript
import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";
const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);
const doc = await client.createDocument({
file: fs.createReadStream("invoice.pdf"),
workspace: "YOUR_WORKSPACE_ID",
});
console.log(doc.data); // 提取的JSON数据Other Libraries
其他库
- .NET: -- GitHub
dotnet add package Affinda.API - Java: Maven repository -- GitHub
Note: The .NET and Java libraries may lag behind the Python and TypeScript libraries in feature parity.
Direct HTTP (cURL)
直接HTTP调用(cURL)
bash
curl -X POST https://api.affinda.com/v3/documents \
-H "Authorization: Bearer $AFFINDA_API_KEY" \
-F "file=@invoice.pdf" \
-F "workspace=YOUR_WORKSPACE_ID"bash
curl -X POST https://api.affinda.com/v3/documents \
-H "Authorization: Bearer $AFFINDA_API_KEY" \
-F "file=@invoice.pdf" \
-F "workspace=YOUR_WORKSPACE_ID"Structured Outputs (Type-Safe Responses)
结构化输出(类型安全响应)
This is the recommended approach for building robust integrations. Affinda can generate typed models from your document type configuration, giving you auto-completion, validation, and type safety.
这是构建可靠集成的推荐方式。 Affinda可根据你的文档类型配置生成类型化模型,为你提供自动补全、验证和类型安全保障。
Python -- Pydantic Models
Python — Pydantic模型
Generate Pydantic v2 models that match your document type's field schema:
bash
undefined生成与文档类型字段架构匹配的Pydantic v2模型:
bash
undefinedSet your API key (or export AFFINDA_API_KEY)
设置你的API密钥(或导出AFFINDA_API_KEY环境变量)
python -m affinda generate_models --workspace-id=YOUR_WORKSPACE_ID
This creates a `./affinda_models/` directory with one `.py` file per document type. Each file contains Pydantic `BaseModel` classes with all your configured fields as typed, optional attributes.
**Use the generated models when calling the API:**
```python
from pathlib import Path
from affinda import AffindaAPI, TokenCredential
from affinda_models.invoice import Invoice # Generated model
credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)
with Path("invoice.pdf").open("rb") as f:
doc = client.create_document(
file=f,
workspace="YOUR_WORKSPACE_ID",
data_model=Invoice, # Enables Pydantic validation
)python -m affinda generate_models --workspace-id=YOUR_WORKSPACE_ID
此命令会创建`./affinda_models/`目录,每个文档类型对应一个`.py`文件。每个文件包含Pydantic `BaseModel`类,其中所有配置字段均为带类型的可选属性。
**调用API时使用生成的模型:**
```python
from pathlib import Path
from affinda import AffindaAPI, TokenCredential
from affinda_models.invoice import Invoice # 生成的模型
credential = TokenCredential(token="YOUR_API_KEY")
client = AffindaAPI(credential=credential)
with Path("invoice.pdf").open("rb") as f:
doc = client.create_document(
file=f,
workspace="YOUR_WORKSPACE_ID",
data_model=Invoice, # 启用Pydantic验证
)doc.parsed is a typed Invoice instance
doc.parsed是类型化的Invoice实例
print(doc.parsed.invoice_number)
print(doc.parsed.total_amount)
print(doc.parsed.invoice_number)
print(doc.parsed.total_amount)
doc.data is still available as raw JSON
doc.data仍可作为原始JSON数据使用
print(doc.data)
**Handling validation errors gracefully:**
```python
with Path("invoice.pdf").open("rb") as f:
doc = client.create_document(
file=f,
workspace="YOUR_WORKSPACE_ID",
data_model=Invoice,
ignore_validation_errors=True, # Don't raise on schema mismatch
)
if doc.parsed:
print(doc.parsed.invoice_number) # Type-safe access
else:
print("Validation failed, falling back to raw data")
print(doc.data)CLI options:
bash
python -m affinda generate_models --workspace-id=ID # All types in a workspace
python -m affinda generate_models --document-type-id=ID # Single document type
python -m affinda generate_models --organization-id=ID # All types in an org
python -m affinda generate_models --output-dir=./my_models # Custom output path
python -m affinda generate_models --help # All optionsprint(doc.data)
**优雅处理验证错误:**
```python
with Path("invoice.pdf").open("rb") as f:
doc = client.create_document(
file=f,
workspace="YOUR_WORKSPACE_ID",
data_model=Invoice,
ignore_validation_errors=True, # 架构不匹配时不抛出异常
)
if doc.parsed:
print(doc.parsed.invoice_number) # 类型安全访问
else:
print("验证失败,回退到原始数据")
print(doc.data)CLI选项:
bash
python -m affinda generate_models --workspace-id=ID # 工作区中的所有类型
python -m affinda generate_models --document-type-id=ID # 单个文档类型
python -m affinda generate_models --organization-id=ID # 组织中的所有类型
python -m affinda generate_models --output-dir=./my_models # 自定义输出路径
python -m affinda generate_models --help # 所有选项TypeScript -- Generated Interfaces
TypeScript — 生成的接口
Generate TypeScript interfaces that match your document type's field schema:
bash
undefined生成与文档类型字段架构匹配的TypeScript接口:
bash
undefinedSet your API key (or export AFFINDA_API_KEY)
设置你的API密钥(或导出AFFINDA_API_KEY环境变量)
npm exec affinda-generate-interfaces -- --workspace-id=YOUR_WORKSPACE_ID
This creates an `./affinda-interfaces/` directory with one `.ts` file per document type. Each file contains TypeScript interfaces with all your configured fields.
**Use the generated interfaces for type-safe access:**
```typescript
import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";
import { Invoice } from "./affinda-interfaces/Invoice";
const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);
const doc = await client.createDocument({
file: fs.createReadStream("invoice.pdf"),
workspace: "YOUR_WORKSPACE_ID",
});
const parsed = doc.data as Invoice;
console.log(parsed.invoiceNumber); // Type-safe access
console.log(parsed.totalAmount);CLI options:
bash
npm exec affinda-generate-interfaces -- --workspace-id=ID # All types in workspace
npm exec affinda-generate-interfaces -- --document-type-id=ID # Single document type
npm exec affinda-generate-interfaces -- --output-dir=./types # Custom output path
npm exec affinda-generate-interfaces -- --help # All optionsnpm exec affinda-generate-interfaces -- --workspace-id=YOUR_WORKSPACE_ID
此命令会创建`./affinda-interfaces/`目录,每个文档类型对应一个`.ts`文件。每个文件包含TypeScript接口,其中包含所有配置字段。
**使用生成的接口实现类型安全访问:**
```typescript
import { AffindaAPI, AffindaCredential } from "@affinda/affinda";
import * as fs from "fs";
import { Invoice } from "./affinda-interfaces/Invoice";
const credential = new AffindaCredential("YOUR_API_KEY");
const client = new AffindaAPI(credential);
const doc = await client.createDocument({
file: fs.createReadStream("invoice.pdf"),
workspace: "YOUR_WORKSPACE_ID",
});
const parsed = doc.data as Invoice;
console.log(parsed.invoiceNumber); # 类型安全访问
console.log(parsed.totalAmount);CLI选项:
bash
npm exec affinda-generate-interfaces -- --workspace-id=ID # 工作区中的所有类型
npm exec affinda-generate-interfaces -- --document-type-id=ID # 单个文档类型
npm exec affinda-generate-interfaces -- --output-dir=./types # 自定义输出路径
npm exec affinda-generate-interfaces -- --help # 所有选项Why Use Structured Outputs?
为什么使用结构化输出?
- Type safety: Catch field name typos and type mismatches at compile/lint time
- Auto-completion: IDE support for all extracted fields
- Validation: Pydantic automatically validates the API response structure
- Schema-driven: Models stay in sync with your document type configuration -- regenerate after schema changes
- Documentation as code: The generated models serve as living documentation of your extraction schema
- 类型安全:在编译/代码检查阶段捕获字段名称拼写错误和类型不匹配问题
- 自动补全:IDE支持所有提取字段的自动补全
- 验证:Pydantic自动验证API响应结构
- 架构驱动:模型与文档类型配置保持同步——架构变更后重新生成即可
- 文档即代码:生成的模型可作为提取架构的活文档
Document Upload Options
文档上传选项
There are three patterns for submitting documents and retrieving results:
提交文档并获取结果有三种模式:
1. Synchronous (simplest)
1. 同步模式(最简单)
Upload and block until parsing completes. The response contains the extracted data.
python
doc = client.create_document(file=f, workspace="WORKSPACE_ID")上传文档并阻塞直到解析完成,响应中包含提取的数据。
python
doc = client.create_document(file=f, workspace="WORKSPACE_ID")wait defaults to True -- blocks until ready
wait默认值为True——阻塞直到处理完成
print(doc.data)
**Best for**: Interactive apps, low volume, quick prototyping.
**Limitation**: Can timeout on large or complex documents.print(doc.data)
**最佳适用场景**:交互式应用、低数据量场景、快速原型开发。
**限制**:处理大型或复杂文档时可能超时。2. Asynchronous with Polling
2. 异步模式+轮询
Upload with , receive a document ID, then poll until is .
wait=falseGET /documents/{id}readytruepython
doc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)上传时设置,获取文档ID,然后轮询直到变为。
wait=falseGET /documents/{id}readytruepython
doc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)doc.data is empty -- poll until ready
doc.data为空——轮询直到处理完成
doc = client.get_document(doc.meta.identifier)
**Best for**: Batch processing, large documents, high volume.doc = client.get_document(doc.meta.identifier)
**最佳适用场景**:批量处理、大型文档、高数据量场景。3. Asynchronous with Webhooks (recommended for production)
3. 异步模式+Webhook(生产环境推荐)
Upload the document, then receive a webhook notification when processing completes. This is the most efficient pattern for production systems.
python
undefined上传文档,处理完成后接收Webhook通知。这是生产系统中最高效的模式。
python
undefined1. Upload
1. 上传文档
doc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)
doc = client.create_document(file=f, workspace="WORKSPACE_ID", wait=False)
2. Receive webhook at your endpoint when ready
2. 处理完成后,你的端点会收到Webhook通知
3. Fetch full data
3. 获取完整数据
doc = client.get_document(identifier_from_webhook)
**Best for**: Real-time workflows, event-driven architectures, production systems.
See the [Webhooks section](#webhooks) below for setup details.doc = client.get_document(identifier_from_webhook)
**最佳适用场景**:实时工作流、事件驱动架构、生产系统。
设置详情请见下方的[Webhook部分](#webhooks)。Upload Parameters
上传参数
| Parameter | Type | Description |
|---|---|---|
| binary | The document file. Mutually exclusive with |
| string | URL to download and process. Mutually exclusive with |
| string | Workspace identifier (required). |
| string | Document type identifier (optional -- enables skip-classification). |
| boolean | |
| string | Your internal ID for the document. |
| ISO-8601 | Auto-delete the document at this time. |
| boolean | Reject if duplicate of existing document. |
| boolean | Route to low-priority queue (no rate limit). |
| boolean | Return compact response (with |
| boolean | Delete data after parsing (requires |
| boolean | Make document viewable in validation UI. Set |
| 参数 | 类型 | 描述 |
|---|---|---|
| 二进制 | 文档文件,与 |
| 字符串 | 用于下载并处理的URL,与 |
| 字符串 | 工作区标识符(必填)。 |
| 字符串 | 文档类型标识符(可选——启用后可跳过分类步骤)。 |
| 布尔值 | |
| 字符串 | 你系统中的文档内部ID。 |
| ISO-8601格式 | 文档自动删除的时间。 |
| 布尔值 | 如果是现有文档的副本则拒绝上传。 |
| 布尔值 | 路由到低优先级队列(无速率限制)。 |
| 布尔值 | 返回紧凑响应(仅当 |
| 布尔值 | 解析完成后删除数据(需设置 |
| 布尔值 | 允许在验证UI中查看文档。设置为 |
Response Structure
响应结构
Each extracted field in the response includes metadata:
| Field | Description |
|---|---|
| Raw extracted text before processing |
| Processed value after formatting and mapping |
| Overall confidence score (0-1) |
| Confidence the field was correctly classified |
| Confidence text was correctly extracted |
| Whether the value has been validated (any means) |
| Whether validated by a human |
| Whether auto-validated by rules |
| Bounding box coordinates on the page |
| Which page the data appears on |
Document-level metadata includes , , , , , , , , , , , and .
readyfailedlanguagepagesisOcrdocrConfidencereviewUrlisConfirmedisRejectedisArchivederrorCodeerrorDetailFull metadata reference: https://docs.affinda.com/reference/metadata
响应中的每个提取字段都包含元数据:
| 字段 | 描述 |
|---|---|
| 处理前提取的原始文本 |
| 格式化和映射后的处理值 |
| 整体置信度得分(0-1) |
| 字段分类正确的置信度 |
| 文本提取正确的置信度 |
| 该值是否已通过验证(无论通过何种方式) |
| 是否由人工验证 |
| 是否由规则自动验证 |
| 字段在页面上的 bounding box 坐标 |
| 数据所在的页码 |
文档级元数据包括、、、、、、、、、、和。
readyfailedlanguagepagesisOcrdocrConfidencereviewUrlisConfirmedisRejectedisArchivederrorCodeerrorDetailWebhooks
Webhook
Affinda uses RESTHooks -- webhook subscriptions managed via REST API. Webhooks can be scoped to an organization or workspace.
Affinda使用RESTHooks——通过REST API管理的Webhook订阅。Webhook可作用于组织或工作区级别。
Available Events
可用事件
| Event | Description |
|---|---|
| Parsing finished (succeeded or failed) |
| Parsing succeeded |
| Parsing failed |
| Document confirmed (manually or auto) |
| Classification finished |
| Classification succeeded |
| Classification failed |
| Document rejected |
| 事件 | 描述 |
|---|---|
| 解析完成(成功或失败) |
| 解析成功 |
| 解析失败 |
| 文档已确认(人工或自动) |
| 分类完成 |
| 分类成功 |
| 分类失败 |
| 文档被拒绝 |
Setup Flow
设置流程
- Subscribe -- with
POST /v3/resthook_subscriptions,targetUrl, andevent(ororganization).workspace - Confirm -- Affinda sends a to your
POSTwith antargetUrlheader. Respond withX-Hook-Secret, then call200with that secret.POST /v3/resthook_subscriptions/activate - Receive -- Affinda sends webhook payloads to your endpoint. Respond to acknowledge.
200
- 订阅 —— 调用,携带
POST /v3/resthook_subscriptions、targetUrl和event(或organization)参数。workspace - 确认 —— Affinda会向你的发送
targetUrl请求,包含POST请求头。返回X-Hook-Secret响应,然后携带该密钥调用200。POST /v3/resthook_subscriptions/activate - 接收通知 —— Affinda会向你的端点发送Webhook负载,返回以确认接收。
200
Signature Verification
签名验证
Enable payload signing via Organization Settings -> Webhook Signature Key. Incoming webhooks include an header (). Verify using HMAC-SHA256:
X-Hook-Signature<timestamp>.<signature>python
import hmac, hashlib, json, time
def verify_webhook(request, sig_key: bytes) -> bool:
sig_header = request.headers["X-Hook-Signature"]
timestamp, sig_received = sig_header.split(".")
sig_calculated = hmac.new(sig_key, msg=request.body, digestmod=hashlib.sha256).hexdigest()
sig_ok = hmac.compare_digest(sig_received, sig_calculated)
body = json.loads(request.body)
time_ok = (time.time() - body["timestamp"]) < 600 # 10 min window
return sig_ok and time_ok可在组织设置 -> Webhook签名密钥中启用负载签名。传入的Webhook包含请求头(格式为)。使用HMAC-SHA256进行验证:
X-Hook-Signature<timestamp>.<signature>python
import hmac, hashlib, json, time
def verify_webhook(request, sig_key: bytes) -> bool:
sig_header = request.headers["X-Hook-Signature"]
timestamp, sig_received = sig_header.split(".")
sig_calculated = hmac.new(sig_key, msg=request.body, digestmod=hashlib.sha256).hexdigest()
sig_ok = hmac.compare_digest(sig_received, sig_calculated)
body = json.loads(request.body)
time_ok = (time.time() - body["timestamp"]) < 600 # 10分钟时间窗口
return sig_ok and time_okWebhook Payload
Webhook负载
The payload contains document metadata (not the full parsed data). Use the to fetch full results:
identifierjson
{
"id": "e3bd1942-...",
"event": "document.parse.completed",
"timestamp": 1665637107,
"payload": {
"identifier": "abcdXYZ",
"ready": true,
"failed": false,
"fileName": "invoice.pdf",
"workspace": { "identifier": "...", "name": "..." }
}
}负载包含文档元数据(而非完整的解析数据)。使用获取完整结果:
identifierjson
{
"id": "e3bd1942-...",
"event": "document.parse.completed",
"timestamp": 1665637107,
"payload": {
"identifier": "abcdXYZ",
"ready": true,
"failed": false,
"fileName": "invoice.pdf",
"workspace": { "identifier": "...", "name": "..." }
}
}Retry Behavior
重试机制
- -- Success, delivery confirmed
200 - -- Subscription auto-deleted (endpoint "gone")
410 - Other 4xx/5xx -- Retried with exponential backoff for ~1 day
Full webhook docs: https://docs.affinda.com/reference/webhooks
- —— 成功,交付确认
200 - —— 订阅自动删除(端点已“不存在”)
410 - 其他4xx/5xx状态码 —— 指数退避重试约1天
完整Webhook文档:https://docs.affinda.com/reference/webhooks
Embedded Validation UI
嵌入式验证UI
Affinda provides a human-in-the-loop validation interface that can be embedded in your application via iframe. Each document response includes a -- a signed URL valid for 60 minutes.
reviewUrlImplementation pattern:
- Store only the Affinda document in your system
identifier - When a user needs to review, fetch a fresh via
reviewUrlGET /documents/{id} - Embed the URL in an iframe
- Do not persist the URL -- treat it as ephemeral
The UI supports custom theming (colors, fonts, border radius) in embedded mode. Contact Affinda to configure.
Full embedded docs: https://docs.affinda.com/reference/embedded
Affinda提供了人机协同的验证界面,可通过iframe嵌入你的应用。每个文档响应包含一个——有效期为60分钟的签名URL。
reviewUrl实现模式:
- 在你的系统中仅存储Affinda文档的
identifier - 当用户需要审核时,调用获取最新的
GET /documents/{id}reviewUrl - 将该URL嵌入iframe
- 请勿持久化该URL——将其视为临时链接
嵌入模式下支持自定义主题(颜色、字体、边框圆角)。请联系Affinda进行配置。
Key API Methods
核心API方法
Documents
文档相关
| Method | Endpoint | Description |
|---|---|---|
| POST | | Upload and parse a document |
| GET | | Retrieve a document and its data |
| PATCH | | Update document fields/status |
| DELETE | | Delete a document |
| GET | | List documents (with filtering) |
| GET | | Download redacted PDF |
| 方法 | 端点 | 描述 |
|---|---|---|
| POST | | 上传并解析文档 |
| GET | | 获取文档及其数据 |
| PATCH | | 更新文档字段/状态 |
| DELETE | | 删除文档 |
| GET | | 列出文档(支持过滤) |
| GET | | 下载已脱敏的PDF |
Workspaces
工作区相关
| Method | Endpoint | Description |
|---|---|---|
| GET | | List workspaces |
| POST | | Create a workspace |
| GET | | Get workspace details |
| PATCH | | Update workspace |
| DELETE | | Delete workspace |
| 方法 | 端点 | 描述 |
|---|---|---|
| GET | | 列出工作区 |
| POST | | 创建工作区 |
| GET | | 获取工作区详情 |
| PATCH | | 更新工作区 |
| DELETE | | 删除工作区 |
Annotations
注释相关
| Method | Endpoint | Description |
|---|---|---|
| GET | | List annotations for a document |
| POST | | Create an annotation |
| PATCH | | Update an annotation |
| POST | | Batch create annotations |
| POST | | Batch update annotations |
| POST | | Batch delete annotations |
| 方法 | 端点 | 描述 |
|---|---|---|
| GET | | 列出文档的注释 |
| POST | | 创建注释 |
| PATCH | | 更新注释 |
| POST | | 批量创建注释 |
| POST | | 批量更新注释 |
| POST | | 批量删除注释 |
Webhooks
Webhook相关
| Method | Endpoint | Description |
|---|---|---|
| POST | | Create subscription |
| POST | | Activate with X-Hook-Secret |
| GET | | List subscriptions |
| PATCH | | Update subscription |
| DELETE | | Delete subscription |
Full API reference: https://docs.affinda.com/reference/getting-started
OpenAPI spec: https://api.affinda.com/static/v3/api_spec.yaml
| 方法 | 端点 | 描述 |
|---|---|---|
| POST | | 创建订阅 |
| POST | | 使用X-Hook-Secret激活订阅 |
| GET | | 列出订阅 |
| PATCH | | 更新订阅 |
| DELETE | | 删除订阅 |
完整API参考:https://docs.affinda.com/reference/getting-started
OpenAPI规范:https://api.affinda.com/static/v3/api_spec.yaml
Common Integration Patterns
常见集成模式
Affinda supports six integration workflow patterns depending on where validation logic lives and where exceptions are handled:
| Pattern | Description | Webhook Event |
|---|---|---|
| W1 -- No validation | Upload -> get JSON. No rules, no human review. | |
| W2 -- Client-side validation | Same as W1; your system applies rules after export. | |
| W3 -- Affinda validation logic | Affinda validates automatically; no human review. | |
| W4 -- Review all in Affinda | Humans review every document in Affinda UI. | |
| W5 -- Client rules + Affinda review | Your rules, pushed back as warnings; flagged docs reviewed in Affinda. | |
| W6 -- Full Affinda validation | Affinda validates; exceptions reviewed in Affinda UI. | |
For most new integrations, W1 or W2 is the simplest starting point. W6 provides the most automation with human-in-the-loop for exceptions.
Full solution design guide: https://docs.affinda.com/academy/solution-design
根据验证逻辑的位置和异常处理方式,Affinda支持六种集成工作流模式:
| 模式 | 描述 | Webhook事件 |
|---|---|---|
| W1 — 无验证 | 上传 -> 获取JSON。无规则,无人工审核。 | |
| W2 — 客户端验证 | 与W1相同;你的系统在导出后应用规则。 | |
| W3 — Affinda验证逻辑 | Affinda自动验证;无人工审核。 | |
| W4 — 所有文档在Affinda中审核 | 人工在Affinda UI中审核每份文档。 | |
| W5 — 客户端规则 + Affinda审核 | 你的规则以警告形式推送回Affinda;标记的文档在Affinda中审核。 | |
| W6 — 完整Affinda验证 | Affinda自动验证;异常文档在Affinda UI中审核。 | |
对于大多数新集成,W1或W2是最简单的起点。W6提供最高程度的自动化,同时支持人机协同处理异常情况。
Common Errors
常见错误
| Error Code | Meaning | Resolution |
|---|---|---|
| Document rejected as duplicate | Disable "Reject duplicates" or upload unique files |
| No extractable text | Check file is not a photo of an object; try OCR |
| File is corrupted | Re-upload a valid file |
| Exceeds 20 MB limit | Reduce file size |
| Unsupported format | Use PDF, DOC, DOCX, XLSX, ODT, RTF, TXT, HTML, PNG, JPG, TIFF, JPEG |
| Out of credits | Purchase more credits and reparse |
| File is password-protected | Remove password and re-upload |
| No matching document type | Check document type configuration or disable "Reject Documents" |
| System capacity exceeded | Wait and retry |
| Exceeded timeout | Contact Affinda for custom limits |
Full error reference: https://docs.affinda.com/error-glossary
| 错误代码 | 含义 | 解决方法 |
|---|---|---|
| 文档因重复被拒绝 | 禁用“拒绝重复文档”选项或上传唯一文件 |
| 未提取到可识别文本 | 检查文件是否为实物照片;尝试OCR处理 |
| 文件已损坏 | 重新上传有效文件 |
| 超过20 MB大小限制 | 减小文件大小 |
| 不支持的格式 | 使用PDF、DOC、DOCX、XLSX、ODT、RTF、TXT、HTML、PNG、JPG、TIFF、JPEG格式 |
| 解析额度耗尽 | 购买更多额度后重新解析 |
| 文件受密码保护 | 移除密码后重新上传 |
| 无匹配的文档类型 | 检查文档类型配置或禁用“拒绝文档”选项 |
| 系统容量超出 | 等待后重试 |
| 超出超时限制 | 联系Affinda申请自定义限制 |
Documentation Map
文档导航
Use this index to find detailed information on specific topics. Each link goes to the full documentation page.
使用此索引查找特定主题的详细信息。每个链接指向完整的文档页面。
Affinda Academy (Tutorials)
Affinda学院(教程)
- Getting Started -- Core concepts: organizations, workspaces, document types, statuses, and the processing workflow.
- Creating a New Model -- Step-by-step guide to creating extraction models from scratch.
- Improving Accuracy -- Strategies for 99%+ accuracy via model memory, field prompts, and OCR settings.
- User Validation of Extracted Data -- How to validate and correct extractions in the Affinda UI.
- Table Editor -- Grid and freeform modes for validating table extractions.
- Reviewing Splitting & Classification -- How to correct document splitting and classification.
- Schema Design Best Practices -- Field configuration trade-offs, advanced options, and schema design guidance.
- Straight-Through Processing -- Data mapping, validation rules, and auto-confirmation for full automation.
- Integration Workflows -- Six workflow patterns (W1-W6) for different integration scenarios.
- Integration Agent -- No-code integrations using AI agent and Pipedream.
- 快速入门 — 核心概念:组织、工作区、文档类型、状态及处理工作流。
- 创建新模型 — 从零开始创建提取模型的分步指南。
- 提高准确率 — 通过模型记忆、字段提示和OCR设置实现99%+准确率的策略。
- 用户验证提取数据 — 如何在Affinda UI中验证和修正提取结果。
- 表格编辑器 — 用于验证表格提取结果的网格和自由格式模式。
- 审核拆分与分类 — 如何修正文档拆分和分类结果。
- 架构设计最佳实践 — 字段配置权衡、高级选项及架构设计指南。
- 直通式处理 — 数据映射、验证规则及自动确认,实现全自动化。
- 集成工作流 — 适用于不同集成场景的六种工作流模式(W1-W6)。
- 集成Agent — 使用AI Agent和Pipedream实现无代码集成。
Configuration Guide
配置指南
Overview & Workflow:
- Workflow -- End-to-end document processing pipeline stages.
- Glossary -- Platform terminology definitions.
- Document Status -- For Review, Confirmed, Archived, Rejected states.
Ingestion & Pre-Processing:
- Ingestion -- Upload methods: manual, email, API.
- Email Upload -- Email-to-workspace document ingestion.
- Pre-Processing -- Automated cleaning before extraction.
- OCR -- OCR modes: Skip, Auto-detect, Partial, Full.
- Duplicates -- Duplicate detection and rejection.
Splitting, Classification & Extraction:
- Splitting -- Auto-separate multi-document files.
- Classification -- Auto-categorize documents by type.
- Field Configuration -- Field names, types, and settings.
- Standard Fields -- Text, numbers, dates, location, phone, URL types.
- Groups & Tables -- Repeating structures and line items.
- Picklists & Data Sources -- Controlled vocabularies and master data matching.
- Checkboxes -- Label and true/false checkbox extraction.
- Image Fields -- Signature, headshot, and seal extraction.
- Model Memory -- RAG-based learning from validated documents.
Validation & Export:
- Machine Validation -- Automated validation overview.
- Validation Rules -- Natural-language business rule creation.
- Confidence -- Confidence scoring and thresholds.
- User Validation -- Human review interface.
- Data Export -- JSON, XML, CSV export options.
- Redaction -- PDF redaction of sensitive data.
- User Management -- Roles and permissions.
API Reference
API参考
- Quick Start -- First API call walkthrough with code examples.
- Authentication -- API key management and rotation.
- Upload Options -- Sync, async polling, and webhook patterns.
- Metadata -- Field-level and document-level metadata reference.
- Limits -- Rate limits, file size limits, page limits.
- Webhooks -- Webhook setup, events, signature verification.
- Embedded Mode -- Embedding validation UI via iframe.
- Client Libraries -- Python, JavaScript, .NET, Java SDKs.
- Structured Outputs (Pydantic) -- Generate Python Pydantic models from document types.
- TypeScript Interfaces -- Generate TypeScript interfaces from document types.
- 快速入门 — 第一个API调用的分步指南及代码示例。
- 认证 — API密钥管理与轮换。
- 上传选项 — 同步、异步轮询及Webhook模式。
- 元数据 — 字段级和文档级元数据参考。
- 限制 — 速率限制、文件大小限制、页数限制。
- Webhook — Webhook设置、事件、签名验证。
- 嵌入式模式 — 通过iframe嵌入验证UI。
- 客户端库 — Python、JavaScript、.NET、Java SDK。
- 结构化输出(Pydantic) — 从文档类型生成Python Pydantic模型。
- TypeScript接口 — 从文档类型生成TypeScript接口。
Resume Parsing Guide
简历解析指南
- Getting Started -- Resume parsing product overview and workspace setup.
- Integration -- Resume parser API integration with code examples.
- Credits -- Per-document credit system for resume parsing.
- Data Extracted -- All fields extracted from resumes with sample JSON.
- Taxonomies -- Skills, job titles, and occupation standardization.
- Resume Redactor -- Automated PII redaction for unbiased hiring.
- Resume Summary -- AI-generated candidate summaries.
- Job Description Parser -- Structured extraction from job descriptions.
- Search & Match -- Candidate/job matching with scoring and search UI.
Additional Resources
其他资源
- Error Glossary -- Error codes and resolutions.
- FAQs -- Common questions on capabilities, configuration, and troubleshooting.
- Billing -- Credits, pricing, and payment.
- Data Retention -- Document deletion and expiry policies.
- Deployment & Data Residency -- Regional servers and enterprise options.
- Product Updates -- Changelog and release notes.
- Status -- Service availability dashboard.