agentic-doc-parse-and-extract

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

agentic-doc-parse-and-extract Skill

agentic-doc-parse-and-extract is an official command-line tool released by Laiye Technology's ADP (Agentic Document Processing) product, enabling both humans and AI agents to invoke ADP capabilities in the terminal for document parsing and extraction.

agentic-doc-parse-and-extract是来也科技ADP（Agentic Document Processing）产品推出的官方命令行工具，支持人类和AI Agent在终端调用ADP能力进行文档解析与提取。

Quick Start Guide for AI Agents

AI Agent快速入门指南

Core Workflow

核心工作流

Install dependencies: On first execution, install the ADP CLI tool and dependencies by following the instructions in references/examples.md.
Discover commands: Run
```
adp schema
```
to get the machine-readable JSON spec of all commands, parameters, types, and defaults.
Authentication: On first execution, run
```
adp config get
```
to verify credentials. If no valid configuration exists, prompt the user to provide an API Key.
Check Application: On first execution, retrieve the application list via
```
adp app-id list
```
. For subsequent executions, prioritize
```
adp app-id cache
```
(cached in context). If the cache is unavailable, refresh it by calling
```
adp app-id list
```
again.

Execute: Run

adp extract url <URL> --app-id <ID>

adp parse url <URL> --app-id <ID>

Query: Check results asynchronously with

adp extract query <task_id>

adp parse query <task_id>

Error handling: When a command fails, parse the stderr JSON to determine error type and recovery action. See references/error-handling.md.

安装依赖：首次执行时，按照references/examples.md中的说明安装ADP CLI工具及相关依赖。
查看命令：运行
```
adp schema
```
获取所有命令、参数、类型及默认值的机器可读JSON规范。
身份验证：首次执行时，运行
```
adp config get
```
验证凭据。若不存在有效配置，提示用户提供API Key。
检查应用：首次执行时，通过
```
adp app-id list
```
获取应用列表。后续执行优先使用
```
adp app-id cache
```
（缓存于上下文）。若缓存不可用，再次调用
```
adp app-id list
```
刷新缓存。

执行操作：运行

adp extract url <URL> --app-id <ID>

或

adp parse url <URL> --app-id <ID>

。

查询结果：通过

adp extract query <task_id>

或

adp parse query <task_id>

异步检查结果。

错误处理：当命令执行失败时，解析stderr中的JSON以确定错误类型及恢复操作。详情请见references/error-handling.md。

Common Scenarios → Command Mapping

常见场景→命令映射

User Intent	Recommended Command	Handling Rules
- Read full document content<br>- Parse layout & structure<br>- Convert document to text<br>- Process / analyze full document	`adp parse`	- Sync processing for small files<br>- Async processing ( `--async` parameter) for files >20MB or >200 pages
- Extract key fields (amount, date, name, ID, etc.)<br>- Output structured results (JSON/table)	`adp extract`	- Use matched existing app<br>- Create a custom extraction app if the document type is not in the known app list
Batch process multiple files	`adp parse` / `adp extract` (batch mode)	- Directly use the local folder path<br>- Or save the file URL list to a text file, then pass the text file path
Document content available as base64 string	`adp parse base64` / `adp extract base64`	- Use `--file-name` to specify original filename

用户意图	推荐命令	处理规则
- 读取完整文档内容<br>- 解析布局与结构<br>- 将文档转换为文本<br>- 处理/分析完整文档	`adp parse`	- 小文件使用同步处理<br>- 文件大于20MB或页数超过200时使用异步处理（ `--async` 参数）
- 提取关键字段（金额、日期、名称、ID等）<br>- 输出结构化结果（JSON/表格）	`adp extract`	- 使用匹配的现有应用<br>- 若文档类型不在已知应用列表中，创建自定义提取应用
批量处理多个文件	`adp parse` / `adp extract` （批量模式）	- 直接使用本地文件夹路径<br>- 或将文件URL列表保存到文本文件，然后传入该文本文件路径
文档内容以base64字符串形式提供	`adp parse base64` / `adp extract base64`	- 使用 `--file-name` 指定原始文件名

Quick Reference for Common Commands

常用命令速查

bash

undefined

bash

undefined

Command Discovery (for Agent introspection)

adp schema

Configuration Check

adp config get

Query Applications (First Use)

adp app-id list

Document Extraction (Invoice/Receipt)

adp extract url <file URL> --app-id <app_id>

Document Parsing (Long Document)

adp parse url <file URL> --app-id <app_id>

Base64 Input

adp extract base64 <base64_string> --app-id <app_id> --file-name invoice.pdf adp parse base64 <base64_string> --app-id <app_id> --file-name document.pdf

Asynchronous Query

adp extract query <task_id> adp parse query <task_id> adp parse query <task_id1> <task_id2> --watch # batch query with auto-poll

Batch Processing

adp extract local <folder path> --app-id <app_id> --export <folder path> --concurrency 2 adp parse local <folder path> --app-id <app_id> --export <folder path> --concurrency 2

undefined

adp extract local <folder path> --app-id <app_id> --export <folder path> --concurrency 2 adp parse local <folder path> --app-id <app_id> --export <folder path> --concurrency 2

undefined

Performance Optimization Suggestions

性能优化建议

Reuse APP_ID: Cache it in the context after one query to avoid calling
```
app-id list
```
every time.
Sync First: For small files (<20MB), prioritize using synchronous calls to avoid asynchronous polling.
Batch Processing: Processes multiple documents via
```
url <URL list file path>
```
or
```
local <folder path>
```
in a single run, without looped invocations. Default
```
--concurrency 2
```
.
Local Cache: Store commonly used APP_IDs in environment variables or configuration files.
Priority Extraction: If only key information needs to be extracted, use
```
extract
```
instead of
```
parse
```
(faster).
Use --retry for batch: Set
```
--retry 2
```
for batch processing to auto-recover from transient failures.
Use --timeout for large files: Increase
```
--timeout
```
for files >20MB. Default is 900s.

复用APP_ID：查询一次后将其缓存到上下文中，避免每次调用
```
app-id list
```
。
优先同步：对于小文件（<20MB），优先使用同步调用，避免异步轮询。
批量处理：通过
```
url <URL列表文件路径>
```
或
```
local <文件夹路径>
```
一次性处理多个文档，无需循环调用。默认
```
--concurrency 2
```
。
本地缓存：将常用APP_ID存储在环境变量或配置文件中。
优先提取：若仅需提取关键信息，使用
```
extract
```
而非
```
parse
```
（速度更快）。
批量处理使用--retry：批量处理时设置
```
--retry 2
```
，自动恢复临时故障。
大文件使用--timeout：对于大于20MB的文件，增加
```
--timeout
```
值。默认值为900秒。

Detailed Product Introduction

产品详细介绍

Core Function Definition

核心功能定义

parse: Parses the entire document to retrieve full text, layout, structure, and content.
extract: Extracts specific structured fields from the document, such as amount, date, company name, and order number.

parse：解析整个文档，获取完整文本、布局、结构及内容。
extract：从文档中提取特定结构化字段，如金额、日期、公司名称、订单号等。

Application Scenarios

应用场景

Long Document Parsing: Efficiently process long documents with fast parsing speed, accurately extract multiple elements such as text, tables and images, replace manual extraction, and improve efficiency.
Structured Extraction for Scanned/Photographed Documents: For scanned documents and photos, complete structured extraction in reading order, generate clear and editable electronic documents, and eliminate manual entry errors.
Intelligent Invoice Extraction: After uploading invoice images/documents, AI automatically invokes preset applications to accurately extract 10+ key fields such as invoice number and amount, suitable for financial filing scenarios.
Intelligent Order Extraction: Support batch upload of orders from multiple distributors, AI extracts 10+ key fields such as order number and buyer-seller information, automatically identifies currencies, and reduces manual verification costs.
Domestic ID Document Extraction: Process in seconds, supporting the identification and extraction of more than 10 common types of documents in China; for example, core information such as name and ID number can be quickly extracted from ID card scans.
Automatic Splitting and Extraction of Mixed Documents: Batch upload mixed documents such as contracts and invoices, AI automatically classifies, splits and completes structured extraction to improve processing efficiency.
Batch Document Processing: Support batch upload of various business documents, extract information and output standardized structured data, reducing repetitive manual operations.

长文档解析：高效处理长文档，解析速度快，准确提取文本、表格、图片等多种元素，替代人工提取，提升效率。
扫描/拍摄文档结构化提取：针对扫描件和照片，按阅读顺序完成结构化提取，生成清晰可编辑的电子文档，消除人工录入错误。
智能发票提取：上传发票图片/文档后，AI自动调用预设应用，准确提取发票号码、金额等10+关键字段，适用于财务归档场景。
智能订单提取：支持批量上传多分销商订单，AI提取订单号、买卖双方信息等10+关键字段，自动识别币种，降低人工审核成本。
国内身份证件提取：秒级处理，支持中国地区10余种常见证件的识别与提取；例如，可快速从身份证扫描件中提取姓名、身份证号等核心信息。
混合文档自动拆分提取：批量上传合同、发票等混合文档，AI自动分类、拆分并完成结构化提取，提升处理效率。
批量文档处理：支持批量上传各类业务文档，提取信息并输出标准化结构化数据，减少重复人工操作。

Detailed Usage Steps

详细使用步骤

Step 1: Obtain the Installation Package

步骤1：获取安装包

For details, see references/examples.md

详情请见references/examples.md

Step 2: Obtain and Configure API Key

步骤2：获取并配置API Key

1. Access the ADP Portal to Obtain Credentials

We provide independent Public Cloud access addresses for domestic and international users, which need to be configured separately by region. Accessing nearby can better ensure high-speed and stable calls across the network.

Region	Login Address	API Base URL
Chinese Mainland	https://adp.laiye.com/	`https://adp.laiye.com/`
Overseas Region	https://adp-global.laiye.com/	`https://adp-global.laiye.com/`

2. Get API Key after registration/login

New users need to register an ADP account first, and after registration, they can get 100 free credits/month

After logging in, click on the personal avatar, and you can directly access the
```
API_Key
```
entry.

3. Complete the authentication configuration

For details, see references/examples.md

4. Verify the configuration

For details, see references/examples.md

Notes:

If API Key and API Base URL have been configured, the configuration information needs to be stored in environment variables to avoid uploading configuration items every time they are used.
If API Key and API Base URL have not been configured yet, they need to be configured according to the above steps.

1. 访问ADP门户获取凭据

我们为国内和国际用户提供独立的公有云访问地址，需按地区分别配置。就近访问可更好地保障跨网络高速稳定调用。

地区	登录地址	API基础URL
中国大陆	https://adp.laiye.com/	`https://adp.laiye.com/`
海外地区	https://adp-global.laiye.com/	`https://adp-global.laiye.com/`

2. 注册/登录后获取API Key

新用户需先注册ADP账号，注册后每月可获得100个免费额度

登录后，点击个人头像，可直接进入
```
API_Key
```
入口。

3. 完成认证配置

详情请见references/examples.md

4. 验证配置

详情请见references/examples.md

注意事项：

若已配置API Key和API基础URL，需将配置信息存储在环境变量中，避免每次使用时重复上传配置项。
若尚未配置API Key和API基础URL，需按照上述步骤进行配置。

Step 3: Upload Documents

步骤3：上传文档

After completing the authentication of the API Key, guide the user to upload local files or specify the file URL. After the user uploads the document, they can query the supported application scope of ADP and select the appropriate application for document parsing and extraction. If no suitable application is found, they can choose to create a custom extraction application, configure exclusive fields and parsing modes to meet the personalized document processing requirements.

完成API Key认证后，引导用户上传本地文件或指定文件URL。用户上传文档后，可查询ADP支持的应用范围，选择合适的应用进行文档解析与提取。若未找到合适应用，可选择创建自定义提取应用，配置专属字段和解析模式，以满足个性化文档处理需求。

Step 4: Query Available Applications

步骤4：查询可用应用

This function is used to query the built-in applications under the user's account (such as invoices/receipts, orders, common cards and certificates in China region, etc. which are standardized documents). Based on the

app-label

, you can assist in filtering the suitable application IDs. If no suitable application is found, you can choose to create a custom extraction application, configure specific fields and parsing modes to meet the personalized document processing requirements.

Notes:

For the first execution, use
```
adp app-id list
```
. For subsequent executions, prefer to use
```
adp app-id cache
```
(cache the application ID in the context). If the cache becomes invalid or there are no suitable applications in the cache, call
```
adp app-id list
```
again to update the cache.

For detailed examples of commands and responses, see references/examples.md.

此功能用于查询用户账号下的内置应用（如中国地区的发票/收据、订单、常见卡证等标准化文档）。可基于

app-label

协助筛选合适的应用ID。若未找到合适应用，可选择创建自定义提取应用，配置特定字段和解析模式，以满足个性化文档处理需求。

注意事项：

首次执行时使用
```
adp app-id list
```
。后续执行优先使用
```
adp app-id cache
```
（将应用ID缓存到上下文）。若缓存失效或缓存中无合适应用，再次调用
```
adp app-id list
```
更新缓存。

命令及响应的详细示例请见references/examples.md。

Step 5: Add custom extraction application

步骤5：添加自定义提取应用

Support creating custom extraction applications, and independently add business-specific extraction fields as needed, and improve the detailed description of each field; the system will accurately identify the document content based on the configured fields and definitions, and complete customized information extraction for personalized documents and non-standard forms.

For example commands, responses, and detailed parameter descriptions, please refer to references/examples.md

支持创建自定义提取应用，可根据需要独立添加业务专属提取字段，并完善每个字段的详细描述；系统将根据配置的字段和定义准确识别文档内容，完成个性化文档和非标准表单的定制化信息提取。

命令示例、响应及详细参数说明请参考references/examples.md

Step 6: Execute Document Processing

步骤6：执行文档处理

Single Document Parsing

单文档解析

Perform document parsing based on the selected application ID, which will return a formatted JSON result containing information such as document content, element position coordinates, OCR Confidence Level, etc.

For examples of commands and responses, please refer to references/examples.md

根据所选应用ID执行文档解析，将返回包含文档内容、元素位置坐标、OCR置信度等信息的格式化JSON结果。

命令及响应示例请见references/examples.md

Single Document Extraction

单文档提取

Perform document extraction based on the selected application ID, which will return a formatted JSON result containing information such as extraction fields, extraction results, and Confidence Level.

For examples of commands and responses, please refer to references/examples.md

根据所选应用ID执行文档提取，将返回包含提取字段、提取结果、置信度等信息的格式化JSON结果。

命令及响应示例请见references/examples.md

Batch Document Processing

批量文档处理

ADP supports batch processing capabilities. Users can upload multiple file URLs or local folder paths at once, and the system will automatically identify each document type and match the most suitable application for processing, greatly improving the efficiency of batch document processing.

For detailed command examples, see references/examples.md

Note: The number of concurrent requests is limited to 1 for free users, while enterprise users can adjust it according to their needs, with a maximum support of 2.

ADP支持批量处理能力。用户可一次性上传多个文件URL或本地文件夹路径，系统将自动识别每个文档类型并匹配最合适的应用进行处理，大幅提升批量文档处理效率。

详细命令示例请见references/examples.md

注意：免费用户的并发请求数限制为1，企业用户可根据需求调整，最大支持2。

Asynchronous Processing (Suitable for Large Documents)

异步处理（适用于大文档）

ADP provides asynchronous processing capabilities, allowing users to choose asynchronous mode to perform document parsing and extraction. The system will return a task ID, and users can periodically query the task status and results through the query interface, which is suitable for processing complex documents or batch documents with long processing times. If the document uploaded by the user is larger than 20MB or contains more than 200 pages, it is recommended to use the asynchronous processing mode.

For examples of commands and responses, see references/examples.md

ADP提供异步处理能力，用户可选择异步模式进行文档解析与提取。系统将返回任务ID，用户可通过查询接口定期查询任务状态和结果，适用于处理复杂文档或处理时间较长的批量文档。若用户上传的文档大于20MB或页数超过200，建议使用异步处理模式。

命令及响应示例请见references/examples.md

Complete Command List

完整命令列表

For a complete list of all available commands with full parameter specs, see references/commands.md

所有可用命令的完整参数规范列表请见references/commands.md

Response Schema Reference

响应格式参考

For the output structure of each command (including batch processing output mechanism), see references/response-schema.md

每个命令的输出结构（包括批量处理输出机制）请见references/response-schema.md

Error Handling Guide

错误处理指南

For error codes, types, and Agent auto-recovery strategies, see references/error-handling.md

错误代码、类型及Agent自动恢复策略请见references/error-handling.md

Precautions

注意事项

When using ADP output, always present the returned data as-is. Do not modify, add, or remove any fields during extraction or parsing to ensure data integrity.

API Key Security: Please keep your API Key secure and avoid disclosing it to unauthorized third parties.
API Base URL Configuration: Select the corresponding address based on the region. For Chinese Mainland, use
```
https://adp.laiye.com/
```
, and for overseas regions, use
```
https://adp-global.laiye.com/
```
File Size Limit: The maximum size of a single file is 50MB
Supported Formats: .jpg, .jpeg, .png, .bmp, .tiff, .tif, .pdf, .doc, .docx, .xls, .xlsx, .ppt, .pptx
Free Quota: New users receive 100 free credits per month, which are reset at the beginning of each month. Credits can be used for document parsing and extraction processing.
Check Balance: Run
```
adp credit
```
to check the current account's credit balance.
Billing Rules:
- Document parsing: 0.5 credits per page
- Invoice/receipt extraction: 1.5 credits per page
- Order extraction: 1.5 credits per page
- Custom extraction: 1 credit per page
App ID Reuse: The app ID used by the user can be remembered for direct use next time, eliminating the need to enter the app_id after each query. The app ID under each user is unique and fixed; unless the user deletes the app, the app_id will not change, and the previously queried app_id can be directly used for document processing calls.

使用ADP输出时，需原样呈现返回数据。提取或解析过程中请勿修改、添加或删除任何字段，以确保数据完整性。

API Key安全：请妥善保管您的API Key，避免泄露给未授权的第三方。
API基础URL配置：根据地区选择对应地址。中国大陆使用
```
https://adp.laiye.com/
```
，海外地区使用
```
https://adp-global.laiye.com/
```
文件大小限制：单个文件最大为50MB
支持格式：.jpg、.jpeg、.png、.bmp、.tiff、.tif、.pdf、.doc、.docx、.xls、.xlsx、.ppt、.pptx
免费额度：新用户每月可获得100个免费额度，每月初重置。额度可用于文档解析和提取处理。
查询余额：运行
```
adp credit
```
查询当前账号的额度余额。
计费规则：
- 文档解析：每页0.5额度
- 发票/收据提取：每页1.5额度
- 订单提取：每页1.5额度
- 自定义提取：每页1额度
App ID复用：用户使用的App ID可记忆，下次直接使用，无需每次查询后输入App ID。每个用户下的App ID是唯一且固定的；除非用户删除应用，否则App ID不会变更，之前查询的App ID可直接用于文档处理调用。

agentic-doc-parse-and-extract

Original

Translation

agentic-doc-parse-and-extract Skill

agentic-doc-parse-and-extract Skill

Quick Start Guide for AI Agents

AI Agent快速入门指南

Core Workflow

核心工作流

Common Scenarios → Command Mapping

常见场景→命令映射

Quick Reference for Common Commands

常用命令速查

Command Discovery (for Agent introspection)

Command Discovery (for Agent introspection)

Configuration Check

Configuration Check

Query Applications (First Use)

Query Applications (First Use)

Document Extraction (Invoice/Receipt)

Document Extraction (Invoice/Receipt)

Document Parsing (Long Document)

Document Parsing (Long Document)

Base64 Input

Base64 Input

Asynchronous Query

Asynchronous Query

Batch Processing

Batch Processing

Performance Optimization Suggestions

性能优化建议

Detailed Product Introduction

产品详细介绍

Core Function Definition

核心功能定义

Application Scenarios

应用场景

Detailed Usage Steps

详细使用步骤

Step 1: Obtain the Installation Package

步骤1：获取安装包

Step 2: Obtain and Configure API Key

步骤2：获取并配置API Key

1. Access the ADP Portal to Obtain Credentials

2. Get API Key after registration/login

3. Complete the authentication configuration

4. Verify the configuration

1. 访问ADP门户获取凭据

2. 注册/登录后获取API Key

3. 完成认证配置

4. 验证配置

Step 3: Upload Documents

步骤3：上传文档

Step 4: Query Available Applications

步骤4：查询可用应用

Step 5: Add custom extraction application

步骤5：添加自定义提取应用

Step 6: Execute Document Processing

步骤6：执行文档处理

Single Document Parsing

单文档解析

Single Document Extraction

单文档提取

Batch Document Processing

批量文档处理

Asynchronous Processing (Suitable for Large Documents)

异步处理（适用于大文档）

Complete Command List

完整命令列表

Response Schema Reference

响应格式参考

Error Handling Guide

错误处理指南

Precautions

注意事项

Related Resources

相关资源