nutrient-document-processing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Nutrient Document Processing

Nutrient 文档处理

Note: This skill integrates with the Nutrient commercial API. Review their terms before use.
Process documents with the Nutrient DWS Processor API. Convert formats, extract text and tables, OCR scanned documents, redact PII, add watermarks, digitally sign, and fill PDF forms.
注意: 此技能集成了Nutrient商业API。使用前请查看其条款。
通过Nutrient DWS Processor API处理文档。支持格式转换、提取文本和表格、对扫描文档进行OCR识别、脱敏个人身份信息(PII)、添加水印、数字签名以及填充PDF表单。

Setup

设置

Get a free API key at nutrient.io
bash
export NUTRIENT_API_KEY="pdf_live_..."
All requests go to
https://api.nutrient.io/build
as multipart POST with an
instructions
JSON field.
前往**nutrient.io**获取免费API密钥
bash
export NUTRIENT_API_KEY="pdf_live_..."
所有请求均以multipart POST方式发送至
https://api.nutrient.io/build
,并包含
instructions
JSON字段。

Operations

操作

Convert Documents

文档转换

bash
undefined
bash
undefined

DOCX to PDF

DOCX转PDF

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.docx=@document.docx"
-F 'instructions={"parts":[{"file":"document.docx"}]}'
-o output.pdf
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.docx=@document.docx"
-F 'instructions={"parts":[{"file":"document.docx"}]}'
-o output.pdf

PDF to DOCX

PDF转DOCX

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}'
-o output.docx
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}'
-o output.docx

HTML to PDF

HTML转PDF

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "index.html=@index.html"
-F 'instructions={"parts":[{"html":"index.html"}]}'
-o output.pdf

Supported inputs: PDF, DOCX, XLSX, PPTX, DOC, XLS, PPT, PPS, PPSX, ODT, RTF, HTML, JPG, PNG, TIFF, HEIC, GIF, WebP, SVG, TGA, EPS.
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "index.html=@index.html"
-F 'instructions={"parts":[{"html":"index.html"}]}'
-o output.pdf

支持的输入格式:PDF、DOCX、XLSX、PPTX、DOC、XLS、PPT、PPS、PPSX、ODT、RTF、HTML、JPG、PNG、TIFF、HEIC、GIF、WebP、SVG、TGA、EPS。

Extract Text and Data

提取文本与数据

bash
undefined
bash
undefined

Extract plain text

提取纯文本

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}'
-o output.txt
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}'
-o output.txt

Extract tables as Excel

提取表格为Excel格式

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}'
-o tables.xlsx
undefined
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}'
-o tables.xlsx
undefined

OCR Scanned Documents

扫描文档OCR识别

bash
undefined
bash
undefined

OCR to searchable PDF (supports 100+ languages)

OCR转换为可搜索PDF(支持100+种语言)

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "scanned.pdf=@scanned.pdf"
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}'
-o searchable.pdf

Languages: Supports 100+ languages via ISO 639-2 codes (e.g., `eng`, `deu`, `fra`, `spa`, `jpn`, `kor`, `chi_sim`, `chi_tra`, `ara`, `hin`, `rus`). Full language names like `english` or `german` also work. See the [complete OCR language table](https://www.nutrient.io/guides/document-engine/ocr/language-support/) for all supported codes.
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "scanned.pdf=@scanned.pdf"
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}'
-o searchable.pdf

语言支持:通过ISO 639-2代码支持100+种语言(例如`eng`、`deu`、`fra`、`spa`、`jpn`、`kor`、`chi_sim`、`chi_tra`、`ara`、`hin`、`rus`)。也支持完整语言名称,如`english`或`german`。查看[完整OCR语言列表](https://www.nutrient.io/guides/document-engine/ocr/language-support/)获取所有支持的代码。

Redact Sensitive Information

敏感信息脱敏

bash
undefined
bash
undefined

Pattern-based (SSN, email)

基于预设规则(社保号、邮箱)

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}'
-o redacted.pdf
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}'
-o redacted.pdf

Regex-based

基于正则表达式

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\b[A-Z]{2}\d{6}\b"}}]}'
-o redacted.pdf

Presets: `social-security-number`, `email-address`, `credit-card-number`, `international-phone-number`, `north-american-phone-number`, `date`, `time`, `url`, `ipv4`, `ipv6`, `mac-address`, `us-zip-code`, `vin`.
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\b[A-Z]{2}\d{6}\b"}}]}'
-o redacted.pdf

预设规则包括:`social-security-number`、`email-address`、`credit-card-number`、`international-phone-number`、`north-american-phone-number`、`date`、`time`、`url`、`ipv4`、`ipv6`、`mac-address`、`us-zip-code`、`vin`。

Add Watermarks

添加水印

bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \
  -o watermarked.pdf
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "document.pdf=@document.pdf" \
  -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \
  -o watermarked.pdf

Digital Signatures

数字签名

bash
undefined
bash
undefined

Self-signed CMS signature

自签名CMS签名

curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}'
-o signed.pdf
undefined
curl -X POST https://api.nutrient.io/build
-H "Authorization: Bearer $NUTRIENT_API_KEY"
-F "document.pdf=@document.pdf"
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}'
-o signed.pdf
undefined

Fill PDF Forms

填充PDF表单

bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "form.pdf=@form.pdf" \
  -F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"jane@example.com","date":"2026-02-06"}}]}' \
  -o filled.pdf
bash
curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer $NUTRIENT_API_KEY" \
  -F "form.pdf=@form.pdf" \
  -F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"jane@example.com","date":"2026-02-06"}}]}' \
  -o filled.pdf

MCP Server (Alternative)

MCP服务器(替代方案)

For native tool integration, use the MCP server instead of curl:
json
{
  "mcpServers": {
    "nutrient-dws": {
      "command": "npx",
      "args": ["-y", "@nutrient-sdk/dws-mcp-server"],
      "env": {
        "NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
        "SANDBOX_PATH": "/path/to/working/directory"
      }
    }
  }
}
如需原生工具集成,可使用MCP服务器替代curl:
json
{
  "mcpServers": {
    "nutrient-dws": {
      "command": "npx",
      "args": ["-y", "@nutrient-sdk/dws-mcp-server"],
      "env": {
        "NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
        "SANDBOX_PATH": "/path/to/working/directory"
      }
    }
  }
}

When to Use

使用场景

  • Converting documents between formats (PDF, DOCX, XLSX, PPTX, HTML, images)
  • Extracting text, tables, or key-value pairs from PDFs
  • OCR on scanned documents or images
  • Redacting PII before sharing documents
  • Adding watermarks to drafts or confidential documents
  • Digitally signing contracts or agreements
  • Filling PDF forms programmatically
  • 文档格式转换(PDF、DOCX、XLSX、PPTX、HTML、图片)
  • 从PDF中提取文本、表格或键值对
  • 扫描文档或图片的OCR识别
  • 共享文档前脱敏个人身份信息(PII)
  • 为草稿或机密文档添加水印
  • 为合同或协议添加数字签名
  • 程序化填充PDF表单

Links

相关链接