tabular-review-lawvable

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Tabular Review

表格审查

Extract structured data from multiple documents into an Excel matrix with citations.

从多个文档中提取结构化数据，生成带引用的Excel矩阵。

Required Skills

所需技能

pdf - For reading PDF documents
docx - For reading Word documents
xlsx - For creating the Excel output

pdf - 用于读取PDF文档
docx - 用于读取Word文档
xlsx - 用于生成Excel输出

Workflow

工作流程

Step 1: Gather User Requirements

步骤1：收集用户需求

Use

AskUserQuestion

to collect:

Document folder path - Where are the documents?
Output filename - Name for the Excel file
Columns to extract - What information to pull from each document

Example column definitions:

- Parties: Names of all parties to the agreement
- Effective Date: When the agreement becomes effective
- Term: Duration of the agreement
- Governing Law: Jurisdiction for disputes

使用

AskUserQuestion

收集以下信息：

文档文件夹路径 - 文档存储在哪里？
输出文件名 - Excel文件的名称
要提取的列 - 需要从每份文档中提取哪些信息

列定义示例：

- 协议方：协议所有参与方的名称
- 生效日期：协议生效的时间
- 期限：协议的持续时长
- 管辖法律：争议管辖的司法区域

Step 2: Discover Documents

步骤2：发现文档

Use

Glob

to find all documents:

Glob(pattern: "**/*.pdf", path: "<folder>")
Glob(pattern: "**/*.docx", path: "<folder>")

使用

Glob

查找所有文档：

Glob(pattern: "**/*.pdf", path: "<folder>")
Glob(pattern: "**/*.docx", path: "<folder>")

Step 3: Process Documents in Parallel

步骤3：并行处理文档

Launch background agents to process documents concurrently. Each agent:

Reads assigned documents using pdf or docx skill
Extracts values for each column
Captures page/paragraph citations
Returns structured JSON

Launch agents:

Task(
  prompt: "<agent_prompt>",
  subagent_type: "general-purpose",
  run_in_background: true
)

Agent prompt template:

You are processing documents for a tabular review.

DOCUMENTS TO PROCESS:
<list of document paths>

COLUMNS TO EXTRACT:
<column definitions>

For each document:
1. Read the document using the pdf skill (for .pdf) or docx skill (for .docx)
2. Extract the requested information for each column
3. Note the page number (PDF) or section (DOCX) where you found the information
4. Include a brief quote (30-50 chars) showing the source text

Return your results as JSON:
{
  "results": [
    {
      "document": "<filename>",
      "path": "<absolute_path>",
      "extractions": [
        {
          "column": "<column_name>",
          "value": "<extracted_value>",
          "page": <page_number>,
          "quote": "<brief_context_quote>"
        }
      ]
    }
  ]
}

If you cannot find information for a column, set value to "Not found" and explain in the quote field.

Distribution strategy:

For N documents and M agents, each agent processes ceil(N/M) documents
Default: 10 agents maximum
Adjust based on document count

启动后台Agent并行处理文档。每个Agent负责：

使用pdf或docx技能读取分配的文档
提取每列对应的值
记录页码/段落引用
返回结构化JSON

启动Agent：

Task(
  prompt: "<agent_prompt>",
  subagent_type: "general-purpose",
  run_in_background: true
)

Agent提示模板：

You are processing documents for a tabular review.

DOCUMENTS TO PROCESS:
<list of document paths>

COLUMNS TO EXTRACT:
<column definitions>

For each document:
1. Read the document using the pdf skill (for .pdf) or docx skill (for .docx)
2. Extract the requested information for each column
3. Note the page number (PDF) or section (DOCX) where you found the information
4. Include a brief quote (30-50 chars) showing the source text

Return your results as JSON:
{
  "results": [
    {
      "document": "<filename>",
      "path": "<absolute_path>",
      "extractions": [
        {
          "column": "<column_name>",
          "value": "<extracted_value>",
          "page": <page_number>,
          "quote": "<brief_context_quote>"
        }
      ]
    }
  ]
}

If you cannot find information for a column, set value to "Not found" and explain in the quote field.

分配策略：

若有N份文档和M个Agent，每个Agent处理ceil(N/M)份文档
默认最多使用10个Agent
根据文档数量调整

Step 4: Collect Results

步骤4：收集结果

Wait for all background agents to complete:

TaskOutput(task_id: "<agent_id>", block: true)

Aggregate all results into a single array of document extractions.

等待所有后台Agent完成任务：

TaskOutput(task_id: "<agent_id>", block: true)

将所有结果汇总为单个文档提取数组。

Step 5: Generate Excel Output

步骤5：生成Excel输出

Invoke the xlsx skill to create the output file:

Create an Excel workbook at <output_path>:

SHEET 1: "Document Review"
- Header row: Document | <Column1> | <Column2> | ...
- Data rows: One row per document

For each extraction cell:
- Cell value: The extracted text
- Cell hyperlink: file://<document_path>#page=<N> (for PDFs)
- Cell comment: "Page <N>: '<quote>'"

SHEET 2: "Summary"
- Total documents: <count>
- Documents processed: <count>
- Extraction date: <today>

调用xlsx技能创建输出文件：

Create an Excel workbook at <output_path>:

SHEET 1: "Document Review"
- Header row: Document | <Column1> | <Column2> | ...
- Data rows: One row per document

For each extraction cell:
- Cell value: The extracted text
- Cell hyperlink: file://<document_path>#page=<N> (for PDFs)
- Cell comment: "Page <N>: '<quote>'"

SHEET 2: "Summary"
- Total documents: <count>
- Documents processed: <count>
- Extraction date: <today>

JSON Schema

Extraction result format:

json

{
  "document": "Contract_ABC.pdf",
  "path": "/path/to/Contract_ABC.pdf",
  "extractions": [
    {
      "column": "Parties",
      "value": "Acme Corp and Beta Inc",
      "page": 1,
      "quote": "entered into between Acme Corp and Beta Inc"
    },
    {
      "column": "Effective Date",
      "value": "January 15, 2025",
      "page": 1,
      "quote": "effective as of January 15, 2025"
    }
  ]
}

提取结果格式：

json

{
  "document": "Contract_ABC.pdf",
  "path": "/path/to/Contract_ABC.pdf",
  "extractions": [
    {
      "column": "Parties",
      "value": "Acme Corp and Beta Inc",
      "page": 1,
      "quote": "entered into between Acme Corp and Beta Inc"
    },
    {
      "column": "Effective Date",
      "value": "January 15, 2025",
      "page": 1,
      "quote": "effective as of January 15, 2025"
    }
  ]
}

Excel Output Format

Excel输出格式

Cell with citation:

Value: "Acme Corp and Beta Inc"
Hyperlink:
```
file:///path/to/Contract_ABC.pdf#page=1
```

Comment:

Page 1: "entered into between Acme Corp and Beta Inc"

Color coding (optional):

Green: Value found with high confidence
Yellow: Value found but uncertain
Red: Value not found

带引用的单元格：

值："Acme Corp and Beta Inc"
超链接：
```
file:///path/to/Contract_ABC.pdf#page=1
```

批注：

Page 1: "entered into between Acme Corp and Beta Inc"

颜色编码（可选）：

绿色：高置信度找到值
黄色：找到值但存在不确定性
红色：未找到值

Error Handling

错误处理

Scenario	Action
Document unreadable	Log error, mark row as failed, continue
Column not found	Set value to "Not found", explain in comment
Agent timeout	Collect partial results, note incomplete
Missing skill	Prompt user to install required skill

场景	操作
文档无法读取	记录错误，标记该行处理失败，继续执行
未找到对应列	将值设为"Not found"，在批注中说明原因
Agent超时	收集部分结果，标记未完成
缺少所需技能	提示用户安装所需技能

Example Usage

示例用法

User: I want to do a tabular review of my contracts

Claude: [Uses AskUserQuestion]
  - What folder contains your documents?
  - What should I name the output Excel file?
  - What columns do you want to extract?

User: ~/Contracts, review.xlsx, Parties/Date/Term/Governing Law

Claude: [Discovers 15 documents via Glob]
Claude: [Launches 5 background agents, 3 docs each]
Claude: [Collects results via TaskOutput]
Claude: [Creates review.xlsx via xlsx skill]

Output: review.xlsx with 15 rows, 4 columns, hyperlinks and citations

User: I want to do a tabular review of my contracts

Claude: [Uses AskUserQuestion]
  - What folder contains your documents?
  - What should I name the output Excel file?
  - What columns do you want to extract?

User: ~/Contracts, review.xlsx, Parties/Date/Term/Governing Law

Claude: [Discovers 15 documents via Glob]
Claude: [Launches 5 background agents, 3 docs each]
Claude: [Collects results via TaskOutput]
Claude: [Creates review.xlsx via xlsx skill]

Output: review.xlsx with 15 rows, 4 columns, hyperlinks and citations