tabular-review-lawvable
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTabular Review
表格审查
Extract structured data from multiple documents into an Excel matrix with citations.
从多个文档中提取结构化数据,生成带引用的Excel矩阵。
Required Skills
所需技能
- pdf - For reading PDF documents
- docx - For reading Word documents
- xlsx - For creating the Excel output
- pdf - 用于读取PDF文档
- docx - 用于读取Word文档
- xlsx - 用于生成Excel输出
Workflow
工作流程
Step 1: Gather User Requirements
步骤1:收集用户需求
Use to collect:
AskUserQuestion- Document folder path - Where are the documents?
- Output filename - Name for the Excel file
- Columns to extract - What information to pull from each document
Example column definitions:
- Parties: Names of all parties to the agreement
- Effective Date: When the agreement becomes effective
- Term: Duration of the agreement
- Governing Law: Jurisdiction for disputes使用收集以下信息:
AskUserQuestion- 文档文件夹路径 - 文档存储在哪里?
- 输出文件名 - Excel文件的名称
- 要提取的列 - 需要从每份文档中提取哪些信息
列定义示例:
- 协议方:协议所有参与方的名称
- 生效日期:协议生效的时间
- 期限:协议的持续时长
- 管辖法律:争议管辖的司法区域Step 2: Discover Documents
步骤2:发现文档
Use to find all documents:
GlobGlob(pattern: "**/*.pdf", path: "<folder>")
Glob(pattern: "**/*.docx", path: "<folder>")使用查找所有文档:
GlobGlob(pattern: "**/*.pdf", path: "<folder>")
Glob(pattern: "**/*.docx", path: "<folder>")Step 3: Process Documents in Parallel
步骤3:并行处理文档
Launch background agents to process documents concurrently. Each agent:
- Reads assigned documents using pdf or docx skill
- Extracts values for each column
- Captures page/paragraph citations
- Returns structured JSON
Launch agents:
Task(
prompt: "<agent_prompt>",
subagent_type: "general-purpose",
run_in_background: true
)Agent prompt template:
You are processing documents for a tabular review.
DOCUMENTS TO PROCESS:
<list of document paths>
COLUMNS TO EXTRACT:
<column definitions>
For each document:
1. Read the document using the pdf skill (for .pdf) or docx skill (for .docx)
2. Extract the requested information for each column
3. Note the page number (PDF) or section (DOCX) where you found the information
4. Include a brief quote (30-50 chars) showing the source text
Return your results as JSON:
{
"results": [
{
"document": "<filename>",
"path": "<absolute_path>",
"extractions": [
{
"column": "<column_name>",
"value": "<extracted_value>",
"page": <page_number>,
"quote": "<brief_context_quote>"
}
]
}
]
}
If you cannot find information for a column, set value to "Not found" and explain in the quote field.Distribution strategy:
- For N documents and M agents, each agent processes ceil(N/M) documents
- Default: 10 agents maximum
- Adjust based on document count
启动后台Agent并行处理文档。每个Agent负责:
- 使用pdf或docx技能读取分配的文档
- 提取每列对应的值
- 记录页码/段落引用
- 返回结构化JSON
启动Agent:
Task(
prompt: "<agent_prompt>",
subagent_type: "general-purpose",
run_in_background: true
)Agent提示模板:
You are processing documents for a tabular review.
DOCUMENTS TO PROCESS:
<list of document paths>
COLUMNS TO EXTRACT:
<column definitions>
For each document:
1. Read the document using the pdf skill (for .pdf) or docx skill (for .docx)
2. Extract the requested information for each column
3. Note the page number (PDF) or section (DOCX) where you found the information
4. Include a brief quote (30-50 chars) showing the source text
Return your results as JSON:
{
"results": [
{
"document": "<filename>",
"path": "<absolute_path>",
"extractions": [
{
"column": "<column_name>",
"value": "<extracted_value>",
"page": <page_number>,
"quote": "<brief_context_quote>"
}
]
}
]
}
If you cannot find information for a column, set value to "Not found" and explain in the quote field.分配策略:
- 若有N份文档和M个Agent,每个Agent处理ceil(N/M)份文档
- 默认最多使用10个Agent
- 根据文档数量调整
Step 4: Collect Results
步骤4:收集结果
Wait for all background agents to complete:
TaskOutput(task_id: "<agent_id>", block: true)Aggregate all results into a single array of document extractions.
等待所有后台Agent完成任务:
TaskOutput(task_id: "<agent_id>", block: true)将所有结果汇总为单个文档提取数组。
Step 5: Generate Excel Output
步骤5:生成Excel输出
Invoke the xlsx skill to create the output file:
Create an Excel workbook at <output_path>:
SHEET 1: "Document Review"
- Header row: Document | <Column1> | <Column2> | ...
- Data rows: One row per document
For each extraction cell:
- Cell value: The extracted text
- Cell hyperlink: file://<document_path>#page=<N> (for PDFs)
- Cell comment: "Page <N>: '<quote>'"
SHEET 2: "Summary"
- Total documents: <count>
- Documents processed: <count>
- Extraction date: <today>调用xlsx技能创建输出文件:
Create an Excel workbook at <output_path>:
SHEET 1: "Document Review"
- Header row: Document | <Column1> | <Column2> | ...
- Data rows: One row per document
For each extraction cell:
- Cell value: The extracted text
- Cell hyperlink: file://<document_path>#page=<N> (for PDFs)
- Cell comment: "Page <N>: '<quote>'"
SHEET 2: "Summary"
- Total documents: <count>
- Documents processed: <count>
- Extraction date: <today>JSON Schema
JSON Schema
Extraction result format:
json
{
"document": "Contract_ABC.pdf",
"path": "/path/to/Contract_ABC.pdf",
"extractions": [
{
"column": "Parties",
"value": "Acme Corp and Beta Inc",
"page": 1,
"quote": "entered into between Acme Corp and Beta Inc"
},
{
"column": "Effective Date",
"value": "January 15, 2025",
"page": 1,
"quote": "effective as of January 15, 2025"
}
]
}提取结果格式:
json
{
"document": "Contract_ABC.pdf",
"path": "/path/to/Contract_ABC.pdf",
"extractions": [
{
"column": "Parties",
"value": "Acme Corp and Beta Inc",
"page": 1,
"quote": "entered into between Acme Corp and Beta Inc"
},
{
"column": "Effective Date",
"value": "January 15, 2025",
"page": 1,
"quote": "effective as of January 15, 2025"
}
]
}Excel Output Format
Excel输出格式
Cell with citation:
- Value: "Acme Corp and Beta Inc"
- Hyperlink:
file:///path/to/Contract_ABC.pdf#page=1 - Comment:
Page 1: "entered into between Acme Corp and Beta Inc"
Color coding (optional):
- Green: Value found with high confidence
- Yellow: Value found but uncertain
- Red: Value not found
带引用的单元格:
- 值:"Acme Corp and Beta Inc"
- 超链接:
file:///path/to/Contract_ABC.pdf#page=1 - 批注:
Page 1: "entered into between Acme Corp and Beta Inc"
颜色编码(可选):
- 绿色:高置信度找到值
- 黄色:找到值但存在不确定性
- 红色:未找到值
Error Handling
错误处理
| Scenario | Action |
|---|---|
| Document unreadable | Log error, mark row as failed, continue |
| Column not found | Set value to "Not found", explain in comment |
| Agent timeout | Collect partial results, note incomplete |
| Missing skill | Prompt user to install required skill |
| 场景 | 操作 |
|---|---|
| 文档无法读取 | 记录错误,标记该行处理失败,继续执行 |
| 未找到对应列 | 将值设为"Not found",在批注中说明原因 |
| Agent超时 | 收集部分结果,标记未完成 |
| 缺少所需技能 | 提示用户安装所需技能 |
Example Usage
示例用法
User: I want to do a tabular review of my contracts
Claude: [Uses AskUserQuestion]
- What folder contains your documents?
- What should I name the output Excel file?
- What columns do you want to extract?
User: ~/Contracts, review.xlsx, Parties/Date/Term/Governing Law
Claude: [Discovers 15 documents via Glob]
Claude: [Launches 5 background agents, 3 docs each]
Claude: [Collects results via TaskOutput]
Claude: [Creates review.xlsx via xlsx skill]
Output: review.xlsx with 15 rows, 4 columns, hyperlinks and citationsUser: I want to do a tabular review of my contracts
Claude: [Uses AskUserQuestion]
- What folder contains your documents?
- What should I name the output Excel file?
- What columns do you want to extract?
User: ~/Contracts, review.xlsx, Parties/Date/Term/Governing Law
Claude: [Discovers 15 documents via Glob]
Claude: [Launches 5 background agents, 3 docs each]
Claude: [Collects results via TaskOutput]
Claude: [Creates review.xlsx via xlsx skill]
Output: review.xlsx with 15 rows, 4 columns, hyperlinks and citations