extraction-form

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Chinese

Goal: create a consistent, analysis-ready extraction table that is directly grounded in the protocol.

目标：创建一个与方案直接对齐的、一致且可用于分析的提取表格。

Required:

Optional:

必填：

可选：

Determine the included set
- From
```
papers/screening_log.csv
```
  , collect all rows with
```
decision=include
```
  .
Build/confirm the schema
- Use the extraction schema defined in
```
output/PROTOCOL.md
```
  .
- If the protocol does not define fields yet, stop and update
```
output/PROTOCOL.md
```
  first.
Populate
```
papers/extraction_table.csv
```
- One row per included paper.
- If
```
papers/paper_notes.jsonl
```
  exists, use it as a structured source for values/provenance (but keep the table schema governed by
```
output/PROTOCOL.md
```
  ).
- Always include provenance columns:
  - ```
  paper_id
```
  ,
```
  title
```
  ,
```
  year
```
  ,
```
  url
```
- For each protocol-defined field:
  - fill concrete values (units explicit)
  - use an explicit sentinel for unknowns (recommended: empty cell +
```
notes
```
    )
Keep it auditable
- If a value is inferred (not directly stated), mark it in a notes column.
- Do not write synthesis; only extraction.
Quick QA
- Ensure 1:1 coverage: included papers == extraction rows.
- Spot-check a few rows against the paper text/notes.

```
papers/extraction_table.csv
```
exists.
Every included paper from
```
papers/screening_log.csv
```
has exactly one extraction row.
Column meanings match
```
output/PROTOCOL.md
```
(no ad-hoc columns without updating the protocol).

Fix:

Update
```
output/PROTOCOL.md
```
(extraction schema section) and re-run extraction.

解决方法：

Fix:

Move narrative into a
```
notes
```
column and keep the rest as atomic values (numbers/enums/short strings).

解决方法：