extraction-form

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Extraction Form (systematic review)

Extraction Form (systematic review)

Goal: create a consistent, analysis-ready extraction table that is directly grounded in the protocol.
目标:创建一个与方案直接对齐的、一致且可用于分析的提取表格。

Inputs

输入项

Required:
  • papers/screening_log.csv
  • output/PROTOCOL.md
Optional:
  • papers/paper_notes.jsonl
    (if you already have structured notes)
必填:
  • papers/screening_log.csv
  • output/PROTOCOL.md
可选:
  • papers/paper_notes.jsonl
    (若已存在结构化笔记)

Outputs

输出项

  • papers/extraction_table.csv
  • papers/extraction_table.csv

Workflow

工作流程

  1. Determine the included set
    • From
      papers/screening_log.csv
      , collect all rows with
      decision=include
      .
  2. Build/confirm the schema
    • Use the extraction schema defined in
      output/PROTOCOL.md
      .
    • If the protocol does not define fields yet, stop and update
      output/PROTOCOL.md
      first.
  3. Populate
    papers/extraction_table.csv
    • One row per included paper.
    • If
      papers/paper_notes.jsonl
      exists, use it as a structured source for values/provenance (but keep the table schema governed by
      output/PROTOCOL.md
      ).
    • Always include provenance columns:
      • paper_id
        ,
        title
        ,
        year
        ,
        url
    • For each protocol-defined field:
      • fill concrete values (units explicit)
      • use an explicit sentinel for unknowns (recommended: empty cell +
        notes
        )
  4. Keep it auditable
    • If a value is inferred (not directly stated), mark it in a notes column.
    • Do not write synthesis; only extraction.
  5. Quick QA
    • Ensure 1:1 coverage: included papers == extraction rows.
    • Spot-check a few rows against the paper text/notes.
  1. 确定纳入集合
    • papers/screening_log.csv
      中收集所有
      decision=include
      的行。
  2. 构建/确认架构
    • 使用
      output/PROTOCOL.md
      中定义的提取架构。
    • 若方案尚未定义字段,请先停止操作并更新
      output/PROTOCOL.md
  3. 填充
    papers/extraction_table.csv
    • 每篇纳入论文对应一行。
    • papers/paper_notes.jsonl
      存在,可将其作为值/溯源信息的结构化来源(但表格架构仍需遵循
      output/PROTOCOL.md
      )。
    • 必须包含溯源列:
      • paper_id
        title
        year
        url
    • 对于每个方案定义的字段:
      • 填写具体值(明确单位)
      • 对于未知值使用明确的标记(推荐:空单元格 +
        notes
        列)
  4. 保持可审计性
    • 若值为推断得出(未直接声明),请在notes列中标记。
    • 请勿撰写synthesis内容;仅进行数据提取。
  5. 快速质量检查
    • 确保1:1覆盖:纳入论文数量 == 提取表格行数。
    • 抽查部分行与论文文本/笔记是否一致。

Definition of Done

完成标准

  • papers/extraction_table.csv
    exists.
  • Every included paper from
    papers/screening_log.csv
    has exactly one extraction row.
  • Column meanings match
    output/PROTOCOL.md
    (no ad-hoc columns without updating the protocol).
  • papers/extraction_table.csv
    已生成。
  • papers/screening_log.csv
    中的每篇纳入论文在提取表格中恰好对应一行。
  • 列含义与
    output/PROTOCOL.md
    一致(未更新方案时不得添加临时列)。

Troubleshooting

故障排除

Issue: the protocol does not specify extraction fields

问题:方案未指定提取字段

Fix:
  • Update
    output/PROTOCOL.md
    (extraction schema section) and re-run extraction.
解决方法
  • 更新
    output/PROTOCOL.md
    (提取架构章节)后重新运行提取流程。

Issue: extraction table mixes narrative text with fields

问题:提取表格中混合了叙述性文本与字段值

Fix:
  • Move narrative into a
    notes
    column and keep the rest as atomic values (numbers/enums/short strings).
解决方法
  • 将叙述性内容移至
    notes
    列,其余内容保留为原子值(数字/枚举/短字符串)。