source-ingestion
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSource Ingestion
研究来源导入
Ingest sources so future analysis compounds instead of rediscovering the same
document. The native source remains immutable; derived Markdown and wiki pages
are working artifacts.
导入研究来源,以便后续分析可以基于已有成果展开,无需重复查找同一文档。原始来源文件保持不可更改;转换生成的Markdown和wiki页面为可编辑的工作产物。
Read First
必读文档
references/repository-contract.mdreferences/source-ledger.mdreferences/pdf-markdown-policy.md
references/repository-contract.mdreferences/source-ledger.mdreferences/pdf-markdown-policy.md
Workflow
工作流程
- Identify source type, provenance, identifiers, and destination.
- Store immutable originals under ,
sources/,data/raw/, ordata/external/.reports/ - If conversion is needed, hand off the conversion procedure to
; keep this workflow responsible for provenance and repository registration.
document-conversion - Add or update .
sources/source-ledger.csv - Create or update .
wiki/sources/<source_id>.md - Extract claims, methods, datasets, limitations, and open questions.
- Update ,
wiki/index.md, and any affected concept, claim, method, or question pages.wiki/log.md
- 确定来源类型、出处、标识符及存储目标。
- 将不可更改的原始文件存储在、
sources/、data/raw/或data/external/目录下。reports/ - 若需要转换,将转换流程移交至;本工作流程负责记录来源出处及仓库注册信息。
document-conversion - 添加或更新文件。
sources/source-ledger.csv - 创建或更新页面。
wiki/sources/<source_id>.md - 提取研究主张、方法、数据集、局限性及待解决问题。
- 更新、
wiki/index.md以及所有受影响的概念、主张、方法或问题页面。wiki/log.md
Source ID
来源标识符(Source ID)
Prefer stable slugs:
smith-2024-topicarxiv-2401-12345doi-10-1145-short-topicproposal-short-topic-2026
优先使用稳定的短标识:
smith-2024-topicarxiv-2401-12345doi-10-1145-short-topicproposal-short-topic-2026
Conversion Handoff
转换移交
- Preserve the native PDF.
- Save derived Markdown under only after the conversion workflow records quality and command details.
sources/markdown/<source_id>.md - Verify exact claims, tables, equations, and quotes against the native file.
- If extraction quality is poor, mark it and avoid using the Markdown for exact evidence.
- 保留原始PDF文件。
- 仅在转换工作流程记录质量及命令详情后,将转换生成的Markdown文件保存至。
sources/markdown/<source_id>.md - 对照原始文件验证研究主张、表格、公式及引用内容的准确性。
- 若提取质量不佳,需标记该文件,且避免将其Markdown版本用作精确证据。
Output Contract
输出规范
The task is not complete until these are updated when applicable:
- source file path
- derived Markdown path
- metadata or BibTeX path
sources/source-ledger.csvwiki/sources/<source_id>.mdwiki/index.mdwiki/log.md
在适用情况下,需完成以下内容的更新,任务才算完成:
- 源文件路径
- 转换生成的Markdown文件路径
- 元数据或BibTeX文件路径
sources/source-ledger.csvwiki/sources/<source_id>.mdwiki/index.mdwiki/log.md