paper2code
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesepaper2code — Orchestration
paper2code — 编排
You are executing the paper2code skill. This file governs the high-level flow. Each stage dispatches to a detailed reasoning protocol in . Do NOT skip stages. Do NOT combine stages. Execute them in order.
pipeline/你正在执行paper2code技能。本文件管控高层流程。每个阶段都会调度目录下的详细推理协议。请勿跳过阶段,请勿合并阶段,请按顺序执行。
pipeline/Parse arguments
参数解析
Extract from the user's input:
- : the arxiv paper ID (e.g.,
ARXIV_ID). Strip any URL prefix.2106.09685 - : one of
MODE(default),minimal,full.educational - : one of
FRAMEWORK(default),pytorch,jax.numpy
If the user provided a full URL like , extract the ID .
If the user provided a versioned ID like , keep the version.
https://arxiv.org/abs/2106.096852106.096852106.09685v2从用户输入中提取以下内容:
- :arxiv论文ID(例如
ARXIV_ID),去除所有URL前缀。2106.09685 - :可选值为
MODE(默认)、minimal、full中的一个。educational - :可选值为
FRAMEWORK(默认)、pytorch、jax中的一个。numpy
如果用户提供了类似的完整URL,请提取ID。
如果用户提供了带版本的ID,例如,请保留版本号。
https://arxiv.org/abs/2106.096852106.096852106.09685v2Set up working directory
设置工作目录
Create a temporary working directory:
This is where intermediate artifacts go. The final output goes in the current directory under .
.paper2code_work/{ARXIV_ID}/{paper_slug}/创建临时工作目录:
中间产物将存储在此处,最终输出将存放在当前目录的文件夹下。
.paper2code_work/{ARXIV_ID}/{paper_slug}/Install dependencies
安装依赖
Run via Bash:
bash
pip install pymupdf4llm pdfplumber requests pyyaml通过Bash运行:
bash
pip install pymupdf4llm pdfplumber requests pyyamlExecute pipeline
执行流程
Stage 1 — Paper Acquisition and Parsing
阶段1 — 论文获取与解析
Read and follow:
pipeline/01_paper_acquisition.mdRun the helper script to fetch and parse the paper:
bash
python skills/paper2code/scripts/fetch_paper.py {ARXIV_ID} .paper2code_work/{ARXIV_ID}/Then run structure extraction:
bash
python skills/paper2code/scripts/extract_structure.py .paper2code_work/{ARXIV_ID}/paper_text.md .paper2code_work/{ARXIV_ID}/Verify the outputs exist before proceeding. If extraction failed, follow the fallback protocol in .
pipeline/01_paper_acquisition.mdThe script also searches for official code repositories (in the paper text and on the arxiv page) and saves any found links to under the key. Verify these links before relying on them — see Step 8 in .
paper_metadata.jsonofficial_codepipeline/01_paper_acquisition.md阅读并遵循的要求。
pipeline/01_paper_acquisition.md运行辅助脚本获取并解析论文:
bash
python skills/paper2code/scripts/fetch_paper.py {ARXIV_ID} .paper2code_work/{ARXIV_ID}/随后运行结构提取脚本:
bash
python skills/paper2code/scripts/extract_structure.py .paper2code_work/{ARXIV_ID}/paper_text.md .paper2code_work/{ARXIV_ID}/继续执行前请验证输出文件已存在。如果提取失败,请遵循中的降级协议处理。
pipeline/01_paper_acquisition.md该脚本还会搜索官方代码仓库(从论文文本和arxiv页面中查找),并将找到的链接保存到的键下。使用这些链接前请先验证有效性——参考中的第8步。
paper_metadata.jsonofficial_codepipeline/01_paper_acquisition.mdStage 2 — Contribution Identification
阶段2 — 贡献点识别
Read and follow:
pipeline/02_contribution_identification.mdRead the parsed paper sections. Identify the single core contribution. Classify the paper type. Write the contribution statement. Save it to .
.paper2code_work/{ARXIV_ID}/contribution.md阅读并遵循的要求。
pipeline/02_contribution_identification.md阅读解析后的论文章节,识别唯一的核心贡献点,分类论文类型,撰写贡献说明,保存到。
.paper2code_work/{ARXIV_ID}/contribution.mdStage 3 — Ambiguity Audit
阶段3 — 歧义点审计
Read and follow:
pipeline/03_ambiguity_audit.mdBefore reading this stage, also read:
guardrails/hallucination_prevention.mdGo through every implementation-relevant detail. Classify each as SPECIFIED, PARTIALLY_SPECIFIED, or UNSPECIFIED. Save the audit to .
.paper2code_work/{ARXIV_ID}/ambiguity_audit.md阅读并遵循的要求。
pipeline/03_ambiguity_audit.md阅读本阶段前,请同时阅读。
guardrails/hallucination_prevention.md梳理所有和实现相关的细节,将每个细节分类为SPECIFIED(已明确)、PARTIALLY_SPECIFIED(部分明确)、UNSPECIFIED(未明确),将审计结果保存到。
.paper2code_work/{ARXIV_ID}/ambiguity_audit.mdStage 4 — Code Generation
阶段4 — 代码生成
Read and follow:
pipeline/04_code_generation.mdBefore writing code, read:
- — to determine what's in and out of scope
guardrails/scope_enforcement.md - — if the paper is vague or inconsistent
guardrails/badly_written_papers.md - The relevant knowledge files in for the paper's domain
knowledge/ - The scaffold templates in for the expected file structure
scaffolds/
Determine the from the paper title (lowercase, underscores, no special chars).
Generate all files under in the current working directory.
paper_slug{paper_slug}/阅读并遵循的要求。
pipeline/04_code_generation.md编写代码前,请阅读:
- — 用于确定哪些内容属于实现范围,哪些不属于
guardrails/scope_enforcement.md - — 适用于论文表述模糊或前后不一致的情况
guardrails/badly_written_papers.md - 目录下对应论文领域的相关知识文件
knowledge/ - 目录下的脚手架模板,用于参考预期的文件结构
scaffolds/
根据论文标题确定(小写、下划线分隔、无特殊字符)。
将所有生成的文件存放在当前工作目录的文件夹下。
paper_slug{paper_slug}/Stage 5 — Walkthrough Notebook
阶段5 — 讲解Notebook
Read and follow:
pipeline/05_walkthrough_notebook.mdGenerate the walkthrough notebook that connects paper sections to code with runnable sanity checks. Save to .
{paper_slug}/notebooks/walkthrough.ipynb阅读并遵循的要求。
pipeline/05_walkthrough_notebook.md生成讲解用的notebook,将论文章节和代码对应起来,并包含可运行的合理性检查代码,保存到。
{paper_slug}/notebooks/walkthrough.ipynbCleanup
清理
Remove the directory after successful completion.
.paper2code_work/流程成功完成后删除目录。
.paper2code_work/Final output
最终输出
Print a summary:
✓ paper2code complete for: {paper_title}
Output directory: {paper_slug}/
Files generated: {list of files}
Unspecified choices: {count} (see REPRODUCTION_NOTES.md)
Mode: {MODE} | Framework: {FRAMEWORK}打印总结信息:
✓ paper2code 执行完成,对应论文:{paper_title}
输出目录:{paper_slug}/
生成文件:{list of files}
未明确的选择项:{count}(详见REPRODUCTION_NOTES.md)
模式:{MODE} | 框架:{FRAMEWORK}Mode-specific behavior
不同模式的专属行为
- minimal (default): Core contribution only. Training loop only if contribution involves training. No data pipeline beyond Dataset skeleton.
- full: Core contribution + full training loop + data pipeline + evaluation pipeline. More code, same citation rigor.
- educational: Same as minimal but with extra inline comments explaining ML concepts, expanded walkthrough notebook with theory sections, and a that walks through the paper section by section.
PAPER_GUIDE.md
- minimal(默认):仅实现核心贡献点。只有当贡献点涉及训练时才实现训练循环,仅提供Dataset骨架,不包含完整数据管道。
- full:核心贡献点 + 完整训练循环 + 数据管道 + 评估管道。代码量更多,但引用严谨性保持一致。
- educational:和minimal模式内容一致,但额外添加解释ML概念的行内注释,扩展讲解notebook增加理论模块,同时提供逐节讲解论文内容。
PAPER_GUIDE.md
Guardrails — always active
防护规则 — 始终生效
These apply at ALL stages. Read them if you haven't already:
- — the most important file in this skill
guardrails/hallucination_prevention.md - — what to implement and what to skip
guardrails/scope_enforcement.md - — what to do when the paper is unclear
guardrails/badly_written_papers.md
这些规则适用于所有阶段,如未读过请务必阅读:
- — 本技能最重要的文件
guardrails/hallucination_prevention.md - — 明确哪些内容需要实现,哪些需要跳过
guardrails/scope_enforcement.md - — 论文表述不清时的处理规则
guardrails/badly_written_papers.md
Knowledge base — consult as needed
知识库 — 按需查阅
Before implementing any of these components, read the corresponding knowledge file:
- Transformer layers, attention, positional encoding →
knowledge/transformer_components.md - Optimizers, LR schedules, batch size semantics →
knowledge/training_recipes.md - Cross-entropy, contrastive loss, diffusion loss, ELBO →
knowledge/loss_functions.md - Framework-specific pitfalls, notation mismatches →
knowledge/paper_to_code_mistakes.md
实现以下任意组件前,请阅读对应的知识文件:
- Transformer层、注意力机制、位置编码 →
knowledge/transformer_components.md - 优化器、学习率调度、批次大小语义 →
knowledge/training_recipes.md - 交叉熵、对比损失、扩散损失、ELBO →
knowledge/loss_functions.md - 框架专属陷阱、符号不匹配问题 →
knowledge/paper_to_code_mistakes.md