paper2code

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

paper2code — Orchestration

paper2code — 编排

You are executing the paper2code skill. This file governs the high-level flow. Each stage dispatches to a detailed reasoning protocol in
pipeline/
. Do NOT skip stages. Do NOT combine stages. Execute them in order.
你正在执行paper2code技能。本文件管控高层流程。每个阶段都会调度
pipeline/
目录下的详细推理协议。请勿跳过阶段,请勿合并阶段,请按顺序执行。

Parse arguments

参数解析

Extract from the user's input:
  • ARXIV_ID
    : the arxiv paper ID (e.g.,
    2106.09685
    ). Strip any URL prefix.
  • MODE
    : one of
    minimal
    (default),
    full
    ,
    educational
    .
  • FRAMEWORK
    : one of
    pytorch
    (default),
    jax
    ,
    numpy
    .
If the user provided a full URL like
https://arxiv.org/abs/2106.09685
, extract the ID
2106.09685
. If the user provided a versioned ID like
2106.09685v2
, keep the version.
从用户输入中提取以下内容:
  • ARXIV_ID
    :arxiv论文ID(例如
    2106.09685
    ),去除所有URL前缀。
  • MODE
    :可选值为
    minimal
    (默认)、
    full
    educational
    中的一个。
  • FRAMEWORK
    :可选值为
    pytorch
    (默认)、
    jax
    numpy
    中的一个。
如果用户提供了类似
https://arxiv.org/abs/2106.09685
的完整URL,请提取ID
2106.09685
。 如果用户提供了带版本的ID,例如
2106.09685v2
,请保留版本号。

Set up working directory

设置工作目录

Create a temporary working directory:
.paper2code_work/{ARXIV_ID}/
This is where intermediate artifacts go. The final output goes in the current directory under
{paper_slug}/
.
创建临时工作目录:
.paper2code_work/{ARXIV_ID}/
中间产物将存储在此处,最终输出将存放在当前目录的
{paper_slug}/
文件夹下。

Install dependencies

安装依赖

Run via Bash:
bash
pip install pymupdf4llm pdfplumber requests pyyaml
通过Bash运行:
bash
pip install pymupdf4llm pdfplumber requests pyyaml

Execute pipeline

执行流程

Stage 1 — Paper Acquisition and Parsing

阶段1 — 论文获取与解析

Read and follow:
pipeline/01_paper_acquisition.md
Run the helper script to fetch and parse the paper:
bash
python skills/paper2code/scripts/fetch_paper.py {ARXIV_ID} .paper2code_work/{ARXIV_ID}/
Then run structure extraction:
bash
python skills/paper2code/scripts/extract_structure.py .paper2code_work/{ARXIV_ID}/paper_text.md .paper2code_work/{ARXIV_ID}/
Verify the outputs exist before proceeding. If extraction failed, follow the fallback protocol in
pipeline/01_paper_acquisition.md
.
The script also searches for official code repositories (in the paper text and on the arxiv page) and saves any found links to
paper_metadata.json
under the
official_code
key. Verify these links before relying on them — see Step 8 in
pipeline/01_paper_acquisition.md
.
阅读并遵循
pipeline/01_paper_acquisition.md
的要求。
运行辅助脚本获取并解析论文:
bash
python skills/paper2code/scripts/fetch_paper.py {ARXIV_ID} .paper2code_work/{ARXIV_ID}/
随后运行结构提取脚本:
bash
python skills/paper2code/scripts/extract_structure.py .paper2code_work/{ARXIV_ID}/paper_text.md .paper2code_work/{ARXIV_ID}/
继续执行前请验证输出文件已存在。如果提取失败,请遵循
pipeline/01_paper_acquisition.md
中的降级协议处理。
该脚本还会搜索官方代码仓库(从论文文本和arxiv页面中查找),并将找到的链接保存到
paper_metadata.json
official_code
键下。使用这些链接前请先验证有效性——参考
pipeline/01_paper_acquisition.md
中的第8步。

Stage 2 — Contribution Identification

阶段2 — 贡献点识别

Read and follow:
pipeline/02_contribution_identification.md
Read the parsed paper sections. Identify the single core contribution. Classify the paper type. Write the contribution statement. Save it to
.paper2code_work/{ARXIV_ID}/contribution.md
.
阅读并遵循
pipeline/02_contribution_identification.md
的要求。
阅读解析后的论文章节,识别唯一的核心贡献点,分类论文类型,撰写贡献说明,保存到
.paper2code_work/{ARXIV_ID}/contribution.md

Stage 3 — Ambiguity Audit

阶段3 — 歧义点审计

Read and follow:
pipeline/03_ambiguity_audit.md
Before reading this stage, also read:
guardrails/hallucination_prevention.md
Go through every implementation-relevant detail. Classify each as SPECIFIED, PARTIALLY_SPECIFIED, or UNSPECIFIED. Save the audit to
.paper2code_work/{ARXIV_ID}/ambiguity_audit.md
.
阅读并遵循
pipeline/03_ambiguity_audit.md
的要求。
阅读本阶段前,请同时阅读
guardrails/hallucination_prevention.md
梳理所有和实现相关的细节,将每个细节分类为SPECIFIED(已明确)、PARTIALLY_SPECIFIED(部分明确)、UNSPECIFIED(未明确),将审计结果保存到
.paper2code_work/{ARXIV_ID}/ambiguity_audit.md

Stage 4 — Code Generation

阶段4 — 代码生成

Read and follow:
pipeline/04_code_generation.md
Before writing code, read:
  • guardrails/scope_enforcement.md
    — to determine what's in and out of scope
  • guardrails/badly_written_papers.md
    — if the paper is vague or inconsistent
  • The relevant knowledge files in
    knowledge/
    for the paper's domain
  • The scaffold templates in
    scaffolds/
    for the expected file structure
Determine the
paper_slug
from the paper title (lowercase, underscores, no special chars). Generate all files under
{paper_slug}/
in the current working directory.
阅读并遵循
pipeline/04_code_generation.md
的要求。
编写代码前,请阅读:
  • guardrails/scope_enforcement.md
    — 用于确定哪些内容属于实现范围,哪些不属于
  • guardrails/badly_written_papers.md
    — 适用于论文表述模糊或前后不一致的情况
  • knowledge/
    目录下对应论文领域的相关知识文件
  • scaffolds/
    目录下的脚手架模板,用于参考预期的文件结构
根据论文标题确定
paper_slug
(小写、下划线分隔、无特殊字符)。 将所有生成的文件存放在当前工作目录的
{paper_slug}/
文件夹下。

Stage 5 — Walkthrough Notebook

阶段5 — 讲解Notebook

Read and follow:
pipeline/05_walkthrough_notebook.md
Generate the walkthrough notebook that connects paper sections to code with runnable sanity checks. Save to
{paper_slug}/notebooks/walkthrough.ipynb
.
阅读并遵循
pipeline/05_walkthrough_notebook.md
的要求。
生成讲解用的notebook,将论文章节和代码对应起来,并包含可运行的合理性检查代码,保存到
{paper_slug}/notebooks/walkthrough.ipynb

Cleanup

清理

Remove the
.paper2code_work/
directory after successful completion.
流程成功完成后删除
.paper2code_work/
目录。

Final output

最终输出

Print a summary:
✓ paper2code complete for: {paper_title}
  Output directory: {paper_slug}/
  Files generated: {list of files}
  Unspecified choices: {count} (see REPRODUCTION_NOTES.md)
  Mode: {MODE} | Framework: {FRAMEWORK}
打印总结信息:
✓ paper2code 执行完成,对应论文:{paper_title}
  输出目录:{paper_slug}/
  生成文件:{list of files}
  未明确的选择项:{count}(详见REPRODUCTION_NOTES.md)
  模式:{MODE} | 框架:{FRAMEWORK}

Mode-specific behavior

不同模式的专属行为

  • minimal (default): Core contribution only. Training loop only if contribution involves training. No data pipeline beyond Dataset skeleton.
  • full: Core contribution + full training loop + data pipeline + evaluation pipeline. More code, same citation rigor.
  • educational: Same as minimal but with extra inline comments explaining ML concepts, expanded walkthrough notebook with theory sections, and a
    PAPER_GUIDE.md
    that walks through the paper section by section.
  • minimal(默认):仅实现核心贡献点。只有当贡献点涉及训练时才实现训练循环,仅提供Dataset骨架,不包含完整数据管道。
  • full:核心贡献点 + 完整训练循环 + 数据管道 + 评估管道。代码量更多,但引用严谨性保持一致。
  • educational:和minimal模式内容一致,但额外添加解释ML概念的行内注释,扩展讲解notebook增加理论模块,同时提供
    PAPER_GUIDE.md
    逐节讲解论文内容。

Guardrails — always active

防护规则 — 始终生效

These apply at ALL stages. Read them if you haven't already:
  • guardrails/hallucination_prevention.md
    — the most important file in this skill
  • guardrails/scope_enforcement.md
    — what to implement and what to skip
  • guardrails/badly_written_papers.md
    — what to do when the paper is unclear
这些规则适用于所有阶段,如未读过请务必阅读:
  • guardrails/hallucination_prevention.md
    — 本技能最重要的文件
  • guardrails/scope_enforcement.md
    — 明确哪些内容需要实现,哪些需要跳过
  • guardrails/badly_written_papers.md
    — 论文表述不清时的处理规则

Knowledge base — consult as needed

知识库 — 按需查阅

Before implementing any of these components, read the corresponding knowledge file:
  • Transformer layers, attention, positional encoding →
    knowledge/transformer_components.md
  • Optimizers, LR schedules, batch size semantics →
    knowledge/training_recipes.md
  • Cross-entropy, contrastive loss, diffusion loss, ELBO →
    knowledge/loss_functions.md
  • Framework-specific pitfalls, notation mismatches →
    knowledge/paper_to_code_mistakes.md
实现以下任意组件前,请阅读对应的知识文件:
  • Transformer层、注意力机制、位置编码 →
    knowledge/transformer_components.md
  • 优化器、学习率调度、批次大小语义 →
    knowledge/training_recipes.md
  • 交叉熵、对比损失、扩散损失、ELBO →
    knowledge/loss_functions.md
  • 框架专属陷阱、符号不匹配问题 →
    knowledge/paper_to_code_mistakes.md