paper2code

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

paper2code — Orchestration

paper2code — 编排

You are executing the paper2code skill. This file governs the high-level flow. Each stage dispatches to a detailed reasoning protocol in

pipeline/

. Do NOT skip stages. Do NOT combine stages. Execute them in order.

你正在执行paper2code技能。本文件管控高层流程。每个阶段都会调度

pipeline/

目录下的详细推理协议。请勿跳过阶段，请勿合并阶段，请按顺序执行。

Parse arguments

参数解析

Extract from the user's input:

```
ARXIV_ID
```
: the arxiv paper ID (e.g.,
```
2106.09685
```
). Strip any URL prefix.
```
MODE
```
: one of
```
minimal
```
(default),
```
full
```
,
```
educational
```
.
```
FRAMEWORK
```
: one of
```
pytorch
```
(default),
```
jax
```
,
```
numpy
```
.

If the user provided a full URL like

https://arxiv.org/abs/2106.09685

, extract the ID

2106.09685

. If the user provided a versioned ID like

2106.09685v2

, keep the version.

从用户输入中提取以下内容：

```
ARXIV_ID
```
：arxiv论文ID（例如
```
2106.09685
```
），去除所有URL前缀。
```
MODE
```
：可选值为
```
minimal
```
（默认）、
```
full
```
、
```
educational
```
中的一个。
```
FRAMEWORK
```
：可选值为
```
pytorch
```
（默认）、
```
jax
```
、
```
numpy
```
中的一个。

如果用户提供了类似

https://arxiv.org/abs/2106.09685

的完整URL，请提取ID

2106.09685

。如果用户提供了带版本的ID，例如

2106.09685v2

，请保留版本号。

Set up working directory

设置工作目录

Create a temporary working directory:

.paper2code_work/{ARXIV_ID}/

This is where intermediate artifacts go. The final output goes in the current directory under

{paper_slug}/

创建临时工作目录：

.paper2code_work/{ARXIV_ID}/

中间产物将存储在此处，最终输出将存放在当前目录的

{paper_slug}/

文件夹下。

Install dependencies

安装依赖

Run via Bash:

bash

pip install pymupdf4llm pdfplumber requests pyyaml

通过Bash运行：

bash

pip install pymupdf4llm pdfplumber requests pyyaml

Execute pipeline

执行流程

Stage 1 — Paper Acquisition and Parsing

阶段1 — 论文获取与解析

Read and follow:

pipeline/01_paper_acquisition.md

Run the helper script to fetch and parse the paper:

bash

python skills/paper2code/scripts/fetch_paper.py {ARXIV_ID} .paper2code_work/{ARXIV_ID}/

Then run structure extraction:

bash

python skills/paper2code/scripts/extract_structure.py .paper2code_work/{ARXIV_ID}/paper_text.md .paper2code_work/{ARXIV_ID}/

Verify the outputs exist before proceeding. If extraction failed, follow the fallback protocol in

pipeline/01_paper_acquisition.md

The script also searches for official code repositories (in the paper text and on the arxiv page) and saves any found links to

paper_metadata.json

under the

official_code

key. Verify these links before relying on them — see Step 8 in

pipeline/01_paper_acquisition.md

阅读并遵循

pipeline/01_paper_acquisition.md

的要求。

运行辅助脚本获取并解析论文：

bash

python skills/paper2code/scripts/fetch_paper.py {ARXIV_ID} .paper2code_work/{ARXIV_ID}/

随后运行结构提取脚本：

bash

python skills/paper2code/scripts/extract_structure.py .paper2code_work/{ARXIV_ID}/paper_text.md .paper2code_work/{ARXIV_ID}/

继续执行前请验证输出文件已存在。如果提取失败，请遵循

pipeline/01_paper_acquisition.md

中的降级协议处理。

该脚本还会搜索官方代码仓库（从论文文本和arxiv页面中查找），并将找到的链接保存到

paper_metadata.json

的

official_code

键下。使用这些链接前请先验证有效性——参考

pipeline/01_paper_acquisition.md

中的第8步。

Stage 2 — Contribution Identification

阶段2 — 贡献点识别

Read and follow:

pipeline/02_contribution_identification.md

Read the parsed paper sections. Identify the single core contribution. Classify the paper type. Write the contribution statement. Save it to

.paper2code_work/{ARXIV_ID}/contribution.md

阅读并遵循

pipeline/02_contribution_identification.md

的要求。

阅读解析后的论文章节，识别唯一的核心贡献点，分类论文类型，撰写贡献说明，保存到

.paper2code_work/{ARXIV_ID}/contribution.md

。

Stage 3 — Ambiguity Audit

阶段3 — 歧义点审计

Read and follow:

pipeline/03_ambiguity_audit.md

Before reading this stage, also read:

guardrails/hallucination_prevention.md

Go through every implementation-relevant detail. Classify each as SPECIFIED, PARTIALLY_SPECIFIED, or UNSPECIFIED. Save the audit to

.paper2code_work/{ARXIV_ID}/ambiguity_audit.md

阅读并遵循

pipeline/03_ambiguity_audit.md

的要求。

阅读本阶段前，请同时阅读

guardrails/hallucination_prevention.md

。

梳理所有和实现相关的细节，将每个细节分类为SPECIFIED（已明确）、PARTIALLY_SPECIFIED（部分明确）、UNSPECIFIED（未明确），将审计结果保存到

.paper2code_work/{ARXIV_ID}/ambiguity_audit.md

。

Stage 4 — Code Generation

阶段4 — 代码生成

Read and follow:

pipeline/04_code_generation.md

Before writing code, read:

```
guardrails/scope_enforcement.md
```
— to determine what's in and out of scope
```
guardrails/badly_written_papers.md
```
— if the paper is vague or inconsistent
The relevant knowledge files in
```
knowledge/
```
for the paper's domain
The scaffold templates in
```
scaffolds/
```
for the expected file structure

Determine the

paper_slug

from the paper title (lowercase, underscores, no special chars). Generate all files under

{paper_slug}/

in the current working directory.

阅读并遵循

pipeline/04_code_generation.md

的要求。

编写代码前，请阅读：

```
guardrails/scope_enforcement.md
```
— 用于确定哪些内容属于实现范围，哪些不属于
```
guardrails/badly_written_papers.md
```
— 适用于论文表述模糊或前后不一致的情况
```
knowledge/
```
目录下对应论文领域的相关知识文件
```
scaffolds/
```
目录下的脚手架模板，用于参考预期的文件结构

根据论文标题确定

paper_slug

（小写、下划线分隔、无特殊字符）。将所有生成的文件存放在当前工作目录的

{paper_slug}/

文件夹下。

Stage 5 — Walkthrough Notebook

阶段5 — 讲解Notebook

Read and follow:

pipeline/05_walkthrough_notebook.md

Generate the walkthrough notebook that connects paper sections to code with runnable sanity checks. Save to

{paper_slug}/notebooks/walkthrough.ipynb

阅读并遵循

pipeline/05_walkthrough_notebook.md

的要求。

生成讲解用的notebook，将论文章节和代码对应起来，并包含可运行的合理性检查代码，保存到

{paper_slug}/notebooks/walkthrough.ipynb

。

Cleanup

清理

Remove the

.paper2code_work/

directory after successful completion.

流程成功完成后删除

.paper2code_work/

目录。

Final output

最终输出

Print a summary:

✓ paper2code complete for: {paper_title}
  Output directory: {paper_slug}/
  Files generated: {list of files}
  Unspecified choices: {count} (see REPRODUCTION_NOTES.md)
  Mode: {MODE} | Framework: {FRAMEWORK}

打印总结信息：

✓ paper2code 执行完成，对应论文：{paper_title}
  输出目录：{paper_slug}/
  生成文件：{list of files}
  未明确的选择项：{count}（详见REPRODUCTION_NOTES.md）
  模式：{MODE} | 框架：{FRAMEWORK}

Mode-specific behavior

不同模式的专属行为

minimal (default): Core contribution only. Training loop only if contribution involves training. No data pipeline beyond Dataset skeleton.
full: Core contribution + full training loop + data pipeline + evaluation pipeline. More code, same citation rigor.
educational: Same as minimal but with extra inline comments explaining ML concepts, expanded walkthrough notebook with theory sections, and a
```
PAPER_GUIDE.md
```
that walks through the paper section by section.

minimal（默认）：仅实现核心贡献点。只有当贡献点涉及训练时才实现训练循环，仅提供Dataset骨架，不包含完整数据管道。
full：核心贡献点 + 完整训练循环 + 数据管道 + 评估管道。代码量更多，但引用严谨性保持一致。
educational：和minimal模式内容一致，但额外添加解释ML概念的行内注释，扩展讲解notebook增加理论模块，同时提供
```
PAPER_GUIDE.md
```
逐节讲解论文内容。

Guardrails — always active

防护规则 — 始终生效

These apply at ALL stages. Read them if you haven't already:

```
guardrails/hallucination_prevention.md
```
— the most important file in this skill
```
guardrails/scope_enforcement.md
```
— what to implement and what to skip
```
guardrails/badly_written_papers.md
```
— what to do when the paper is unclear

这些规则适用于所有阶段，如未读过请务必阅读：

```
guardrails/hallucination_prevention.md
```
— 本技能最重要的文件
```
guardrails/scope_enforcement.md
```
— 明确哪些内容需要实现，哪些需要跳过
```
guardrails/badly_written_papers.md
```
— 论文表述不清时的处理规则

Knowledge base — consult as needed

知识库 — 按需查阅

Before implementing any of these components, read the corresponding knowledge file:

Transformer layers, attention, positional encoding →
```
knowledge/transformer_components.md
```
Optimizers, LR schedules, batch size semantics →
```
knowledge/training_recipes.md
```
Cross-entropy, contrastive loss, diffusion loss, ELBO →
```
knowledge/loss_functions.md
```
Framework-specific pitfalls, notation mismatches →
```
knowledge/paper_to_code_mistakes.md
```

实现以下任意组件前，请阅读对应的知识文件：

Transformer层、注意力机制、位置编码 →
```
knowledge/transformer_components.md
```
优化器、学习率调度、批次大小语义 →
```
knowledge/training_recipes.md
```
交叉熵、对比损失、扩散损失、ELBO →
```
knowledge/loss_functions.md
```
框架专属陷阱、符号不匹配问题 →
```
knowledge/paper_to_code_mistakes.md
```