paper2code-arxiv-implementation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

paper2code — Arxiv Paper to Working Implementation

paper2code — 将Arxiv论文转换为可运行的实现

Skill by ara.so — Daily 2026 Skills collection.

paper2code is a Claude Code agent skill that converts any arxiv paper URL into a citation-anchored Python implementation. Every code decision references the exact paper section and equation it implements, and all gaps/ambiguities are explicitly flagged rather than silently filled in.

由ara.so开发的Skill — 2026年度每日Skill合集。

paper2code是一款Claude Code Agent技能，可将任意arxiv论文URL转换为带引用锚点的Python实现。每一处代码决策都对应它所实现的具体论文章节和公式，所有空白/歧义点都会被明确标记，而非悄无声息地被补全。

Install

安装

bash

npx skills add PrathamLearnsToCode/paper2code/skills/paper2code

During install you'll choose:

Agents: which coding agents get the skill (e.g., Claude Code)
Scope: Global (recommended) or project-level
Method: Symlink (recommended) or copy

Then launch your agent:

bash

claude

bash

npx skills add PrathamLearnsToCode/paper2code/skills/paper2code

安装过程中你需要选择：

Agents：哪些编码Agent可以使用该Skill（例如Claude Code）
Scope：全局（推荐）或项目级别
Method：软链接（推荐）或复制

然后启动你的Agent：

bash

claude

Core Commands

核心命令

Basic usage

基础用法

/paper2code https://arxiv.org/abs/1706.03762

/paper2code https://arxiv.org/abs/1706.03762

With framework override

覆盖默认框架

/paper2code https://arxiv.org/abs/2006.11239 --framework jax
/paper2code https://arxiv.org/abs/2006.11239 --framework pytorch   # default
/paper2code https://arxiv.org/abs/2006.11239 --framework tensorflow

/paper2code https://arxiv.org/abs/2006.11239 --framework jax
/paper2code https://arxiv.org/abs/2006.11239 --framework pytorch   # 默认值
/paper2code https://arxiv.org/abs/2006.11239 --framework tensorflow

With mode flag

携带模式参数

/paper2code 1706.03762 --mode minimal       # architecture only (default)
/paper2code 1706.03762 --mode full          # includes training loop + data pipeline
/paper2code 1706.03762 --mode educational   # extra comments + pedagogical notebook

/paper2code 1706.03762 --mode minimal       # 仅包含架构（默认）
/paper2code 1706.03762 --mode full          # 包含训练循环 + 数据管道
/paper2code 1706.03762 --mode educational   # 额外注释 + 教学用notebook

Bare arxiv ID (no URL required)

仅输入arxiv ID（无需URL）

/paper2code 1706.03762
/paper2code 2106.09685

/paper2code 1706.03762
/paper2code 2106.09685

Output Structure

输出结构

Every run produces a directory named after the paper slug:

attention_is_all_you_need/
├── README.md                  # Paper summary + quick-start
├── REPRODUCTION_NOTES.md      # Ambiguity audit, unspecified choices, known deviations
├── requirements.txt           # Pinned dependencies
├── src/
│   ├── model.py               # Architecture — every layer cited to paper section
│   ├── loss.py                # Loss functions with equation references
│   ├── data.py                # Dataset skeleton with preprocessing TODOs
│   ├── train.py               # Training loop (full/educational mode)
│   ├── evaluate.py            # Metric computation
│   └── utils.py               # Shared utilities
├── configs/
│   └── base.yaml              # All hyperparams — each cited or flagged [UNSPECIFIED]
└── notebooks/
    └── walkthrough.ipynb      # Paper section → code → shape checks

每次运行都会生成一个以论文slug命名的目录：

attention_is_all_you_need/
├── README.md                  # 论文摘要 + 快速入门
├── REPRODUCTION_NOTES.md      # 歧义审核、未明确说明的选择、已知偏差
├── requirements.txt           # 锁定版本的依赖
├── src/
│   ├── model.py               # 模型架构 — 每一层都对应论文章节引用
│   ├── loss.py                # 带公式引用的损失函数
│   ├── data.py                # 数据集骨架，包含预处理待办项
│   ├── train.py               # 训练循环（full/educational模式下存在）
│   ├── evaluate.py            # 指标计算
│   └── utils.py               # 共享工具函数
├── configs/
│   └── base.yaml              # 所有超参数 — 每个参数都有引用或标记为[UNSPECIFIED]
└── notebooks/
    └── walkthrough.ipynb      # 论文章节 → 代码 → 维度校验

Citation Anchoring Convention

引用锚定规则

The core value of paper2code is traceability. Every non-trivial decision is tagged:

Tag	Meaning
`§X.Y`	Directly specified in section X.Y
`§X.Y, Eq. N`	Implements equation N from section X.Y
`[UNSPECIFIED]`	Paper doesn't state this — choice made with alternatives listed
`[PARTIALLY_SPECIFIED]`	Paper mentions it but is ambiguous — quote included
`[ASSUMPTION]`	Reasonable inference — reasoning explained
`[FROM_OFFICIAL_CODE]`	Taken from authors' official implementation

paper2code的核心价值是可追溯性，每个非琐碎决策都会被标记：

标签	含义
`§X.Y`	直接在X.Y节中明确说明
`§X.Y, Eq. N`	实现了X.Y节中的第N个公式
`[UNSPECIFIED]`	论文未说明该部分 — 给出选择以及替代方案列表
`[PARTIALLY_SPECIFIED]`	论文提及但表述模糊 — 包含原文引用
`[ASSUMPTION]`	合理推断 — 解释推断理由
`[FROM_OFFICIAL_CODE]`	取自作者的官方实现

Example — model.py with citation anchors

示例 — 带引用锚点的model.py

python

import torch
import torch.nn as nn
import math


class MultiHeadAttention(nn.Module):
    """§3.2 — Multi-Head Attention
    
    Implements Eq. 4: MultiHead(Q, K, V) = Concat(head_1, ..., head_h) W^O
    where head_i = Attention(Q W_i^Q, K W_i^K, V W_i^V)
    """

    def __init__(self, d_model: int, num_heads: int, dropout: float = 0.1):
        super().__init__()
        # §3.2 — d_model = 512, h = 8 stated in Table 1
        assert d_model % num_heads == 0
        self.d_k = d_model // num_heads  # §3.2 — d_k = d_v = d_model / h = 64
        self.num_heads = num_heads

        self.W_q = nn.Linear(d_model, d_model)
        self.W_k = nn.Linear(d_model, d_model)
        self.W_v = nn.Linear(d_model, d_model)
        self.W_o = nn.Linear(d_model, d_model)  # §3.2, Eq. 4 — W^O projection

        # [UNSPECIFIED] Dropout rate for attention weights not stated in §3.2
        # Using 0.1 matching the model-wide dropout (§5.4, Table 3)
        self.dropout = nn.Dropout(dropout)

    def forward(self, q, k, v, mask=None):
        batch_size = q.size(0)

        # §3.2, Eq. 4 — project into h heads
        Q = self.W_q(q).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)
        K = self.W_k(k).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)
        V = self.W_v(v).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)

        # §3.2.1, Eq. 1 — Attention(Q,K,V) = softmax(QK^T / sqrt(d_k)) V
        scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.d_k)

        if mask is not None:
            # §3.2.3 — decoder masks future positions with -inf before softmax
            scores = scores.masked_fill(mask == 0, float('-inf'))

        attn_weights = torch.softmax(scores, dim=-1)
        attn_weights = self.dropout(attn_weights)

        out = torch.matmul(attn_weights, V)  # (batch, heads, seq, d_k)
        out = out.transpose(1, 2).contiguous().view(batch_size, -1, self.num_heads * self.d_k)
        return self.W_o(out)  # §3.2, Eq. 4 — W^O output projection


class TransformerBlock(nn.Module):
    """§3.1 — Encoder/Decoder layer structure"""

    def __init__(self, d_model: int, num_heads: int, d_ff: int, dropout: float = 0.1):
        super().__init__()
        self.attention = MultiHeadAttention(d_model, num_heads, dropout)

        # [ASSUMPTION] Using pre-norm based on stability; paper Figure 1 shows post-norm
        # Post-norm: x = LayerNorm(x + sublayer(x)) — §3.1
        # [PARTIALLY_SPECIFIED] "We apply layer normalization" — position ambiguous
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)

        # §3.3 — FFN(x) = max(0, xW_1 + b_1)W_2 + b_2
        self.ff = nn.Sequential(
            nn.Linear(d_model, d_ff),
            nn.ReLU(),  # §3.3 — "ReLU activation"
            nn.Dropout(dropout),
            nn.Linear(d_ff, d_model),
        )
        self.dropout = nn.Dropout(dropout)

    def forward(self, x, mask=None):
        # §3.1 — residual connection around each sub-layer
        attn_out = self.attention(self.norm1(x), self.norm1(x), self.norm1(x), mask)
        x = x + self.dropout(attn_out)
        x = x + self.dropout(self.ff(self.norm2(x)))
        return x

python

import torch
import torch.nn as nn
import math


class MultiHeadAttention(nn.Module):
    """§3.2 — 多头注意力
    
    实现公式4: MultiHead(Q, K, V) = Concat(head_1, ..., head_h) W^O
    其中 head_i = Attention(Q W_i^Q, K W_i^K, V W_i^V)
    """

    def __init__(self, d_model: int, num_heads: int, dropout: float = 0.1):
        super().__init__()
        # §3.2 — d_model = 512, h = 8 在表1中说明
        assert d_model % num_heads == 0
        self.d_k = d_model // num_heads  # §3.2 — d_k = d_v = d_model / h = 64
        self.num_heads = num_heads

        self.W_q = nn.Linear(d_model, d_model)
        self.W_k = nn.Linear(d_model, d_model)
        self.W_v = nn.Linear(d_model, d_model)
        self.W_o = nn.Linear(d_model, d_model)  # §3.2, 公式4 — W^O 投影层

        # [UNSPECIFIED] §3.2中未说明注意力权重的dropout率
        # 使用0.1匹配模型全局dropout率（§5.4, 表3）
        self.dropout = nn.Dropout(dropout)

    def forward(self, q, k, v, mask=None):
        batch_size = q.size(0)

        # §3.2, 公式4 — 投影到h个注意力头
        Q = self.W_q(q).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)
        K = self.W_k(k).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)
        V = self.W_v(v).view(batch_size, -1, self.num_heads, self.d_k).transpose(1, 2)

        # §3.2.1, 公式1 — Attention(Q,K,V) = softmax(QK^T / sqrt(d_k)) V
        scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.d_k)

        if mask is not None:
            # §3.2.3 — 解码器在softmax前用-inf掩盖未来位置
            scores = scores.masked_fill(mask == 0, float('-inf'))

        attn_weights = torch.softmax(scores, dim=-1)
        attn_weights = self.dropout(attn_weights)

        out = torch.matmul(attn_weights, V)  # (batch, heads, seq, d_k)
        out = out.transpose(1, 2).contiguous().view(batch_size, -1, self.num_heads * self.d_k)
        return self.W_o(out)  # §3.2, 公式4 — W^O 输出投影


class TransformerBlock(nn.Module):
    """§3.1 — 编码器/解码器层结构"""

    def __init__(self, d_model: int, num_heads: int, d_ff: int, dropout: float = 0.1):
        super().__init__()
        self.attention = MultiHeadAttention(d_model, num_heads, dropout)

        # [ASSUMPTION] 基于稳定性使用预归一化；论文图1显示后归一化
        # 后归一化: x = LayerNorm(x + sublayer(x)) — §3.1
        # [PARTIALLY_SPECIFIED] "我们应用层归一化" — 位置表述模糊
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)

        # §3.3 — FFN(x) = max(0, xW_1 + b_1)W_2 + b_2
        self.ff = nn.Sequential(
            nn.Linear(d_model, d_ff),
            nn.ReLU(),  # §3.3 — "ReLU激活函数"
            nn.Dropout(dropout),
            nn.Linear(d_ff, d_model),
        )
        self.dropout = nn.Dropout(dropout)

    def forward(self, x, mask=None):
        # §3.1 — 每个子层周围都有残差连接
        attn_out = self.attention(self.norm1(x), self.norm1(x), self.norm1(x), mask)
        x = x + self.dropout(attn_out)
        x = x + self.dropout(self.ff(self.norm2(x)))
        return x

Example — configs/base.yaml with citations

示例 — 带引用的configs/base.yaml

yaml

undefined

yaml

undefined

base.yaml — All hyperparameters for attention_is_all_you_need

base.yaml — attention_is_all_you_need的所有超参数

Each value is either cited from the paper or flagged [UNSPECIFIED]

每个值要么来自论文引用，要么标记为[UNSPECIFIED]

model: d_model: 512 # §3, Table 1 — "d_model = 512" num_heads: 8 # §3.2, Table 1 — "h = 8" d_ff: 2048 # §3.3, Table 1 — "d_ff = 2048" num_encoder_layers: 6 # §3, Table 1 — "N = 6" num_decoder_layers: 6 # §3, Table 1 — "N = 6" dropout: 0.1 # §5.4, Table 3 — "P_drop = 0.1" max_seq_len: 512 # [UNSPECIFIED] not stated; using 512 (common default) # Alternatives: 256, 1024

training: batch_size: 25000 # §5.1 — "each batch ~25,000 source + target tokens" optimizer: adam # §5.3 — "Adam optimizer" beta1: 0.9 # §5.3 — "β1 = 0.9" beta2: 0.98 # §5.3 — "β2 = 0.98" epsilon: 1.0e-9 # §5.3 — "ε = 10^-9" warmup_steps: 4000 # §5.3 — "warmup_steps = 4000" label_smoothing: 0.1 # §5.4 — "ε_ls = 0.1"

undefined

model: d_model: 512 # §3, 表1 — "d_model = 512" num_heads: 8 # §3.2, 表1 — "h = 8" d_ff: 2048 # §3.3, 表1 — "d_ff = 2048" num_encoder_layers: 6 # §3, 表1 — "N = 6" num_decoder_layers: 6 # §3, 表1 — "N = 6" dropout: 0.1 # §5.4, 表3 — "P_drop = 0.1" max_seq_len: 512 # [UNSPECIFIED] 未说明；使用512（通用默认值） # 替代方案: 256, 1024

training: batch_size: 25000 # §5.1 — "每个batch约包含25000个源+目标token" optimizer: adam # §5.3 — "Adam优化器" beta1: 0.9 # §5.3 — "β1 = 0.9" beta2: 0.98 # §5.3 — "β2 = 0.98" epsilon: 1.0e-9 # §5.3 — "ε = 10^-9" warmup_steps: 4000 # §5.3 — "warmup_steps = 4000" label_smoothing: 0.1 # §5.4 — "ε_ls = 0.1"

undefined

Example — REPRODUCTION_NOTES.md structure

示例 — REPRODUCTION_NOTES.md结构

markdown

undefined

markdown

undefined

Reproduction Notes — Attention Is All You Need

复现笔记 — Attention Is All You Need

Ambiguity Audit

歧义审核

SPECIFIED (high confidence)

明确说明（高置信度）

Choice	Value	Source
d_model	512	§3, Table 1
num_heads	8	§3.2, Table 1
optimizer	Adam β1=0.9, β2=0.98	§5.3

选择	取值	来源
d_model	512	§3, 表1
num_heads	8	§3.2, 表1
优化器	Adam β1=0.9, β2=0.98	§5.3

PARTIALLY_SPECIFIED (judgment call made)

部分说明（已做判断）

Choice	Our Decision	Paper Quote	Alternatives
Norm position	pre-norm	"layer norm before each sub-layer" (§3.1) conflicts with Figure 1	post-norm

选择	我们的决策	论文原文	替代方案
归一化位置	预归一化	"每个子层前应用层归一化"（§3.1）与图1冲突	后归一化

UNSPECIFIED (our defaults)

未明确说明（我们的默认值）

Choice	Our Default	Rationale	Alternatives
LayerNorm epsilon	1e-6	common default	1e-5, 1e-8
max_seq_len	512	common for WMT	256, 1024

选择	我们的默认值	理由	替代方案
LayerNorm epsilon	1e-6	通用默认值	1e-5, 1e-8
max_seq_len	512	WMT数据集通用配置	256, 1024

Known Deviations

已知偏差

data.py provides skeleton only; WMT14 preprocessing not implemented
No beam search decoding (§5 mentions beam size 4, not fully implemented)

---

data.py仅提供骨架；未实现WMT14预处理
未实现束搜索解码（§5提及束大小为4，未完全实现）

---

What paper2code Will NOT Do

paper2code不会做的事

Understanding limits prevents wasted debugging time:

Won't guarantee correctness — matches what the paper describes; if the paper is wrong, the code is wrong
Won't invent details silently — gaps are always
```
[UNSPECIFIED]
```
, never filled confidently
Won't download datasets —
```
data.py
```
gives a
```
Dataset
```
skeleton with instructions
Won't set up training infrastructure — no distributed training, no experiment tracking
Won't implement baselines — only the paper's core contribution
Won't reimplement standard components — imports them or notes the dependency

了解局限性可以避免浪费调试时间：

不保证正确性 — 仅匹配论文描述的内容；如果论文有误，代码也会有误
不会悄无声息地补充细节 — 空白项始终标记为
```
[UNSPECIFIED]
```
，绝不会主观补全
不会下载数据集 —
```
data.py
```
仅提供
```
Dataset
```
骨架和使用说明
不会搭建训练基础设施 — 不包含分布式训练、实验追踪功能
不会实现基线模型 — 仅实现论文的核心贡献
不会重新实现标准组件 — 会导入相关组件或标注依赖

Common Patterns

常见使用场景

Pattern 1 — Implement a new architecture paper

场景1 — 实现新架构论文

/paper2code https://arxiv.org/abs/2010.11929 --mode minimal

Focus:

src/model.py

will contain the full architecture. Review

REPRODUCTION_NOTES.md

to understand every ambiguous choice before running.

/paper2code https://arxiv.org/abs/2010.11929 --mode minimal

重点：

src/model.py

会包含完整架构。运行前请查看

REPRODUCTION_NOTES.md

了解所有歧义决策。

Pattern 2 — Reproduce a training method

场景2 — 复现训练方法

/paper2code https://arxiv.org/abs/2006.11239 --mode full --framework pytorch

Focus:

src/train.py

will contain the full training loop.

configs/base.yaml

will list every hyperparameter with paper citations.

/paper2code https://arxiv.org/abs/2006.11239 --mode full --framework pytorch

重点：

src/train.py

会包含完整训练循环。

configs/base.yaml

会列出所有带论文引用的超参数。

Pattern 3 — Educational deep-dive

场景3 — 教学深度拆解

/paper2code 1706.03762 --mode educational

Focus:

notebooks/walkthrough.ipynb

walks through each paper section, shows corresponding code, and runs CPU-safe shape checks.

/paper2code 1706.03762 --mode educational

重点：

notebooks/walkthrough.ipynb

会逐节拆解论文，展示对应代码，并运行CPU友好的维度校验。

Pattern 4 — Quick architecture prototype

场景4 — 快速架构原型

/paper2code 2106.09685  # ViT

Then inspect and run:

bash

cd vision_transformer/
pip install -r requirements.txt
python -c "
from src.model import VisionTransformer
import torch
model = VisionTransformer()  # toy config
x = torch.randn(2, 3, 224, 224)
print(model(x).shape)
"

/paper2code 2106.09685  # ViT

然后检查并运行：

bash

cd vision_transformer/
pip install -r requirements.txt
python -c "
from src.model import VisionTransformer
import torch
model = VisionTransformer()  # 简易配置
x = torch.randn(2, 3, 224, 224)
print(model(x).shape)
"

Troubleshooting

故障排查

Skill not triggering

Skill未触发

Confirm install completed:

npx skills list

should show

paper2code-arxiv-implementation

Use the explicit trigger:
```
/paper2code <url>
```
Try bare arxiv ID format:
```
/paper2code 1706.03762
```

确认安装完成：运行

npx skills list

应显示

paper2code-arxiv-implementation

使用显式触发命令：
```
/paper2code <url>
```
尝试纯arxiv ID格式：
```
/paper2code 1706.03762
```

Generated code has import errors

生成的代码有导入错误

Run
```
pip install -r requirements.txt
```
first
Check
```
REPRODUCTION_NOTES.md
```
for noted dependencies
Standard components (e.g., HuggingFace transformers) are imported, not reimplemented — install them separately

先运行
```
pip install -r requirements.txt
```
查看
```
REPRODUCTION_NOTES.md
```
中注明的依赖
标准组件（例如HuggingFace transformers）是直接导入，未重新实现 — 请单独安装

"Paper not found" or fetch errors

"论文未找到"或获取错误

Confirm the arxiv ID exists:
```
https://arxiv.org/abs/<ID>
```
Try the full URL instead of bare ID
Some very new papers (hours old) may not be indexed yet

确认arxiv ID存在：
```
https://arxiv.org/abs/<ID>
```
尝试使用完整URL而非纯ID
部分刚发布的论文（发布仅数小时）可能还未被索引

Silent assumptions in generated code

生成的代码存在隐式假设

This should not happen by design — if you find one, it's a bug
Check
```
REPRODUCTION_NOTES.md
```
first; the assumption may be documented there
Report via the repo issues if a gap was genuinely filled silently

设计上不应出现这种情况 — 如果你发现了，说明是Bug
请先查看
```
REPRODUCTION_NOTES.md
```
；相关假设可能已在此处记录
如果确实存在悄无声息补全的空白项，请在仓库Issues中反馈

Framework-specific issues

框架相关问题

Default framework is PyTorch — omitting
```
--framework
```
gives PyTorch output
JAX output requires
```
jax
```
,
```
flax
```
,
```
optax
```
— listed in
```
requirements.txt
```
TensorFlow output requires
```
tensorflow>=2.x
```

默认框架是PyTorch — 省略
```
--framework
```
参数会输出PyTorch代码
JAX输出需要
```
jax
```
、
```
flax
```
、
```
optax
```
— 已在
```
requirements.txt
```
中列出
TensorFlow输出需要
```
tensorflow>=2.x
```

Contributing

贡献指南

Add a worked example

添加已验证的示例

Run:

/paper2code https://arxiv.org/abs/XXXX.XXXXX

Save output to
```
skills/paper2code/worked/{paper_slug}/
```
Write
```
review.md
```
evaluating correctness, flagged ambiguities, and any mistakes
Submit PR

运行：

/paper2code https://arxiv.org/abs/XXXX.XXXXX

将输出保存到
```
skills/paper2code/worked/{paper_slug}/
```
编写
```
review.md
```
评估正确性、标记的歧义点以及所有错误
提交PR

Improve guardrails

改进防护规则

Add patterns where the skill makes silent assumptions to

guardrails/

将Skill可能出现隐式假设的场景添加到

guardrails/

目录下。

Add domain knowledge

添加领域知识

Papers in your subfield reference common components? Add a knowledge file to

knowledge/

(e.g.,

knowledge/graph_neural_networks.md

你所在子领域的论文会引用通用组件？请将知识文件添加到

knowledge/

目录下（例如

knowledge/graph_neural_networks.md

）。

paper2code-arxiv-implementation

Original

Translation

paper2code — Arxiv Paper to Working Implementation

paper2code — 将Arxiv论文转换为可运行的实现

Install

安装

Core Commands

核心命令

Basic usage

基础用法

With framework override

覆盖默认框架

With mode flag

携带模式参数

Bare arxiv ID (no URL required)

仅输入arxiv ID（无需URL）

Output Structure

输出结构

Citation Anchoring Convention

引用锚定规则

Example — model.py with citation anchors

示例 — 带引用锚点的model.py

Example — configs/base.yaml with citations

示例 — 带引用的configs/base.yaml

base.yaml — All hyperparameters for attention_is_all_you_need

base.yaml — attention_is_all_you_need的所有超参数

Each value is either cited from the paper or flagged [UNSPECIFIED]

每个值要么来自论文引用，要么标记为[UNSPECIFIED]

Example — REPRODUCTION_NOTES.md structure

示例 — REPRODUCTION_NOTES.md结构

Reproduction Notes — Attention Is All You Need

复现笔记 — Attention Is All You Need

Ambiguity Audit

歧义审核

SPECIFIED (high confidence)

明确说明（高置信度）

PARTIALLY_SPECIFIED (judgment call made)

部分说明（已做判断）

UNSPECIFIED (our defaults)

未明确说明（我们的默认值）

Known Deviations

已知偏差

What paper2code Will NOT Do

paper2code不会做的事

Common Patterns

常见使用场景

Pattern 1 — Implement a new architecture paper

场景1 — 实现新架构论文

Pattern 2 — Reproduce a training method

场景2 — 复现训练方法

Pattern 3 — Educational deep-dive

场景3 — 教学深度拆解

Pattern 4 — Quick architecture prototype

场景4 — 快速架构原型

Troubleshooting

故障排查

Skill not triggering

Skill未触发

Generated code has import errors

生成的代码有导入错误

"Paper not found" or fetch errors

"论文未找到"或获取错误

Silent assumptions in generated code

生成的代码存在隐式假设

Framework-specific issues

框架相关问题

Contributing

贡献指南

Add a worked example

添加已验证的示例

Improve guardrails

改进防护规则

Add domain knowledge

添加领域知识

Resources

相关资源