paper2code

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Paper2Code: AI Agent for Converting Research Papers into Code

Paper2Code：将研究论文转换为代码的AI Agent

Overview

概述

This Skill executes a 4+2 stage pipeline effectively systematically analyzing research papers and converting them into executable code.

Core Principle: Do not simply read the paper and generate code; generate a structured intermediate representation (YAML) first, then write the code.

本Skill执行一套4+2阶段pipeline，可高效系统地分析研究论文并将其转换为可执行代码。

核心原则：不要直接通读论文后生成代码，首先生成结构化中间表示（YAML），再编写代码。

⚠️ Critical Behavioral Control Rules (CRITICAL)

⚠️ 关键行为控制规则（重要）

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ MANDATORY BEHAVIORAL RULES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1. Implement one file at a time
2. Proceed to the next file only after completing the current file, without asking for confirmation
3. Original paper specifications always take precedence over reference code
4. Perform a Self-Check for each Phase before completion
5. Save all intermediate results as YAML files

DO:
✓ Implementing exactly what is stated in the paper
✓ Write simple and direct code
✓ Working code first, elegant code later
✓ Test each component immediately
✓ Move to the next file immediately after implementation is complete

DON'T:
✗ Do not ask "Shall I implement the next file?" between files
✗ Extensive documentation not required for core functionality
✗ Optimization not needed for reproducibility
✗ Excessive abstraction or design patterns
✗ Providing instructions without writing actual code
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ 强制行为规则
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1. 一次实现一个文件
2. 完成当前文件后直接进入下一个文件，无需询问确认
3. 原论文的规范始终优先于参考代码
4. 每个阶段完成前必须进行自检
5. 所有中间结果都保存为YAML文件

允许的行为：
✓ 严格实现论文中明确表述的内容
✓ 编写简洁直接的代码
✓ 优先保证代码可运行，再考虑代码优雅性
✓ 每个组件完成后立即测试
✓ 实现完成后立刻进入下一个文件

禁止的行为：
✗ 不要在两个文件之间询问“我可以实现下一个文件吗？”
✗ 核心功能不需要编写大量文档
✗ 复现阶段不需要做优化
✗ 不要过度抽象或使用不必要的设计模式
✗ 不要只提供说明而不编写实际代码
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Input Processing

输入处理

Supported Formats

支持的格式

arXiv URL:

https://arxiv.org/abs/xxxx.xxxxx

https://arxiv.org/pdf/xxxx.xxxxx.pdf

PDF File Path:
```
/path/to/paper.pdf
```
Converted Text/Markdown: When paper content is provided as text

arXiv URL：

https://arxiv.org/abs/xxxx.xxxxx

或者

https://arxiv.org/pdf/xxxx.xxxxx.pdf

PDF文件路径：
```
/path/to/paper.pdf
```
转换后的文本/Markdown：论文内容以文本形式提供的场景

Input Processing Method

输入处理方法

For arXiv URL:

bash

undefined

针对arXiv URL：

bash

undefined

Convert to PDF URL and download

转换为PDF URL并下载

curl -L "https://arxiv.org/pdf/xxxx.xxxxx.pdf" -o paper.pdf

Convert PDF to text (using pdftotext)

将PDF转换为文本（使用pdftotext）

pdftotext -layout paper.pdf paper.txt


**For PDF File:**
```bash
pdftotext -layout "/path/to/paper.pdf" paper.txt

pdftotext -layout paper.pdf paper.txt


**针对PDF文件：**
```bash
pdftotext -layout "/path/to/paper.pdf" paper.txt

Pipeline Overview

Pipeline总览

[User Input: Paper URL/File]
        │
        ▼
┌─────────────────────────────────────────────┐
│ Step 0: Acquire Paper Text                  │
│ - arXiv URL → Download PDF                  │
│ - PDF → Convert to Text                     │
└─────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────┐
│ Phase 0: Search Reference Code (Optional)   │
│ @[05_reference_search.md]                   │
│ Output: reference_search.yaml               │
└─────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────┐
│ Phase 1: Algorithm Extraction               │
│ @[01_algorithm_extraction.md]               │
│ Output: 01_algorithm_extraction.yaml        │
└─────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────┐
│ Phase 2: Concept Analysis                   │
│ @[02_concept_analysis.md]                   │
│ Output: 02_concept_analysis.yaml            │
└─────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────┐
│ Phase 3: Implementation Plan                │
│ @[03_code_planning.md]                      │
│ Output: 03_implementation_plan.yaml         │
└─────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────┐
│ Phase 4: Code Implementation                │
│ @[04_implementation_guide.md]               │
│ Output: Complete Project Directory          │
└─────────────────────────────────────────────┘

[用户输入：论文URL/文件]
        │
        ▼
┌─────────────────────────────────────────────┐
│ 步骤0：获取论文文本                          │
│ - arXiv URL → 下载PDF                        │
│ - PDF → 转换为文本                           │
└─────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────┐
│ 阶段0：搜索参考代码（可选）                   │
│ @[05_reference_search.md]                   │
│ 输出：reference_search.yaml                 │
└─────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────┐
│ 阶段1：算法提取                              │
│ @[01_algorithm_extraction.md]               │
│ 输出：01_algorithm_extraction.yaml          │
└─────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────┐
│ 阶段2：概念分析                              │
│ @[02_concept_analysis.md]                   │
│ 输出：02_concept_analysis.yaml              │
└─────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────┐
│ 阶段3：实现计划                              │
│ @[03_code_planning.md]                      │
│ 输出：03_implementation_plan.yaml           │
└─────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────┐
│ 阶段4：代码实现                              │
│ @[04_implementation_guide.md]               │
│ 输出：完整的项目目录                          │
└─────────────────────────────────────────────┘

Data Transfer Format Between Stages

阶段间数据传输格式

Phase 1 → Phase 2 Transfer

阶段1 → 阶段2 传输格式

yaml

phase1_to_phase2:
  algorithms_found: "[Number of found algorithms]"
  key_algorithms:
    - name: "[Algorithm Name]"
      section: "[Paper Section]"
      complexity: "[Simple/Medium/Complex]"
  hyperparameters_count: "[Number of collected hyperparameters]"
  critical_equations: "[List of critical equation numbers]"
  missing_info: "[List of missing information]"

yaml

phase1_to_phase2:
  algorithms_found: "[找到的算法数量]"
  key_algorithms:
    - name: "[算法名称]"
      section: "[对应论文章节]"
      complexity: "[简单/中等/复杂]"
  hyperparameters_count: "[收集到的超参数数量]"
  critical_equations: "[关键方程编号列表]"
  missing_info: "[缺失信息列表]"

Phase 2 → Phase 3 Transfer

阶段2 → 阶段3 传输格式

yaml

phase2_to_phase3:
  components_count: "[Number of identified components]"
  implementation_complexity: "[Low/Medium/High]"
  key_dependencies:
    - "[Component A] → [Component B]"
  experiments_to_reproduce:
    - "[Experiment Name]: [Expected Result]"
  success_criteria:
    - "[Specific Success Criteria]"

yaml

phase2_to_phase3:
  components_count: "[识别到的组件数量]"
  implementation_complexity: "[低/中/高]"
  key_dependencies:
    - "[组件A] → [组件B]"
  experiments_to_reproduce:
    - "[实验名称]: [预期结果]"
  success_criteria:
    - "[具体的成功标准]"

Phase 3 → Phase 4 Transfer

阶段3 → 阶段4 传输格式

yaml

phase3_to_phase4:
  file_order: "[List of files in implementation order]"
  current_file: "[Currently implementing file]"
  completed_files: "[List of completed files]"
  blocking_dependencies: "[Dependencies to resolve]"

yaml

phase3_to_phase4:
  file_order: "[按实现顺序排列的文件列表]"
  current_file: "[当前正在实现的文件]"
  completed_files: "[已完成的文件列表]"
  blocking_dependencies: "[需要解决的阻塞依赖]"

Detail of Each Phase

各阶段详情

Phase 0: Reference Code Search (Optional)

阶段0：参考代码搜索（可选）

Using the @05_reference_search.md prompt:

Search for and evaluate 5 similar implementations
Secure references to improve implementation quality
Output: Reference list in YAML format

使用@05_reference_search.md提示词：

搜索并评估5个相似实现
收集参考资料以提升实现质量
输出：YAML格式的参考列表

Phase 1: Algorithm Extraction

阶段1：算法提取

Using the @01_algorithm_extraction.md prompt:

Extract all algorithms, equations, and pseudocode
Collect hyperparameters and configuration values
Organize training procedures and optimization methods
Output: Complete algorithm specification in YAML format

使用@01_algorithm_extraction.md提示词：

提取所有算法、方程和伪代码
收集超参数和配置值
整理训练流程和优化方法
输出：YAML格式的完整算法规范

Phase 2: Concept Analysis

阶段2：概念分析

Using the @02_concept_analysis.md prompt:

Map paper structure and sections
Analyze system architecture
Identify component relationships and data flow
Organize experiment and validation requirements
Output: Implementation requirements specification in YAML format

使用@02_concept_analysis.md提示词：

梳理论文结构和章节
分析系统架构
识别组件关系和数据流
整理实验和验证要求
输出：YAML格式的实现需求规范

Phase 3: Establish Implementation Plan

阶段3：制定实现计划

Using the @03_code_planning.md prompt:

Integrate results from Phase 1 and 2
Generate detailed implementation plans for 5 essential sections:
1. ```
file_structure
```
  : Project file structure
2. ```
implementation_components
```
  : Implementation component details
3. ```
validation_approach
```
  : Validation and testing methods
4. ```
environment_setup
```
  : Environment and dependencies
5. ```
implementation_strategy
```
  : Step-by-step implementation strategy
Output: Complete YAML implementation plan (8000-10000 characters)

使用@03_code_planning.md提示词：

整合阶段1和阶段2的结果
为5个核心模块生成详细实现计划：
1. ```
file_structure
```
  ：项目文件结构
2. ```
implementation_components
```
  ：实现组件详情
3. ```
validation_approach
```
  ：验证和测试方法
4. ```
environment_setup
```
  ：环境和依赖
5. ```
implementation_strategy
```
  ：分步骤实现策略
输出：完整的YAML格式实现计划（8000-10000字符）

Phase 4: Code Implementation

阶段4：代码实现

Following the guide @04_implementation_guide.md:

Generate code file by file according to the plan
Implement in dependency order
Each file must be complete and executable
Output: Executable codebase

遵循@04_implementation_guide.md指南：

按照计划逐文件生成代码
按依赖顺序实现
每个文件必须完整可运行
输出：可执行的代码库

Memory Management

内存管理

Refer to the guide @06_memory_management.md:

Context management when processing long papers
Saving step-by-step outputs
Recovery protocol in case of interruption

参考指南@06_memory_management.md：

处理长论文时的上下文管理
逐步保存输出结果
中断后的恢复协议

Quality Standards

质量标准

Principles that Must Be Followed

必须遵守的原则

Completeness: Complete implementation without placeholders or TODOs
Accuracy: Accurately reflect equations and parameters specified in the paper
Executability: Code that can be executed immediately
Reproducibility: Must be able to reproduce the results of the paper

完整性：完整实现，无占位符或TODO项
准确性：准确还原论文中指定的方程和参数
可执行性：代码可直接运行
可复现性：必须能够复现论文的结果

File Implementation Order

文件实现顺序

Configuration and environment files (config, requirements.txt initialization)
Core utilities and base classes
Main algorithm/model implementation
Training and evaluation scripts
Documentation (README.md, requirements.txt finalization)

配置和环境文件（config、requirements.txt初始化）
核心工具类和基类
主算法/模型实现
训练和评估脚本
文档（README.md、requirements.txt最终确定）

✅ Final Completion Checklist (MANDATORY)

✅ 最终完成检查清单（强制）

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ BEFORE DECLARING COMPLETE - ALL MUST BE YES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

□ All algorithms in the paper implemented?       → YES / NO
□ Correct versions of environment/datasets set?  → YES / NO
□ All comparison methods referenced implemented? → YES / NO
□ Working integration to run paper experiments?  → YES / NO
□ All metrics, figures, tables reproducible?     → YES / NO
□ Basic docs explaining how to reproduce?        → YES / NO
□ Code runs without errors?                      → YES / NO

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ If even one is NO, it is NOT complete!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ 声明完成前必须全部满足
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

□ 论文中所有算法都已实现？       → 是 / 否
□ 环境/数据集版本设置正确？       → 是 / 否
□ 所有引用的对比方法都已实现？     → 是 / 否
□ 可运行论文实验的集成代码可用？   → 是 / 否
□ 所有指标、图、表都可复现？       → 是 / 否
□ 包含说明复现方法的基础文档？     → 是 / 否
□ 代码运行无错误？                → 是 / 否

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ 只要有一项为否，即未完成！
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Usage Examples

使用示例

Example 1: arXiv Paper

示例1：arXiv论文

User: Implement this paper https://arxiv.org/abs/2301.12345

Claude: I will analyze the paper and convert it to code.

[Phase 0: Reference Code Search (Optional)...]
[Phase 1: Algorithm Extraction...]
[Phase 2: Concept Analysis...]
[Phase 3: Establish Implementation Plan...]
[Phase 4: Code Generation...]

用户：Implement this paper https://arxiv.org/abs/2301.12345

Claude：我将分析这篇论文并将其转换为代码。

[阶段0：参考代码搜索（可选）...]
[阶段1：算法提取...]
[阶段2：概念分析...]
[阶段3：制定实现计划...]
[阶段4：代码生成...]

Example 2: PDF File

示例2：PDF文件

User: Implement the algorithms from this paper /home/user/papers/attention.pdf

用户：Implement the algorithms from this paper /home/user/papers/attention.pdf

Example 3: Specific Request

示例3：特定需求

User: Implement only the algorithm in Section 3 of this paper

用户：Implement only the algorithm in Section 3 of this paper

Related Files

Precautions

注意事项

⚠️ REMEMBER:

1. Read the paper thoroughly: Start implementation after understanding the entire content
2. Save detailed results: Save YAML output of each Phase as a file
3. Incremental implementation: Do not generate all code at once, proceed file by file
4. Include verification: Include simple test code if possible
5. Reference is inspiration: Reference code is for understanding and application, not copying

⚠️ 请注意：

1. 通读论文：理解全部内容后再开始实现
2. 保存详细结果：将每个阶段的YAML输出保存为文件
3. 增量实现：不要一次性生成所有代码，逐文件推进
4. 包含验证：尽可能添加简单的测试代码
5. 参考是启发：参考代码仅用于理解和应用，不要直接复制

paper2code

Original

Translation

Paper2Code: AI Agent for Converting Research Papers into Code

Paper2Code：将研究论文转换为代码的AI Agent

Overview

概述

⚠️ Critical Behavioral Control Rules (CRITICAL)

⚠️ 关键行为控制规则（重要）

Input Processing

输入处理

Supported Formats

支持的格式

Input Processing Method

输入处理方法

Convert to PDF URL and download

转换为PDF URL并下载

Convert PDF to text (using pdftotext)

将PDF转换为文本（使用pdftotext）

Pipeline Overview

Pipeline总览

Data Transfer Format Between Stages

阶段间数据传输格式

Phase 1 → Phase 2 Transfer

阶段1 → 阶段2 传输格式

Phase 2 → Phase 3 Transfer

阶段2 → 阶段3 传输格式

Phase 3 → Phase 4 Transfer

阶段3 → 阶段4 传输格式

Detail of Each Phase

各阶段详情

Phase 0: Reference Code Search (Optional)

阶段0：参考代码搜索（可选）

Phase 1: Algorithm Extraction

阶段1：算法提取

Phase 2: Concept Analysis

阶段2：概念分析

Phase 3: Establish Implementation Plan

阶段3：制定实现计划

Phase 4: Code Implementation

阶段4：代码实现

Memory Management

内存管理

Quality Standards

质量标准

Principles that Must Be Followed

必须遵守的原则

File Implementation Order

文件实现顺序

✅ Final Completion Checklist (MANDATORY)

✅ 最终完成检查清单（强制）

Usage Examples

使用示例

Example 1: arXiv Paper

示例1：arXiv论文

Example 2: PDF File

示例2：PDF文件

Example 3: Specific Request

示例3：特定需求

Related Files

相关文件

Precautions

注意事项