paper2code

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Paper2Code: AI Agent for Converting Research Papers into Code

Paper2Code:将研究论文转换为代码的AI Agent

Overview

概述

This Skill executes a 4+2 stage pipeline effectively systematically analyzing research papers and converting them into executable code.
Core Principle: Do not simply read the paper and generate code; generate a structured intermediate representation (YAML) first, then write the code.

本Skill执行一套4+2阶段pipeline,可高效系统地分析研究论文并将其转换为可执行代码。
核心原则:不要直接通读论文后生成代码,首先生成结构化中间表示(YAML),再编写代码。

⚠️ Critical Behavioral Control Rules (CRITICAL)

⚠️ 关键行为控制规则(重要)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ MANDATORY BEHAVIORAL RULES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1. Implement one file at a time
2. Proceed to the next file only after completing the current file, without asking for confirmation
3. Original paper specifications always take precedence over reference code
4. Perform a Self-Check for each Phase before completion
5. Save all intermediate results as YAML files

DO:
✓ Implementing exactly what is stated in the paper
✓ Write simple and direct code
✓ Working code first, elegant code later
✓ Test each component immediately
✓ Move to the next file immediately after implementation is complete

DON'T:
✗ Do not ask "Shall I implement the next file?" between files
✗ Extensive documentation not required for core functionality
✗ Optimization not needed for reproducibility
✗ Excessive abstraction or design patterns
✗ Providing instructions without writing actual code
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ 强制行为规则
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1. 一次实现一个文件
2. 完成当前文件后直接进入下一个文件,无需询问确认
3. 原论文的规范始终优先于参考代码
4. 每个阶段完成前必须进行自检
5. 所有中间结果都保存为YAML文件

允许的行为:
✓ 严格实现论文中明确表述的内容
✓ 编写简洁直接的代码
✓ 优先保证代码可运行,再考虑代码优雅性
✓ 每个组件完成后立即测试
✓ 实现完成后立刻进入下一个文件

禁止的行为:
✗ 不要在两个文件之间询问“我可以实现下一个文件吗?”
✗ 核心功能不需要编写大量文档
✗ 复现阶段不需要做优化
✗ 不要过度抽象或使用不必要的设计模式
✗ 不要只提供说明而不编写实际代码
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Input Processing

输入处理

Supported Formats

支持的格式

  1. arXiv URL:
    https://arxiv.org/abs/xxxx.xxxxx
    or
    https://arxiv.org/pdf/xxxx.xxxxx.pdf
  2. PDF File Path:
    /path/to/paper.pdf
  3. Converted Text/Markdown: When paper content is provided as text
  1. arXiv URL
    https://arxiv.org/abs/xxxx.xxxxx
    或者
    https://arxiv.org/pdf/xxxx.xxxxx.pdf
  2. PDF文件路径
    /path/to/paper.pdf
  3. 转换后的文本/Markdown:论文内容以文本形式提供的场景

Input Processing Method

输入处理方法

For arXiv URL:
bash
undefined
针对arXiv URL:
bash
undefined

Convert to PDF URL and download

转换为PDF URL并下载

Convert PDF to text (using pdftotext)

将PDF转换为文本(使用pdftotext)

pdftotext -layout paper.pdf paper.txt

**For PDF File:**
```bash
pdftotext -layout "/path/to/paper.pdf" paper.txt

pdftotext -layout paper.pdf paper.txt

**针对PDF文件:**
```bash
pdftotext -layout "/path/to/paper.pdf" paper.txt

Pipeline Overview

Pipeline总览

[User Input: Paper URL/File]
┌─────────────────────────────────────────────┐
│ Step 0: Acquire Paper Text                  │
│ - arXiv URL → Download PDF                  │
│ - PDF → Convert to Text                     │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ Phase 0: Search Reference Code (Optional)   │
│ @[05_reference_search.md]                   │
│ Output: reference_search.yaml               │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ Phase 1: Algorithm Extraction               │
│ @[01_algorithm_extraction.md]               │
│ Output: 01_algorithm_extraction.yaml        │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ Phase 2: Concept Analysis                   │
│ @[02_concept_analysis.md]                   │
│ Output: 02_concept_analysis.yaml            │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ Phase 3: Implementation Plan                │
│ @[03_code_planning.md]                      │
│ Output: 03_implementation_plan.yaml         │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ Phase 4: Code Implementation                │
│ @[04_implementation_guide.md]               │
│ Output: Complete Project Directory          │
└─────────────────────────────────────────────┘

[用户输入:论文URL/文件]
┌─────────────────────────────────────────────┐
│ 步骤0:获取论文文本                          │
│ - arXiv URL → 下载PDF                        │
│ - PDF → 转换为文本                           │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 阶段0:搜索参考代码(可选)                   │
│ @[05_reference_search.md]                   │
│ 输出:reference_search.yaml                 │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 阶段1:算法提取                              │
│ @[01_algorithm_extraction.md]               │
│ 输出:01_algorithm_extraction.yaml          │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 阶段2:概念分析                              │
│ @[02_concept_analysis.md]                   │
│ 输出:02_concept_analysis.yaml              │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 阶段3:实现计划                              │
│ @[03_code_planning.md]                      │
│ 输出:03_implementation_plan.yaml           │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ 阶段4:代码实现                              │
│ @[04_implementation_guide.md]               │
│ 输出:完整的项目目录                          │
└─────────────────────────────────────────────┘

Data Transfer Format Between Stages

阶段间数据传输格式

Phase 1 → Phase 2 Transfer

阶段1 → 阶段2 传输格式

yaml
phase1_to_phase2:
  algorithms_found: "[Number of found algorithms]"
  key_algorithms:
    - name: "[Algorithm Name]"
      section: "[Paper Section]"
      complexity: "[Simple/Medium/Complex]"
  hyperparameters_count: "[Number of collected hyperparameters]"
  critical_equations: "[List of critical equation numbers]"
  missing_info: "[List of missing information]"
yaml
phase1_to_phase2:
  algorithms_found: "[找到的算法数量]"
  key_algorithms:
    - name: "[算法名称]"
      section: "[对应论文章节]"
      complexity: "[简单/中等/复杂]"
  hyperparameters_count: "[收集到的超参数数量]"
  critical_equations: "[关键方程编号列表]"
  missing_info: "[缺失信息列表]"

Phase 2 → Phase 3 Transfer

阶段2 → 阶段3 传输格式

yaml
phase2_to_phase3:
  components_count: "[Number of identified components]"
  implementation_complexity: "[Low/Medium/High]"
  key_dependencies:
    - "[Component A] → [Component B]"
  experiments_to_reproduce:
    - "[Experiment Name]: [Expected Result]"
  success_criteria:
    - "[Specific Success Criteria]"
yaml
phase2_to_phase3:
  components_count: "[识别到的组件数量]"
  implementation_complexity: "[低/中/高]"
  key_dependencies:
    - "[组件A] → [组件B]"
  experiments_to_reproduce:
    - "[实验名称]: [预期结果]"
  success_criteria:
    - "[具体的成功标准]"

Phase 3 → Phase 4 Transfer

阶段3 → 阶段4 传输格式

yaml
phase3_to_phase4:
  file_order: "[List of files in implementation order]"
  current_file: "[Currently implementing file]"
  completed_files: "[List of completed files]"
  blocking_dependencies: "[Dependencies to resolve]"

yaml
phase3_to_phase4:
  file_order: "[按实现顺序排列的文件列表]"
  current_file: "[当前正在实现的文件]"
  completed_files: "[已完成的文件列表]"
  blocking_dependencies: "[需要解决的阻塞依赖]"

Detail of Each Phase

各阶段详情

Phase 0: Reference Code Search (Optional)

阶段0:参考代码搜索(可选)

Using the @05_reference_search.md prompt:
  • Search for and evaluate 5 similar implementations
  • Secure references to improve implementation quality
  • Output: Reference list in YAML format
使用@05_reference_search.md提示词:
  • 搜索并评估5个相似实现
  • 收集参考资料以提升实现质量
  • 输出:YAML格式的参考列表

Phase 1: Algorithm Extraction

阶段1:算法提取

Using the @01_algorithm_extraction.md prompt:
  • Extract all algorithms, equations, and pseudocode
  • Collect hyperparameters and configuration values
  • Organize training procedures and optimization methods
  • Output: Complete algorithm specification in YAML format
使用@01_algorithm_extraction.md提示词:
  • 提取所有算法、方程和伪代码
  • 收集超参数和配置值
  • 整理训练流程和优化方法
  • 输出:YAML格式的完整算法规范

Phase 2: Concept Analysis

阶段2:概念分析

Using the @02_concept_analysis.md prompt:
  • Map paper structure and sections
  • Analyze system architecture
  • Identify component relationships and data flow
  • Organize experiment and validation requirements
  • Output: Implementation requirements specification in YAML format
使用@02_concept_analysis.md提示词:
  • 梳理论文结构和章节
  • 分析系统架构
  • 识别组件关系和数据流
  • 整理实验和验证要求
  • 输出:YAML格式的实现需求规范

Phase 3: Establish Implementation Plan

阶段3:制定实现计划

Using the @03_code_planning.md prompt:
  • Integrate results from Phase 1 and 2
  • Generate detailed implementation plans for 5 essential sections:
    1. file_structure
      : Project file structure
    2. implementation_components
      : Implementation component details
    3. validation_approach
      : Validation and testing methods
    4. environment_setup
      : Environment and dependencies
    5. implementation_strategy
      : Step-by-step implementation strategy
  • Output: Complete YAML implementation plan (8000-10000 characters)
使用@03_code_planning.md提示词:
  • 整合阶段1和阶段2的结果
  • 为5个核心模块生成详细实现计划:
    1. file_structure
      :项目文件结构
    2. implementation_components
      :实现组件详情
    3. validation_approach
      :验证和测试方法
    4. environment_setup
      :环境和依赖
    5. implementation_strategy
      :分步骤实现策略
  • 输出:完整的YAML格式实现计划(8000-10000字符)

Phase 4: Code Implementation

阶段4:代码实现

Following the guide @04_implementation_guide.md:
  • Generate code file by file according to the plan
  • Implement in dependency order
  • Each file must be complete and executable
  • Output: Executable codebase

遵循@04_implementation_guide.md指南:
  • 按照计划逐文件生成代码
  • 按依赖顺序实现
  • 每个文件必须完整可运行
  • 输出:可执行的代码库

Memory Management

内存管理

Refer to the guide @06_memory_management.md:
  • Context management when processing long papers
  • Saving step-by-step outputs
  • Recovery protocol in case of interruption

参考指南@06_memory_management.md
  • 处理长论文时的上下文管理
  • 逐步保存输出结果
  • 中断后的恢复协议

Quality Standards

质量标准

Principles that Must Be Followed

必须遵守的原则

  • Completeness: Complete implementation without placeholders or TODOs
  • Accuracy: Accurately reflect equations and parameters specified in the paper
  • Executability: Code that can be executed immediately
  • Reproducibility: Must be able to reproduce the results of the paper
  • 完整性:完整实现,无占位符或TODO项
  • 准确性:准确还原论文中指定的方程和参数
  • 可执行性:代码可直接运行
  • 可复现性:必须能够复现论文的结果

File Implementation Order

文件实现顺序

  1. Configuration and environment files (config, requirements.txt initialization)
  2. Core utilities and base classes
  3. Main algorithm/model implementation
  4. Training and evaluation scripts
  5. Documentation (README.md, requirements.txt finalization)

  1. 配置和环境文件(config、requirements.txt初始化)
  2. 核心工具类和基类
  3. 主算法/模型实现
  4. 训练和评估脚本
  5. 文档(README.md、requirements.txt最终确定)

✅ Final Completion Checklist (MANDATORY)

✅ 最终完成检查清单(强制)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ BEFORE DECLARING COMPLETE - ALL MUST BE YES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

□ All algorithms in the paper implemented?       → YES / NO
□ Correct versions of environment/datasets set?  → YES / NO
□ All comparison methods referenced implemented? → YES / NO
□ Working integration to run paper experiments?  → YES / NO
□ All metrics, figures, tables reproducible?     → YES / NO
□ Basic docs explaining how to reproduce?        → YES / NO
□ Code runs without errors?                      → YES / NO

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ If even one is NO, it is NOT complete!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ 声明完成前必须全部满足
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

□ 论文中所有算法都已实现?       → 是 / 否
□ 环境/数据集版本设置正确?       → 是 / 否
□ 所有引用的对比方法都已实现?     → 是 / 否
□ 可运行论文实验的集成代码可用?   → 是 / 否
□ 所有指标、图、表都可复现?       → 是 / 否
□ 包含说明复现方法的基础文档?     → 是 / 否
□ 代码运行无错误?                → 是 / 否

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ 只要有一项为否,即未完成!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Usage Examples

使用示例

Example 1: arXiv Paper

示例1:arXiv论文

User: Implement this paper https://arxiv.org/abs/2301.12345

Claude: I will analyze the paper and convert it to code.

[Phase 0: Reference Code Search (Optional)...]
[Phase 1: Algorithm Extraction...]
[Phase 2: Concept Analysis...]
[Phase 3: Establish Implementation Plan...]
[Phase 4: Code Generation...]
用户:Implement this paper https://arxiv.org/abs/2301.12345

Claude:我将分析这篇论文并将其转换为代码。

[阶段0:参考代码搜索(可选)...]
[阶段1:算法提取...]
[阶段2:概念分析...]
[阶段3:制定实现计划...]
[阶段4:代码生成...]

Example 2: PDF File

示例2:PDF文件

User: Implement the algorithms from this paper /home/user/papers/attention.pdf
用户:Implement the algorithms from this paper /home/user/papers/attention.pdf

Example 3: Specific Request

示例3:特定需求

User: Implement only the algorithm in Section 3 of this paper

用户:Implement only the algorithm in Section 3 of this paper

Related Files

相关文件

  • 01_algorithm_extraction.md - Phase 1: Algorithm Extraction
  • 02_concept_analysis.md - Phase 2: Concept Analysis
  • 03_code_planning.md - Phase 3: Implementation Plan
  • 04_implementation_guide.md - Phase 4: Implementation Guide
  • 05_reference_search.md - Phase 0: Reference Search (Optional)
  • 06_memory_management.md - Memory Management Guide

  • 01_algorithm_extraction.md - 阶段1:算法提取
  • 02_concept_analysis.md - 阶段2:概念分析
  • 03_code_planning.md - 阶段3:实现计划
  • 04_implementation_guide.md - 阶段4:实现指南
  • 05_reference_search.md - 阶段0:参考搜索(可选)
  • 06_memory_management.md - 内存管理指南

Precautions

注意事项

⚠️ REMEMBER:

1. Read the paper thoroughly: Start implementation after understanding the entire content
2. Save detailed results: Save YAML output of each Phase as a file
3. Incremental implementation: Do not generate all code at once, proceed file by file
4. Include verification: Include simple test code if possible
5. Reference is inspiration: Reference code is for understanding and application, not copying
⚠️ 请注意:

1. 通读论文:理解全部内容后再开始实现
2. 保存详细结果:将每个阶段的YAML输出保存为文件
3. 增量实现:不要一次性生成所有代码,逐文件推进
4. 包含验证:尽可能添加简单的测试代码
5. 参考是启发:参考代码仅用于理解和应用,不要直接复制