ai-paper-reproduction

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ai-paper-reproduction

ai-paper-reproduction

Use when

适用场景

  • The user wants Codex to reproduce an AI paper repository.
  • The target is a code repository with a README, scripts, configs, or documented commands.
  • The goal is a minimal trustworthy run, not unlimited experimentation.
  • The user needs standardized outputs that another human or model can audit quickly.
  • The task spans more than one stage, such as intake plus setup, or setup plus execution plus reporting.
  • 用户需要Codex复现AI论文对应的代码仓库
  • 复现目标为包含README、脚本、配置文件或有文档记录的命令的代码仓库
  • 目标是完成最小可信运行,而非无限制的实验
  • 用户需要可供他人或模型可快速审核的标准化输出
  • 任务涉及多个阶段,例如仓库接入+环境搭建,或环境搭建+执行+报告生成

Do not use when

不适用场景

  • The task is a general literature review or paper summary.
  • The task is to design a new model, benchmark suite, or training pipeline from scratch.
  • The repository is not centered on AI or does not expose a documented reproduction path.
  • The user primarily wants a deep code refactor rather than README-first reproduction.
  • The user is explicitly asking for only one narrow phase that a sub-skill already covers cleanly.
  • 任务为通用文献综述或论文总结
  • 任务为从零开始设计新模型、基准测试套件或训练流水线
  • 仓库核心并非AI相关,或未提供有文档记录的复现路径
  • 用户主要需求是深度代码重构,而非以README为优先的复现
  • 用户明确要求仅完成某单一窄范围阶段,且已有子技能可以完美覆盖

Success criteria

成功标准

  • README is treated as the primary source of reproduction intent.
  • A minimum trustworthy target is selected and justified.
  • Documented inference is preferred over evaluation, and evaluation is preferred over training.
  • Any repo edits remain conservative, explicit, and auditable.
  • repro_outputs/
    is generated with consistent structure and stable machine-readable fields.
  • Final user-facing explanation is short and follows the user's language when practical.
  • 将README作为复现意图的首要参考来源
  • 选择最小可信复现目标,并给出选择理由
  • 优先选择有文档记录的推理任务,其次是评估任务,最后是训练任务
  • 所有对仓库的修改都保持保守、明确、可审核
  • 生成结构一致、包含稳定的机器可读字段的
    repro_outputs/
    目录
  • 最终面向用户的说明简洁,且尽量使用用户使用的语言

Interaction and usability policy

交互与可用性规则

  • Keep the workflow simple enough for a new user to understand quickly.
  • Prefer short, concrete plans over exhaustive research.
  • Expose commands, assumptions, blockers, and evidence.
  • Avoid turning the skill into an opaque automation layer.
  • Preserve a low learning cost for both humans and downstream agents.
  • 保持工作流足够简单,新用户可快速理解
  • 优先选择简短具体的方案,而非穷尽式调研
  • 公开所有命令、假设、阻塞问题和相关依据
  • 避免将该技能封装为不透明的自动化层
  • 对人类和下游Agent都保持较低的学习成本

Language policy

语言规则

  • Human-readable Markdown outputs should follow the user's language when it is clear.
  • If the user's language is unclear, default to concise English.
  • Machine-readable fields, filenames, keys, and enum values stay in stable English.
  • Paths, package names, CLI commands, config keys, and code identifiers remain unchanged.
See
references/language-policy.md
.
  • 当用户使用的语言明确时,人类可读的Markdown输出使用用户的语言
  • 若用户使用的语言不明确,默认使用简洁的英文
  • 机器可读字段、文件名、键名、枚举值保持使用稳定的英文
  • 路径、包名、CLI命令、配置键和代码标识符保持不变
参考
references/language-policy.md

Reproduction policy

复现规则

Core priority order:
  1. documented inference
  2. documented evaluation
  3. documented training startup or partial verification
  4. full training only when the user explicitly asks later
Rules:
  • README-first: use repository files to clarify, not casually override, the README.
  • Aim for minimal trustworthy reproduction rather than maximum task coverage.
  • Treat smoke tests, startup verification, and early-step checks as valid training evidence when full training is not appropriate.
  • Record unresolved gaps rather than fabricating confidence.
核心优先级顺序:
  1. 有文档记录的推理
  2. 有文档记录的评估
  3. 有文档记录的训练启动或部分验证
  4. 仅当用户后续明确要求时才执行完整训练
规则:
  • 以README为优先:使用仓库文件补充说明README内容,而非随意覆盖README的说明
  • 目标是完成最小可信复现,而非覆盖最多任务
  • 当不适合执行完整训练时,冒烟测试、启动验证和早期步骤检查都可视为有效的训练相关验证证据
  • 记录未解决的差异,而非虚构可信度

Patch policy

补丁规则

  • Prefer no code changes.
  • Prefer safer adjustments first:
    • command-line arguments
    • environment variables
    • path fixes
    • dependency version fixes
    • dependency file fixes such as
      requirements.txt
      or
      environment.yml
  • Avoid changing:
    • model architecture
    • core inference semantics
    • core training logic
    • loss functions
    • experiment meaning
  • If repository files must change:
    • create a patch branch first using
      repro/YYYY-MM-DD-short-task
    • apply low-risk changes before medium-risk changes
    • avoid high-risk changes by default
    • commit only verified groups of changes
    • keep verified patch commits sparse, usually
      0-2
    • use commit messages in the form
      repro: <scope> for documented <command>
See
references/patch-policy.md
.
  • 优先不修改代码
  • 优先使用更安全的调整方式:
    • 命令行参数
    • 环境变量
    • 路径修复
    • 依赖版本修复
    • 依赖文件修复,例如
      requirements.txt
      environment.yml
  • 避免修改:
    • 模型架构
    • 核心推理语义
    • 核心训练逻辑
    • 损失函数
    • 实验含义
  • 若必须修改仓库文件:
    • 首先使用
      repro/YYYY-MM-DD-short-task
      格式创建补丁分支
    • 先应用低风险修改,再应用中风险修改
    • 默认避免高风险修改
    • 仅提交经过验证的修改组
    • 保持经过验证的补丁提交数量稀疏,通常为
      0-2
    • 提交信息使用
      repro: <scope> for documented <command>
      格式
参考
references/patch-policy.md

Workflow

工作流

  1. Read README and repo signals.
  2. Call
    repo-intake-and-plan
    to scan the repository and extract documented commands.
  3. Select the smallest trustworthy reproduction target.
  4. Call
    env-and-assets-bootstrap
    to prepare environment assumptions and asset paths.
  5. Run a conservative smoke check or documented command with
    minimal-run-and-audit
    .
  6. Use
    paper-context-resolver
    only if README and repo files leave a narrow reproduction-critical gap that blocks the current target.
  7. Write the standardized outputs.
  8. Give the user a short final note in the user's language.
  1. 读取README和仓库信号
  2. 调用
    repo-intake-and-plan
    扫描仓库并提取有文档记录的命令
  3. 选择最小的可信复现目标
  4. 调用
    env-and-assets-bootstrap
    准备环境假设和资源路径
  5. 使用
    minimal-run-and-audit
    执行保守的冒烟检查或有文档记录的命令
  6. 仅当README和仓库文件存在影响当前复现目标的窄范围关键差异时,才调用
    paper-context-resolver
  7. 生成标准化输出
  8. 使用用户的语言向用户发送简短的最终说明

Required outputs

要求输出

Always target:
text
repro_outputs/
  SUMMARY.md
  COMMANDS.md
  LOG.md
  status.json
  PATCHES.md   # only if patches were applied
Use the templates under
assets/
and the field rules in
references/output-spec.md
.
始终生成如下结构:
text
repro_outputs/
  SUMMARY.md
  COMMANDS.md
  LOG.md
  status.json
  PATCHES.md   # 仅当应用了补丁时存在
使用
assets/
目录下的模板和
references/output-spec.md
中的字段规则

Reporting policy

报告规则

  • Put the shortest high-value summary in
    SUMMARY.md
    .
  • Put copyable commands in
    COMMANDS.md
    .
  • Put process evidence, assumptions, failures, and decisions in
    LOG.md
    .
  • Put durable machine-readable state in
    status.json
    .
  • Put branch, commit, validation, and README-fidelity impact in
    PATCHES.md
    when needed.
  • Distinguish verified facts from inferred guesses.
  • 将最简短的高价值总结放在
    SUMMARY.md
  • 将可直接复制的命令放在
    COMMANDS.md
  • 将流程依据、假设、失败情况和决策记录放在
    LOG.md
  • 将持久化的机器可读状态放在
    status.json
  • 需要时将分支、提交、验证信息和README保真度影响放在
    PATCHES.md
  • 区分已验证的事实和推断的猜测

Maintainability notes

可维护性说明

  • Keep this skill narrow: README-first AI repo reproduction only.
  • Push specialized logic into sub-skills or helper scripts.
  • Prefer stable templates and simple schemas over ad hoc prose.
  • Keep machine-readable outputs backward compatible when possible.
  • Add new evidence sources only when they improve auditability without raising learning cost.
  • 保持该技能的定位窄而专:仅用于以README为优先的AI仓库复现
  • 将专用逻辑下沉到子技能或辅助脚本中
  • 优先使用稳定的模板和简单的模式,而非临时编写定制化描述
  • 尽可能保持机器可读输出的向后兼容性
  • 仅当新的证据来源可以提升可审核性且不提高学习成本时才添加