ai-research-reproduction

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ai-research-reproduction

ai-research-reproduction

Use when

适用场景

  • The user wants the agent to reproduce an AI paper repository.
  • The target is a code repository with a README, scripts, configs, or documented commands.
  • The goal is a minimal trustworthy run, not unlimited experimentation.
  • The user needs standardized outputs that another human or model can audit quickly.
  • The task spans more than one stage, such as intake plus setup, or setup plus execution plus reporting.
  • 用户需要Agent复现AI论文对应的仓库。
  • 复现目标是包含README、脚本、配置文件或已记录命令的代码仓库。
  • 目标是完成最低可信任的运行,而非无限制的实验。
  • 用户需要标准化的输出,便于其他人员或模型快速审计。
  • 任务涉及多个阶段,例如资源导入+环境搭建,或是环境搭建+执行+报告生成。

Do not use when

不适用场景

  • The task is a general literature review or paper summary.
  • The task is to design a new model, benchmark suite, or training pipeline from scratch.
  • The repository is not centered on AI or does not expose a documented reproduction path.
  • The user primarily wants a deep code refactor rather than README-first reproduction.
  • The user is explicitly asking for only one narrow phase that a sub-skill already covers cleanly.
  • The user is explicitly authorizing exploratory branch-only experimentation instead of trusted reproduction.
  • 任务是通用文献综述或论文总结。
  • 任务是从零开始设计新模型、基准测试套件或训练流水线。
  • 目标仓库不聚焦AI领域,或是没有提供公开的可复现流程文档。
  • 用户的核心需求是深度代码重构,而非以README优先的复现。
  • 用户明确只需要某个子技能已完整覆盖的单一窄范围阶段任务。
  • 用户明确授权仅在探索分支进行实验,而非需要可信复现结果。

Success criteria

成功标准

  • README is treated as the primary source of reproduction intent.
  • A minimum trustworthy target is selected and justified.
  • Documented inference is preferred over evaluation, and evaluation is preferred over training.
  • Any repo edits remain conservative, explicit, and auditable.
  • Assumptions, protocol deviations, and human decision points are surfaced rather than hidden.
  • repro_outputs/
    is generated with consistent structure and stable machine-readable fields.
  • Final user-facing explanation is short and follows the user's language when practical.
  • README被视为复现需求的首要参考来源。
  • 选择最小范围的可信复现目标并给出选择依据。
  • 优先执行文档记录的推理流程,其次是评估流程,最后才是训练流程。
  • 所有对仓库的修改都保持保守、明确、可审计。
  • 假设、流程偏差和人工决策点需公开披露,而非隐藏。
  • 生成的
    repro_outputs/
    结构统一,包含稳定的机器可读取字段。
  • 面向用户的最终说明简洁,尽可能使用和用户输入一致的语言。

Interaction and usability policy

交互与可用性规则

  • Keep the workflow simple enough for a new user to understand quickly.
  • Prefer short, concrete plans over exhaustive research.
  • Expose commands, assumptions, blockers, and evidence.
  • Avoid turning the skill into an opaque automation layer.
  • Preserve a low learning cost for both humans and downstream agents.
  • 工作流足够简单,新用户可以快速理解。
  • 优先制定简短、具体的计划,而非无限制调研。
  • 公开所有执行命令、假设、阻塞问题和证据。
  • 避免将该技能封装为不透明的自动化层。
  • 对人类用户和下游Agent都保持低学习成本。

Language policy

语言规则

  • Human-readable Markdown outputs should follow the user's language when it is clear.
  • If the user's language is unclear, default to concise English.
  • Machine-readable fields, filenames, keys, and enum values stay in stable English.
  • Paths, package names, CLI commands, config keys, and code identifiers remain unchanged.
See
references/language-policy.md
.
  • 人类可读的Markdown输出,当用户语言明确时,和用户使用的语言保持一致。
  • 用户语言不明确时,默认使用简洁的英文。
  • 机器可读取字段、文件名、键名、枚举值保持使用稳定的英文。
  • 路径、包名、CLI命令、配置键名、代码标识符保持不变。
详见
references/language-policy.md

Reproduction policy

复现规则

Core priority order:
  1. documented inference
  2. documented evaluation
  3. documented training startup or partial verification
  4. full training only when the user explicitly asks later
Rules:
  • README-first: use repository files to clarify, not casually override, the README.
  • Aim for minimal trustworthy reproduction rather than maximum task coverage.
  • Treat smoke tests, startup verification, and early-step checks as valid training evidence when full training is not appropriate.
  • In trusted reproduction, a documented training command should first be checked through startup verification or a short monitoring window, then paused for explicit human confirmation before broader training continues.
  • In explicitly authorized explore-lane execution, the training record can continue without the trusted-lane confirmation pause, but it must stay isolated from trusted conclusions.
  • Record unresolved gaps rather than fabricating confidence.
核心优先级排序:
  1. 文档记录的推理流程
  2. 文档记录的评估流程
  3. 文档记录的训练启动或部分验证
  4. 仅当用户后续明确要求时,才执行完整训练
规则:
  • README优先:使用仓库其他文件来澄清README内容,而非随意覆盖README的说明。
  • 目标是完成最小范围的可信复现,而非覆盖最多的任务。
  • 不适合执行完整训练时,冒烟测试、启动验证、早期步骤检查都视为有效的训练相关证据。
  • 可信复现场景下,执行文档记录的训练命令前,首先要通过启动验证或短时间运行监控,之后暂停流程,获得明确的人工确认后再继续执行更长时间的训练。
  • 获得明确授权的探索模式执行场景下,训练记录可以跳过可信模式的确认暂停步骤,但相关内容必须和可信结论隔离。
  • 记录未解决的缺口,而非虚构可信度。

Patch policy

补丁规则

  • Prefer no code changes.
  • Prefer safer adjustments first:
    • command-line arguments
    • environment variables
    • path fixes
    • dependency version fixes
    • dependency file fixes such as
      requirements.txt
      or
      environment.yml
  • Avoid changing:
    • model architecture
    • core inference semantics
    • core training logic
    • loss functions
    • experiment meaning
  • If repository files must change:
    • create a patch branch first using
      repro/YYYY-MM-DD-short-task
    • apply low-risk changes before medium-risk changes
    • avoid high-risk changes by default
    • commit only verified groups of changes
    • keep verified patch commits sparse, usually
      0-2
    • use commit messages in the form
      repro: <scope> for documented <command>
See
references/patch-policy.md
.
  • 优先不修改代码。
  • 优先采用更安全的调整方式,优先级从高到低为:
    • 命令行参数调整
    • 环境变量调整
    • 路径修复
    • 依赖版本修复
    • 依赖文件修复,例如
      requirements.txt
      environment.yml
  • 避免修改:
    • 模型架构
    • 核心推理语义
    • 核心训练逻辑
    • 损失函数
    • 实验含义
  • 必须修改仓库文件时:
    • 首先创建补丁分支,命名格式为
      repro/YYYY-MM-DD-short-task
    • 先应用低风险修改,再应用中风险修改
    • 默认避免高风险修改
    • 仅提交经过验证的修改组
    • 已验证的补丁提交尽量少,通常为
      0-2
    • 提交信息使用
      repro: <scope> for documented <command>
      格式
详见
references/patch-policy.md

Research safety boundary

研究安全边界

  • Preserve experiment meaning over convenience.
  • Do not silently change dataset, split, checkpoint, preprocessing, metric, loss, or model semantics.
  • Distinguish direct evidence from inference and from user-approved decisions.
  • Prefer a recorded blocker over an unrecorded workaround.
  • Escalate for explicit human review before any change that could alter scientific meaning or reported conclusions.
See
references/research-safety-principles.md
.
  • 优先保障实验含义准确,而非追求便捷。
  • 不得隐式修改数据集、数据集划分、checkpoint、预处理逻辑、指标、损失函数或模型语义。
  • 明确区分直接证据、推理结论和用户批准的决策。
  • 优先记录阻塞问题,而非采用无记录的临时解决方案。
  • 任何可能改变科学含义或已公开结论的修改,都需要先升级获得明确的人工审核。
详见
references/research-safety-principles.md

Workflow

工作流

  1. Read README and repo signals.
  2. Call
    repo-intake-and-plan
    to scan the repository and extract documented commands.
  3. Select the smallest trustworthy reproduction target.
  4. Call
    env-and-assets-bootstrap
    to prepare environment assumptions and asset paths.
  5. Call
    analyze-project
    only when repo structure, insertion points, or suspicious implementation patterns need a read-only pass before continuing.
  6. Run a conservative smoke check or documented inference or evaluation command with
    minimal-run-and-audit
    .
  7. If the selected trustworthy target is documented training startup, short-run verification, or resume, hand execution to
    run-train
    instead of
    minimal-run-and-audit
    .
  8. When training is selected inside trusted reproduction, let
    run-train
    capture the startup evidence first, then surface a human review checkpoint before any fuller training claim.
  9. Stop for human review if protocol meaning, model semantics, or result interpretation would otherwise be changed implicitly.
  10. Use
    paper-context-resolver
    only if README and repo files leave a narrow reproduction-critical gap that blocks the current target.
  11. Never auto-route into
    explore-code
    or
    explore-run
    ; exploration requires explicit user authorization.
  12. Write the standardized outputs with evidence, assumptions, deviations, and next safe action.
  13. Give the user a short final note in the user's language.
  1. 读取README和仓库相关信息。
  2. 调用
    repo-intake-and-plan
    扫描仓库,提取文档记录的命令。
  3. 选择最小范围的可信复现目标。
  4. 调用
    env-and-assets-bootstrap
    准备环境假设和资源路径。
  5. 仅当需要先对仓库结构、插入点或可疑的实现模式进行只读扫描才能继续时,才调用
    analyze-project
  6. 使用
    minimal-run-and-audit
    执行保守的冒烟检查,或是文档记录的推理/评估命令。
  7. 如果选择的可信复现目标是文档记录的训练启动、短时间运行验证或恢复训练,则使用
    run-train
    执行,而非
    minimal-run-and-audit
  8. 可信复现场景下选择执行训练时,首先让
    run-train
    捕获启动证据,之后在执行更长时间训练前设置人工审核检查点。
  9. 如果可能隐式改变流程含义、模型语义或结果解读,暂停流程等待人工审核。
  10. 仅当README和仓库文件存在影响当前复现目标的窄范围关键缺口时,才使用
    paper-context-resolver
  11. 绝不自动跳转至
    explore-code
    explore-run
    ;探索模式需要明确的用户授权。
  12. 生成包含证据、假设、偏差和下一步安全操作的标准化输出。
  13. 使用用户的语言向用户提供简短的最终说明。

Required outputs

要求输出

Always target:
text
repro_outputs/
  SUMMARY.md
  COMMANDS.md
  LOG.md
  status.json
  PATCHES.md   # only if patches were applied
Use the templates under
assets/
and the field rules in
references/output-spec.md
.
始终输出以下内容:
text
repro_outputs/
  SUMMARY.md
  COMMANDS.md
  LOG.md
  status.json
  PATCHES.md   # only if patches were applied
使用
assets/
目录下的模板,以及
references/output-spec.md
中的字段规则。

Reporting policy

报告规则

  • Put the shortest high-value summary in
    SUMMARY.md
    .
  • Put copyable commands in
    COMMANDS.md
    .
  • Put process evidence, assumptions, failures, and decisions in
    LOG.md
    .
  • Put durable machine-readable state in
    status.json
    .
  • Put branch, commit, validation, and README-fidelity impact in
    PATCHES.md
    when needed.
  • Distinguish verified facts from inferred guesses.
  • 将最简短的高价值总结放在
    SUMMARY.md
    中。
  • 将可直接复制的命令放在
    COMMANDS.md
    中。
  • 将流程证据、假设、失败信息和决策放在
    LOG.md
    中。
  • 将持久化的机器可读取状态放在
    status.json
    中。
  • 需要时,将分支、提交、验证信息、对README保真度的影响放在
    PATCHES.md
    中。
  • 明确区分已验证的事实和推理得出的猜测。

Maintainability notes

可维护性说明

  • Keep this skill narrow: README-first AI repo reproduction only.
  • Push specialized logic into sub-skills or helper scripts.
  • Prefer stable templates and simple schemas over ad hoc prose.
  • Keep machine-readable outputs backward compatible when possible.
  • Add new evidence sources only when they improve auditability without raising learning cost.
  • Treat
    repo-intake-and-plan
    and
    paper-context-resolver
    as narrow helpers, not primary public entrypoints.
  • 保持该技能的定位窄而专:仅用于README优先的AI仓库复现。
  • 将专用逻辑下沉到子技能或辅助脚本中。
  • 优先使用稳定的模板和简单的schema,而非临时编写的文本。
  • 尽可能保持机器可读取输出的向后兼容性。
  • 仅当新的证据来源可以提升可审计性且不会提高学习成本时,才进行添加。
  • repo-intake-and-plan
    paper-context-resolver
    视为窄范围辅助工具,而非主要的公开入口。