ai-research-reproduction

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ai-research-reproduction

ai-research-reproduction

Purpose

用途

Use this as the Rigor Reproduce compatible skill slug for README-first deep learning repository reproduction. The installed slug remains
ai-research-reproduction
for compatibility. The skill guides the agent toward a minimal trustworthy run with auditable evidence; it should not micromanage implementation details that the model can infer from the repository. Reproduction is not "make it run by changing anything"; it means faithfully reading the README, environment, weights, datasets, and documented commands, then recording results and deviations.
Start from the shared operating principles in
../../references/agent-operating-principles.md
, then load
../../references/research-rigor-principles.md
and
../../references/deep-learning-experiment-principles.md
when scientific meaning, comparability, or experiment details are at stake.
本技能是与Rigor Reproduce兼容的技能slug,用于以README优先的深度学习仓库复现。为保证兼容性,已安装的slug仍为
ai-research-reproduction
。该技能引导Agent完成一套最小化的可信运行流程并生成可审计的证据;对于模型可从仓库中自行推断的实现细节,无需进行微观管理。复现并非“通过任意修改使其运行”,而是指忠实读取README、环境配置、权重、数据集及文档化命令,然后记录结果与偏差。
先遵循
../../references/agent-operating-principles.md
中的通用操作原则,当涉及科学意义、可比性或实验细节时,再加载
../../references/research-rigor-principles.md
../../references/deep-learning-experiment-principles.md

Fit

适用场景

Use this skill when all are true:
  • The target is an AI code repository with a README, scripts, configs, or documented commands.
  • The request spans multiple trusted phases such as intake, setup, execution, training verification, analysis, paper-gap resolution, and reporting.
  • The desired result is a small reproducible target, not broad experimentation.
Do not use this skill for paper summaries, generic environment setup, isolated repo scanning, standalone command execution, open-ended research design, or explicit candidate-only exploration.
当以下所有条件均满足时,可使用本技能:
  • 目标是包含README、脚本、配置文件或文档化命令的AI代码仓库。
  • 请求涵盖多个可信阶段,如接收、环境搭建、执行、训练验证、分析、论文与仓库差异解决及报告。
  • 期望结果是一个小型可复现目标,而非广泛的实验探索。
请勿将本技能用于论文摘要生成、通用环境搭建、孤立仓库扫描、独立命令执行、开放式研究设计或仅针对候选方案的探索。

Trusted Target Selection

可信目标选择

Choose the smallest target that can honestly demonstrate repository-grounded reproduction:
  1. documented inference
  2. documented evaluation
  3. documented training startup or partial verification
  4. full training only after explicit user confirmation
Treat README guidance as the primary reproduction intent. Use repository files to clarify the README, not to silently replace it. When the README and paper conflict, record the conflict and use
paper-context-resolver
only for the narrow reproduction-critical gap.
选择能够真实体现仓库复现效果的最小目标:
  1. 文档化的推理任务
  2. 文档化的评估任务
  3. 文档化的训练启动或部分验证任务
  4. 仅在获得用户明确确认后,才进行完整训练任务
将README中的指导视为复现的首要依据。使用仓库文件来澄清README内容,而非静默替换README。当README与论文存在冲突时,记录冲突情况,并仅针对复现关键的狭窄差异使用
paper-context-resolver

Workflow

工作流程

  1. Read the README and nearby repo signals.
  2. Use
    repo-intake-and-plan
    to extract documented commands and candidate targets.
  3. Select and justify the minimum trustworthy target.
  4. Use
    env-and-assets-bootstrap
    only for target-specific environment, checkpoint, dataset, and cache assumptions.
  5. Use
    analyze-project
    only when structure, insertion points, or suspicious implementation patterns need read-only clarification.
  6. Use
    minimal-run-and-audit
    for documented inference, evaluation, smoke, or sanity execution.
  7. Use
    run-train
    instead when the selected trusted target is training startup, short-run verification, full kickoff, or resume.
  8. Pause for human review before fuller training claims or any change that could alter dataset, split, checkpoint, preprocessing, metric, loss, model semantics, or result interpretation.
  9. Write the standardized outputs and give a concise final note in the user's language when practical.
  1. 读取README及仓库相关信号。
  2. 使用
    repo-intake-and-plan
    提取文档化命令及候选目标。
  3. 选择并论证最小可信目标。
  4. 仅针对目标特定的环境、检查点、数据集及缓存假设,使用
    env-and-assets-bootstrap
  5. 仅当需要以只读方式澄清仓库结构、插入点或可疑实现模式时,使用
    analyze-project
  6. 针对文档化的推理、评估、冒烟测试或 sanity 执行,使用
    minimal-run-and-audit
  7. 当所选可信目标为训练启动、短运行验证、完整启动或恢复训练时,改用
    run-train
  8. 在提出更全面的训练结论或进行任何可能改变数据集、数据划分、检查点、预处理、指标、损失函数、模型语义或结果解读的修改前,暂停流程等待人工审核。
  9. 生成标准化输出,并尽可能使用用户的语言给出简洁的最终说明。

Patch Boundary

补丁边界

Prefer no repository edits. If edits are needed, keep them conservative and auditable:
  • Try command-line arguments, environment variables, path fixes, dependency version fixes, or dependency-file fixes before code changes.
  • Reproduction fixes are allowed when needed, but they must not be hidden. State what changed, why it was necessary, whether it changes scientific meaning, and whether it affects comparability with the paper, README, or baseline.
  • Avoid changing model architecture, core inference semantics, training logic, loss functions, or experiment meaning.
  • If repository files must change, create a branch named
    repro/YYYY-MM-DD-short-task
    , keep verified patch commits sparse, and record README-fidelity impact in
    PATCHES.md
    .
See
references/patch-policy.md
.
优先选择不修改仓库文件。若必须修改,需保持修改的保守性与可审计性:
  • 在修改代码前,先尝试使用命令行参数、环境变量、路径修复、依赖版本修复或依赖文件修复。
  • 仅在必要时允许进行复现修复,但修复内容不得隐藏。需说明修改内容、修改原因、是否改变科学意义以及是否影响与论文、README或基线的可比性。
  • 避免修改模型架构、核心推理语义、训练逻辑、损失函数或实验意义。
  • 若必须修改仓库文件,创建名为
    repro/YYYY-MM-DD-short-task
    的分支,保持已验证的补丁提交精简,并在
    PATCHES.md
    中记录对README忠实度的影响。
详见
references/patch-policy.md

Outputs

输出

Always target
repro_outputs/
:
text
SUMMARY.md
COMMANDS.md
LOG.md
SCIENTIFIC_CHANGELOG.md
COMPARABILITY_REPORT.md
status.json
PATCHES.md   # only if patches were applied
Use the templates under
assets/
and the field rules in
references/output-spec.md
.
  • Put the shortest high-value summary in
    SUMMARY.md
    .
  • Put copyable commands in
    COMMANDS.md
    .
  • Put process evidence, assumptions, failures, and decisions in
    LOG.md
    .
  • Put scientific meaning and change effects in
    SCIENTIFIC_CHANGELOG.md
    .
  • Put comparison anchors and protocol deviations in
    COMPARABILITY_REPORT.md
    .
  • Put durable machine-readable state in
    status.json
    .
  • Put branch, commit, validation, and README-fidelity impact in
    PATCHES.md
    when needed.
  • Distinguish verified facts from inferred guesses.
始终将输出目标指向
repro_outputs/
text
SUMMARY.md
COMMANDS.md
LOG.md
SCIENTIFIC_CHANGELOG.md
COMPARABILITY_REPORT.md
status.json
PATCHES.md   # 仅在应用补丁时生成
使用
assets/
下的模板及
references/output-spec.md
中的字段规则。
  • 将最简短的高价值摘要放入
    SUMMARY.md
  • 将可复制的命令放入
    COMMANDS.md
  • 将流程证据、假设、失败情况及决策放入
    LOG.md
  • 将科学意义及修改影响放入
    SCIENTIFIC_CHANGELOG.md
  • 将对比基准及协议偏差放入
    COMPARABILITY_REPORT.md
  • 将持久化的机器可读状态放入
    status.json
  • 必要时,将分支、提交、验证信息及对README忠实度的影响放入
    PATCHES.md
  • 区分已验证的事实与推断的猜测。

Reference Loading

参考文件加载

  • Load
    references/language-policy.md
    when writing human-readable outputs.
  • Load
    ../../references/research-rigor-principles.md
    before making comparability, contribution, or research-result claims.
  • Load
    ../../references/deep-learning-experiment-principles.md
    when dataset, split, metric, checkpoint, training, or evaluation details matter.
  • Load
    references/research-safety-principles.md
    before protocol-sensitive decisions.
  • Load
    references/patch-policy.md
    before modifying repository files.
  • Keep specialized logic in sub-skills, scripts, templates, or references rather than expanding this entrypoint.
  • 生成人类可读输出时,加载
    references/language-policy.md
  • 在提出可比性、贡献或研究结果相关结论前,加载
    ../../references/research-rigor-principles.md
  • 当涉及数据集、数据划分、指标、检查点、训练或评估细节时,加载
    ../../references/deep-learning-experiment-principles.md
  • 在做出涉及协议敏感性的决策前,加载
    references/research-safety-principles.md
  • 在修改仓库文件前,加载
    references/patch-policy.md
  • 将专用逻辑放在子技能、脚本、模板或参考文件中,而非扩展本入口文件。