ai-research-reproduction

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ai-research-reproduction

Use when

适用场景

The user wants the agent to reproduce an AI paper repository.
The target is a code repository with a README, scripts, configs, or documented commands.
The goal is a minimal trustworthy run, not unlimited experimentation.
The user needs standardized outputs that another human or model can audit quickly.
The task spans more than one stage, such as intake plus setup, or setup plus execution plus reporting.

用户需要Agent复现AI论文对应的仓库。
复现目标是包含README、脚本、配置文件或已记录命令的代码仓库。
目标是完成最低可信任的运行，而非无限制的实验。
用户需要标准化的输出，便于其他人员或模型快速审计。
任务涉及多个阶段，例如资源导入+环境搭建，或是环境搭建+执行+报告生成。

Do not use when

不适用场景

The task is a general literature review or paper summary.
The task is to design a new model, benchmark suite, or training pipeline from scratch.
The repository is not centered on AI or does not expose a documented reproduction path.
The user primarily wants a deep code refactor rather than README-first reproduction.
The user is explicitly asking for only one narrow phase that a sub-skill already covers cleanly.
The user is explicitly authorizing exploratory branch-only experimentation instead of trusted reproduction.

任务是通用文献综述或论文总结。
任务是从零开始设计新模型、基准测试套件或训练流水线。
目标仓库不聚焦AI领域，或是没有提供公开的可复现流程文档。
用户的核心需求是深度代码重构，而非以README优先的复现。
用户明确只需要某个子技能已完整覆盖的单一窄范围阶段任务。
用户明确授权仅在探索分支进行实验，而非需要可信复现结果。

Success criteria

成功标准

README is treated as the primary source of reproduction intent.
A minimum trustworthy target is selected and justified.
Documented inference is preferred over evaluation, and evaluation is preferred over training.
Any repo edits remain conservative, explicit, and auditable.
Assumptions, protocol deviations, and human decision points are surfaced rather than hidden.
```
repro_outputs/
```
is generated with consistent structure and stable machine-readable fields.
Final user-facing explanation is short and follows the user's language when practical.

README被视为复现需求的首要参考来源。
选择最小范围的可信复现目标并给出选择依据。
优先执行文档记录的推理流程，其次是评估流程，最后才是训练流程。
所有对仓库的修改都保持保守、明确、可审计。
假设、流程偏差和人工决策点需公开披露，而非隐藏。
生成的
```
repro_outputs/
```
结构统一，包含稳定的机器可读取字段。
面向用户的最终说明简洁，尽可能使用和用户输入一致的语言。

Interaction and usability policy

交互与可用性规则

Keep the workflow simple enough for a new user to understand quickly.
Prefer short, concrete plans over exhaustive research.
Expose commands, assumptions, blockers, and evidence.
Avoid turning the skill into an opaque automation layer.
Preserve a low learning cost for both humans and downstream agents.

工作流足够简单，新用户可以快速理解。
优先制定简短、具体的计划，而非无限制调研。
公开所有执行命令、假设、阻塞问题和证据。
避免将该技能封装为不透明的自动化层。
对人类用户和下游Agent都保持低学习成本。

Language policy

语言规则

Human-readable Markdown outputs should follow the user's language when it is clear.
If the user's language is unclear, default to concise English.
Machine-readable fields, filenames, keys, and enum values stay in stable English.
Paths, package names, CLI commands, config keys, and code identifiers remain unchanged.

See

references/language-policy.md

人类可读的Markdown输出，当用户语言明确时，和用户使用的语言保持一致。
用户语言不明确时，默认使用简洁的英文。
机器可读取字段、文件名、键名、枚举值保持使用稳定的英文。
路径、包名、CLI命令、配置键名、代码标识符保持不变。

详见

references/language-policy.md

。

Reproduction policy

复现规则

Core priority order:

documented inference
documented evaluation
documented training startup or partial verification
full training only when the user explicitly asks later

Rules:

README-first: use repository files to clarify, not casually override, the README.
Aim for minimal trustworthy reproduction rather than maximum task coverage.
Treat smoke tests, startup verification, and early-step checks as valid training evidence when full training is not appropriate.
In trusted reproduction, a documented training command should first be checked through startup verification or a short monitoring window, then paused for explicit human confirmation before broader training continues.
In explicitly authorized explore-lane execution, the training record can continue without the trusted-lane confirmation pause, but it must stay isolated from trusted conclusions.
Record unresolved gaps rather than fabricating confidence.

核心优先级排序：

文档记录的推理流程
文档记录的评估流程
文档记录的训练启动或部分验证
仅当用户后续明确要求时，才执行完整训练

规则：

README优先：使用仓库其他文件来澄清README内容，而非随意覆盖README的说明。
目标是完成最小范围的可信复现，而非覆盖最多的任务。
不适合执行完整训练时，冒烟测试、启动验证、早期步骤检查都视为有效的训练相关证据。
可信复现场景下，执行文档记录的训练命令前，首先要通过启动验证或短时间运行监控，之后暂停流程，获得明确的人工确认后再继续执行更长时间的训练。
获得明确授权的探索模式执行场景下，训练记录可以跳过可信模式的确认暂停步骤，但相关内容必须和可信结论隔离。
记录未解决的缺口，而非虚构可信度。

Patch policy

补丁规则

Prefer no code changes.
Prefer safer adjustments first:
- command-line arguments
- environment variables
- path fixes
- dependency version fixes
- dependency file fixes such as
```
requirements.txt
```
  or
```
environment.yml
```
Avoid changing:
- model architecture
- core inference semantics
- core training logic
- loss functions
- experiment meaning
If repository files must change:
- create a patch branch first using
```
repro/YYYY-MM-DD-short-task
```
- apply low-risk changes before medium-risk changes
- avoid high-risk changes by default
- commit only verified groups of changes
- keep verified patch commits sparse, usually
```
0-2
```
- use commit messages in the form
```
repro: <scope> for documented <command>
```

See

references/patch-policy.md

优先不修改代码。
优先采用更安全的调整方式，优先级从高到低为：
- 命令行参数调整
- 环境变量调整
- 路径修复
- 依赖版本修复
- 依赖文件修复，例如
```
requirements.txt
```
  或
```
environment.yml
```
避免修改：
- 模型架构
- 核心推理语义
- 核心训练逻辑
- 损失函数
- 实验含义
必须修改仓库文件时：
- 首先创建补丁分支，命名格式为
```
repro/YYYY-MM-DD-short-task
```
- 先应用低风险修改，再应用中风险修改
- 默认避免高风险修改
- 仅提交经过验证的修改组
- 已验证的补丁提交尽量少，通常为
```
0-2
```
  个
- 提交信息使用
```
repro: <scope> for documented <command>
```
  格式

详见

references/patch-policy.md

。

Research safety boundary

研究安全边界

Preserve experiment meaning over convenience.
Do not silently change dataset, split, checkpoint, preprocessing, metric, loss, or model semantics.
Distinguish direct evidence from inference and from user-approved decisions.
Prefer a recorded blocker over an unrecorded workaround.
Escalate for explicit human review before any change that could alter scientific meaning or reported conclusions.

See

references/research-safety-principles.md

优先保障实验含义准确，而非追求便捷。
不得隐式修改数据集、数据集划分、checkpoint、预处理逻辑、指标、损失函数或模型语义。
明确区分直接证据、推理结论和用户批准的决策。
优先记录阻塞问题，而非采用无记录的临时解决方案。
任何可能改变科学含义或已公开结论的修改，都需要先升级获得明确的人工审核。

详见

references/research-safety-principles.md

。

Workflow

工作流

Read README and repo signals.
Call
```
repo-intake-and-plan
```
to scan the repository and extract documented commands.
Select the smallest trustworthy reproduction target.
Call
```
env-and-assets-bootstrap
```
to prepare environment assumptions and asset paths.
Call
```
analyze-project
```
only when repo structure, insertion points, or suspicious implementation patterns need a read-only pass before continuing.
Run a conservative smoke check or documented inference or evaluation command with
```
minimal-run-and-audit
```
.
If the selected trustworthy target is documented training startup, short-run verification, or resume, hand execution to
```
run-train
```
instead of
```
minimal-run-and-audit
```
.
When training is selected inside trusted reproduction, let
```
run-train
```
capture the startup evidence first, then surface a human review checkpoint before any fuller training claim.
Stop for human review if protocol meaning, model semantics, or result interpretation would otherwise be changed implicitly.
Use
```
paper-context-resolver
```
only if README and repo files leave a narrow reproduction-critical gap that blocks the current target.
Never auto-route into
```
explore-code
```
or
```
explore-run
```
; exploration requires explicit user authorization.
Write the standardized outputs with evidence, assumptions, deviations, and next safe action.
Give the user a short final note in the user's language.

读取README和仓库相关信息。
调用
```
repo-intake-and-plan
```
扫描仓库，提取文档记录的命令。
选择最小范围的可信复现目标。
调用
```
env-and-assets-bootstrap
```
准备环境假设和资源路径。
仅当需要先对仓库结构、插入点或可疑的实现模式进行只读扫描才能继续时，才调用
```
analyze-project
```
。
使用
```
minimal-run-and-audit
```
执行保守的冒烟检查，或是文档记录的推理/评估命令。
如果选择的可信复现目标是文档记录的训练启动、短时间运行验证或恢复训练，则使用
```
run-train
```
执行，而非
```
minimal-run-and-audit
```
。
可信复现场景下选择执行训练时，首先让
```
run-train
```
捕获启动证据，之后在执行更长时间训练前设置人工审核检查点。
如果可能隐式改变流程含义、模型语义或结果解读，暂停流程等待人工审核。
仅当README和仓库文件存在影响当前复现目标的窄范围关键缺口时，才使用
```
paper-context-resolver
```
。
绝不自动跳转至
```
explore-code
```
或
```
explore-run
```
；探索模式需要明确的用户授权。
生成包含证据、假设、偏差和下一步安全操作的标准化输出。
使用用户的语言向用户提供简短的最终说明。

Required outputs

要求输出

Always target:

text

repro_outputs/
  SUMMARY.md
  COMMANDS.md
  LOG.md
  status.json
  PATCHES.md   # only if patches were applied

Use the templates under

assets/

and the field rules in

references/output-spec.md

始终输出以下内容：

text

repro_outputs/
  SUMMARY.md
  COMMANDS.md
  LOG.md
  status.json
  PATCHES.md   # only if patches were applied

使用

assets/

目录下的模板，以及

references/output-spec.md

中的字段规则。

Reporting policy

报告规则

Put the shortest high-value summary in
```
SUMMARY.md
```
.
Put copyable commands in
```
COMMANDS.md
```
.
Put process evidence, assumptions, failures, and decisions in
```
LOG.md
```
.
Put durable machine-readable state in
```
status.json
```
.
Put branch, commit, validation, and README-fidelity impact in
```
PATCHES.md
```
when needed.
Distinguish verified facts from inferred guesses.

将最简短的高价值总结放在
```
SUMMARY.md
```
中。
将可直接复制的命令放在
```
COMMANDS.md
```
中。
将流程证据、假设、失败信息和决策放在
```
LOG.md
```
中。
将持久化的机器可读取状态放在
```
status.json
```
中。
需要时，将分支、提交、验证信息、对README保真度的影响放在
```
PATCHES.md
```
中。
明确区分已验证的事实和推理得出的猜测。

Maintainability notes

可维护性说明

Keep this skill narrow: README-first AI repo reproduction only.
Push specialized logic into sub-skills or helper scripts.
Prefer stable templates and simple schemas over ad hoc prose.
Keep machine-readable outputs backward compatible when possible.
Add new evidence sources only when they improve auditability without raising learning cost.
Treat
```
repo-intake-and-plan
```
and
```
paper-context-resolver
```
as narrow helpers, not primary public entrypoints.

保持该技能的定位窄而专：仅用于README优先的AI仓库复现。
将专用逻辑下沉到子技能或辅助脚本中。
优先使用稳定的模板和简单的schema，而非临时编写的文本。
尽可能保持机器可读取输出的向后兼容性。
仅当新的证据来源可以提升可审计性且不会提高学习成本时，才进行添加。
将
```
repo-intake-and-plan
```
和
```
paper-context-resolver
```
视为窄范围辅助工具，而非主要的公开入口。