ai-research-reproduction
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseai-research-reproduction
ai-research-reproduction
Use when
适用场景
- The user wants the agent to reproduce an AI paper repository.
- The target is a code repository with a README, scripts, configs, or documented commands.
- The goal is a minimal trustworthy run, not unlimited experimentation.
- The user needs standardized outputs that another human or model can audit quickly.
- The task spans more than one stage, such as intake plus setup, or setup plus execution plus reporting.
- 用户需要Agent复现AI论文对应的仓库。
- 复现目标是包含README、脚本、配置文件或已记录命令的代码仓库。
- 目标是完成最低可信任的运行,而非无限制的实验。
- 用户需要标准化的输出,便于其他人员或模型快速审计。
- 任务涉及多个阶段,例如资源导入+环境搭建,或是环境搭建+执行+报告生成。
Do not use when
不适用场景
- The task is a general literature review or paper summary.
- The task is to design a new model, benchmark suite, or training pipeline from scratch.
- The repository is not centered on AI or does not expose a documented reproduction path.
- The user primarily wants a deep code refactor rather than README-first reproduction.
- The user is explicitly asking for only one narrow phase that a sub-skill already covers cleanly.
- The user is explicitly authorizing exploratory branch-only experimentation instead of trusted reproduction.
- 任务是通用文献综述或论文总结。
- 任务是从零开始设计新模型、基准测试套件或训练流水线。
- 目标仓库不聚焦AI领域,或是没有提供公开的可复现流程文档。
- 用户的核心需求是深度代码重构,而非以README优先的复现。
- 用户明确只需要某个子技能已完整覆盖的单一窄范围阶段任务。
- 用户明确授权仅在探索分支进行实验,而非需要可信复现结果。
Success criteria
成功标准
- README is treated as the primary source of reproduction intent.
- A minimum trustworthy target is selected and justified.
- Documented inference is preferred over evaluation, and evaluation is preferred over training.
- Any repo edits remain conservative, explicit, and auditable.
- Assumptions, protocol deviations, and human decision points are surfaced rather than hidden.
- is generated with consistent structure and stable machine-readable fields.
repro_outputs/ - Final user-facing explanation is short and follows the user's language when practical.
- README被视为复现需求的首要参考来源。
- 选择最小范围的可信复现目标并给出选择依据。
- 优先执行文档记录的推理流程,其次是评估流程,最后才是训练流程。
- 所有对仓库的修改都保持保守、明确、可审计。
- 假设、流程偏差和人工决策点需公开披露,而非隐藏。
- 生成的结构统一,包含稳定的机器可读取字段。
repro_outputs/ - 面向用户的最终说明简洁,尽可能使用和用户输入一致的语言。
Interaction and usability policy
交互与可用性规则
- Keep the workflow simple enough for a new user to understand quickly.
- Prefer short, concrete plans over exhaustive research.
- Expose commands, assumptions, blockers, and evidence.
- Avoid turning the skill into an opaque automation layer.
- Preserve a low learning cost for both humans and downstream agents.
- 工作流足够简单,新用户可以快速理解。
- 优先制定简短、具体的计划,而非无限制调研。
- 公开所有执行命令、假设、阻塞问题和证据。
- 避免将该技能封装为不透明的自动化层。
- 对人类用户和下游Agent都保持低学习成本。
Language policy
语言规则
- Human-readable Markdown outputs should follow the user's language when it is clear.
- If the user's language is unclear, default to concise English.
- Machine-readable fields, filenames, keys, and enum values stay in stable English.
- Paths, package names, CLI commands, config keys, and code identifiers remain unchanged.
See .
references/language-policy.md- 人类可读的Markdown输出,当用户语言明确时,和用户使用的语言保持一致。
- 用户语言不明确时,默认使用简洁的英文。
- 机器可读取字段、文件名、键名、枚举值保持使用稳定的英文。
- 路径、包名、CLI命令、配置键名、代码标识符保持不变。
详见。
references/language-policy.mdReproduction policy
复现规则
Core priority order:
- documented inference
- documented evaluation
- documented training startup or partial verification
- full training only when the user explicitly asks later
Rules:
- README-first: use repository files to clarify, not casually override, the README.
- Aim for minimal trustworthy reproduction rather than maximum task coverage.
- Treat smoke tests, startup verification, and early-step checks as valid training evidence when full training is not appropriate.
- In trusted reproduction, a documented training command should first be checked through startup verification or a short monitoring window, then paused for explicit human confirmation before broader training continues.
- In explicitly authorized explore-lane execution, the training record can continue without the trusted-lane confirmation pause, but it must stay isolated from trusted conclusions.
- Record unresolved gaps rather than fabricating confidence.
核心优先级排序:
- 文档记录的推理流程
- 文档记录的评估流程
- 文档记录的训练启动或部分验证
- 仅当用户后续明确要求时,才执行完整训练
规则:
- README优先:使用仓库其他文件来澄清README内容,而非随意覆盖README的说明。
- 目标是完成最小范围的可信复现,而非覆盖最多的任务。
- 不适合执行完整训练时,冒烟测试、启动验证、早期步骤检查都视为有效的训练相关证据。
- 可信复现场景下,执行文档记录的训练命令前,首先要通过启动验证或短时间运行监控,之后暂停流程,获得明确的人工确认后再继续执行更长时间的训练。
- 获得明确授权的探索模式执行场景下,训练记录可以跳过可信模式的确认暂停步骤,但相关内容必须和可信结论隔离。
- 记录未解决的缺口,而非虚构可信度。
Patch policy
补丁规则
- Prefer no code changes.
- Prefer safer adjustments first:
- command-line arguments
- environment variables
- path fixes
- dependency version fixes
- dependency file fixes such as or
requirements.txtenvironment.yml
- Avoid changing:
- model architecture
- core inference semantics
- core training logic
- loss functions
- experiment meaning
- If repository files must change:
- create a patch branch first using
repro/YYYY-MM-DD-short-task - apply low-risk changes before medium-risk changes
- avoid high-risk changes by default
- commit only verified groups of changes
- keep verified patch commits sparse, usually
0-2 - use commit messages in the form
repro: <scope> for documented <command>
- create a patch branch first using
See .
references/patch-policy.md- 优先不修改代码。
- 优先采用更安全的调整方式,优先级从高到低为:
- 命令行参数调整
- 环境变量调整
- 路径修复
- 依赖版本修复
- 依赖文件修复,例如或
requirements.txtenvironment.yml
- 避免修改:
- 模型架构
- 核心推理语义
- 核心训练逻辑
- 损失函数
- 实验含义
- 必须修改仓库文件时:
- 首先创建补丁分支,命名格式为
repro/YYYY-MM-DD-short-task - 先应用低风险修改,再应用中风险修改
- 默认避免高风险修改
- 仅提交经过验证的修改组
- 已验证的补丁提交尽量少,通常为个
0-2 - 提交信息使用格式
repro: <scope> for documented <command>
- 首先创建补丁分支,命名格式为
详见。
references/patch-policy.mdResearch safety boundary
研究安全边界
- Preserve experiment meaning over convenience.
- Do not silently change dataset, split, checkpoint, preprocessing, metric, loss, or model semantics.
- Distinguish direct evidence from inference and from user-approved decisions.
- Prefer a recorded blocker over an unrecorded workaround.
- Escalate for explicit human review before any change that could alter scientific meaning or reported conclusions.
See .
references/research-safety-principles.md- 优先保障实验含义准确,而非追求便捷。
- 不得隐式修改数据集、数据集划分、checkpoint、预处理逻辑、指标、损失函数或模型语义。
- 明确区分直接证据、推理结论和用户批准的决策。
- 优先记录阻塞问题,而非采用无记录的临时解决方案。
- 任何可能改变科学含义或已公开结论的修改,都需要先升级获得明确的人工审核。
详见。
references/research-safety-principles.mdWorkflow
工作流
- Read README and repo signals.
- Call to scan the repository and extract documented commands.
repo-intake-and-plan - Select the smallest trustworthy reproduction target.
- Call to prepare environment assumptions and asset paths.
env-and-assets-bootstrap - Call only when repo structure, insertion points, or suspicious implementation patterns need a read-only pass before continuing.
analyze-project - Run a conservative smoke check or documented inference or evaluation command with .
minimal-run-and-audit - If the selected trustworthy target is documented training startup, short-run verification, or resume, hand execution to instead of
run-train.minimal-run-and-audit - When training is selected inside trusted reproduction, let capture the startup evidence first, then surface a human review checkpoint before any fuller training claim.
run-train - Stop for human review if protocol meaning, model semantics, or result interpretation would otherwise be changed implicitly.
- Use only if README and repo files leave a narrow reproduction-critical gap that blocks the current target.
paper-context-resolver - Never auto-route into or
explore-code; exploration requires explicit user authorization.explore-run - Write the standardized outputs with evidence, assumptions, deviations, and next safe action.
- Give the user a short final note in the user's language.
- 读取README和仓库相关信息。
- 调用扫描仓库,提取文档记录的命令。
repo-intake-and-plan - 选择最小范围的可信复现目标。
- 调用准备环境假设和资源路径。
env-and-assets-bootstrap - 仅当需要先对仓库结构、插入点或可疑的实现模式进行只读扫描才能继续时,才调用。
analyze-project - 使用执行保守的冒烟检查,或是文档记录的推理/评估命令。
minimal-run-and-audit - 如果选择的可信复现目标是文档记录的训练启动、短时间运行验证或恢复训练,则使用执行,而非
run-train。minimal-run-and-audit - 可信复现场景下选择执行训练时,首先让捕获启动证据,之后在执行更长时间训练前设置人工审核检查点。
run-train - 如果可能隐式改变流程含义、模型语义或结果解读,暂停流程等待人工审核。
- 仅当README和仓库文件存在影响当前复现目标的窄范围关键缺口时,才使用。
paper-context-resolver - 绝不自动跳转至或
explore-code;探索模式需要明确的用户授权。explore-run - 生成包含证据、假设、偏差和下一步安全操作的标准化输出。
- 使用用户的语言向用户提供简短的最终说明。
Required outputs
要求输出
Always target:
text
repro_outputs/
SUMMARY.md
COMMANDS.md
LOG.md
status.json
PATCHES.md # only if patches were appliedUse the templates under and the field rules in .
assets/references/output-spec.md始终输出以下内容:
text
repro_outputs/
SUMMARY.md
COMMANDS.md
LOG.md
status.json
PATCHES.md # only if patches were applied使用目录下的模板,以及中的字段规则。
assets/references/output-spec.mdReporting policy
报告规则
- Put the shortest high-value summary in .
SUMMARY.md - Put copyable commands in .
COMMANDS.md - Put process evidence, assumptions, failures, and decisions in .
LOG.md - Put durable machine-readable state in .
status.json - Put branch, commit, validation, and README-fidelity impact in when needed.
PATCHES.md - Distinguish verified facts from inferred guesses.
- 将最简短的高价值总结放在中。
SUMMARY.md - 将可直接复制的命令放在中。
COMMANDS.md - 将流程证据、假设、失败信息和决策放在中。
LOG.md - 将持久化的机器可读取状态放在中。
status.json - 需要时,将分支、提交、验证信息、对README保真度的影响放在中。
PATCHES.md - 明确区分已验证的事实和推理得出的猜测。
Maintainability notes
可维护性说明
- Keep this skill narrow: README-first AI repo reproduction only.
- Push specialized logic into sub-skills or helper scripts.
- Prefer stable templates and simple schemas over ad hoc prose.
- Keep machine-readable outputs backward compatible when possible.
- Add new evidence sources only when they improve auditability without raising learning cost.
- Treat and
repo-intake-and-planas narrow helpers, not primary public entrypoints.paper-context-resolver
- 保持该技能的定位窄而专:仅用于README优先的AI仓库复现。
- 将专用逻辑下沉到子技能或辅助脚本中。
- 优先使用稳定的模板和简单的schema,而非临时编写的文本。
- 尽可能保持机器可读取输出的向后兼容性。
- 仅当新的证据来源可以提升可审计性且不会提高学习成本时,才进行添加。
- 将和
repo-intake-and-plan视为窄范围辅助工具,而非主要的公开入口。paper-context-resolver