ai-research-reproduction
Original:🇺🇸 English
Translated
1 scripts
Main orchestrator for README-first AI repo reproduction. Use when the user wants an end-to-end, minimal-trustworthy reproduction flow that reads the repository first, selects the smallest documented inference or evaluation target, coordinates intake, setup, trusted execution, optional trusted training, optional repository analysis, and optional paper-gap resolution, enforces conservative patch rules, records evidence assumptions deviations and human decision points, and writes the standardized `repro_outputs/` bundle. Do not use for paper summary, generic environment setup, isolated repo scanning, standalone command execution, silent protocol changes, or broad research assistance outside repository-grounded reproduction.
2installs
Added on
NPX Install
npx skill4agent add lllllllama/ai-paper-reproduction-skill ai-research-reproductionTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →ai-research-reproduction
Use when
- The user wants the agent to reproduce an AI paper repository.
- The target is a code repository with a README, scripts, configs, or documented commands.
- The goal is a minimal trustworthy run, not unlimited experimentation.
- The user needs standardized outputs that another human or model can audit quickly.
- The task spans more than one stage, such as intake plus setup, or setup plus execution plus reporting.
Do not use when
- The task is a general literature review or paper summary.
- The task is to design a new model, benchmark suite, or training pipeline from scratch.
- The repository is not centered on AI or does not expose a documented reproduction path.
- The user primarily wants a deep code refactor rather than README-first reproduction.
- The user is explicitly asking for only one narrow phase that a sub-skill already covers cleanly.
- The user is explicitly authorizing exploratory branch-only experimentation instead of trusted reproduction.
Success criteria
- README is treated as the primary source of reproduction intent.
- A minimum trustworthy target is selected and justified.
- Documented inference is preferred over evaluation, and evaluation is preferred over training.
- Any repo edits remain conservative, explicit, and auditable.
- Assumptions, protocol deviations, and human decision points are surfaced rather than hidden.
- is generated with consistent structure and stable machine-readable fields.
repro_outputs/ - Final user-facing explanation is short and follows the user's language when practical.
Interaction and usability policy
- Keep the workflow simple enough for a new user to understand quickly.
- Prefer short, concrete plans over exhaustive research.
- Expose commands, assumptions, blockers, and evidence.
- Avoid turning the skill into an opaque automation layer.
- Preserve a low learning cost for both humans and downstream agents.
Language policy
- Human-readable Markdown outputs should follow the user's language when it is clear.
- If the user's language is unclear, default to concise English.
- Machine-readable fields, filenames, keys, and enum values stay in stable English.
- Paths, package names, CLI commands, config keys, and code identifiers remain unchanged.
See .
references/language-policy.mdReproduction policy
Core priority order:
- documented inference
- documented evaluation
- documented training startup or partial verification
- full training only when the user explicitly asks later
Rules:
- README-first: use repository files to clarify, not casually override, the README.
- Aim for minimal trustworthy reproduction rather than maximum task coverage.
- Treat smoke tests, startup verification, and early-step checks as valid training evidence when full training is not appropriate.
- In trusted reproduction, a documented training command should first be checked through startup verification or a short monitoring window, then paused for explicit human confirmation before broader training continues.
- In explicitly authorized explore-lane execution, the training record can continue without the trusted-lane confirmation pause, but it must stay isolated from trusted conclusions.
- Record unresolved gaps rather than fabricating confidence.
Patch policy
- Prefer no code changes.
- Prefer safer adjustments first:
- command-line arguments
- environment variables
- path fixes
- dependency version fixes
- dependency file fixes such as or
requirements.txtenvironment.yml
- Avoid changing:
- model architecture
- core inference semantics
- core training logic
- loss functions
- experiment meaning
- If repository files must change:
- create a patch branch first using
repro/YYYY-MM-DD-short-task - apply low-risk changes before medium-risk changes
- avoid high-risk changes by default
- commit only verified groups of changes
- keep verified patch commits sparse, usually
0-2 - use commit messages in the form
repro: <scope> for documented <command>
- create a patch branch first using
See .
references/patch-policy.mdResearch safety boundary
- Preserve experiment meaning over convenience.
- Do not silently change dataset, split, checkpoint, preprocessing, metric, loss, or model semantics.
- Distinguish direct evidence from inference and from user-approved decisions.
- Prefer a recorded blocker over an unrecorded workaround.
- Escalate for explicit human review before any change that could alter scientific meaning or reported conclusions.
See .
references/research-safety-principles.mdWorkflow
- Read README and repo signals.
- Call to scan the repository and extract documented commands.
repo-intake-and-plan - Select the smallest trustworthy reproduction target.
- Call to prepare environment assumptions and asset paths.
env-and-assets-bootstrap - Call only when repo structure, insertion points, or suspicious implementation patterns need a read-only pass before continuing.
analyze-project - Run a conservative smoke check or documented inference or evaluation command with .
minimal-run-and-audit - If the selected trustworthy target is documented training startup, short-run verification, or resume, hand execution to instead of
run-train.minimal-run-and-audit - When training is selected inside trusted reproduction, let capture the startup evidence first, then surface a human review checkpoint before any fuller training claim.
run-train - Stop for human review if protocol meaning, model semantics, or result interpretation would otherwise be changed implicitly.
- Use only if README and repo files leave a narrow reproduction-critical gap that blocks the current target.
paper-context-resolver - Never auto-route into or
explore-code; exploration requires explicit user authorization.explore-run - Write the standardized outputs with evidence, assumptions, deviations, and next safe action.
- Give the user a short final note in the user's language.
Required outputs
Always target:
text
repro_outputs/
SUMMARY.md
COMMANDS.md
LOG.md
status.json
PATCHES.md # only if patches were appliedUse the templates under and the field rules in .
assets/references/output-spec.mdReporting policy
- Put the shortest high-value summary in .
SUMMARY.md - Put copyable commands in .
COMMANDS.md - Put process evidence, assumptions, failures, and decisions in .
LOG.md - Put durable machine-readable state in .
status.json - Put branch, commit, validation, and README-fidelity impact in when needed.
PATCHES.md - Distinguish verified facts from inferred guesses.
Maintainability notes
- Keep this skill narrow: README-first AI repo reproduction only.
- Push specialized logic into sub-skills or helper scripts.
- Prefer stable templates and simple schemas over ad hoc prose.
- Keep machine-readable outputs backward compatible when possible.
- Add new evidence sources only when they improve auditability without raising learning cost.
- Treat and
repo-intake-and-planas narrow helpers, not primary public entrypoints.paper-context-resolver