research-repo-reproduction

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Research Repo Reproduction

研究代码仓库复现

Reproduction means a faithful, documented attempt to run or inspect the research artifact. It does not mean changing anything until the result looks good.
复现指的是对研究工件进行忠实且有记录的运行或检查尝试。这并不意味着为了让结果看起来不错而进行任何修改。

Read First

必读内容

  • references/repository-contract.md
  • references/output-contracts.md
  • references/repository-contract.md
  • references/output-contracts.md

Trusted Reproduction Flow

可信复现流程

  1. Read README, configs, environment files, scripts, and docs.
  2. Inventory documented commands.
  3. Identify the smallest trustworthy target: smoke test, inference, evaluation, or training startup.
  4. Record assumptions about data, weights, environment, and hardware.
  5. Run only documented or clearly justified commands.
  6. If patching is required, isolate patch notes from scientific contribution claims.
  7. Write standardized evidence under
    repro_outputs/
    .
  1. 阅读README、配置文件、环境文件、脚本和文档。
  2. 整理已记录的命令。
  3. 确定最小的可信目标:冒烟测试、推理、评估或训练启动。
  4. 记录关于数据、权重、环境和硬件的假设。
  5. 仅运行已记录或有明确合理依据的命令。
  6. 如果需要打补丁,将补丁说明与科学贡献声明分开记录。
  7. repro_outputs/
    目录下编写标准化的证据文件。

Output Files

输出文件

  • repro_outputs/SUMMARY.md
  • repro_outputs/COMMANDS.md
  • repro_outputs/LOG.md
  • repro_outputs/PATCHES.md
  • repro_outputs/status.json
  • repro_outputs/SUMMARY.md
  • repro_outputs/COMMANDS.md
  • repro_outputs/LOG.md
  • repro_outputs/PATCHES.md
  • repro_outputs/status.json

Boundaries

边界规则

  • Do not silently change datasets, splits, metrics, checkpoints, or evaluation code.
  • Do not convert a failed reproduction into an exploratory refactor without naming the transition.
  • Do not claim SOTA or correctness from a smoke test.
  • Keep trusted reproduction separate from
    explore_outputs/
    .
  • 不得擅自修改数据集、数据拆分、指标、检查点或评估代码。
  • 不得在未明确标注转换过程的情况下,将失败的复现转为探索性重构。
  • 不得通过冒烟测试就宣称达到SOTA(State-of-the-Art)或结果正确。
  • 保持可信复现与
    explore_outputs/
    目录内容分离。

Repository Cleanup

仓库清理

For bad academic repos, prioritize:
  • package structure
  • clear CLI/scripts
  • configs
  • environment lock or setup docs
  • tests for structural invariants
  • reproducibility commands
  • data provenance
  • paper/proposal placement
  • wiki/source summaries
对于质量不佳的学术代码仓库,优先处理以下事项:
  • 包结构
  • 清晰的CLI/脚本
  • 配置文件
  • 环境锁定或安装文档
  • 结构不变性测试
  • 可复现性命令
  • 数据来源
  • 论文/提案放置位置
  • 维基/源码摘要