research-repo-reproduction
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseResearch Repo Reproduction
研究代码仓库复现
Reproduction means a faithful, documented attempt to run or inspect the research
artifact. It does not mean changing anything until the result looks good.
复现指的是对研究工件进行忠实且有记录的运行或检查尝试。这并不意味着为了让结果看起来不错而进行任何修改。
Read First
必读内容
references/repository-contract.mdreferences/output-contracts.md
references/repository-contract.mdreferences/output-contracts.md
Trusted Reproduction Flow
可信复现流程
- Read README, configs, environment files, scripts, and docs.
- Inventory documented commands.
- Identify the smallest trustworthy target: smoke test, inference, evaluation, or training startup.
- Record assumptions about data, weights, environment, and hardware.
- Run only documented or clearly justified commands.
- If patching is required, isolate patch notes from scientific contribution claims.
- Write standardized evidence under .
repro_outputs/
- 阅读README、配置文件、环境文件、脚本和文档。
- 整理已记录的命令。
- 确定最小的可信目标:冒烟测试、推理、评估或训练启动。
- 记录关于数据、权重、环境和硬件的假设。
- 仅运行已记录或有明确合理依据的命令。
- 如果需要打补丁,将补丁说明与科学贡献声明分开记录。
- 在目录下编写标准化的证据文件。
repro_outputs/
Output Files
输出文件
repro_outputs/SUMMARY.mdrepro_outputs/COMMANDS.mdrepro_outputs/LOG.mdrepro_outputs/PATCHES.mdrepro_outputs/status.json
repro_outputs/SUMMARY.mdrepro_outputs/COMMANDS.mdrepro_outputs/LOG.mdrepro_outputs/PATCHES.mdrepro_outputs/status.json
Boundaries
边界规则
- Do not silently change datasets, splits, metrics, checkpoints, or evaluation code.
- Do not convert a failed reproduction into an exploratory refactor without naming the transition.
- Do not claim SOTA or correctness from a smoke test.
- Keep trusted reproduction separate from .
explore_outputs/
- 不得擅自修改数据集、数据拆分、指标、检查点或评估代码。
- 不得在未明确标注转换过程的情况下,将失败的复现转为探索性重构。
- 不得通过冒烟测试就宣称达到SOTA(State-of-the-Art)或结果正确。
- 保持可信复现与目录内容分离。
explore_outputs/
Repository Cleanup
仓库清理
For bad academic repos, prioritize:
- package structure
- clear CLI/scripts
- configs
- environment lock or setup docs
- tests for structural invariants
- reproducibility commands
- data provenance
- paper/proposal placement
- wiki/source summaries
对于质量不佳的学术代码仓库,优先处理以下事项:
- 包结构
- 清晰的CLI/脚本
- 配置文件
- 环境锁定或安装文档
- 结构不变性测试
- 可复现性命令
- 数据来源
- 论文/提案放置位置
- 维基/源码摘要