repo-migration
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRepo Migration
代码库迁移
Migrate bad research repos without losing provenance. The first pass should make
the repository understandable before trying to improve scientific quality.
在不丢失溯源信息的情况下迁移混乱的研究代码库。在尝试提升科研质量之前,第一步应先让代码库结构清晰易懂。
Read First
必读文档
references/repository-contract.mdreferences/repo-migration-policy.mdreferences/output-contracts.md
references/repository-contract.mdreferences/repo-migration-policy.mdreferences/output-contracts.md
Workflow
工作流程
- Snapshot current state with and a file inventory.
git status --short - Classify files as sources, data, code, notebooks, outputs, reports, configs, credentials, generated caches, or unknown.
- Move immutable research sources into ,
sources/, orreports/.data/raw/ - Move repeatable code into and thin entrypoints into
src/.scripts/ - Keep notebooks in only when they are exploratory or narrative.
notebooks/ - Separate trusted, exploratory, training, analysis, and debug outputs.
- Add source ledger rows and wiki pages for important papers, proposals, and datasets.
- Record unresolved questions in instead of guessing.
wiki/open_questions.md - Run repository structure checks and tests.
- 使用和文件清单记录当前状态快照。
git status --short - 将文件分类为源文件、数据、代码、笔记本、输出文件、报告、配置文件、凭据、生成的缓存或未知类型。
- 将不可修改的研究源文件移至、
sources/或reports/目录。data/raw/ - 将可重复执行的代码移至目录,将精简的入口文件移至
src/目录。scripts/ - 仅当笔记本用于探索性研究或叙事性记录时,才将其保留在目录中。
notebooks/ - 区分可信输出、探索性输出、训练输出、分析输出和调试输出。
- 为重要论文、提案和数据集添加源分类账条目和维基页面。
- 将未解决的问题记录在中,而非主观猜测。
wiki/open_questions.md - 运行代码库结构检查和测试。
Preserve
需要保留的内容
- original filenames in ledger notes when renaming
- archive paths and extraction paths
- proposal versions
- command history if present
- data provenance and access constraints
- 重命名时在分类账记录中保留原始文件名
- 归档路径和提取路径
- 提案版本
- 若存在命令历史则保留
- 数据溯源信息和访问限制
Do Not
禁止操作
- Rewrite scientific logic while doing structural migration unless required to keep tests runnable.
- Delete suspicious files without user approval.
- Promote notebook-only results to final claims without reproduction.
- 在进行结构迁移时重写科研逻辑,除非为了保证测试可运行而必须修改。
- 在未获得用户批准的情况下删除可疑文件。
- 在未复现结果的情况下,将仅存在于笔记本中的结果提升为最终结论。