repo-migration

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Repo Migration

代码库迁移

Migrate bad research repos without losing provenance. The first pass should make the repository understandable before trying to improve scientific quality.
在不丢失溯源信息的情况下迁移混乱的研究代码库。在尝试提升科研质量之前,第一步应先让代码库结构清晰易懂。

Read First

必读文档

  • references/repository-contract.md
  • references/repo-migration-policy.md
  • references/output-contracts.md
  • references/repository-contract.md
  • references/repo-migration-policy.md
  • references/output-contracts.md

Workflow

工作流程

  1. Snapshot current state with
    git status --short
    and a file inventory.
  2. Classify files as sources, data, code, notebooks, outputs, reports, configs, credentials, generated caches, or unknown.
  3. Move immutable research sources into
    sources/
    ,
    reports/
    , or
    data/raw/
    .
  4. Move repeatable code into
    src/
    and thin entrypoints into
    scripts/
    .
  5. Keep notebooks in
    notebooks/
    only when they are exploratory or narrative.
  6. Separate trusted, exploratory, training, analysis, and debug outputs.
  7. Add source ledger rows and wiki pages for important papers, proposals, and datasets.
  8. Record unresolved questions in
    wiki/open_questions.md
    instead of guessing.
  9. Run repository structure checks and tests.
  1. 使用
    git status --short
    和文件清单记录当前状态快照。
  2. 将文件分类为源文件、数据、代码、笔记本、输出文件、报告、配置文件、凭据、生成的缓存或未知类型。
  3. 将不可修改的研究源文件移至
    sources/
    reports/
    data/raw/
    目录。
  4. 将可重复执行的代码移至
    src/
    目录,将精简的入口文件移至
    scripts/
    目录。
  5. 仅当笔记本用于探索性研究或叙事性记录时,才将其保留在
    notebooks/
    目录中。
  6. 区分可信输出、探索性输出、训练输出、分析输出和调试输出。
  7. 为重要论文、提案和数据集添加源分类账条目和维基页面。
  8. 将未解决的问题记录在
    wiki/open_questions.md
    中,而非主观猜测。
  9. 运行代码库结构检查和测试。

Preserve

需要保留的内容

  • original filenames in ledger notes when renaming
  • archive paths and extraction paths
  • proposal versions
  • command history if present
  • data provenance and access constraints
  • 重命名时在分类账记录中保留原始文件名
  • 归档路径和提取路径
  • 提案版本
  • 若存在命令历史则保留
  • 数据溯源信息和访问限制

Do Not

禁止操作

  • Rewrite scientific logic while doing structural migration unless required to keep tests runnable.
  • Delete suspicious files without user approval.
  • Promote notebook-only results to final claims without reproduction.
  • 在进行结构迁移时重写科研逻辑,除非为了保证测试可运行而必须修改。
  • 在未获得用户批准的情况下删除可疑文件。
  • 在未复现结果的情况下,将仅存在于笔记本中的结果提升为最终结论。