split-pr
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSplit PR by CODEOWNERS Groups
按CODEOWNERS组拆分PR
Split a large pull request into multiple smaller PRs, where each PR touches
the fewest possible CODEOWNERS reviewer groups. The goal is to reduce review
burden: a PR that only touches needs only the core reviewers,
while a PR that also touches , , and
pulls in many additional groups.
megatron/core/examples/tools/megatron/training/将大型拉取请求(PR)拆分为多个较小的PR,每个PR涉及尽可能少的CODEOWNERS评审组。目标是减轻评审负担:仅涉及的PR只需核心评审人员,而同时涉及、和的PR则会引入更多评审组。
megatron/core/examples/tools/megatron/training/Workflow
工作流程
1. Analyze the PR
1. 分析PR
- Fetch the PR details: and
gh pr view <number> --repo NVIDIA/Megatron-LM --json title,body,headRefName,author. Also determine the current GitHub user withgh pr diff <number> --repo NVIDIA/Megatron-LM --stat.gh api user --jq .login - Parse to build a mapping from file path patterns to owner groups.
.github/CODEOWNERS - For each changed file in the PR, determine which CODEOWNERS groups would be required to review it.
- Build a summary table grouped by CODEOWNERS group, showing which files pull in which groups.
- Count the total number of distinct reviewer groups the PR currently requires.
- 获取PR详情:执行和
gh pr view <number> --repo NVIDIA/Megatron-LM --json title,body,headRefName,author。同时通过gh pr diff <number> --repo NVIDIA/Megatron-LM --stat确定当前GitHub用户。gh api user --jq .login - 解析文件,构建文件路径模式与所有者组的映射关系。
.github/CODEOWNERS - 针对PR中的每个变更文件,确定需要哪些CODEOWNERS组进行评审。
- 构建按CODEOWNERS组分组的汇总表格,展示哪些文件会引入哪些评审组。
- 统计当前PR所需的不同评审组总数。
2. Propose a split that minimizes reviewer groups per PR
2. 提出拆分方案,最小化每个PR的评审组数量
The primary optimization goal: minimize the number of CODEOWNERS reviewer groups required for each resulting PR.
Strategy:
- Cluster files by their CODEOWNERS groups. Files owned by the same set of groups naturally belong together.
- Identify the largest cluster — this becomes the first (and usually largest) PR.
- Remaining files form one or more additional PRs, each ideally requiring only one or two reviewer groups.
- If a split creates a dependency (e.g., PR B uses symbols renamed in PR A), the dependent PR must be merged after the first. Note this explicitly.
- Each PR must be independently mergeable to main — no broken imports, no missing symbols. Backward-compatible aliases and re-export stubs in the first PR can make this possible.
Present the proposed split as a table:
- PR name/description
- Files included
- CODEOWNERS groups required
- Dependencies on other PRs (if any)
Wait for user approval before proceeding.
主要优化目标:最小化每个拆分后PR所需的CODEOWNERS评审组数量。
策略:
- 按CODEOWNERS组对文件进行聚类。由同一组所有者负责的文件自然应归为一类。
- 确定最大的聚类——这将成为第一个(通常也是最大的)PR。
- 剩余文件形成一个或多个额外PR,每个PR理想情况下只需一个或两个评审组。
- 如果拆分产生依赖关系(例如,PR B使用了PR A中重命名的符号),则依赖PR必须在第一个PR合并后再合并。需明确注明这一点。
- 每个PR必须能够独立合并到main分支——不能有导入错误,不能缺少符号。第一个PR中的向后兼容别名和重新导出存根可以实现这一点。
以表格形式呈现拟议的拆分方案:
- PR名称/描述
- 包含的文件
- 所需的CODEOWNERS组
- 对其他PR的依赖(如有)
在继续执行前等待用户批准。
3. Execute the split (after user approval)
3. 执行拆分(获得用户批准后)
For each new PR:
- Create a new branch from the appropriate base (, or a dependency PR's branch).
main - Extract the relevant changes: .
git diff upstream/main..<source-branch> -- <file paths> | git apply - Stage, commit with a clear message, and push to the user's fork.
- Create the PR as a draft (per repo contributing guidelines).
- If the original PR needs to be narrowed in scope, confirm with the user before force-pushing.
- Report all PR URLs when done.
针对每个新PR:
- 从合适的基准分支(,或依赖PR的分支)创建新分支。
main - 提取相关变更:。
git diff upstream/main..<source-branch> -- <file paths> | git apply - 暂存、提交(使用清晰的提交信息)并推送到用户的fork仓库。
- 创建草稿PR(遵循仓库贡献指南)。
- 如果需要缩小原始PR的范围,在强制推送前需与用户确认。
- 完成后报告所有PR的URL。
Important guidelines
重要指南
- Always create PRs as drafts and push to the user's fork, never directly to upstream.
- Backward-compatible changes (aliases, re-exports, deprecation shims) should go in the first PR so subsequent PRs can depend on them.
- Test files should go with the production code they test, not in a separate PR.
- Prefer a single clean commit per split PR over replaying the original commit history.
- If a file is hard to categorize (e.g., it touches two groups), ask the user which PR it should go in.
- If the current GitHub user is not the author of the original PR, each new PR's description must explicitly credit the original author (e.g., "Original changes by @<author> in #<number>").
- 始终创建草稿PR并推送到用户的fork仓库,切勿直接推送到上游仓库。
- 向后兼容的变更(别名、重新导出、弃用垫片)应放在第一个PR中,以便后续PR可以依赖这些变更。
- 测试文件应与其测试的生产代码放在同一个PR中,不要单独放在一个PR里。
- 每个拆分后的PR优先使用单个清晰的提交,而非重放原始提交历史。
- 如果某个文件难以分类(例如,它涉及两个组),询问用户应将其放入哪个PR。
- 如果当前GitHub用户不是原始PR的作者,每个新PR的描述必须明确注明原始作者的贡献(例如,“原始变更由@<author>在#<number>中提交”)。