mcore-split-pr
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSplit PR by CODEOWNERS Groups
按CODEOWNERS组拆分PR
Split a large pull request into multiple smaller PRs, where each PR touches
the fewest possible CODEOWNERS reviewer groups. The goal is to reduce review
burden: a PR that only touches needs only the core reviewers,
while a PR that also touches , , and
pulls in many additional groups.
megatron/core/examples/tools/megatron/training/将大型拉取请求(PR)拆分为多个较小的PR,每个PR涉及尽可能少的CODEOWNERS审核组。目标是减轻审核负担:仅涉及的PR只需核心审核人员,而同时涉及、和的PR则会引入更多审核组。
megatron/core/examples/tools/megatron/training/Answer-First Constraints
优先约束条件
For split-planning questions, lead with these constraints before the full
workflow:
- Minimize CODEOWNERS reviewer groups per PR, but each resulting PR must still be independently mergeable and reviewable.
- Tests travel with the production code they validate; do not split tests into a separate PR just to reduce reviewer groups.
- If PR B depends on symbols renamed in PR A, call out the dependency and put backward-compatible aliases, re-exports, or shims in PR A when needed.
- Wait for user approval before execution.
- Execution creates draft PRs from the right base, applies file-scoped diffs
with , pushes to the user's fork, and never pushes directly to upstream.
git diff upstream/main..<source-branch> -- <paths> | git apply
对于拆分规划类问题,在完整工作流程前先说明以下约束:
- 尽量减少每个PR所需的CODEOWNERS审核组,但每个生成的PR必须仍能独立合并和审核。
- 测试代码需与其验证的生产代码放在一起;不要仅为减少审核组而将测试代码拆分为单独的PR。
- 如果PR B依赖于PR A中重命名的符号,需明确指出依赖关系,并在必要时在PR A中添加向后兼容的别名、重导出或垫片(shims)。
- 执行前需等待用户批准。
- 执行时会从正确的基准分支创建草稿PR,使用应用文件范围的差异,推送到用户的fork仓库,绝不直接推送到上游仓库。
git diff upstream/main..<source-branch> -- <paths> | git apply
Workflow
工作流程
1. Analyze the PR
1. 分析PR
- Fetch the PR details: and
gh pr view <number> --repo NVIDIA/Megatron-LM --json title,body,headRefName,author. Also determine the current GitHub user withgh pr diff <number> --repo NVIDIA/Megatron-LM --stat.gh api user --jq .login - Parse to build a mapping from file path patterns to owner groups.
.github/CODEOWNERS - For each changed file in the PR, determine which CODEOWNERS groups would be required to review it.
- Build a summary table grouped by CODEOWNERS group, showing which files pull in which groups.
- Count the total number of distinct reviewer groups the PR currently requires.
- 获取PR详情:执行和
gh pr view <number> --repo NVIDIA/Megatron-LM --json title,body,headRefName,author。同时通过gh pr diff <number> --repo NVIDIA/Megatron-LM --stat确定当前GitHub用户。gh api user --jq .login - 解析文件,构建文件路径模式到所有者组的映射。
.github/CODEOWNERS - 针对PR中每个变更的文件,确定需要哪些CODEOWNERS组进行审核。
- 构建按CODEOWNERS组分组的汇总表格,展示哪些文件会引入哪些审核组。
- 统计当前PR所需的不同审核组总数。
2. Propose a split that minimizes reviewer groups per PR
2. 提出拆分方案,尽量减少每个PR的审核组数量
The primary optimization goal: minimize the number of CODEOWNERS reviewer groups required for each resulting PR.
Strategy:
- Cluster files by their CODEOWNERS groups. Files owned by the same set of groups naturally belong together.
- Identify the largest cluster — this becomes the first (and usually largest) PR.
- Remaining files form one or more additional PRs, each ideally requiring only one or two reviewer groups.
- If a split creates a dependency (e.g., PR B uses symbols renamed in PR A), the dependent PR must be merged after the first. Note this explicitly.
- Each PR must be independently mergeable to main — no broken imports, no missing symbols. Backward-compatible aliases and re-export stubs in the first PR can make this possible.
Present the proposed split as a table:
- PR name/description
- Files included
- CODEOWNERS groups required
- Dependencies on other PRs (if any)
Wait for user approval before proceeding.
主要优化目标:尽量减少每个生成的PR所需的CODEOWNERS审核组数量。
策略:
- 按CODEOWNERS组对文件进行聚类。由同一组所有者负责的文件自然应放在一起。
- 确定最大的聚类——这将成为第一个(通常也是最大的)PR。
- 剩余文件组成一个或多个额外的PR,每个PR理想情况下只需一个或两个审核组。
- 如果拆分产生依赖关系(例如,PR B使用PR A中重命名的符号),则依赖PR必须在第一个PR之后合并。需明确注明这一点。
- 每个PR必须能够独立合并到main分支——不能有导入错误,不能缺少符号。第一个PR中的向后兼容别名和重导出存根可以实现这一点。
以表格形式呈现提议的拆分方案:
- PR名称/描述
- 包含的文件
- 所需的CODEOWNERS组
- 对其他PR的依赖(如有)
继续执行前需等待用户批准。
3. Execute the split (after user approval)
3. 执行拆分(用户批准后)
For each new PR:
- Create a new branch from the appropriate base (, or a dependency PR's branch).
main - Extract the relevant changes: .
git diff upstream/main..<source-branch> -- <file paths> | git apply - Stage, commit with a clear message, and push to the user's fork.
- Create the PR as a draft (per repo contributing guidelines).
- If the original PR needs to be narrowed in scope, confirm with the user before force-pushing.
- Report all PR URLs when done.
针对每个新PR:
- 从适当的基准分支(,或某个依赖PR的分支)创建新分支。
main - 提取相关变更:执行。
git diff upstream/main..<source-branch> -- <file paths> | git apply - 暂存、提交(使用清晰的提交信息)并推送到用户的fork仓库。
- 将PR创建为草稿(遵循仓库贡献指南)。
- 如果需要缩小原始PR的范围,在强制推送前需与用户确认。
- 完成后报告所有PR的URL。
Important guidelines
重要指南
- Always create PRs as drafts and push to the user's fork, never directly to upstream.
- Backward-compatible changes (aliases, re-exports, deprecation shims) should go in the first PR so subsequent PRs can depend on them.
- Test files should go with the production code they test, not in a separate PR.
- Prefer a single clean commit per split PR over replaying the original commit history.
- If a file is hard to categorize (e.g., it touches two groups), ask the user which PR it should go in.
- If the current GitHub user is not the author of the original PR, each new PR's description must explicitly credit the original author (e.g., "Original changes by @<author> in #<number>").
- 始终将PR创建为草稿,并推送到用户的fork仓库,绝不直接推送到上游仓库。
- 向后兼容的变更(别名、重导出、弃用垫片)应放在第一个PR中,以便后续PR可以依赖它们。
- 测试代码需与其测试的生产代码放在一起,不要放在单独的PR中。
- 每个拆分后的PR优先使用单个清晰的提交,而非重放原始提交历史。
- 如果某个文件难以分类(例如,它涉及两个组),询问用户应将其放入哪个PR。
- 如果当前GitHub用户不是原始PR的作者,每个新PR的描述必须明确注明原始作者(例如,“原始变更由@<author>在#<number>中提交”)。