megatron-change-analyzer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Megatron Change Analyzer

Megatron变更分析器

Turn a raw Megatron change set into structured feature evolution findings that are ready for downstream implementation work. Focus on extracting the real upstream feature shape, not on writing MindSpeed code yet.
将原始Megatron变更集转换为结构化的功能演进结论,为下游实现工作做好准备。重点在于提取上游真实的功能形态,暂不涉及MindSpeed代码编写。

Core Rules

核心规则

  • Assume input is a normalized Megatron
    change-set
    , not an arbitrary pile of commits.
  • Group multiple low-level commits into higher-level change events when they serve one feature.
  • Prioritize new features, API shape changes, config or CLI schema changes, and behavior-affecting refactors.
  • Tag each event with migration relevance, but do not edit MindSpeed here.
  • Treat the upstream commit itself as implementation evidence, not just as a summary source.
  • When a feature is represented by a specific commit or tightly related commit cluster, preserve that grouping so downstream steps can port the implementation with the commit as the primary reference.
  • Treat
    megatron main
    findings as exploratory unless the user has supplied a strict target mapping.
  • 假设输入为标准化的Megatron
    change-set
    ,而非任意提交的集合。
  • 当多个底层提交服务于同一功能时,将其归为更高层级的变更事件。
  • 优先关注新功能、API形态变更、配置或CLI schema变更,以及影响行为的重构。
  • 为每个事件标记迁移相关性,但在此处不编辑MindSpeed代码。
  • 将上游提交本身视为实现证据,而非仅作为摘要来源。
  • 当某一功能由特定提交或紧密关联的提交集群实现时,保留该分组,以便下游步骤能以该提交为主要参考进行移植实现。
  • 除非用户提供了严格的目标映射,否则将
    megatron main
    的结论视为探索性内容。

Output Shape

输出格式

Produce a structured report with one event per feature or migration-relevant change:
json
{
  "branch": "core_v0.15.3",
  "events": [
    {
      "title": "Add feature X",
      "kind": "new_feature",
      "commits": ["sha1", "sha2"],
      "primary_commit": "sha2",
      "areas": ["training", "config"],
      "breaking_risk": "medium",
      "migration_relevance": "high",
      "notes": "Short factual summary",
      "evidence": ["file/path.py", "arguments.py"],
      "upstream_changed_files": ["megatron/path_a.py", "megatron/path_b.py"],
      "implementation_units": [
        {
          "name": "Expose new config flag",
          "kind": "config_surface",
          "upstream_files": ["megatron/training/config/training_config.py"],
          "summary": "What this unit changes in upstream code"
        },
        {
          "name": "Add runtime behavior",
          "kind": "runtime_logic",
          "upstream_files": ["megatron/training/global_vars.py"],
          "summary": "What runtime path must be ported downstream"
        }
      ],
      "porting_notes": [
        "Facts the downstream mapper should preserve when porting the feature"
      ]
    }
  ]
}
生成一份结构化报告,每个功能或与迁移相关的变更对应一个事件:
json
{
  "branch": "core_v0.15.3",
  "events": [
    {
      "title": "Add feature X",
      "kind": "new_feature",
      "commits": ["sha1", "sha2"],
      "primary_commit": "sha2",
      "areas": ["training", "config"],
      "breaking_risk": "medium",
      "migration_relevance": "high",
      "notes": "Short factual summary",
      "evidence": ["file/path.py", "arguments.py"],
      "upstream_changed_files": ["megatron/path_a.py", "megatron/path_b.py"],
      "implementation_units": [
        {
          "name": "Expose new config flag",
          "kind": "config_surface",
          "upstream_files": ["megatron/training/config/training_config.py"],
          "summary": "What this unit changes in upstream code"
        },
        {
          "name": "Add runtime behavior",
          "kind": "runtime_logic",
          "upstream_files": ["megatron/training/global_vars.py"],
          "summary": "What runtime path must be ported downstream"
        }
      ],
      "porting_notes": [
        "Facts the downstream mapper should preserve when porting the feature"
      ]
    }
  ]
}

Workflow

工作流程

  1. Read the normalized change-set.
  2. Inspect commit titles, touched files, and diffs at a feature level.
  3. Collapse related commits into one event when appropriate, but keep the primary implementation commit visible.
  4. Label each event using the taxonomy in change-taxonomy.md.
  5. Break each relevant event into implementation units: config and CLI exposure, runtime logic, wrappers or adaptors, tests and examples, lifecycle or cleanup behavior.
  6. Separate events into: relevant for MindSpeed, probably already covered, not currently worth adaptation.
  7. Hand off only the relevant subset to the impact mapper, including commit references, upstream changed files, and implementation units.
  1. 读取标准化变更集。
  2. 从功能层面检查提交标题、涉及文件和差异内容。
  3. 适当时将相关提交合并为一个事件,但保留主要实现提交的可见性。
  4. 使用change-taxonomy.md中的分类法为每个事件标记类别。
  5. 将每个相关事件拆分为实现单元: 配置与CLI暴露、 运行时逻辑、 包装器或适配器、 测试与示例、 生命周期或清理行为。
  6. 将事件分为: 与MindSpeed相关的、 可能已覆盖的、 当前不值得适配的。
  7. 仅将相关子集提交给影响映射器,包括提交引用、上游变更文件和实现单元。

What To Highlight

重点关注内容

  • Newly exposed features or workflows
  • Public API additions or signature changes
  • CLI/config additions, removals, or renames
  • Checkpoint or state format changes
  • Parallelism and distributed execution changes
  • Data pipeline or runtime behavior changes
  • 新增的功能或工作流
  • 公共API的新增或签名变更
  • CLI/配置的新增、移除或重命名
  • 检查点或状态格式变更
  • 并行化与分布式执行变更
  • 数据管道或运行时行为变更

What To Avoid

需避免的内容

  • Do not produce code patches here.
  • Do not claim MindSpeed compatibility from Megatron evidence alone.
  • Do not treat internal refactors as high-value migration work unless they change integration surfaces.
  • Do not collapse a large feature into a one-line note if the upstream commit actually changes multiple integration surfaces. Preserve enough structure for downstream implementation.
  • 在此处不生成代码补丁。
  • 仅根据Megatron的证据,不得声称MindSpeed具备兼容性。
  • 除非内部重构改变了集成接口,否则不要将其视为高价值的迁移工作。
  • 如果上游提交实际修改了多个集成接口,不要将大型功能简化为一行注释。保留足够的结构以便下游实现。

References

参考资料

  • Read change-taxonomy.md before labeling events.
  • Run build_feature_events.py when you need a deterministic first-pass event file that preserves primary commits, upstream changed files, and implementation units.
  • Hand migration-relevant events to $megatron-impact-mapper.
  • 在标记事件前,请阅读change-taxonomy.md
  • 当需要生成确定性的初始事件文件,且保留主要提交、上游变更文件和实现单元时,运行build_feature_events.py
  • 将与迁移相关的事件提交给$megatron-impact-mapper