auto-bug-fixer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Bug 定位与修复技能

Bug Localization and Fix Skill

概述

Overview

通过结构化的错误分析和根因定位流程,自动生成最小化diff格式修复补丁和防回归测试用例,实现从错误现象到完整修复方案的端到端自动化处理。
Through a structured error analysis and root cause localization process, automatically generate minimized diff-format fix patches and regression-preventive test cases, enabling end-to-end automated processing from error symptoms to complete fix solutions.

使用场景

Application Scenarios

  • 遇到程序崩溃、异常抛出、测试失败等错误现象需要分析
  • 需要定位bug的根本原因(特别是代码变更引入的问题)
  • 需要生成符合工程化规范的代码修复补丁
  • 需要创建可执行的防回归测试用例
  • 需要完整的bug分析报告(包含根因、修复、测试、预防建议)
  • Need analysis when encountering error symptoms such as program crashes, exception throws, test failures, etc.
  • Need to locate the root cause of bugs (especially issues introduced by code changes)
  • Need to generate code fix patches that comply with engineering specifications
  • Need to create executable regression-preventive test cases
  • Need a complete bug analysis report (including root cause, fix, testing, and prevention suggestions)

快速参考

Quick Reference

场景定位方法适用条件
通用bug分析自顶向下追踪法无明确代码变更/PR信息
代码变更引入bugGit二分法有code_change_info,需定位具体PR/提交
错误码含义处理方式
E001输入信息缺失或无法解析补充必填输入(error_phenomenon/reproduce_steps)
E002根因模糊无法准确定位提供更多上下文信息或代码变更信息
E003补丁生成失败确认根因已锁定,检查代码上下文
E004测试用例生成失败确认修复补丁有效,指定测试框架
ScenarioLocalization MethodApplicable Conditions
General bug analysisTop-down tracing methodNo clear code change/PR information
Bug introduced by code changeGit bisect methodHas code_change_info, needs to locate specific PR/commit
Error CodeMeaningHandling Method
E001Missing or unparseable input informationSupplement required inputs (error_phenomenon/reproduce_steps)
E002Ambiguous root cause that cannot be accurately locatedProvide more context information or code change information
E003Patch generation failedConfirm root cause is locked, check code context
E004Test case generation failedConfirm fix patch is valid, specify test framework

核心能力(动作+结果,无重叠/无超范围)

Core Capabilities (Action + Result, No Overlap/Out-of-Scope)

  1. 解析多语言错误现象,校验必填输入信息的完整性和有效性
  2. 对通用场景采用自顶向下追踪法、对代码变更/PR场景采用Git二分法定位bug根因,并验证根因的可复现性
  3. 针对根因生成最小化diff格式代码修复补丁,校验补丁语法正确性,遵循工程化开发规范
  4. 为主流测试框架创建可直接执行的防回归测试用例,覆盖正常/边界/异常核心场景
  5. 输出结构化Markdown格式的Bug分析报告,包含错误现象描述、复现步骤、根因分析、修复补丁、测试用例和预防建议
  1. Parse multi-language error symptoms, verify the completeness and validity of required input information
  2. Use the top-down tracing method for general scenarios and the Git bisect method for code change/PR scenarios to locate bug root causes, and verify the reproducibility of the root cause
  3. Generate minimized diff-format code fix patches for the root cause, verify patch syntax correctness, and follow engineering development specifications
  4. Create directly executable regression-preventive test cases for mainstream test frameworks, covering core normal/boundary/exception scenarios
  5. Output structured Markdown-format bug analysis reports, including error symptom descriptions, reproduction steps, root cause analysis, fix patches, test cases, and prevention suggestions

工作流程(步骤+校验+中断+反馈,Agent可直接执行)

Workflow (Steps + Verification + Interruption + Feedback, Executable by Agent)

第一步:信息收集与校验

Step 1: Information Collection and Verification

校验点:检查必填输入(error_phenomenon/reproduce_steps)是否完整、可识别 ❌ 中断条件:缺失必填输入 / 错误现象无法解析(如乱码、不完整)→ 抛出错误码E001并中断 📝 反馈:输出「信息收集完成,可进入根因分析阶段 / 缺失必填输入,错误码E001」
Checkpoint: Verify whether required inputs (error_phenomenon/reproduce_steps) are complete and recognizable ❌ Interrupt Condition: Missing required inputs / Unparseable error symptoms (e.g., garbled text, incomplete content) → Throw error code E001 and interrupt 📝 Feedback: Output "Information collection completed, can proceed to root cause analysis stage / Missing required inputs, error code E001"

第二步:根因分析与确认

Step 2: Root Cause Analysis and Confirmation

校验点:根因可复现,且精准锁定至具体代码行/文件/PR/提交记录中断条件:信息不足导致根因模糊,无法准确定位 → 抛出错误码E002并中断 📝 反馈:输出「根因已定位:XXX(关联PR/提交:XXX) / 根因模糊,错误码E002」
Checkpoint: Root cause is reproducible and accurately locked to specific code lines/files/PR/commit recordsInterrupt Condition: Insufficient information leads to ambiguous root cause that cannot be accurately located → Throw error code E002 and interrupt 📝 Feedback: Output "Root cause located: XXX (associated PR/commit: XXX) / Ambiguous root cause, error code E002"

定位方法(Agent根据输入自动选择,强化Git二分法定位PR/提交)

Localization Methods (Agent automatically selects based on input, enhances Git bisect method for PR/commit localization)

方法一:自顶向下追踪法(通用场景,无明确代码变更/PR信息)
Method 1: Top-down Tracing Method (General scenario, no clear code change/PR information)
  1. 分析错误堆栈跟踪,定位抛出异常/触发错误的具体代码行/文件
  2. 检查该位置的入参、数据流、依赖调用是否符合业务预期
  3. 若输入/依赖存在异常,逐级向上游追踪数据来源/调用方
  4. 重复上述步骤,直至找到根因(数据/逻辑首次出现异常的位置)
  1. Analyze error stack traces to locate the specific code line/file where the exception is thrown/error is triggered
  2. Check whether the input parameters, data flow, and dependency calls at this location meet business expectations
  3. If there are exceptions in input/dependencies, trace upstream to the data source/caller level by level
  4. Repeat the above steps until the root cause is found (the location where data/logic first deviates from expectations)
方法二:Git二分法定位(代码变更/PR引入bug场景,核心定位问题PR/提交)
Method 2: Git Bisect Localization (Scenario where bug is introduced by code change/PR, core is to locate problematic PR/commit)
  1. 根据输入的code_change_info,确认bug未出现的最后正常版本/PR/提交bug首次出现的异常版本/PR/提交
  2. 依次执行Git二分法命令:
    git bisect start
    git bisect bad [异常版本/提交哈希]
    git bisect good [正常版本/提交哈希]
  3. Git自动缩小排查范围,定位到引入bug的具体提交记录/关联PR(二分法结果中唯一的变更记录)
  4. 分析目标提交/PR的代码变更内容,锁定根因代码行,并做验证:回滚该提交/PR后,确认bug是否消失
  5. 若涉及多个PR/提交,按影响程度排序,提供逐一验证的方法
  1. Based on the input code_change_info, confirm the last normal version/PR/commit where the bug did not occur and the abnormal version/PR/commit where the bug first occurred
  2. Execute Git bisect commands in sequence:
    git bisect start
    git bisect bad [abnormal version/commit hash]
    git bisect good [normal version/commit hash]
  3. Git automatically narrows down the investigation scope and locates the specific commit record/associated PR that introduced the bug (the only change record in the bisect result)
  4. Analyze the code change content of the target commit/PR, lock the root cause code line, and verify: after rolling back this commit/PR, confirm whether the bug disappears
  5. If multiple PRs/commits are involved, sort them by impact level and provide a method for verification one by one

第三步:最小化Diff格式补丁生成与校验

Step 3: Minimized Diff-format Patch Generation and Verification

校验点:补丁为标准diff格式、修改粒度最小、语法无错误、仅针对根因修复 ❌ 中断条件:根因未锁定 / 无有效代码上下文 → 抛出错误码E003并中断 📝 反馈:输出「Diff格式修复补丁已生成,共X处修改 / 补丁生成失败,错误码E003」
Checkpoint: Patch is in standard diff format, minimal modification granularity, no syntax errors, and only fixes the root cause ❌ Interrupt Condition: Root cause not locked / No valid code context → Throw error code E003 and interrupt 📝 Feedback: Output "Diff-format fix patch generated, total X changes / Patch generation failed, error code E003"

补丁生成原则(强制遵循)

Patch Generation Principles (Mandatory Compliance)

  • 最小化修改:仅修改与根因相关的代码,无无关逻辑/格式变更
  • 根因修复:解决问题本质,而非临时规避表象问题
  • 防御性编程:必要时添加输入校验、边界判断、异常捕获逻辑
  • 工程化规范:符合项目编码规范,不引入新的语法/逻辑问题
  • 可追溯性:关键修复点添加单行注释,说明修复原因和关联根因
  • Minimal modification: Only modify code related to the root cause, no irrelevant logic/format changes
  • Root cause fix: Address the essence of the problem, rather than temporarily circumventing surface issues
  • Defensive programming: Add input validation, boundary judgment, exception capture logic when necessary
  • Engineering specifications: Comply with project coding specifications, do not introduce new syntax/logic issues
  • Traceability: Add single-line comments at key fix points to explain the fix reason and associated root cause

第四步:防回归测试用例生成与校验

Step 4: Regression-preventive Test Case Generation and Verification

校验点:测试用例可直接执行、覆盖根因相关核心场景、匹配项目实际使用的测试框架 ❌ 中断条件:无有效修复补丁 / 未指定测试框架 → 抛出错误码E004并中断 📝 反馈:输出「防回归测试用例已生成,共X个(适配XXX框架) / 测试用例生成失败,错误码E004」
Checkpoint: Test cases are directly executable, cover core scenarios related to the root cause, and match the test framework actually used by the project ❌ Interrupt Condition: No valid fix patch / Test framework not specified → Throw error code E004 and interrupt 📝 Feedback: Output "Regression-preventive test cases generated, total X (adapted to XXX framework) / Test case generation failed, error code E004"

测试用例核心要求

Core Requirements for Test Cases

  • 可复现:能精准复现原始bug(修复前执行失败)
  • 可验证:修复后执行通过,能有效验证补丁的修复效果
  • 全覆盖:覆盖根因相关的正常场景、边界场景、异常场景
  • 可执行:无语法错误,适配主流测试框架(gtest/pytest/JUnit/GoTest/Jest/Vue Test Utils)
  • Reproducible: Can accurately reproduce the original bug (fails before fix)
  • Verifiable: Passes after fix, can effectively verify the patch's fix effect
  • Full coverage: Covers normal scenarios, boundary scenarios, and exception scenarios related to the root cause
  • Executable: No syntax errors, adapts to mainstream test frameworks (gtest/pytest/JUnit/GoTest/Jest/Vue Test Utils)

输出格式(结构化Markdown,机器+人类双友好,固定模板不可随意修改)

Output Format (Structured Markdown, Friendly to Both Machines and Humans, Fixed Template Cannot Be Modified Randomly)

Bug 分析报告

Bug Analysis Report

一、错误现象

1. Error Symptoms

  1. 错误描述:[清晰描述错误类型、报错信息、影响范围、关联业务场景]
  2. 复现步骤:[步骤化复现方法,包含运行环境、测试数据、操作步骤、关联版本]
  3. 输入校验:[通过/失败,失败则标注错误码+具体原因]
  4. 代码变更信息:[从输入中提取的PR/提交/分支信息,标注二分法定位关键信息]
  1. Error Description: [Clearly describe error type, error message, impact scope, associated business scenario]
  2. Reproduction Steps: [Step-by-step reproduction method, including runtime environment, test data, operation steps, associated version]
  3. Input Validation: [Pass/Fail, if failed, mark error code + specific reason]
  4. Code Change Information: [PR/commit/branch information extracted from input, mark key information of bisect localization]

二、根因分析

2. Root Cause Analysis

  1. 根因定位:[具体根因,如除零错误/空指针异常/逻辑判断错误,关联PR/提交则标注哈希/编号]
  2. 定位方法:[自顶向下追踪法 / Git二分法(关联PR:XXX,提交哈希:XXX)]
  3. 追踪过程:[简要描述数据流/调用链/Git二分法的执行过程,标注锁定根因的关键步骤]
  4. 影响范围:[受影响的代码模块/PR/分支/业务场景/用户群体]
  1. Root Cause Localization: [Specific root cause, such as division by zero error/null pointer exception/logic judgment error, mark hash/number if associated with PR/commit]
  2. Localization Method: [Top-down tracing method / Git bisect method (associated PR: XXX, commit hash: XXX)]
  3. Tracing Process: [Briefly describe the execution process of data flow/call chain/Git bisect, mark key steps that locked the root cause]
  4. Impact Scope: [Affected code modules/PRs/branches/business scenarios/user groups]

三、最小化Diff格式修复补丁

3. Minimized Diff-format Fix Patch

[标准diff格式,包含文件路径、行号、修改内容,关键修复点添加注释说明]
diff
diff --git a/.github/workflows/nightly_benchmarks.yaml b/.github/workflows/nightly_benchmarks.yaml
index 123456..789abc 100644
--- a/.github/workflows/nightly_benchmarks.yaml
+++ b/.github/workflows/nightly_benchmarks.yaml
@@ -15,6 +15,10 @@
     - name: Install dependencies
       run: |
         pip install -e .
+        # 安装msprobe相关依赖(修复根因:PR#4241引入的msprobe功能需要这些依赖)
+        pip install mindstudio-probe==8.3.0 || echo "msprobe installation skipped"
+        pip install tb_graph_ascend || echo "tb_graph_ascend installation skipped"
[Standard diff format, including file path, line number, modified content, add comments at key fix points to explain]
diff
diff --git a/.github/workflows/nightly_benchmarks.yaml b/.github/workflows/nightly_benchmarks.yaml
index 123456..789abc 100644
--- a/.github/workflows/nightly_benchmarks.yaml
+++ b/.github/workflows/nightly_benchmarks.yaml
@@ -15,6 +15,10 @@
     - name: Install dependencies
       run: |
         pip install -e .
+        # Install msprobe-related dependencies (Root cause fix: msprobe feature introduced by PR#4241 requires these dependencies)
+        pip install mindstudio-probe==8.3.0 || echo "msprobe installation skipped"
+        pip install tb_graph_ascend || echo "tb_graph_ascend installation skipped"

四、防回归测试用例

4. Regression-preventive Test Cases

[可执行的测试代码,适配项目实际使用的测试框架,覆盖正常/边界/异常场景]
python
undefined
[Executable test code, adapted to the test framework actually used by the project, covering normal/boundary/exception scenarios]
python
undefined

示例:pytest框架测试用例

Example: pytest framework test case

def test_bug_fix_reproduction(): """复现原始bug(修复前执行失败)""" # 测试代码... assert result == expected
def test_bug_fix_validation(): """验证修复效果(修复后执行通过)""" # 测试代码... assert fixed_result == expected
def test_edge_cases(): """边界场景测试""" # 测试代码... pass
undefined
def test_bug_fix_reproduction(): """Reproduce the original bug (fails before fix)""" # Test code... assert result == expected
def test_bug_fix_validation(): """Verify fix effect (passes after fix)""" # Test code... assert fixed_result == expected
def test_edge_cases(): """Boundary scenario test""" # Test code... pass
undefined

五、预防建议

5. Prevention Suggestions

  1. 代码层面:[针对根因的代码改进建议,如添加参数校验、异常处理等]
  2. 流程层面:[开发流程改进建议,如代码审查要点、测试覆盖要求等]
  3. 工具层面:[工具或监控建议,如静态分析、日志增强等]
  1. Code Level: [Code improvement suggestions targeting the root cause, such as adding parameter validation, exception handling, etc.]
  2. Process Level: [Development process improvement suggestions, such as code review key points, test coverage requirements, etc.]
  3. Tool Level: [Tool or monitoring suggestions, such as static analysis, log enhancement, etc.]