pt2-bug-basher
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePT2 Bug Basher
PT2 故障调试指南
Debug test failures and runtime errors in the PyTorch 2 compiler stack (Dynamo, Inductor, AOTAutograd, FX graphs).
调试PyTorch 2编译器栈(Dynamo、Inductor、AOTAutograd、FX图)中的测试失败和运行时错误。
Workflow Summary
工作流程概述
- Reproduce -- Get a consistent reproduction of the failure
- Minimize -- Reduce the repro to the smallest possible standalone case. Strip away unrelated model logic, use minimal tensor shapes, and isolate the specific op or pattern that triggers the bug.
- Add a unit test -- Do this BEFORE diving into code search or root cause investigation. Add a failing test to the codebase that captures the bug. Place it in a specific, topic-appropriate test file (e.g., ,
test/dynamo/test_repros.py,test/inductor/test_torchinductor.py). Avoidtest/export/test_export.py— it is already oversized; find a more specific test file that matches the area of the bug. Usetest/dynamo/test_misc.pyandtorch.testing._internal.common_utils.TestCase. The test must fail before the fix and pass after. Having the test first keeps you grounded — you know exactly what "fixed" looks like before you start exploring the codebase.run_tests - Gather logs -- Run with appropriate settings
TORCH_LOGS - Classify -- Use the Error Triage table to identify the category
- Inspect artifacts -- Check FX graphs, IR, and generated code via
TORCH_COMPILE_DEBUG=1 - Identify root cause -- Trace from the error back through the compilation pipeline
- Fix -- Apply the fix
- Verify -- Run the new unit test AND nearby related existing tests (e.g., if you changed how works, also run the existing
is_exportingexport test). Usetest_is_exportingto quickly run related tests by name. The task is not complete until all pass.pytest -k - Self-review -- Use the skill to review your own changes before presenting them. Fix any issues it flags.
/pr-review - Celebrate -- Summarize the changes: explain the root cause, what was changed and why, and which tests were added/verified. Then tell the user the bug is squashed. Include a fun, varied motivational message or easter egg to keep spirits high (e.g., a pun, a quote, an ASCII art bug getting squashed). Keep it short and different each time.
- 复现问题 —— 稳定复现故障场景
- 最小化复现用例 —— 将复现场景简化为最小的独立案例。剥离无关的模型逻辑,使用最小的张量形状,定位触发故障的特定算子或模式。
- 添加单元测试 —— 在深入代码搜索或根因分析前务必完成此步骤。向代码库中添加一个能复现该故障的测试用例。将其放在特定的、与主题匹配的测试文件中(例如 、
test/dynamo/test_repros.py、test/inductor/test_torchinductor.py)。避免使用test/export/test_export.py—— 该文件已过于庞大,请找到与故障领域更匹配的特定测试文件。使用test/dynamo/test_misc.py和torch.testing._internal.common_utils.TestCase。测试必须在修复前失败,修复后通过。先编写测试能让你目标明确——在开始探索代码库前,你就清楚知道“修复完成”的标准是什么。run_tests - 收集日志 —— 使用合适的 配置运行程序
TORCH_LOGS - 分类故障 —— 使用错误分类表确定故障类别
- 检查产物 —— 通过 查看FX图、IR和生成的代码
TORCH_COMPILE_DEBUG=1 - 定位根因 —— 从错误反向追踪编译流水线
- 修复问题 —— 应用修复方案
- 验证修复 —— 运行新添加的单元测试以及附近相关的现有测试(例如,如果你修改了 的工作方式,还需运行现有的
is_exporting导出测试)。使用test_is_exporting按名称快速运行相关测试。所有测试通过后,任务才算完成。pytest -k - 自我评审 —— 在提交变更前,使用 工具对自己的修改进行评审。修复它指出的所有问题。
/pr-review - 总结成果 —— 总结变更内容:解释根因、修改的内容及原因,以及添加/验证了哪些测试。然后告知用户故障已解决。可以加入有趣多样的激励话语或彩蛋来提升士气(例如双关语、名言、被压扁的ASCII艺术虫子)。保持简短且每次内容不同。
Investigation Strategy
调查策略
Prefer direct tools over meta_codesearch
优先使用直接工具而非meta_codesearch
Use , , and directly for code exploration. Do not spawn agents — they are slow and expensive. The Architectural Knowledge and Key Source Files sections below should give you enough context to know where to look. A targeted for a function name is always faster.
GrepGlobReadmeta_codesearchGrepKnow which compilation mode you're in
明确当前的编译模式
Before reading implementation code, determine the compilation mode. These share code but diverge in important ways:
- -- Dynamo + Inductor.
torch.compile, notx.export=False._compiling_state_context() - (strict) --
torch.export,tx.export=Trueactive._compiling_state_context() - (non-strict, the default) -- Uses Dynamo via
torch.exportbutfullgraph_capturemay differ from strict.tx.exportactive. Check_compiling_state_context()— it changes which code path is used.torch._export.config.use_new_tracer_experimental
在阅读实现代码前,先确定当前的编译模式。这些模式共享部分代码,但在重要环节存在差异:
- —— Dynamo + Inductor。
torch.compile,无tx.export=False。_compiling_state_context() - (严格模式) ——
torch.export,tx.export=True处于激活状态。_compiling_state_context() - (非严格模式,默认) —— 通过
torch.export使用 Dynamo,但fullgraph_capture可能与严格模式不同。tx.export处于激活状态。检查_compiling_state_context()—— 它会改变使用的代码路径。torch._export.config.use_new_tracer_experimental
Distinguish trace-time vs runtime
区分追踪时与运行时
Many PT2 bugs come from confusing these two:
- Trace-time: Inside Dynamo's symbolic interpreter. Dynamo intercepts function calls and may constant-fold them (e.g., →
is_exporting()).ConstantVariable(True) - Runtime: Real tensors, real Python calls, module-level flags like .
torch.compiler._is_exporting_flag
When debugging, add temporary statements directly in the source file rather than monkey-patching from outside — dispatch chains make monkey-patching unreliable.
print()PT2的许多故障源于混淆了这两个阶段:
- 追踪时:在Dynamo的符号解释器内部。Dynamo拦截函数调用,并可能对其进行常量折叠(例如 →
is_exporting())。ConstantVariable(True) - 运行时:使用真实张量、真实Python调用,以及模块级标志如 。
torch.compiler._is_exporting_flag
调试时,直接在源文件中添加临时 语句,而非从外部进行猴子补丁——调度链会让猴子补丁不可靠。
print()Gathering Information
信息收集
Pick the right diagnostic tool based on the error category:
- Quick overview:
TORCH_LOGS="+dynamo,graph_breaks,recompiles" python your_script.py - Full debug artifacts: — creates
TORCH_COMPILE_DEBUG=1 python your_script.pywith FX graphs, Inductor IR, and generated codetorch_compile_debug/ - Generated code only:
TORCH_LOGS="output_code" python your_script.py - Structured tracing: then
TORCH_TRACE=/path/to/trace python your_script.pytlparse /path/to/trace - Single-threaded (for pdb):
TORCHINDUCTOR_COMPILE_THREADS=1 python your_script.py
根据错误类别选择合适的诊断工具:
- 快速概览:
TORCH_LOGS="+dynamo,graph_breaks,recompiles" python your_script.py - 完整调试产物:—— 创建
TORCH_COMPILE_DEBUG=1 python your_script.py目录,包含FX图、Inductor IR和生成的代码torch_compile_debug/ - 仅查看生成的代码:
TORCH_LOGS="output_code" python your_script.py - 结构化追踪:然后运行
TORCH_TRACE=/path/to/trace python your_script.pytlparse /path/to/trace - 单线程模式(用于pdb调试):
TORCHINDUCTOR_COMPILE_THREADS=1 python your_script.py
Error Triage
错误分类
Classify the failure using the error message and traceback:
| Error Pattern | Category | Jump To |
|---|---|---|
| Graph break | Graph Breaks |
| Inductor/backend crash | Backend Failures |
| Recompilation | Recompilation |
| Accuracy mismatch / wrong numerical output | Accuracy | Accuracy |
| Dynamo bug | Internal Errors |
| Segfault or CUDA IMA | Runtime crash | Runtime Crashes |
| Triton assertion / index out of bounds | Triton kernel bug | Triton Failures |
Debugging by Category
按类别调试
Graph Breaks
图中断
Graph breaks split the compiled graph into smaller subgraphs, often causing performance regressions or unexpected behavior.
Diagnosis:
bash
TORCH_LOGS="graph_breaks" python your_script.pyKey files:
- --
torch/_dynamo/exc.pyexception classUnsupported - -- where most graph break decisions happen
torch/_dynamo/variables/
Common causes:
- Unsupported Python constructs (data-dependent control flow, unsupported builtins)
- Tensor operations that can't be traced (in-place ops on inputs, unsupported dtypes)
- Calls to non-traceable functions
Fix approach:
- Read the graph break message to identify the unsupported operation
- Check if there's a decomposition or supported alternative
- If the operation genuinely can't be traced, consider or restructuring user code
torch._dynamo.allow_in_graph
图中断会将编译后的图拆分为更小的子图,通常会导致性能下降或意外行为。
诊断方法:
bash
TORCH_LOGS="graph_breaks" python your_script.py关键文件:
- ——
torch/_dynamo/exc.py异常类Unsupported - —— 大多数图中断决策的发生位置
torch/_dynamo/variables/
常见原因:
- 不支持的Python构造(依赖数据的控制流、不支持的内置函数)
- 无法被追踪的张量操作(对输入的原地操作、不支持的数据类型)
- 调用不可追踪的函数
修复方法:
- 查看图中断消息,识别不支持的操作
- 检查是否存在分解函数或支持的替代方案
- 如果该操作确实无法被追踪,可考虑使用 或重构用户代码
torch._dynamo.allow_in_graph
Backend Compiler Failures
后端编译器故障
BackendCompilerFailedDiagnosis:
bash
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=2 python your_script.pyThis generates that isolates the minimal failing graph.
minifier_launcher.pyKey files:
- -- repro/minifier for post-AOT failures
torch/_dynamo/repro/after_aot.py - -- the backend itself
torch/_inductor/
Fix approach:
- Run the minifier to get a minimal reproduction
- Inspect the FX graph () to understand what ops are involved
TORCH_COMPILE_DEBUG=1 - Check if it's a lowering issue (), scheduling issue, or codegen issue
torch/_inductor/lowering.py - Look at the generated output code if the error is in codegen
BackendCompilerFailed诊断方法:
bash
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=2 python your_script.py这会生成 ,用于隔离最小的故障图。
minifier_launcher.py关键文件:
- —— AOT后故障的复现/最小化工具
torch/_dynamo/repro/after_aot.py - —— 后端本身
torch/_inductor/
修复方法:
- 运行最小化工具,获取最小复现用例
- 检查FX图(通过 )以了解涉及的算子
TORCH_COMPILE_DEBUG=1 - 检查是否是Lowering问题()、调度问题或代码生成问题
torch/_inductor/lowering.py - 如果错误出现在代码生成阶段,查看生成的输出代码
Recompilation Issues
重新编译问题
Excessive recompilation happens when guards are too specific, causing cache misses.
Diagnosis:
bash
TORCH_LOGS="recompiles,recompiles_verbose,guards" python your_script.pyKey config:
- (default: 8)
torch._dynamo.config.recompile_limit - -- set to
torch._dynamo.config.fail_on_recompile_limit_hitto get a hard errorTrue
Common causes:
- Changing tensor shapes without marking them dynamic
- Python scalar values that change between calls
- Global state mutations between calls
Fix approach:
- Read the recompilation reason from logs
- Identify the failing guard
- Either mark the relevant dimension as dynamic with or fix the source of guard instability
torch._dynamo.mark_dynamic()
当防护条件过于严格导致缓存未命中时,会发生过度重新编译。
诊断方法:
bash
TORCH_LOGS="recompiles,recompiles_verbose,guards" python your_script.py关键配置:
- (默认值:8)
torch._dynamo.config.recompile_limit - —— 设置为
torch._dynamo.config.fail_on_recompile_limit_hit以触发硬错误True
常见原因:
- 未标记为动态的张量形状变化
- 调用之间Python标量值发生变化
- 调用之间全局状态被修改
修复方法:
- 从日志中查看重新编译的原因
- 识别失效的防护条件
- 使用 将相关维度标记为动态,或修复防护条件不稳定的根源
torch._dynamo.mark_dynamic()
Accuracy Issues
精度问题
The compiled model produces different numerical results than eager mode.
Diagnosis:
bash
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=4 python your_script.pyThis compares compiled vs. eager with an fp64 reference and dumps a repro if accuracy fails.
Key utilities:
- --
torch/_dynamo/debug_utils.py,same_two_models(),backend_accuracy_fails()cast_to_fp64() - (default: 1e-3)
torch._dynamo.config.repro_tolerance
Fix approach:
- Get the minimal failing graph from the minifier
- Compare eager vs. compiled output at fp64 precision
- Binary search through ops to find the diverging operation
- Check for known numerical issues (reduction order, fused kernels, dtype promotions)
编译后的模型产生与eager模式不同的数值结果。
诊断方法:
bash
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=4 python your_script.py这会将编译后结果与eager模式结果进行fp64精度对比,若精度不匹配则生成复现用例。
关键工具:
- ——
torch/_dynamo/debug_utils.py、same_two_models()、backend_accuracy_fails()cast_to_fp64() - (默认值:1e-3)
torch._dynamo.config.repro_tolerance
修复方法:
- 从最小化工具获取最小故障图
- 在fp64精度下对比eager模式与编译模式的输出
- 通过二分法查找导致结果差异的算子
- 检查已知的数值问题(归约顺序、融合内核、数据类型提升)
Internal Dynamo Errors
内部Dynamo错误
InternalTorchDynamoErrorDiagnosis:
bash
TORCHDYNAMO_VERBOSE=1 python your_script.pyInternalTorchDynamoError诊断方法:
bash
TORCHDYNAMO_VERBOSE=1 python your_script.pyor equivalently:
或等价命令:
TORCH_LOGS="+dynamo" python your_script.py
**Key files:**
- `torch/_dynamo/symbolic_convert.py` -- bytecode interpreter
- `torch/_dynamo/variables/` -- variable tracking system
- `torch/_dynamo/guards.py` -- guard generation
**Fix approach:**
1. Get the full stack trace with `TORCHDYNAMO_VERBOSE=1`
2. Identify which bytecode instruction or variable type caused the crash
3. Create a minimal repro (the error message often includes a minifier path)
4. Debug with `TORCHINDUCTOR_COMPILE_THREADS=1` and pdb if neededTORCH_LOGS="+dynamo" python your_script.py
**关键文件:**
- `torch/_dynamo/symbolic_convert.py` —— 字节码解释器
- `torch/_dynamo/variables/` —— 变量跟踪系统
- `torch/_dynamo/guards.py` —— 防护条件生成
**修复方法:**
1. 使用 `TORCHDYNAMO_VERBOSE=1` 获取完整堆栈跟踪
2. 识别导致崩溃的字节码指令或变量类型
3. 创建最小复现用例(错误消息通常包含最小化工具的路径)
4. 必要时使用 `TORCHINDUCTOR_COMPILE_THREADS=1` 和pdb进行调试Runtime Crashes
运行时崩溃
Segfaults and CUDA illegal memory access errors during execution of compiled code.
Diagnosis (make crash deterministic):
bash
PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1 python your_script.pyFor CUDA IMA, add NaN checks:
bash
TORCHINDUCTOR_NAN_ASSERTS=1 python your_script.pyFor Inductor-level sync debugging:
python
torch._inductor.config.triton.debug_sync_kernel = True # sync after every kernel
torch._inductor.config.triton.debug_sync_graph = True # sync before/after graphFix approach:
- Make the crash deterministic with
PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1 - Check if it's an input mismatch (shapes, devices, dtypes)
- Inspect the generated kernel code with
TORCH_LOGS="output_code" - Use to find the first kernel producing bad values
TORCHINDUCTOR_NAN_ASSERTS=1 - Check for dynamic shapes issues (historically a common source of IMA)
编译后代码执行过程中出现段错误或CUDA非法内存访问错误。
诊断方法(使崩溃可复现):
bash
PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1 python your_script.py对于CUDA IMA,添加NaN检查:
bash
TORCHINDUCTOR_NAN_ASSERTS=1 python your_script.py对于Inductor级同步调试:
python
torch._inductor.config.triton.debug_sync_kernel = True # 每个内核执行后同步
torch._inductor.config.triton.debug_sync_graph = True # 图执行前后同步修复方法:
- 使用 使崩溃可复现
PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1 - 检查是否是输入不匹配(形状、设备、数据类型)
- 使用 查看生成的内核代码
TORCH_LOGS="output_code" - 使用 找到第一个产生错误值的内核
TORCHINDUCTOR_NAN_ASSERTS=1 - 检查动态形状问题(历史上是IMA的常见根源)
Triton Kernel Failures
Triton内核故障
Triton assertion failures or index-out-of-bounds in generated kernels.
Diagnosis:
bash
TORCH_LOGS="output_code,schedule" python your_script.pyKey files:
- -- Triton codegen
torch/_inductor/codegen/triton.py - -- kernel fusion decisions
torch/_inductor/scheduler.py
Fix approach:
- Get the generated Triton kernel from logs
output_code - Check index computations for off-by-one or wrong stride calculations
- Look at the IR () to trace back to the FX op
TORCH_COMPILE_DEBUG=1 - Check if fusion decisions created invalid index combinations
生成的内核中出现Triton断言失败或索引越界。
诊断方法:
bash
TORCH_LOGS="output_code,schedule" python your_script.py关键文件:
- —— Triton代码生成
torch/_inductor/codegen/triton.py - —— 内核融合决策
torch/_inductor/scheduler.py
修复方法:
- 从 日志中获取生成的Triton内核
output_code - 检查索引计算是否存在差一错误或步长计算错误
- 通过IR()追溯到对应的FX算子
TORCH_COMPILE_DEBUG=1 - 检查融合决策是否导致了无效的索引组合
Key Source Files
关键源文件
| File | Purpose |
|---|---|
| Exception hierarchy and error formatting |
| Minifier support, accuracy checking, input serialization |
| Repro/minifier for Dynamo-stage failures |
| Repro/minifier for post-AOTAutograd failures |
| Repro/minifier for AOTI failures |
| Dynamo config (repro levels, recompile limits) |
| Torch function handling, tracing state functions |
| HOP tracing (cond, map, etc.) |
| Bytecode interpreter, InstructionTranslator |
| Frame compilation, |
| New export tracer ( |
| |
| Export pipeline ( |
| |
| |
| |
| |
| Inductor config (debug flags, trace settings) |
| DebugContext, graph visualization, IR logging |
| All registered log aliases and artifacts |
| 文件 | 用途 |
|---|---|
| 异常层级结构和错误格式化 |
| 最小化工具支持、精度检查、输入序列化 |
| Dynamo阶段故障的复现/最小化工具 |
| AOTAutograd后故障的复现/最小化工具 |
| AOTI故障的复现/最小化工具 |
| Dynamo配置(复现级别、重新编译限制) |
| Torch函数处理、追踪状态函数 |
| HOP追踪(cond、map等) |
| 字节码解释器、InstructionTranslator |
| 帧编译、 |
| 新的导出追踪器( |
| |
| 导出流水线( |
| |
| |
| |
| HOP分支追踪的 |
| Inductor配置(调试标志、追踪设置) |
| DebugContext、图可视化、IR日志 |
| 所有已注册的日志别名和产物 |
Using the Minifier
使用最小化工具
The minifier reduces a failing graph to the smallest reproduction:
bash
undefined最小化工具可将故障图简化为最小复现用例:
bash
undefinedStep 1: Generate the minifier launcher
步骤1:生成最小化工具启动器
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=2 python your_script.py
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=2 python your_script.py
Step 2: Run the minifier
步骤2:运行最小化工具
python minifier_launcher.py minify
python minifier_launcher.py minify
Step 3: Run the minimized repro
步骤3:运行最小化后的复现用例
python minifier_launcher.py run
For accuracy issues, use level 4:
```bash
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=4 python your_script.pypython minifier_launcher.py run
对于精度问题,使用级别4:
```bash
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=4 python your_script.py