pt2-bug-basher

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

PT2 Bug Basher

PT2 故障调试指南

Debug test failures and runtime errors in the PyTorch 2 compiler stack (Dynamo, Inductor, AOTAutograd, FX graphs).

调试PyTorch 2编译器栈（Dynamo、Inductor、AOTAutograd、FX图）中的测试失败和运行时错误。

Workflow Summary

工作流程概述

Reproduce -- Get a consistent reproduction of the failure
Minimize -- Reduce the repro to the smallest possible standalone case. Strip away unrelated model logic, use minimal tensor shapes, and isolate the specific op or pattern that triggers the bug.
Add a unit test -- Do this BEFORE diving into code search or root cause investigation. Add a failing test to the codebase that captures the bug. Place it in a specific, topic-appropriate test file (e.g.,
```
test/dynamo/test_repros.py
```
,
```
test/inductor/test_torchinductor.py
```
,
```
test/export/test_export.py
```
). Avoid
test/dynamo/test_misc.py
— it is already oversized; find a more specific test file that matches the area of the bug. Use
```
torch.testing._internal.common_utils.TestCase
```
and
```
run_tests
```
. The test must fail before the fix and pass after. Having the test first keeps you grounded — you know exactly what "fixed" looks like before you start exploring the codebase.
Gather logs -- Run with appropriate
```
TORCH_LOGS
```
settings
Classify -- Use the Error Triage table to identify the category
Inspect artifacts -- Check FX graphs, IR, and generated code via
```
TORCH_COMPILE_DEBUG=1
```
Identify root cause -- Trace from the error back through the compilation pipeline
Fix -- Apply the fix
Verify -- Run the new unit test AND nearby related existing tests (e.g., if you changed how
```
is_exporting
```
works, also run the existing
```
test_is_exporting
```
export test). Use
```
pytest -k
```
to quickly run related tests by name. The task is not complete until all pass.
Self-review -- Use the
```
/pr-review
```
skill to review your own changes before presenting them. Fix any issues it flags.
Celebrate -- Summarize the changes: explain the root cause, what was changed and why, and which tests were added/verified. Then tell the user the bug is squashed. Include a fun, varied motivational message or easter egg to keep spirits high (e.g., a pun, a quote, an ASCII art bug getting squashed). Keep it short and different each time.

复现问题 —— 稳定复现故障场景
最小化复现用例 —— 将复现场景简化为最小的独立案例。剥离无关的模型逻辑，使用最小的张量形状，定位触发故障的特定算子或模式。
添加单元测试 —— 在深入代码搜索或根因分析前务必完成此步骤。向代码库中添加一个能复现该故障的测试用例。将其放在特定的、与主题匹配的测试文件中（例如
```
test/dynamo/test_repros.py
```
、
```
test/inductor/test_torchinductor.py
```
、
```
test/export/test_export.py
```
）。避免使用
test/dynamo/test_misc.py
—— 该文件已过于庞大，请找到与故障领域更匹配的特定测试文件。使用
```
torch.testing._internal.common_utils.TestCase
```
和
```
run_tests
```
。测试必须在修复前失败，修复后通过。先编写测试能让你目标明确——在开始探索代码库前，你就清楚知道“修复完成”的标准是什么。
收集日志 —— 使用合适的
```
TORCH_LOGS
```
配置运行程序
分类故障 —— 使用错误分类表确定故障类别
检查产物 —— 通过
```
TORCH_COMPILE_DEBUG=1
```
查看FX图、IR和生成的代码
定位根因 —— 从错误反向追踪编译流水线
修复问题 —— 应用修复方案
验证修复 —— 运行新添加的单元测试以及附近相关的现有测试（例如，如果你修改了
```
is_exporting
```
的工作方式，还需运行现有的
```
test_is_exporting
```
导出测试）。使用
```
pytest -k
```
按名称快速运行相关测试。所有测试通过后，任务才算完成。
自我评审 —— 在提交变更前，使用
```
/pr-review
```
工具对自己的修改进行评审。修复它指出的所有问题。
总结成果 —— 总结变更内容：解释根因、修改的内容及原因，以及添加/验证了哪些测试。然后告知用户故障已解决。可以加入有趣多样的激励话语或彩蛋来提升士气（例如双关语、名言、被压扁的ASCII艺术虫子）。保持简短且每次内容不同。

Investigation Strategy

调查策略

Prefer direct tools over meta_codesearch

优先使用直接工具而非meta_codesearch

Use

Grep

Glob

, and

Read

directly for code exploration. Do not spawn
meta_codesearch
agents — they are slow and expensive. The Architectural Knowledge and Key Source Files sections below should give you enough context to know where to look. A targeted

Grep

for a function name is always faster.

直接使用

Grep

、

Glob

和

Read

进行代码探索。不要启动
meta_codesearch
代理——它们速度慢且成本高。下方的架构知识和关键源文件部分应能为你提供足够的上下文，让你知道该从何处入手。针对函数名的定向

Grep

总是更快。

Know which compilation mode you're in

明确当前的编译模式

Before reading implementation code, determine the compilation mode. These share code but diverge in important ways:

torch.compile
-- Dynamo + Inductor.

tx.export=False

, no

_compiling_state_context()

torch.export
(strict) --

tx.export=True

_compiling_state_context()

active.

torch.export
(non-strict, the default) -- Uses Dynamo via
```
fullgraph_capture
```
but
```
tx.export
```
may differ from strict.
```
_compiling_state_context()
```
active. Check
```
torch._export.config.use_new_tracer_experimental
```
— it changes which code path is used.

在阅读实现代码前，先确定当前的编译模式。这些模式共享部分代码，但在重要环节存在差异：

torch.compile
—— Dynamo + Inductor。

tx.export=False

，无

_compiling_state_context()

。

torch.export
（严格模式） ——

tx.export=True

，

_compiling_state_context()

处于激活状态。

torch.export
（非严格模式，默认） —— 通过
```
fullgraph_capture
```
使用 Dynamo，但
```
tx.export
```
可能与严格模式不同。
```
_compiling_state_context()
```
处于激活状态。检查
```
torch._export.config.use_new_tracer_experimental
```
—— 它会改变使用的代码路径。

Distinguish trace-time vs runtime

区分追踪时与运行时

Many PT2 bugs come from confusing these two:

Trace-time: Inside Dynamo's symbolic interpreter. Dynamo intercepts function calls and may constant-fold them (e.g.,
```
is_exporting()
```
→
```
ConstantVariable(True)
```
).
Runtime: Real tensors, real Python calls, module-level flags like
```
torch.compiler._is_exporting_flag
```
.

When debugging, add temporary

print()

statements directly in the source file rather than monkey-patching from outside — dispatch chains make monkey-patching unreliable.

PT2的许多故障源于混淆了这两个阶段：

追踪时：在Dynamo的符号解释器内部。Dynamo拦截函数调用，并可能对其进行常量折叠（例如
```
is_exporting()
```
→
```
ConstantVariable(True)
```
）。
运行时：使用真实张量、真实Python调用，以及模块级标志如
```
torch.compiler._is_exporting_flag
```
。

调试时，直接在源文件中添加临时

print()

语句，而非从外部进行猴子补丁——调度链会让猴子补丁不可靠。

Gathering Information

信息收集

Pick the right diagnostic tool based on the error category:

Quick overview:

TORCH_LOGS="+dynamo,graph_breaks,recompiles" python your_script.py

Full debug artifacts:
```
TORCH_COMPILE_DEBUG=1 python your_script.py
```
— creates
```
torch_compile_debug/
```
with FX graphs, Inductor IR, and generated code

Generated code only:

TORCH_LOGS="output_code" python your_script.py

Structured tracing:

TORCH_TRACE=/path/to/trace python your_script.py

then

tlparse /path/to/trace

Single-threaded (for pdb):

TORCHINDUCTOR_COMPILE_THREADS=1 python your_script.py

根据错误类别选择合适的诊断工具：

快速概览：

TORCH_LOGS="+dynamo,graph_breaks,recompiles" python your_script.py

完整调试产物：
```
TORCH_COMPILE_DEBUG=1 python your_script.py
```
—— 创建
```
torch_compile_debug/
```
目录，包含FX图、Inductor IR和生成的代码

仅查看生成的代码：

TORCH_LOGS="output_code" python your_script.py

结构化追踪：

TORCH_TRACE=/path/to/trace python your_script.py

然后运行

tlparse /path/to/trace

单线程模式（用于pdb调试）：

TORCHINDUCTOR_COMPILE_THREADS=1 python your_script.py

Error Triage

错误分类

Classify the failure using the error message and traceback:

Error Pattern	Category	Jump To
`Unsupported: ...` or `graph break` in logs	Graph break	Graph Breaks
`BackendCompilerFailed`	Inductor/backend crash	Backend Failures
`RecompileError` or `cache_size_limit`	Recompilation	Recompilation
Accuracy mismatch / wrong numerical output	Accuracy	Accuracy
`InternalTorchDynamoError`	Dynamo bug	Internal Errors
Segfault or CUDA IMA	Runtime crash	Runtime Crashes
Triton assertion / index out of bounds	Triton kernel bug	Triton Failures

根据错误消息和回溯信息对故障进行分类：

错误模式	类别	跳转至
日志中出现 `Unsupported: ...` 或 `graph break`	图中断	图中断
`BackendCompilerFailed`	Inductor/后端崩溃	后端故障
`RecompileError` 或 `cache_size_limit`	重新编译问题	重新编译
精度不匹配 / 数值输出错误	精度问题	精度问题
`InternalTorchDynamoError`	Dynamo故障	内部错误
段错误或CUDA IMA	运行时崩溃	运行时崩溃
Triton断言失败 / 索引越界	Triton内核故障	Triton故障

Debugging by Category

按类别调试

Graph Breaks

图中断

Graph breaks split the compiled graph into smaller subgraphs, often causing performance regressions or unexpected behavior.

Diagnosis:

bash

TORCH_LOGS="graph_breaks" python your_script.py

Key files:

```
torch/_dynamo/exc.py
```
--
```
Unsupported
```
exception class
```
torch/_dynamo/variables/
```
-- where most graph break decisions happen

Common causes:

Unsupported Python constructs (data-dependent control flow, unsupported builtins)
Tensor operations that can't be traced (in-place ops on inputs, unsupported dtypes)
Calls to non-traceable functions

Fix approach:

Read the graph break message to identify the unsupported operation
Check if there's a decomposition or supported alternative
If the operation genuinely can't be traced, consider
```
torch._dynamo.allow_in_graph
```
or restructuring user code

图中断会将编译后的图拆分为更小的子图，通常会导致性能下降或意外行为。

诊断方法：

bash

TORCH_LOGS="graph_breaks" python your_script.py

关键文件：

```
torch/_dynamo/exc.py
```
——
```
Unsupported
```
异常类
```
torch/_dynamo/variables/
```
—— 大多数图中断决策的发生位置

常见原因：

不支持的Python构造（依赖数据的控制流、不支持的内置函数）
无法被追踪的张量操作（对输入的原地操作、不支持的数据类型）
调用不可追踪的函数

修复方法：

查看图中断消息，识别不支持的操作
检查是否存在分解函数或支持的替代方案
如果该操作确实无法被追踪，可考虑使用
```
torch._dynamo.allow_in_graph
```
或重构用户代码

Backend Compiler Failures

后端编译器故障

BackendCompilerFailed

means Inductor (or another backend) crashed during compilation.

Diagnosis:

bash

TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=2 python your_script.py

This generates

minifier_launcher.py

that isolates the minimal failing graph.

Key files:

```
torch/_dynamo/repro/after_aot.py
```
-- repro/minifier for post-AOT failures
```
torch/_inductor/
```
-- the backend itself

Fix approach:

Run the minifier to get a minimal reproduction
Inspect the FX graph (
```
TORCH_COMPILE_DEBUG=1
```
) to understand what ops are involved
Check if it's a lowering issue (
```
torch/_inductor/lowering.py
```
), scheduling issue, or codegen issue
Look at the generated output code if the error is in codegen

BackendCompilerFailed

表示Inductor（或其他后端）在编译过程中崩溃。

诊断方法：

bash

TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=2 python your_script.py

这会生成

minifier_launcher.py

，用于隔离最小的故障图。

关键文件：

```
torch/_dynamo/repro/after_aot.py
```
—— AOT后故障的复现/最小化工具
```
torch/_inductor/
```
—— 后端本身

修复方法：

运行最小化工具，获取最小复现用例
检查FX图（通过
```
TORCH_COMPILE_DEBUG=1
```
）以了解涉及的算子
检查是否是Lowering问题（
```
torch/_inductor/lowering.py
```
）、调度问题或代码生成问题
如果错误出现在代码生成阶段，查看生成的输出代码

Recompilation Issues

重新编译问题

Excessive recompilation happens when guards are too specific, causing cache misses.

Diagnosis:

bash

TORCH_LOGS="recompiles,recompiles_verbose,guards" python your_script.py

Key config:

```
torch._dynamo.config.recompile_limit
```
(default: 8)

torch._dynamo.config.fail_on_recompile_limit_hit

-- set to

True

to get a hard error

Common causes:

Changing tensor shapes without marking them dynamic
Python scalar values that change between calls
Global state mutations between calls

Fix approach:

Read the recompilation reason from logs
Identify the failing guard
Either mark the relevant dimension as dynamic with
```
torch._dynamo.mark_dynamic()
```
or fix the source of guard instability

当防护条件过于严格导致缓存未命中时，会发生过度重新编译。

诊断方法：

bash

TORCH_LOGS="recompiles,recompiles_verbose,guards" python your_script.py

关键配置：

```
torch._dynamo.config.recompile_limit
```
（默认值：8）

torch._dynamo.config.fail_on_recompile_limit_hit

—— 设置为

True

以触发硬错误

常见原因：

未标记为动态的张量形状变化
调用之间Python标量值发生变化
调用之间全局状态被修改

修复方法：

从日志中查看重新编译的原因
识别失效的防护条件
使用
```
torch._dynamo.mark_dynamic()
```
将相关维度标记为动态，或修复防护条件不稳定的根源

Accuracy Issues

精度问题

The compiled model produces different numerical results than eager mode.

Diagnosis:

bash

TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=4 python your_script.py

This compares compiled vs. eager with an fp64 reference and dumps a repro if accuracy fails.

Key utilities:

torch/_dynamo/debug_utils.py

same_two_models()

backend_accuracy_fails()

cast_to_fp64()

```
torch._dynamo.config.repro_tolerance
```
(default: 1e-3)

Fix approach:

Get the minimal failing graph from the minifier
Compare eager vs. compiled output at fp64 precision
Binary search through ops to find the diverging operation
Check for known numerical issues (reduction order, fused kernels, dtype promotions)

编译后的模型产生与eager模式不同的数值结果。

诊断方法：

bash

TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=4 python your_script.py

这会将编译后结果与eager模式结果进行fp64精度对比，若精度不匹配则生成复现用例。

关键工具：

torch/_dynamo/debug_utils.py

——

same_two_models()

、

backend_accuracy_fails()

、

cast_to_fp64()

```
torch._dynamo.config.repro_tolerance
```
（默认值：1e-3）

修复方法：

从最小化工具获取最小故障图
在fp64精度下对比eager模式与编译模式的输出
通过二分法查找导致结果差异的算子
检查已知的数值问题（归约顺序、融合内核、数据类型提升）

Internal Dynamo Errors

内部Dynamo错误

InternalTorchDynamoError

indicates a bug in Dynamo itself.

Diagnosis:

bash

TORCHDYNAMO_VERBOSE=1 python your_script.py

InternalTorchDynamoError

表示Dynamo本身存在故障。

诊断方法：

bash

TORCHDYNAMO_VERBOSE=1 python your_script.py

or equivalently:

或等价命令：

TORCH_LOGS="+dynamo" python your_script.py


**Key files:**
- `torch/_dynamo/symbolic_convert.py` -- bytecode interpreter
- `torch/_dynamo/variables/` -- variable tracking system
- `torch/_dynamo/guards.py` -- guard generation

**Fix approach:**
1. Get the full stack trace with `TORCHDYNAMO_VERBOSE=1`
2. Identify which bytecode instruction or variable type caused the crash
3. Create a minimal repro (the error message often includes a minifier path)
4. Debug with `TORCHINDUCTOR_COMPILE_THREADS=1` and pdb if needed

TORCH_LOGS="+dynamo" python your_script.py


**关键文件：**
- `torch/_dynamo/symbolic_convert.py` —— 字节码解释器
- `torch/_dynamo/variables/` —— 变量跟踪系统
- `torch/_dynamo/guards.py` —— 防护条件生成

**修复方法：**
1. 使用 `TORCHDYNAMO_VERBOSE=1` 获取完整堆栈跟踪
2. 识别导致崩溃的字节码指令或变量类型
3. 创建最小复现用例（错误消息通常包含最小化工具的路径）
4. 必要时使用 `TORCHINDUCTOR_COMPILE_THREADS=1` 和pdb进行调试

Runtime Crashes

运行时崩溃

Segfaults and CUDA illegal memory access errors during execution of compiled code.

Diagnosis (make crash deterministic):

bash

PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1 python your_script.py

For CUDA IMA, add NaN checks:

bash

TORCHINDUCTOR_NAN_ASSERTS=1 python your_script.py

For Inductor-level sync debugging:

python

torch._inductor.config.triton.debug_sync_kernel = True  # sync after every kernel
torch._inductor.config.triton.debug_sync_graph = True   # sync before/after graph

Fix approach:

Make the crash deterministic with

PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1

Check if it's an input mismatch (shapes, devices, dtypes)
Inspect the generated kernel code with
```
TORCH_LOGS="output_code"
```
Use
```
TORCHINDUCTOR_NAN_ASSERTS=1
```
to find the first kernel producing bad values
Check for dynamic shapes issues (historically a common source of IMA)

编译后代码执行过程中出现段错误或CUDA非法内存访问错误。

诊断方法（使崩溃可复现）：

bash

PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1 python your_script.py

对于CUDA IMA，添加NaN检查：

bash

TORCHINDUCTOR_NAN_ASSERTS=1 python your_script.py

对于Inductor级同步调试：

python

torch._inductor.config.triton.debug_sync_kernel = True  # 每个内核执行后同步
torch._inductor.config.triton.debug_sync_graph = True   # 图执行前后同步

修复方法：

使用

PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1

使崩溃可复现

检查是否是输入不匹配（形状、设备、数据类型）
使用
```
TORCH_LOGS="output_code"
```
查看生成的内核代码
使用
```
TORCHINDUCTOR_NAN_ASSERTS=1
```
找到第一个产生错误值的内核
检查动态形状问题（历史上是IMA的常见根源）

Triton Kernel Failures

Triton内核故障

Triton assertion failures or index-out-of-bounds in generated kernels.

Diagnosis:

bash

TORCH_LOGS="output_code,schedule" python your_script.py

Key files:

```
torch/_inductor/codegen/triton.py
```
-- Triton codegen
```
torch/_inductor/scheduler.py
```
-- kernel fusion decisions

Fix approach:

Get the generated Triton kernel from
```
output_code
```
logs
Check index computations for off-by-one or wrong stride calculations
Look at the IR (
```
TORCH_COMPILE_DEBUG=1
```
) to trace back to the FX op
Check if fusion decisions created invalid index combinations

生成的内核中出现Triton断言失败或索引越界。

诊断方法：

bash

TORCH_LOGS="output_code,schedule" python your_script.py

关键文件：

```
torch/_inductor/codegen/triton.py
```
—— Triton代码生成
```
torch/_inductor/scheduler.py
```
—— 内核融合决策

修复方法：

从
```
output_code
```
日志中获取生成的Triton内核
检查索引计算是否存在差一错误或步长计算错误
通过IR（
```
TORCH_COMPILE_DEBUG=1
```
）追溯到对应的FX算子
检查融合决策是否导致了无效的索引组合

Key Source Files

关键源文件

File	Purpose
`torch/_dynamo/exc.py`	Exception hierarchy and error formatting
`torch/_dynamo/debug_utils.py`	Minifier support, accuracy checking, input serialization
`torch/_dynamo/repro/after_dynamo.py`	Repro/minifier for Dynamo-stage failures
`torch/_dynamo/repro/after_aot.py`	Repro/minifier for post-AOTAutograd failures
`torch/_dynamo/repro/aoti.py`	Repro/minifier for AOTI failures
`torch/_dynamo/config.py`	Dynamo config (repro levels, recompile limits)
`torch/_dynamo/variables/torch.py`	Torch function handling, tracing state functions
`torch/_dynamo/variables/higher_order_ops.py`	HOP tracing (cond, map, etc.)
`torch/_dynamo/symbolic_convert.py`	Bytecode interpreter, InstructionTranslator
`torch/_dynamo/convert_frame.py`	Frame compilation, `fullgraph_capture` entry point
`torch/_dynamo/functional_export.py`	New export tracer ( `_dynamo_graph_capture_for_export` )
`torch/_dynamo/eval_frame.py`	`torch._dynamo.export` , `optimize_assert`
`torch/_export/_trace.py`	Export pipeline ( `_export` , `_strict_export` , `_non_strict_export` , `_export_to_aten_ir` )
`torch/_export/utils.py`	`_compiling_state_context()`
`torch/compiler/__init__.py`	`is_compiling()` , `is_exporting()` , runtime flags
`torch/_higher_order_ops/cond.py`	`torch.cond` implementation and proxy tracing
`torch/_higher_order_ops/utils.py`	`reenter_make_fx` for HOP branch tracing
`torch/_inductor/config.py`	Inductor config (debug flags, trace settings)
`torch/_inductor/debug.py`	DebugContext, graph visualization, IR logging
`torch/_logging/_registrations.py`	All registered log aliases and artifacts

文件	用途
`torch/_dynamo/exc.py`	异常层级结构和错误格式化
`torch/_dynamo/debug_utils.py`	最小化工具支持、精度检查、输入序列化
`torch/_dynamo/repro/after_dynamo.py`	Dynamo阶段故障的复现/最小化工具
`torch/_dynamo/repro/after_aot.py`	AOTAutograd后故障的复现/最小化工具
`torch/_dynamo/repro/aoti.py`	AOTI故障的复现/最小化工具
`torch/_dynamo/config.py`	Dynamo配置（复现级别、重新编译限制）
`torch/_dynamo/variables/torch.py`	Torch函数处理、追踪状态函数
`torch/_dynamo/variables/higher_order_ops.py`	HOP追踪（cond、map等）
`torch/_dynamo/symbolic_convert.py`	字节码解释器、InstructionTranslator
`torch/_dynamo/convert_frame.py`	帧编译、 `fullgraph_capture` 入口点
`torch/_dynamo/functional_export.py`	新的导出追踪器（ `_dynamo_graph_capture_for_export` ）
`torch/_dynamo/eval_frame.py`	`torch._dynamo.export` 、 `optimize_assert`
`torch/_export/_trace.py`	导出流水线（ `_export` 、 `_strict_export` 、 `_non_strict_export` 、 `_export_to_aten_ir` ）
`torch/_export/utils.py`	`_compiling_state_context()`
`torch/compiler/__init__.py`	`is_compiling()` 、 `is_exporting()` 、运行时标志
`torch/_higher_order_ops/cond.py`	`torch.cond` 实现和代理追踪
`torch/_higher_order_ops/utils.py`	HOP分支追踪的 `reenter_make_fx`
`torch/_inductor/config.py`	Inductor配置（调试标志、追踪设置）
`torch/_inductor/debug.py`	DebugContext、图可视化、IR日志
`torch/_logging/_registrations.py`	所有已注册的日志别名和产物

Using the Minifier

使用最小化工具

The minifier reduces a failing graph to the smallest reproduction:

bash

undefined

最小化工具可将故障图简化为最小复现用例：

bash

undefined

Step 1: Generate the minifier launcher

步骤1：生成最小化工具启动器

TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=2 python your_script.py

Step 2: Run the minifier

步骤2：运行最小化工具

python minifier_launcher.py minify

Step 3: Run the minimized repro

步骤3：运行最小化后的复现用例

python minifier_launcher.py run


For accuracy issues, use level 4:
```bash
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=4 python your_script.py

python minifier_launcher.py run


对于精度问题，使用级别4：
```bash
TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=4 python your_script.py