root-cause-tracing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Root Cause Tracing

根因追踪

Overview

概述

Bugs often manifest deep in the call stack (git init in wrong directory, file created in wrong location, database opened with wrong path). Your instinct is to fix where the error appears, but that's treating a symptom.
Core principle: Trace backward through the call chain until you find the original trigger, then fix at the source.
Bug通常会在调用栈的深处显现(比如在错误目录执行git init、文件创建在错误位置、使用错误路径打开数据库)。你的第一反应可能是修复错误出现的地方,但这只是治标不治本。
核心原则: 沿着调用链反向追踪,直到找到最初的触发点,然后从根源处修复问题。

When to Use

适用场景

Use when:
  • Error happens deep in execution (not at entry point)
  • Stack trace shows long call chain
  • Unclear where invalid data originated
  • Need to find which test/code triggers the problem
适用情况:
  • 错误发生在执行流程深处(而非入口点)
  • 栈追踪显示调用链很长
  • 不清楚无效数据的来源
  • 需要找出触发问题的测试用例/代码

The Tracing Process

追踪流程

1. Observe the Symptom

1. 观察症状

Error: git init failed in ~/project/packages/core
Error: git init failed in ~/project/packages/core

2. Find Immediate Cause

2. 找到直接原因

What code directly causes this?
typescript
await execFileAsync('git', ['init'], { cwd: projectDir });
哪段代码直接导致了这个错误?
typescript
await execFileAsync('git', ['init'], { cwd: projectDir });

3. Ask: What Called This?

3. 追问:谁调用了这段代码?

typescript
WorktreeManager.createSessionWorktree(projectDir, sessionId)
  → called by Session.initializeWorkspace()
  → called by Session.create()
  → called by test at Project.create()
typescript
WorktreeManager.createSessionWorktree(projectDir, sessionId)
  → called by Session.initializeWorkspace()
  → called by Session.create()
  → called by test at Project.create()

4. Keep Tracing Up

4. 继续向上追踪

What value was passed?
  • projectDir = ''
    (empty string!)
  • Empty string as
    cwd
    resolves to
    process.cwd()
  • That's the source code directory!
传入的参数值是什么?
  • projectDir = ''
    (空字符串!)
  • 空字符串作为
    cwd
    参数时,会解析为
    process.cwd()
    (当前进程工作目录)
  • 而这正好是源码目录!

5. Find Original Trigger

5. 找到最初触发点

Where did empty string come from?
typescript
const context = setupCoreTest(); // Returns { tempDir: '' }
Project.create('name', context.tempDir); // Accessed before beforeEach!
空字符串来自哪里?
typescript
const context = setupCoreTest(); // Returns { tempDir: '' }
Project.create('name', context.tempDir); // Accessed before beforeEach!

Adding Stack Traces

添加调用栈追踪

When you can't trace manually, add instrumentation:
typescript
// Before the problematic operation
async function gitInit(directory: string) {
  const stack = new Error().stack;
  console.error('DEBUG git init:', {
    directory,
    cwd: process.cwd(),
    nodeEnv: process.env.NODE_ENV,
    stack,
  });

  await execFileAsync('git', ['init'], { cwd: directory });
}
Critical: Use
console.error()
in tests (not logger - may not show)
Run and capture:
bash
bun test 2>&1 | grep 'DEBUG git init'
Analyze stack traces:
  • Look for test file names
  • Find the line number triggering the call
  • Identify the pattern (same test? same parameter?)
当无法手动追踪时,可以添加埋点代码:
typescript
// Before the problematic operation
async function gitInit(directory: string) {
  const stack = new Error().stack;
  console.error('DEBUG git init:', {
    directory,
    cwd: process.cwd(),
    nodeEnv: process.env.NODE_ENV,
    stack,
  });

  await execFileAsync('git', ['init'], { cwd: directory });
}
重点: 在测试中使用
console.error()
(不要用日志工具,可能不会输出)
运行并捕获输出:
bash
bun test 2>&1 | grep 'DEBUG git init'
分析调用栈:
  • 查找测试文件名
  • 找到触发调用的行号
  • 识别规律(同一个测试用例?同一个参数?)

Finding Which Test Causes Pollution

找出导致环境污染的测试用例

If something appears during tests but you don't know which test:
Use the bisection script to run tests one-by-one:
bash
undefined
如果测试过程中出现问题,但不知道是哪个测试用例导致的:
使用二分法脚本逐个运行测试用例:
bash
undefined

Example: find which test creates .git in wrong place

Example: find which test creates .git in wrong place

bun test --run --bail 2>&1 | tee test-output.log

Runs tests one-by-one, stops at first polluter.
bun test --run --bail 2>&1 | tee test-output.log

逐个运行测试用例,遇到第一个导致污染的用例时停止。

Real Example: Empty projectDir

实际案例:空projectDir

Symptom:
.git
created in
packages/core/
(source code)
Trace chain:
  1. git init
    runs in
    process.cwd()
    ← empty cwd parameter
  2. WorktreeManager called with empty projectDir
  3. Session.create() passed empty string
  4. Test accessed
    context.tempDir
    before beforeEach
  5. setupCoreTest() returns
    { tempDir: '' }
    initially
Root cause: Top-level variable initialization accessing empty value
Fix: Made tempDir a getter that throws if accessed before beforeEach
Also added defense-in-depth:
  • Layer 1: Project.create() validates directory
  • Layer 2: WorkspaceManager validates not empty
  • Layer 3: NODE_ENV guard refuses git init outside tmpdir
  • Layer 4: Stack trace logging before git init
症状:
.git
目录被创建在
packages/core/
(源码目录)中
追踪链:
  1. git init
    process.cwd()
    中执行 ← 传入了空的cwd参数
  2. WorktreeManager被传入了空的projectDir
  3. Session.create()被传入了空字符串
  4. 测试用例在beforeEach之前就访问了
    context.tempDir
  5. setupCoreTest()初始返回
    { tempDir: '' }
根因: 顶层变量初始化时访问了空值
修复方案: 将tempDir改为getter,若在beforeEach之前访问则抛出错误
额外添加的纵深防御:
  • 第一层:Project.create()验证目录有效性
  • 第二层:WorkspaceManager验证参数非空
  • 第三层:NODE_ENV防护,禁止在临时目录外执行git init
  • 第四层:执行git init前记录调用栈

Key Principle

核心原则

NEVER fix just where the error appears. Trace back to find the original trigger.
永远不要只修复错误出现的地方。要回溯找到最初的触发点。

Stack Trace Tips

调用栈追踪技巧

In tests: Use
console.error()
not logger - logger may be suppressed Before operation: Log before the dangerous operation, not after it fails Include context: Directory, cwd, environment variables, timestamps Capture stack:
new Error().stack
shows complete call chain
在测试中: 使用
console.error()
而非日志工具——日志工具可能被屏蔽 操作前: 在危险操作前记录日志,而非失败后 包含上下文: 目录、当前工作目录、环境变量、时间戳 捕获调用栈:
new Error().stack
会显示完整的调用链

Real-World Impact

实际效果

From debugging session:
  • Found root cause through 5-level trace
  • Fixed at source (getter validation)
  • Added 4 layers of defense
  • 1847 tests passed, zero pollution
某次调试过程中:
  • 通过5层追踪找到根因
  • 从根源修复(getter验证)
  • 添加了4层防御机制
  • 1847个测试用例全部通过,无环境污染