debugging-tools

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

When this skill is activated, always start your first response with the 🧢 emoji.

当激活本Skill时，首次回复请始终以🧢表情开头。

Debugging Tools

调试工具

Systematic debugging is a discipline, not a guessing game. This skill covers the principal tools used to diagnose bugs across the full stack - browser front-ends, Node.js servers, native binaries, and the network between them. The underlying mindset is consistent: form a hypothesis, isolate the variable, confirm or refute, then move inward. Tools are instruments; systematic thinking is the method.

系统化调试是一门严谨的方法，而非碰运气的游戏。本Skill涵盖了用于诊断全栈bug的主要工具——包括浏览器前端、Node.js服务器、原生二进制文件，以及它们之间的网络环节。核心思路始终一致：提出假设、隔离变量、验证或推翻假设，然后逐步深入。工具是手段，系统化思维才是核心方法。

When to use this skill

何时使用本Skill

Trigger this skill when the user:

Opens Chrome DevTools to investigate a performance, network, or memory problem
Wants to set breakpoints, step through code, or inspect call stacks
Needs to debug a Node.js process with
```
--inspect
```
or
```
--inspect-brk
```
Is tracing system calls with
```
strace
```
or
```
ltrace
```
on Linux/macOS
Needs to find a memory leak using heap snapshots or the Memory tab
Is capturing or replaying network traffic with
```
curl
```
,
```
tcpdump
```
, or Wireshark
Is analyzing a core dump or a crash from a native application with
```
lldb
```
/
```
gdb
```
Wants to use conditional breakpoints or logpoints instead of
```
console.log
```
spam

Do NOT trigger this skill for:

General code review or refactoring (use clean-code or refactoring-patterns)
CI/CD pipeline failures that are config errors, not runtime bugs

当用户有以下需求时，触发本Skill：

打开Chrome DevTools排查性能、网络或内存问题
想要设置断点、单步执行代码或检查调用栈
需要使用
```
--inspect
```
或
```
--inspect-brk
```
调试Node.js进程
在Linux/macOS上使用
```
strace
```
或
```
ltrace
```
跟踪系统调用
需要使用堆快照或Memory面板查找内存泄漏
使用
```
curl
```
、
```
tcpdump
```
或Wireshark捕获或重放网络流量
使用
```
lldb
```
/
```
gdb
```
分析原生应用的核心转储或崩溃信息
想要使用条件断点或日志断点替代大量
```
console.log
```
输出

请勿在以下场景触发本Skill：

常规代码审查或重构（请使用clean-code或refactoring-patterns Skill）
CI/CD流水线因配置错误而非运行时bug导致的失败

Key principles

核心原则

Reproduce before debugging - A bug you cannot reproduce reliably cannot be debugged reliably. Before touching any tool, find the minimal set of steps that trigger the problem every time. A flaky reproduction is a second bug to solve.
Binary search the problem space - Never start debugging from line 1. Bisect: is the bug in the frontend or backend? In the request or the response? In the query or the result processing? Each question cuts the search space in half.
```
git bisect
```
applies this directly to commit history.
Read the error message twice - The first read captures what you expect to see. The second read captures what it actually says. Most debugging time is lost chasing the wrong problem because the error message was skimmed. Copy the exact message. Look up exact error codes.
Check the obvious first - Before reaching for
```
strace
```
or heap profilers, verify: Is the service running? Are environment variables set? Is the right binary being executed? Is the config pointing to the right database? Exotic tools are for exotic problems.
Automate reproduction - Once you can reproduce a bug manually, write a script or test that reproduces it. This prevents regression, speeds up iteration, and becomes the fix's test case. A bug with an automated reproduction is already halfway fixed.

先复现再调试 - 无法稳定复现的bug也无法被可靠修复。在使用任何工具之前，先找到能每次都触发问题的最少操作步骤。不稳定的复现本身就是另一个需要解决的bug。
二分法缩小问题范围 - 永远不要从第一行代码开始调试。采用二分法：bug出现在前端还是后端？是请求环节还是响应环节？是查询部分还是结果处理部分？每个问题都能将排查范围缩小一半。
```
git bisect
```
就是这一方法在提交历史中的直接应用。
仔细阅读错误信息两次 - 第一次阅读会看到你预期的内容，第二次才能看到实际的信息。大部分调试时间浪费在错误的问题上，只因草草浏览了错误信息。复制完整的错误信息，查询具体的错误代码。
先检查明显的问题 - 在使用
```
strace
```
或内存分析器之前，先验证：服务是否在运行？环境变量是否正确设置？是否执行了正确的二进制文件？配置是否指向正确的数据库？复杂工具只适用于复杂问题。
自动化复现步骤 - 当你能手动复现bug后，编写脚本或测试用例来自动复现。这可以防止回归问题，加快迭代速度，并成为修复后的测试用例。拥有自动化复现步骤的bug，相当于已经解决了一半。

Core concepts

核心概念

Breakpoints vs logging

断点 vs 日志输出

console.log

debugging is slow and noisy. Breakpoints pause execution at a precise point and let you inspect the entire state. Use logging when you need a history of state over time (e.g., a value changing across many requests). Use breakpoints when you need to inspect a single moment in detail.

Logpoints (Chrome DevTools, VS Code) are a middle ground: they log a value at a line without pausing execution and without modifying source code. Prefer logpoints over adding and removing

console.log

statements.

console.log

调试速度慢且输出杂乱。断点会在精确的位置暂停执行，让你检查完整的程序状态。当你需要跟踪一段时间内的状态变化（例如，某个值在多次请求中的变化）时，使用日志输出。当你需要详细检查某个瞬间的状态时，使用断点。

日志断点（Logpoints）（Chrome DevTools、VS Code）是一种折中方案：它会在指定行记录值，但不会暂停执行，也无需修改源代码。优先使用日志断点，而非反复添加和删除

console.log

语句。

Call stacks

调用栈

A call stack is a snapshot of how execution reached the current point. It reads bottom-to-top (oldest frame at bottom). When debugging, always read the full stack, not just the top frame. The top frame is where the error surfaced; the root cause is often several frames down, at the point where your code made an incorrect assumption.

调用栈是程序执行到当前位置的快照，从下到上读取（最底部是最早的栈帧）。调试时，请始终阅读完整的调用栈，而非只看顶部的栈帧。顶部栈帧是错误出现的位置，但根本原因通常在下方几层栈帧中，也就是你的代码做出错误假设的地方。

Heap vs stack memory

堆内存 vs 栈内存

The stack holds function call frames and local variables. It is fast, bounded, and automatically managed. Stack overflows (infinite recursion) are immediately fatal. The heap holds all dynamically allocated objects. Heap memory leaks are slow and insidious - the process grows until it crashes or becomes unresponsive. Heap profiling tools (DevTools Memory tab,

valgrind

heaptrack

) identify objects that accumulate without being freed.

栈存储函数调用帧和局部变量，速度快、有大小限制且由系统自动管理。栈溢出（无限递归）会立即导致程序崩溃。堆存储所有动态分配的对象。堆内存泄漏是缓慢且隐蔽的——进程会持续占用内存直到崩溃或失去响应。堆分析工具（DevTools Memory面板、

valgrind

、

heaptrack

）可以识别那些未被释放且不断累积的对象。

Syscalls

系统调用（Syscalls）

Every interaction between a process and the OS kernel is a syscall: file reads, network connections, process creation, memory allocation.

strace

captures these calls with arguments and return values. When a program hangs or fails with a cryptic error,

strace

often shows exactly which syscall failed and why (e.g.,

ENOENT: no such file or directory

on a missing config path).

进程与操作系统内核之间的每一次交互都是系统调用：文件读取、网络连接、进程创建、内存分配等。

strace

会捕获这些调用及其参数和返回值。当程序挂起或出现模糊的错误时，

strace

通常能准确显示哪个系统调用失败以及原因（例如，打开缺失的配置文件时出现

ENOENT: no such file or directory

）。

Network layers

网络分层

Network bugs live at different layers. HTTP-level bugs (wrong status codes, missing headers, bad JSON) are visible with

curl -v

or browser DevTools Network tab. TCP-level bugs (connections refused, timeouts, RST packets) require

tcpdump

or Wireshark. DNS bugs (resolving the wrong IP, NXDOMAIN) are diagnosed with

dig

and

nslookup

网络bug存在于不同的层级。HTTP层级的bug（错误的状态码、缺失的头信息、无效JSON）可以通过

curl -v

或浏览器DevTools的Network面板查看。TCP层级的bug（连接被拒绝、超时、RST包）需要使用

tcpdump

或Wireshark。DNS bug（解析错误的IP、NXDOMAIN）可以通过

dig

和

nslookup

诊断。

Common tasks

常见任务

Profile a slow page with Chrome DevTools Performance tab

使用Chrome DevTools Performance面板分析慢页面

Open DevTools (
```
F12
```
) > Performance tab
Click Record, perform the slow action, click Stop
In the Flame Chart, find the widest bars - these are the most expensive calls
Look for Long Tasks (red corner flags, >50ms on the main thread)
Identify the function consuming the most self-time vs total-time

Self time  = time spent in the function itself
Total time = self time + time in all functions it called

Key areas to check:

Scripting (yellow) - JS execution, event handlers
Rendering (purple) - style recalc, layout (reflow)
Painting (green) - compositing, rasterization

Rule: a layout thrash occurs when JS reads then writes DOM geometry in a loop. Fix by batching reads before writes, or using
requestAnimationFrame
.

打开DevTools（
```
F12
```
）> Performance面板
点击Record，执行导致页面变慢的操作，然后点击Stop
在**火焰图（Flame Chart）**中，找到最宽的条形——这些是最耗时的调用
查找长任务（Long Tasks）（红色角落标记，主线程耗时>50ms）
识别占用最多自耗时（self-time）和总耗时（total-time）的函数

Self time  = 函数自身执行的时间
Total time = 自耗时 + 该函数调用的所有函数的执行时间

需要重点检查的领域：

Scripting（黄色）- JS执行、事件处理程序
Rendering（紫色）- 样式重计算、布局（回流）
Painting（绿色）- 合成、光栅化

规则：当JS在循环中先读取再写入DOM几何属性时，会出现布局抖动。修复方法是批量读取后再写入，或使用
requestAnimationFrame
。

Find memory leaks with the Memory tab

使用Memory面板查找内存泄漏

Open DevTools > Memory tab
Take a Heap Snapshot (baseline)
Perform the action suspected of leaking (e.g., open and close a modal 10x)
Force GC (trash can icon), then take a second snapshot
In the second snapshot, select Comparison view
Sort by # Delta descending - objects with a growing positive delta are leaking

Common leak sources:
- Event listeners added but never removed
- Closures capturing DOM nodes that were removed
- Global variables holding references to large objects
- setInterval / setTimeout callbacks referencing stale state

打开DevTools > Memory面板
拍摄堆快照（Heap Snapshot）（作为基准）
执行疑似导致内存泄漏的操作（例如，打开并关闭模态框10次）
强制执行垃圾回收（垃圾桶图标），然后拍摄第二张快照
在第二张快照中，选择Comparison视图
按**# Delta**降序排序——Delta值持续增长的对象就是泄漏的对象

常见的内存泄漏来源：
- 添加后未移除的事件监听器
- 捕获已被移除的DOM节点的闭包
- 持有大对象引用的全局变量
- 引用过期状态的setInterval / setTimeout回调

Debug Node.js with the inspector protocol

使用调试器协议调试Node.js

bash

undefined

bash

undefined

Start with inspector (connects DevTools or VS Code)

启动调试器（可连接DevTools或VS Code）

node --inspect server.js

Break immediately on start (useful when the bug is at startup)

启动时立即暂停（适用于启动阶段出现的bug）

node --inspect-brk server.js

Attach to a running process by PID

通过PID附加到运行中的进程

kill -USR1 <pid>


Then open `chrome://inspect` in Chrome and click **inspect** under Remote Target.
Full Chrome DevTools is now connected to the Node process. Set breakpoints in the
Sources panel, use the Console to evaluate expressions in any stack frame.

For production processes, prefer `--inspect=127.0.0.1:9229` to avoid exposing the
debug port publicly.

kill -USR1 <pid>


然后在Chrome中打开`chrome://inspect`，点击Remote Target下的**inspect**。此时完整的Chrome DevTools已连接到Node进程，可在Sources面板设置断点，使用Console在任意栈帧中执行表达式。

对于生产环境的进程，建议使用`--inspect=127.0.0.1:9229`，避免公开暴露调试端口。

Trace syscalls with strace / ltrace

使用strace / ltrace跟踪系统调用

bash

undefined

bash

undefined

Trace all syscalls of a new process

跟踪新进程的所有系统调用

strace ./myapp

Attach to a running process

附加到运行中的进程

strace -p <pid>

Filter to specific syscalls (file operations)

过滤特定的系统调用（文件操作）

strace -e trace=openat,read,write,close ./myapp

Timestamp each call and show duration

为每个调用添加时间戳并显示执行时长

strace -T -tt ./myapp

Write output to file (avoids mixing with stderr)

将输出写入文件（避免与stderr混合）

strace -o /tmp/trace.log ./myapp

ltrace: trace library calls instead of syscalls

ltrace：跟踪库调用而非系统调用

ltrace ./myapp


**Reading strace output:**

openat(AT_FDCWD, "/etc/app.conf", O_RDONLY) = -1 ENOENT (No such file or directory)

Format: `syscall(args) = return_value [error]`. A negative return value with an
error name is a failure. This line shows the app tried to open a config file that
does not exist.

ltrace ./myapp


**解读strace输出：**

openat(AT_FDCWD, "/etc/app.conf", O_RDONLY) = -1 ENOENT (No such file or directory)

格式：`syscall(args) = 返回值 [错误信息]`。返回值为负数且带有错误名称表示调用失败。这行输出显示应用尝试打开一个不存在的配置文件。

Debug network issues with curl / tcpdump / Wireshark

使用curl / tcpdump / Wireshark调试网络问题

bash

undefined

bash

undefined

Verbose HTTP request - shows headers, TLS handshake info

详细的HTTP请求——显示头信息、TLS握手详情

curl -v https://api.example.com/users

Show only HTTP response headers

仅显示HTTP响应头

curl -sI https://api.example.com/users

Time each phase of the request

统计请求各阶段的耗时

curl -w "@curl-format.txt" -o /dev/null -s https://api.example.com/users

curl-format.txt: time_namelookup, time_connect, time_appconnect, time_total

Capture all traffic on port 443 to a file for Wireshark

捕获端口443上的所有流量并保存到文件，供Wireshark分析

tcpdump -i eth0 -w capture.pcap port 443

Capture HTTP traffic and print to stdout

捕获HTTP流量并打印到标准输出

tcpdump -i eth0 -A port 80

DNS resolution chain

DNS解析链

dig +trace api.example.com


For Wireshark analysis:
- Filter by `http` or `http2` for application layer
- Use `tcp.analysis.retransmission` to find packet loss
- Use `tcp.flags.reset == 1` to find unexpected connection resets

dig +trace api.example.com


Wireshark分析技巧：
- 使用`http`或`http2`过滤应用层流量
- 使用`tcp.analysis.retransmission`查找丢包情况
- 使用`tcp.flags.reset == 1`查找意外的连接重置

Debug crashes with core dumps

使用核心转储调试崩溃问题

bash

undefined

bash

undefined

Enable core dumps (Linux - set in /etc/security/limits.conf for persistence)

启用核心转储（Linux - 如需持久化，可在/etc/security/limits.conf中设置）

ulimit -c unlimited

Run the crashing program

运行会崩溃的程序

./myapp # produces core or core.<pid>

./myapp # 生成core或core.<pid>文件

Open with lldb (macOS / modern Linux)

使用lldb打开（macOS / 新版Linux）

lldb ./myapp core

Open with gdb (Linux)

使用gdb打开（Linux）

gdb ./myapp core

Inside lldb/gdb: key commands

在lldb/gdb中：常用命令

(lldb) bt # print backtrace (call stack at crash) (lldb) frame 3 # switch to frame 3 (lldb) print ptr # print value of variable 'ptr' (lldb) info locals # show all local variables in current frame (lldb) list # show source around current line


A crash in a null dereference will show the offending frame in `bt`. Navigate to
the frame with `frame select N`, then inspect variables to find which pointer was
null and why it was never initialized.

(lldb) bt # 打印回溯信息（崩溃时的调用栈） (lldb) frame 3 # 切换到第3个栈帧 (lldb) print ptr # 打印变量'ptr'的值 (lldb) info locals # 显示当前栈帧中的所有局部变量 (lldb) list # 显示当前行附近的源代码


空指针解引用导致的崩溃会在`bt`输出中显示出问题的栈帧。使用`frame select N`切换到对应栈帧，然后检查变量，找出哪个指针为空以及为何未被初始化。

Use conditional breakpoints and logpoints

使用条件断点和日志断点

Conditional breakpoint - pauses only when an expression is true:

In Chrome DevTools: right-click a line number > Add conditional breakpoint

javascript

// Only pause when userId is the problematic one
userId === 'abc-123'

In VS Code

launch.json

json

{
  "condition": "i > 100 && items[i] === null"
}

Logpoint - logs a message without pausing (non-intrusive, no source changes):

In Chrome DevTools: right-click a line number > Add logpoint

User {userId} called checkout with {items.length} items

In VS Code: right-click breakpoint > Edit Breakpoint > select Log Message

Use conditional breakpoints when iterating over large collections and the bug only manifests for a specific element. Use logpoints when you need time-series data across many invocations.

条件断点 - 仅当表达式为真时暂停执行：

在Chrome DevTools中：右键行号 > 添加条件断点

javascript

// 仅当userId为特定值时暂停
userId === 'abc-123'

在VS Code的

launch.json

中：

json

{
  "condition": "i > 100 && items[i] === null"
}

日志断点 - 记录消息但不暂停执行（无侵入性，无需修改源代码）：

在Chrome DevTools中：右键行号 > 添加日志断点

User {userId} 调用了checkout，商品数量为{items.length}

在VS Code中：右键断点 > 编辑断点 > 选择日志消息

当遍历大型集合且仅特定元素会触发bug时，使用条件断点。当你需要多次调用的时间序列数据时，使用日志断点。

Anti-patterns / common mistakes

反模式 / 常见错误

Mistake	Why it's wrong	What to do instead
`console.log` driven development	Clutters output, requires code changes, leaves logs in production	Use logpoints or structured logging with debug levels
Debugging on production	Modifying production state to understand a bug risks data corruption and outages	Reproduce locally or in staging; use read-only observation tools ( `strace -p` )
Fixing without understanding	Changing code until tests pass without knowing root cause leads to the same bug resurfacing in a different form	State the hypothesis in writing before making any change
Ignoring the call stack	Looking only at the top frame of an exception misses the call path that created the bad state	Always read the full stack; the root cause is usually 3-5 frames down
Heap snapshot without baseline	Comparing one snapshot gives no signal - you cannot tell what grew	Always take a baseline snapshot before the action under test
Running strace on production without `-o`	strace output mixed with the program's stderr and interleaved in logs	Always use `strace -o /tmp/trace.log` to isolate output

错误做法	错误原因	正确做法
依赖 `console.log` 进行开发	输出杂乱，需要修改代码，可能将日志遗留到生产环境	使用日志断点或带有调试级别的结构化日志
在生产环境直接调试	修改生产环境状态以排查bug可能导致数据损坏和服务中断	在本地或预发布环境复现问题；使用只读观测工具（如 `strace -p` ）
未理解根因就修复	盲目修改代码直到测试通过，而不了解根因，会导致相同bug以其他形式重现	在修改代码前，先书面记录你的假设
忽略调用栈	仅查看异常的顶部栈帧，会错过导致错误状态的调用路径	始终阅读完整的调用栈；根因通常在下方3-5层栈帧中
无基准的堆快照	仅对比单个快照无法获得有效信息——你无法判断哪些对象在增长	在测试操作前，始终拍摄基准快照
在生产环境使用strace但未加 `-o` 参数	strace输出会与程序的stderr混合，导致日志混乱	始终使用 `strace -o /tmp/trace.log` 将输出单独保存

References

参考资料

For detailed command references, read the relevant file from

references/

```
references/tool-guide.md
```
- Quick reference for each debugging tool with key commands

Only load the references file when you need the full command reference for a specific tool and the task at hand requires precise flag-level detail.

如需详细的命令参考，请阅读

references/

目录下的相关文件：

```
references/tool-guide.md
```
- 各调试工具的快速参考，包含常用命令

仅当你需要特定工具的完整命令参考，且当前任务需要精确的参数细节时，才加载参考文件。