debug

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Debugging Skill

调试技能

Systematic approaches to investigating and diagnosing bugs.

系统地调查和诊断Bug的方法。

Core Principle

核心原则

Understand before fixing. A proper diagnosis leads to a proper fix.

先理解再修复。 正确的诊断才能带来正确的修复方案。

Name

名称

han-core:debug - Investigate and diagnose issues without necessarily fixing them

han-core:debug - 调查并诊断问题，不一定需要对问题进行修复

Synopsis

概要

/debug [arguments]

/debug [arguments]

Debug vs Fix

调试与修复的区别

Use
/debug
when:

Investigating an issue to understand it
Need to gather information before fixing
Want to identify root cause without implementing solution
Triaging to determine severity/priority
Research phase before fix

Use
/fix
when:

Ready to implement the solution
Debugging AND fixing in one go
Issue is understood, just needs fixing

在以下场景使用
/debug
：

调查问题以了解其本质
需要先收集信息再进行修复
希望确定根本原因但不实施解决方案
分类以确定问题的严重程度/优先级
修复前的研究阶段

在以下场景使用
/fix
：

已准备好实施解决方案
同时进行调试和修复
问题已明确，只需进行修复

The Scientific Method for Debugging

调试的科学方法

1. Observe

1. 观察

Gather all the facts:

What's the symptom? (What's happening that shouldn't?)
When does it happen? (Always, sometimes, specific conditions?)
Who's affected? (All users, some users, specific scenarios?)
Error messages? (Exact text, stack traces, error codes?)
Recent changes? (What changed before this started?)

Evidence to collect:

Error messages and stack traces
Application logs
User reports
Reproduction steps
Environment details (browser, OS, versions)
Network requests/responses
Database query logs

收集所有事实：

症状是什么？（发生了哪些不应该出现的情况？）
何时发生？（总是、偶尔还是特定条件下？）
影响范围？（所有用户、部分用户还是特定场景？）
错误信息？（确切文本、堆栈跟踪、错误代码？）
近期有哪些变更？（问题出现前发生了什么变化？）

需要收集的证据：

错误信息和堆栈跟踪
应用日志
用户反馈
复现步骤
环境详情（浏览器、操作系统、版本）
网络请求/响应
数据库查询日志

2. Form Hypothesis

2. 提出假设

Based on symptoms, what could cause this?

Common categories:

Logic error: Code does wrong thing
State management: State gets out of sync
Async/timing: Race condition, callback hell
Data issue: Unexpected input format
Integration: API change, service down
Environment: Config, permissions, network
Resource: Memory leak, connection pool exhausted

Prioritize hypotheses:

Most likely causes first
Easiest to test first (when equal likelihood)
Most impactful if true

基于症状，可能的原因是什么？

常见类别：

逻辑错误： 代码执行了错误的操作
状态管理问题： 状态不同步
异步/时序问题： 竞态条件、回调地狱
数据问题： 输入格式不符合预期
集成问题： API变更、服务宕机
环境问题： 配置、权限、网络
资源问题： 内存泄漏、连接池耗尽

假设优先级排序：

最有可能的原因优先
同等可能性下，最容易测试的优先
如果成立，影响最大的优先

3. Test Hypothesis

3. 验证假设

Design experiment to prove/disprove:

Add logging to see values
Add breakpoints to pause execution
Modify input to isolate variable
Disable feature to rule out
Compare with working version

Keep notes:

markdown

**Hypothesis:** Database query timeout
**Test:** Add query timing logs
**Result:** Query completes in 50ms
**Conclusion:** Not the database

**Hypothesis:** Network latency
**Test:** Check network tab, add timing
**Result:** API call takes 5 seconds
**Conclusion:** Found the issue

设计实验来验证/推翻假设：

添加日志查看值
添加断点暂停执行
修改输入以隔离变量
禁用功能以排除可能性
与正常版本进行对比

记录笔记：

markdown

**Hypothesis:** Database query timeout
**Test:** Add query timing logs
**Result:** Query completes in 50ms
**Conclusion:** Not the database

**Hypothesis:** Network latency
**Test:** Check network tab, add timing
**Result:** API call takes 5 seconds
**Conclusion:** Found the issue

4. Analyze Results

4. 分析结果

What did you learn?

Hypothesis confirmed or rejected?
New questions raised?
Unexpected findings?
Root cause identified?

你有哪些发现？

假设是否成立？
有没有产生新的问题？
有没有意外发现？
是否找到了根本原因？

5. Repeat or Conclude

5. 重复或总结

If root cause found:

Document findings
Estimate impact
Plan fix

If not found:

Form new hypothesis
Repeat cycle

如果找到根本原因：

记录发现
评估影响
规划修复方案

如果未找到：

提出新的假设
重复流程

Debugging Strategies

调试策略

Strategy 1: Add Logging

策略1：添加日志

Most universally useful technique:

typescript

// Strategic console.log placement
function processOrder(order) {
  console.log('processOrder START:', { orderId: order.id })

  const items = order.items
  console.log('items:', items.length)

  const validated = validate(items)
  console.log('validation result:', validated)

  if (!validated.success) {
    console.log('validation failed:', validated.errors)
    throw new Error('Invalid order')
  }

  const total = calculateTotal(items)
  console.log('total calculated:', total)

  console.log('processOrder END')
  return total
}

Logging guidelines:

Log function entry/exit
Log branching decisions
Log external calls (API, database)
Log unexpected values
Include context (IDs, user info)

最通用的实用技巧：

typescript

// Strategic console.log placement
function processOrder(order) {
  console.log('processOrder START:', { orderId: order.id })

  const items = order.items
  console.log('items:', items.length)

  const validated = validate(items)
  console.log('validation result:', validated)

  if (!validated.success) {
    console.log('validation failed:', validated.errors)
    throw new Error('Invalid order')
  }

  const total = calculateTotal(items)
  console.log('total calculated:', total)

  console.log('processOrder END')
  return total
}

日志记录准则：

记录函数的进入/退出
记录分支决策
记录外部调用（API、数据库）
记录不符合预期的值
包含上下文信息（ID、用户信息）

Strategy 2: Use Debugger

策略2：使用调试器

Interactive debugging:

typescript

// Browser
function buggyFunction(input) {
  debugger;  // Execution pauses here
  const result = transform(input)
  debugger;  // And here
  return result
}

// Node.js
node --inspect app.js

交互式调试：

typescript

// Browser
function buggyFunction(input) {
  debugger;  // Execution pauses here
  const result = transform(input)
  debugger;  // And here
  return result
}

// Node.js
node --inspect app.js

Then open chrome://inspect in Chrome


**Debugger features:**

- Step over (next line)
- Step into (into function call)
- Step out (back to caller)
- Watch expressions
- Call stack inspection
- Variable inspection


**调试器功能：**

- 单步跳过（执行下一行）
- 单步进入（进入函数调用）
- 单步退出（返回调用方）
- 监视表达式
- 调用栈检查
- 变量检查

Strategy 3: Binary Search

策略3：二分法排查

Isolate the problem area:

typescript

// 100 lines of code, bug somewhere

// Comment out lines 50-100
// Bug still happens? It's in lines 1-50

// Comment out lines 25-50
// Bug disappears? It's in lines 25-50

// Comment out lines 37-50
// Bug still happens? It's in lines 25-37

// Continue until isolated to specific lines

隔离问题区域：

typescript

// 100 lines of code, bug somewhere

// Comment out lines 50-100
// Bug still happens? It's in lines 1-50

// Comment out lines 25-50
// Bug disappears? It's in lines 25-50

// Comment out lines 37-50
// Bug still happens? It's in lines 25-37

// Continue until isolated to specific lines

Strategy 4: Rubber Duck Debugging

策略4：橡皮鸭调试法

Explain the problem out loud:

"This function is supposed to calculate shipping cost"
"It takes the weight and destination"
"First it... wait, it's using price instead of weight!"
(Bug found)

Why this works: Forces you to examine assumptions.

大声解释问题：

"这个函数应该计算运费"
"它接收重量和目的地作为参数"
"首先它...等等，它用了价格而不是重量！"
（找到Bug）

为什么有效： 迫使你审视自己的假设。

Strategy 5: Compare Working vs Broken

策略5：对比正常与异常版本

What's different?

Version comparison:

bash

undefined

差异是什么？

版本对比：

bash

undefined

Find which commit broke it

git bisect start git bisect bad HEAD git bisect good v1.0.0

Git checks out middle commit

npm test git bisect good/bad

Repeat until found


**Environment comparison:**

- Works locally but not production?
- Works for some users but not others?
- Worked yesterday but not today?

**What changed?**


**环境对比：**

- 本地正常但生产环境异常？
- 部分用户正常部分异常？
- 昨天正常今天异常？

**发生了哪些变化？**

Strategy 6: Simplify

策略6：简化问题

Reduce to minimal reproduction:

typescript

// Complex case with bug
processUserOrderWithDiscountsAndShipping(user, cart, promo, address)

// Simplify inputs one at a time
processUserOrderWithDiscountsAndShipping(user, [], null, null)
// Still breaks? Not discount or address

processUserOrderWithDiscountsAndShipping(null, [], null, null)
// Works now? It's the user object

// What about the user object causes it?

简化为最小复现场景：

typescript

// Complex case with bug
processUserOrderWithDiscountsAndShipping(user, cart, promo, address)

// Simplify inputs one at a time
processUserOrderWithDiscountsAndShipping(user, [], null, null)
// Still breaks? Not discount or address

processUserOrderWithDiscountsAndShipping(null, [], null, null)
// Works now? It's the user object

// What about the user object causes it?

Strategy 7: Check Assumptions

策略7：验证假设

Question everything:

typescript

// Assumption: API returns array
const users = await api.getUsers()
users.forEach(...)  // Crashes

// Check assumption
console.log(typeof users)  // "undefined"
console.log(users)          // undefined

// Assumption was wrong!

Common wrong assumptions:

Function returns expected type
Variable is defined
Array is not empty
API will always respond
Async operation has completed
State is up to date

质疑一切：

typescript

// Assumption: API returns array
const users = await api.getUsers()
users.forEach(...)  // Crashes

// Check assumption
console.log(typeof users)  // "undefined"
console.log(users)          // undefined

// Assumption was wrong!

常见错误假设：

函数返回预期类型
变量已定义
数组非空
API总会响应
异步操作已完成
状态是最新的

Debugging by Symptom

按症状调试

"Intermittent failure"

"间歇性故障"

Likely causes:

Race condition (timing-dependent)
Data-dependent (certain inputs trigger it)
Resource leak (happens after N operations)
External service flakiness

Investigation:

Add extensive logging
Look for async operations
Check timing between operations
Look for shared state
Run many times to see pattern

可能原因：

竞态条件（依赖时序）
数据依赖（特定输入触发）
资源泄漏（执行N次后发生）
外部服务不稳定

调查方法：

添加详细日志
查找异步操作
检查操作之间的时序
查找共享状态
多次运行以寻找规律

"Works locally, fails in production"

"本地正常，生产环境异常"

Check differences:

Environment variables
Data (production has different/more data)
Network (CORS, SSL, proxies)
Dependencies (versions, OS)
Resources (memory, connections)

检查差异：

环境变量
数据（生产环境数据不同/更多）
网络（CORS、SSL、代理）
依赖（版本、操作系统）
资源（内存、连接数）

"Slow performance"

"性能缓慢"

Don't guess - profile:

Frontend:

Chrome DevTools > Performance tab
Look for long tasks (> 50ms)
Check for layout thrashing
Look for memory leaks

Backend:

Add timing logs around operations
Check database query time (EXPLAIN ANALYZE)
Check external API call time
Profile with APM tool

不要猜测——使用性能分析：

前端：

Chrome DevTools > 性能面板
查找长任务（>50ms）
检查布局抖动
查找内存泄漏

后端：

为操作添加时序日志
检查数据库查询时间（EXPLAIN ANALYZE）
检查外部API调用时间
使用APM工具进行性能分析

"Memory leak"

"内存泄漏"

Investigation:

typescript

// Take heap snapshot
// Do operation that leaks
// Take another heap snapshot
// Compare - what increased?

Common causes:

Event listeners not removed
Closures holding references
Global variables accumulating
Intervals not cleared
Cache growing unbounded

调查方法：

typescript

// Take heap snapshot
// Do operation that leaks
// Take another heap snapshot
// Compare - what increased?

常见原因：

事件监听器未移除
闭包持有引用
全局变量不断累积
定时器未清除
缓存无限制增长

"Crash/Exception"

"崩溃/异常"

Read the stack trace:

Error: Cannot read property 'map' of undefined
    at processUsers (app.js:42:15)
    at handleRequest (app.js:23:3)
    at Server.<anonymous> (server.js:12:5)

Stack trace tells you:

Line 42: Where it crashed
Line 23: Where it was called from
Line 12: Origin of the request

Then:

Go to line 42
Check what's undefined
Trace back why it's undefined

阅读堆栈跟踪：

Error: Cannot read property 'map' of undefined
    at processUsers (app.js:42:15)
    at handleRequest (app.js:23:3)
    at Server.<anonymous> (server.js:12:5)

堆栈跟踪告诉你：

第42行：崩溃位置
第23行：调用方位置
第12行：请求的起源

接下来：

定位到第42行
检查哪个变量未定义
回溯为什么它未定义

"It works sometimes"

"有时正常有时异常"

Race condition?
Timing issue?
Data-dependent?
Check for async issues

竞态条件？
时序问题？
数据依赖？
检查异步问题

Common Bug Patterns

常见Bug模式

Null/Undefined

typescript

// Bug
function process(user) {
  return user.name.toUpperCase()  // Crashes if user is null
}

// Investigation
console.log('user:', user)  // undefined - why?
// Trace back to where user comes from

typescript

// Bug
function process(user) {
  return user.name.toUpperCase()  // Crashes if user is null
}

// Investigation
console.log('user:', user)  // undefined - why?
// Trace back to where user comes from

Off-by-One

差一错误

typescript

// Bug
for (let i = 0; i <= array.length; i++) {  // <= instead of <
  process(array[i])  // Crashes on last iteration
}

// Investigation
console.log('i:', i, 'length:', array.length)
// Notice i === array.length causes array[i] === undefined

typescript

// Bug
for (let i = 0; i <= array.length; i++) {  // <= instead of <
  process(array[i])  // Crashes on last iteration
}

// Investigation
console.log('i:', i, 'length:', array.length)
// Notice i === array.length causes array[i] === undefined

Async Timing

异步时序问题

typescript

// Bug
let data
fetchData().then(result => {
  data = result
})
console.log(data)  // undefined - async not complete

// Investigation
console.log('1. Before fetch')
fetchData().then(result => {
  console.log('3. Got result')
  data = result
})
console.log('2. After fetch call')
// Output: 1, 2, 3 - async completes later

typescript

// Bug
let data
fetchData().then(result => {
  data = result
})
console.log(data)  // undefined - async not complete

// Investigation
console.log('1. Before fetch')
fetchData().then(result => {
  console.log('3. Got result')
  data = result
})
console.log('2. After fetch call')
// Output: 1, 2, 3 - async completes later

State Mutation

状态突变

typescript

// Bug
function addItem(cart, item) {
  cart.items.push(item)  // Mutates input!
  return cart
}

const originalCart = { items: [] }
const newCart = addItem(originalCart, item)
// originalCart was modified - unexpected!

// Investigation
console.log('before:', originalCart)
const newCart = addItem(originalCart, item)
console.log('after:', originalCart)  // Changed!

typescript

// Bug
function addItem(cart, item) {
  cart.items.push(item)  // Mutates input!
  return cart
}

const originalCart = { items: [] }
const newCart = addItem(originalCart, item)
// originalCart was modified - unexpected!

// Investigation
console.log('before:', originalCart)
const newCart = addItem(originalCart, item)
console.log('after:', originalCart)  // Changed!

Scope Issues

作用域问题

typescript

// Bug
for (var i = 0; i < 3; i++) {
  setTimeout(() => console.log(i), 100)
}
// Prints: 3, 3, 3 (expected 0, 1, 2)

// Investigation
// var is function-scoped, i is shared
// By time timeout fires, loop is done, i === 3

// Fix: Use let (block-scoped) or capture i

typescript

// Bug
for (var i = 0; i < 3; i++) {
  setTimeout(() => console.log(i), 100)
}
// Prints: 3, 3, 3 (expected 0, 1, 2)

// Investigation
// var is function-scoped, i is shared
// By time timeout fires, loop is done, i === 3

// Fix: Use let (block-scoped) or capture i

Investigation Report Format

调查报告格式

markdown

undefined

markdown

undefined

Investigation: [Issue description]

Symptoms

[What's happening that's wrong?]

Evidence

Error message: [exact text]
When it happens: [conditions]
Frequency: [always/sometimes/rarely]
Affected users: [all/some/specific group]

Error message: [exact text]
When it happens: [conditions]
Frequency: [always/sometimes/rarely]
Affected users: [all/some/specific group]

Reproduction Steps

[Step 1]
[Step 2]
[Observe error]

[Step 1]
[Step 2]
[Observe error]

Investigation Timeline

Hypothesis 1: [What I thought might be wrong]

Tested by: [What I did to test]
Result: [What I found]
Conclusion: [Ruled out / Confirmed]

Hypothesis 2: [Next theory]

Tested by: [What I did]
Result: [What I found]
Conclusion: [Ruled out / Confirmed]

Hypothesis 1: [What I thought might be wrong]

Tested by: [What I did to test]
Result: [What I found]
Conclusion: [Ruled out / Confirmed]

Hypothesis 2: [Next theory]

Tested by: [What I did]
Result: [What I found]
Conclusion: [Ruled out / Confirmed]

Root Cause

[What's actually causing the issue]

Evidence:

[Log showing the problem]
[Stack trace pointing to source]
[Data showing the pattern]

[What's actually causing the issue]

Evidence:

[Log showing the problem]
[Stack trace pointing to source]
[Data showing the pattern]

Impact

Severity: [Critical/High/Medium/Low]
Scope: [How many users/scenarios affected]
Workaround: [Any temporary solutions]

Severity: [Critical/High/Medium/Low]
Scope: [How many users/scenarios affected]
Workaround: [Any temporary solutions]

Next Steps

[What should be done to fix]
[Any additional investigation needed]
[Related issues to check]

undefined

[What should be done to fix]
[Any additional investigation needed]
[Related issues to check]

undefined

Debugging Tools

调试工具

Browser Developer Tools

浏览器开发者工具

Console:

```
console.log()
```
- Print values
```
console.table()
```
- Display arrays/objects as table
```
console.trace()
```
- Print stack trace
```
console.time()
```
/
```
console.timeEnd()
```
- Measure duration

Debugger:

Set breakpoints
Step through code
Inspect variables
Watch expressions
Call stack

Network:

View all requests
See request/response headers and bodies
Measure timing
Replay requests

Performance:

Record profile
See function call tree
Identify bottlenecks
Check memory usage

控制台：

```
console.log()
```
- 打印值
```
console.table()
```
- 以表格形式显示数组/对象
```
console.trace()
```
- 打印堆栈跟踪
```
console.time()
```
/
```
console.timeEnd()
```
- 测量耗时

调试器：

设置断点
单步执行代码
检查变量
监视表达式
调用栈

网络面板：

查看所有请求
查看请求/响应头和内容
测量耗时
重放请求

性能面板：

记录性能分析
查看函数调用树
识别瓶颈
检查内存使用

Command Line Tools

命令行工具

bash

undefined

bash

undefined

Search for text in files

grep -r "error" logs/

Follow log file

tail -f logs/app.log

Search with context

grep -B 5 -A 5 "ERROR" logs/app.log

Check disk space

df -h

Check memory

free -m

Check running processes

ps aux | grep node

undefined

ps aux | grep node

undefined

Database Debugging

数据库调试

sql

-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';

-- Show slow queries
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;

-- Check table size
SELECT pg_size_pretty(pg_total_relation_size('users'));

-- Check indexes
\d users

sql

-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';

-- Show slow queries
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;

-- Check table size
SELECT pg_size_pretty(pg_total_relation_size('users'));

-- Check indexes
\d users

Debugging Checklist

调试检查清单

Before Starting

开始前

During Investigation

调查过程中

After Finding Root Cause

找到根本原因后

Anti-Patterns

反模式

Random Code Changes

随机修改代码

BAD: "Maybe if I change this... nope, try this... nope, try this..."
GOOD: "Hypothesis: X causes Y. Test: Change X. Result: Y still happens.
       Conclusion: X is not the cause."

BAD: "Maybe if I change this... nope, try this... nope, try this..."
GOOD: "Hypothesis: X causes Y. Test: Change X. Result: Y still happens.
       Conclusion: X is not the cause."

Assuming Without Verifying

未验证就假设

BAD: "The API must be returning valid data"
GOOD: "Let me log the API response to see what it actually returns"

BAD: "The API must be returning valid data"
GOOD: "Let me log the API response to see what it actually returns"

Stopping at Symptoms

仅停留在症状层面

BAD: "The page is blank. Fixed by adding a null check."
GOOD: "The page is blank because user is null. User is null because
       authentication token expired. Root cause: token not being refreshed."

BAD: "The page is blank. Fixed by adding a null check."
GOOD: "The page is blank because user is null. User is null because
       authentication token expired. Root cause: token not being refreshed."

Debugging in Production

在生产环境调试

BAD: "Let me add console.log to production to see..."
GOOD: "Let me reproduce locally and debug there, or use proper logging"

BAD: "Let me add console.log to production to see..."
GOOD: "Let me reproduce locally and debug there, or use proper logging"

No Reproduction Steps

无复现步骤

BAD: "It crashed once, let me guess why"
GOOD: "Let me find reliable way to reproduce it first"

BAD: "It crashed once, let me guess why"
GOOD: "Let me find reliable way to reproduce it first"

Examples

示例

When the user says:

"Why is this page loading slowly?"
"Investigate this intermittent test failure"
"Figure out why users are seeing this error"
"Debug the memory leak in production"
"What's causing the database timeouts?"

当用户提出以下问题时：

"为什么这个页面加载缓慢？"
"调查这个间歇性测试失败的问题"
"找出用户看到这个错误的原因"
"调试生产环境中的内存泄漏"
"是什么导致了数据库超时？"

Integration with Other Skills

与其他技能的集成

Use proof-of-work skill to document evidence
Use test-driven-development skill to add regression test after fix
Use explain skill when explaining bug to others
Use boy-scout-rule skill while fixing (improve surrounding code)

使用proof-of-work技能记录证据
使用test-driven-development技能在修复后添加回归测试
向他人解释Bug时使用explain技能
修复时使用boy-scout-rule技能（优化周边代码）

Notes

注意事项

Use TaskCreate to track investigation steps
Document findings even if not fixing immediately
Create minimal reproduction case
Consider using /fix once root cause is found
Add logging/metrics to prevent future issues

使用TaskCreate跟踪调查步骤
即使不立即修复，也要记录发现
创建最小复现案例
找到根本原因后可以考虑使用/fix
添加日志/指标以防止未来出现类似问题

Remember

谨记

Reproduce first - If you can't reproduce, you can't debug
Gather evidence - Don't guess, look at data
Form hypothesis - What do you think is wrong?
Test systematically - Prove or disprove hypothesis
Find root cause - Not just symptoms
Document - Help future you and others

Debugging is detective work. Be methodical, not random.

先复现问题 - 无法复现就无法调试
收集证据 - 不要猜测，查看数据
提出假设 - 你认为问题出在哪里？
系统验证 - 证明或推翻假设
找到根本原因 - 不要只停留在症状
记录文档 - 帮助未来的自己和他人

调试是侦探工作。要有条不紊，不要随机尝试。