debugging
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDebugging Skill
调试技能
Provides comprehensive debugging capabilities with integrated extended thinking for complex scenarios.
提供全面的调试功能,并集成了针对复杂场景的深度思考能力。
When to Use This Skill
何时使用此技能
Activate this skill when working with:
- Error troubleshooting
- Log analysis
- Performance debugging
- Distributed system debugging
- Memory and resource issues
- Complex, multi-layered bugs requiring deep reasoning
在处理以下场景时激活此技能:
- 错误排查
- 日志分析
- 性能调试
- 分布式系统调试
- 内存与资源问题
- 需要深度推理的复杂多层级Bug
Extended Thinking for Complex Debugging
复杂调试场景下的深度思考
When to Enable Extended Thinking
何时启用深度思考
Use extended thinking (Claude's deeper reasoning mode) for debugging when:
- Root Cause Unknown: Multiple possible causes, unclear failure patterns
- Intermittent Issues: Race conditions, timing issues, non-deterministic failures
- Multi-System Failures: Distributed system bugs spanning multiple services
- Performance Mysteries: Unexpected slowdowns without obvious bottlenecks
- Complex State Issues: Bugs involving intricate state transitions or side effects
- Security Vulnerabilities: Subtle security issues requiring careful analysis
在以下调试场景中,使用深度思考(Claude的深度推理模式):
- 根因未知:存在多种可能原因,故障模式不清晰
- 间歇性问题:竞态条件、时序问题、非确定性故障
- 多系统故障:跨多个服务的分布式系统Bug
- 性能谜团:无明显瓶颈的意外性能下降
- 复杂状态问题:涉及复杂状态转换或副作用的Bug
- 安全漏洞:需要仔细分析的细微安全问题
How to Activate Extended Thinking
如何激活深度思考
markdown
undefinedmarkdown
undefinedIn your debugging prompt
In your debugging prompt
Claude, please use extended thinking to help debug this issue:
[Describe the problem with symptoms, context, and what you've tried]
Extended thinking will provide:
- Systematic hypothesis generation
- Multi-path investigation strategies
- Deeper pattern recognition
- Cross-domain insights (e.g., network + application + infrastructure)Claude, please use extended thinking to help debug this issue:
[Describe the problem with symptoms, context, and what you've tried]
深度思考将提供:
- 系统化的假设生成
- 多路径调查策略
- 更深入的模式识别
- 跨领域洞察(例如:网络 + 应用 + 基础设施)Hypothesis-Driven Debugging Framework
基于假设的调试框架
Use this structured approach for complex bugs:
针对复杂Bug,使用以下结构化方法:
1. Observation Phase
1. 观察阶段
What happened?
- Error message/stack trace
- Frequency (always/intermittent)
- When it started
- Environmental context
- Recent changesWhat happened?
- Error message/stack trace
- Frequency (always/intermittent)
- When it started
- Environmental context
- Recent changes2. Hypothesis Generation
2. 假设生成
Generate 3-5 plausible hypotheses:
H1: [Most likely cause based on symptoms]
Evidence for: [...]
Evidence against: [...]
Test: [How to validate/invalidate]
H2: [Alternative explanation]
Evidence for: [...]
Evidence against: [...]
Test: [How to validate/invalidate]
H3: [Edge case or rare scenario]
Evidence for: [...]
Evidence against: [...]
Test: [How to validate/invalidate]Generate 3-5 plausible hypotheses:
H1: [Most likely cause based on symptoms]
Evidence for: [...]
Evidence against: [...]
Test: [How to validate/invalidate]
H2: [Alternative explanation]
Evidence for: [...]
Evidence against: [...]
Test: [How to validate/invalidate]
H3: [Edge case or rare scenario]
Evidence for: [...]
Evidence against: [...]
Test: [How to validate/invalidate]3. Systematic Testing
3. 系统化测试
Priority order (high to low confidence):
1. Test H1 → Result: [Pass/Fail/Inconclusive]
2. Test H2 → Result: [Pass/Fail/Inconclusive]
3. Test H3 → Result: [Pass/Fail/Inconclusive]
New evidence discovered:
- [Finding 1]
- [Finding 2]
Revised hypotheses if needed:
- [...]Priority order (high to low confidence):
1. Test H1 → Result: [Pass/Fail/Inconclusive]
2. Test H2 → Result: [Pass/Fail/Inconclusive]
3. Test H3 → Result: [Pass/Fail/Inconclusive]
New evidence discovered:
- [Finding 1]
- [Finding 2]
Revised hypotheses if needed:
- [...]4. Root Cause Identification
4. 根因定位
Confirmed root cause: [...]
Contributing factors: [...]
Why it wasn't caught earlier: [...]Confirmed root cause: [...]
Contributing factors: [...]
Why it wasn't caught earlier: [...]5. Fix + Validation
5. 修复与验证
Fix implemented: [...]
Tests added: [...]
Validation: [...]
Prevention: [...]Fix implemented: [...]
Tests added: [...]
Validation: [...]
Prevention: [...]Structured Debugging Templates
结构化调试模板
Template 1: MECE Bug Analysis (Mutually Exclusive, Collectively Exhaustive)
模板1:MECE Bug分析(相互独立、完全穷尽)
markdown
undefinedmarkdown
undefinedBug: [Title]
Bug: [Title]
Problem Statement
Problem Statement
- What: [Precise description]
- Where: [System/component]
- When: [Conditions/triggers]
- Impact: [Severity/scope]
- What: [Precise description]
- Where: [System/component]
- When: [Conditions/triggers]
- Impact: [Severity/scope]
MECE Hypothesis Tree
MECE Hypothesis Tree
Layer 1: System Boundaries
- Frontend issue
- Backend API issue
- Database issue
- Infrastructure/network issue
- External dependency issue
Layer 2: Component-Specific (based on Layer 1 finding)
- [Sub-component A]
- [Sub-component B]
- [Sub-component C]
Layer 3: Code-Level (based on Layer 2 finding)
- Logic error
- State management
- Resource handling
- Configuration
Layer 1: System Boundaries
- Frontend issue
- Backend API issue
- Database issue
- Infrastructure/network issue
- External dependency issue
Layer 2: Component-Specific (based on Layer 1 finding)
- [Sub-component A]
- [Sub-component B]
- [Sub-component C]
Layer 3: Code-Level (based on Layer 2 finding)
- Logic error
- State management
- Resource handling
- Configuration
Investigation Log
Investigation Log
| Time | Action | Result | Next Step |
|---|---|---|---|
| [HH:MM] | [What you tested] | [Finding] | [Decision] |
| Time | Action | Result | Next Step |
|---|---|---|---|
| [HH:MM] | [What you tested] | [Finding] | [Decision] |
Root Cause
Root Cause
[Final determination with evidence]
[Final determination with evidence]
Fix
Fix
[Solution with rationale]
undefined[Solution with rationale]
undefinedTemplate 2: 5 Whys Analysis
模板2:5个为什么分析
markdown
undefinedmarkdown
undefinedIssue: [Brief description]
Issue: [Brief description]
Symptom: [Observable problem]
Why 1: Why did this happen?
→ [Answer]
Why 2: Why did [answer from Why 1] occur?
→ [Answer]
Why 3: Why did [answer from Why 2] occur?
→ [Answer]
Why 4: Why did [answer from Why 3] occur?
→ [Answer]
Why 5: Why did [answer from Why 4] occur?
→ [Root cause]
Fix: [Addresses root cause]
Prevention: [Process/check to prevent recurrence]
undefinedSymptom: [Observable problem]
Why 1: Why did this happen?
→ [Answer]
Why 2: Why did [answer from Why 1] occur?
→ [Answer]
Why 3: Why did [answer from Why 2] occur?
→ [Answer]
Why 4: Why did [answer from Why 3] occur?
→ [Answer]
Why 5: Why did [answer from Why 4] occur?
→ [Root cause]
Fix: [Addresses root cause]
Prevention: [Process/check to prevent recurrence]
undefinedTemplate 3: Timeline Reconstruction
模板3:时间线重建
markdown
undefinedmarkdown
undefinedIncident Timeline: [Event]
Incident Timeline: [Event]
Goal: Reconstruct exact sequence leading to failure
| Time | Event | System State | Evidence |
|---|---|---|---|
| T-5min | [Normal operation] | [State] | [Logs] |
| T-2min | [Trigger event] | [State change] | [Logs/metrics] |
| T-30s | [Cascade starts] | [Degraded] | [Alerts] |
| T-0 | [Failure] | [Failed state] | [Error logs] |
| T+5min | [Recovery action] | [Recovering] | [Actions taken] |
Critical Path: [Sequence of events that led to failure]
Alternative Scenarios: [What could have prevented it at each step]
undefinedGoal: Reconstruct exact sequence leading to failure
| Time | Event | System State | Evidence |
|---|---|---|---|
| T-5min | [Normal operation] | [State] | [Logs] |
| T-2min | [Trigger event] | [State change] | [Logs/metrics] |
| T-30s | [Cascade starts] | [Degraded] | [Alerts] |
| T-0 | [Failure] | [Failed state] | [Error logs] |
| T+5min | [Recovery action] | [Recovering] | [Actions taken] |
Critical Path: [Sequence of events that led to failure]
Alternative Scenarios: [What could have prevented it at each step]
undefinedPython Debugging Patterns
Python调试模式
Hypothesis-Driven Python Debugging Example
基于假设的Python调试示例
```python
"""
Bug: API endpoint returns 500 error intermittently
Symptoms: 1 in 10 requests fail, always with same user IDs
Hypothesis: Race condition in user data caching
"""
python
"""
Bug: API endpoint returns 500 error intermittently
Symptoms: 1 in 10 requests fail, always with same user IDs
Hypothesis: Race condition in user data caching
"""H1: Cache key collision between users
H1: Cache key collision between users
Test: Add detailed logging around cache operations
Test: Add detailed logging around cache operations
import logging
logging.basicConfig(level=logging.DEBUG)
def get_user(user_id):
cache_key = f"user:{user_id}"
logging.debug(f"Fetching cache key: {cache_key} for user {user_id}")
cached = cache.get(cache_key)
if cached:
logging.debug(f"Cache hit: {cache_key} -> {cached}")
return cached
user = db.query(User).filter_by(id=user_id).first()
logging.debug(f"DB fetch for user {user_id}: {user}")
cache.set(cache_key, user, timeout=300)
logging.debug(f"Cache set: {cache_key} -> {user}")
return userimport logging
logging.basicConfig(level=logging.DEBUG)
def get_user(user_id):
cache_key = f"user:{user_id}"
logging.debug(f"Fetching cache key: {cache_key} for user {user_id}")
cached = cache.get(cache_key)
if cached:
logging.debug(f"Cache hit: {cache_key} -> {cached}")
return cached
user = db.query(User).filter_by(id=user_id).first()
logging.debug(f"DB fetch for user {user_id}: {user}")
cache.set(cache_key, user, timeout=300)
logging.debug(f"Cache set: {cache_key} -> {user}")
return userResult: Discovered cache_key had different format in different code paths
Result: Discovered cache_key had different format in different code paths
Root cause: String formatting inconsistency (f"user:{id}" vs f"user_{id}")
Root cause: String formatting inconsistency (f"user:{id}" vs f"user_{id}")
```
undefinedAdvanced Debugging with Context Managers
使用上下文管理器的高级调试
```python
import time
from contextlib import contextmanager
@contextmanager
def debug_timer(operation_name):
"""Time operations and log if slow"""
start = time.perf_counter()
try:
yield
finally:
duration = time.perf_counter() - start
if duration > 1.0: # Slow operation threshold
logging.warning(
f"{operation_name} took {duration:.2f}s",
extra={'operation': operation_name, 'duration': duration}
)
python
import time
from contextlib import contextmanager
@contextmanager
def debug_timer(operation_name):
"""Time operations and log if slow"""
start = time.perf_counter()
try:
yield
finally:
duration = time.perf_counter() - start
if duration > 1.0: # Slow operation threshold
logging.warning(
f"{operation_name} took {duration:.2f}s",
extra={'operation': operation_name, 'duration': duration}
)Usage
Usage
with debug_timer("database_query"):
results = db.query(User).filter(...).all()
@contextmanager
def hypothesis_test(hypothesis_name, expected_outcome):
"""Test and validate debugging hypotheses"""
print(f"\n=== Testing: {hypothesis_name} ===")
print(f"Expected: {expected_outcome}")
start_state = capture_state()
try:
yield
finally:
end_state = capture_state()
outcome = compare_states(start_state, end_state)
print(f"Actual: {outcome}")
print(f"Hypothesis {'CONFIRMED' if outcome == expected_outcome else 'REJECTED'}")
with debug_timer("database_query"):
results = db.query(User).filter(...).all()
@contextmanager
def hypothesis_test(hypothesis_name, expected_outcome):
"""Test and validate debugging hypotheses"""
print(f"\n=== Testing: {hypothesis_name} ===")
print(f"Expected: {expected_outcome}")
start_state = capture_state()
try:
yield
finally:
end_state = capture_state()
outcome = compare_states(start_state, end_state)
print(f"Actual: {outcome}")
print(f"Hypothesis {'CONFIRMED' if outcome == expected_outcome else 'REJECTED'}")
Usage
Usage
with hypothesis_test(
"H1: Database connection pool exhaustion",
expected_outcome="pool_size increases during load"
):
# Run load test
for i in range(100):
api_call()
```
with hypothesis_test(
"H1: Database connection pool exhaustion",
expected_outcome="pool_size increases during load"
):
# Run load test
for i in range(100):
api_call()
undefinedpdb Debugger with Advanced Techniques
高级pdb调试技巧
```python
python
undefinedBasic breakpoint
Basic breakpoint
import pdb; pdb.set_trace()
import pdb; pdb.set_trace()
Python 3.7+
Python 3.7+
breakpoint()
breakpoint()
Conditional breakpoint
Conditional breakpoint
if user_id == 12345:
breakpoint()
if user_id == 12345:
breakpoint()
Post-mortem debugging (debug after crash)
Post-mortem debugging (debug after crash)
import pdb
try:
risky_function()
except Exception:
pdb.post_mortem()
import pdb
try:
risky_function()
except Exception:
pdb.post_mortem()
Common pdb commands
Common pdb commands
n(ext) - Execute next line
n(ext) - Execute next line
s(tep) - Step into function
s(tep) - Step into function
c(ontinue) - Continue execution
c(ontinue) - Continue execution
p expr - Print expression
p expr - Print expression
pp expr - Pretty print
pp expr - Pretty print
l(ist) - Show source code
l(ist) - Show source code
w(here) - Show stack trace
w(here) - Show stack trace
u(p) - Move up stack frame
u(p) - Move up stack frame
d(own) - Move down stack frame
d(own) - Move down stack frame
b(reak) - Set breakpoint
b(reak) - Set breakpoint
cl(ear) - Clear breakpoint
cl(ear) - Clear breakpoint
q(uit) - Quit debugger
q(uit) - Quit debugger
Advanced: Programmatic debugging
Advanced: Programmatic debugging
import pdb
pdb.run('my_function()', globals(), locals())
```
import pdb
pdb.run('my_function()', globals(), locals())
undefinedLogging
日志记录
```python
import logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('debug.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger(name)
logger.debug("Debug message")
logger.info("Info message")
logger.warning("Warning message")
logger.error("Error message", exc_info=True)
```
python
import logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('debug.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
logger.debug("Debug message")
logger.info("Info message")
logger.warning("Warning message")
logger.error("Error message", exc_info=True)Exception Handling
异常处理
```python
import traceback
try:
result = risky_operation()
except Exception as e:
# Log full traceback
logger.error(f"Operation failed: {e}")
logger.error(traceback.format_exc())
# Or get traceback as string
tb = traceback.format_exception(type(e), e, e.__traceback__)
error_details = ''.join(tb)```
python
import traceback
try:
result = risky_operation()
except Exception as e:
# Log full traceback
logger.error(f"Operation failed: {e}")
logger.error(traceback.format_exc())
# Or get traceback as string
tb = traceback.format_exception(type(e), e, e.__traceback__)
error_details = ''.join(tb)JavaScript/Node.js Debugging
JavaScript/Node.js调试
Hypothesis-Driven JavaScript Debugging Example
基于假设的JavaScript调试示例
```javascript
/**
- Bug: Memory leak in websocket connections
- Symptoms: Memory grows over time, eventually crashes
- Hypothesis: Event listeners not cleaned up on disconnect */
// H1: Event listeners accumulating
// Test: Track listener counts
class WebSocketManager {
constructor() {
this.connections = new Map();
this.debugListenerCounts = true;
}
addConnection(userId, socket) {
console.debug(`[H1 Test] Adding connection for user ${userId}`);
if (this.debugListenerCounts) {
console.debug(\`[H1] Listener count before: \${socket.listenerCount('message')}\`);
}
socket.on('message', (data) => this.handleMessage(userId, data));
socket.on('close', () => this.removeConnection(userId));
if (this.debugListenerCounts) {
console.debug(\`[H1] Listener count after: \${socket.listenerCount('message')}\`);
}
this.connections.set(userId, socket);}
removeConnection(userId) {
console.debug(`[H1 Test] Removing connection for user ${userId}`);
const socket = this.connections.get(userId);
if (socket) {
const messageListenerCount = socket.listenerCount('message');
console.debug(\`[H1] Listeners still attached: \${messageListenerCount}\`);
// Result: Found 3+ listeners on same event!
// Root cause: Not removing listeners on reconnect
socket.removeAllListeners();
this.connections.delete(userId);
}}
}
```
javascript
/**
* Bug: Memory leak in websocket connections
* Symptoms: Memory grows over time, eventually crashes
* Hypothesis: Event listeners not cleaned up on disconnect
*/
// H1: Event listeners accumulating
// Test: Track listener counts
class WebSocketManager {
constructor() {
this.connections = new Map();
this.debugListenerCounts = true;
}
addConnection(userId, socket) {
console.debug(`[H1 Test] Adding connection for user ${userId}`);
if (this.debugListenerCounts) {
console.debug(`[H1] Listener count before: ${socket.listenerCount('message')}`);
}
socket.on('message', (data) => this.handleMessage(userId, data));
socket.on('close', () => this.removeConnection(userId));
if (this.debugListenerCounts) {
console.debug(`[H1] Listener count after: ${socket.listenerCount('message')}`);
}
this.connections.set(userId, socket);
}
removeConnection(userId) {
console.debug(`[H1 Test] Removing connection for user ${userId}`);
const socket = this.connections.get(userId);
if (socket) {
const messageListenerCount = socket.listenerCount('message');
console.debug(`[H1] Listeners still attached: ${messageListenerCount}`);
// Result: Found 3+ listeners on same event!
// Root cause: Not removing listeners on reconnect
socket.removeAllListeners();
this.connections.delete(userId);
}
}
}Advanced Console Debugging
高级控制台调试
```javascript
// Basic logging
console.log('Basic log');
console.error('Error message');
console.warn('Warning');
// Object inspection with depth
console.dir(object, { depth: null, colors: true });
console.table(array);
// Performance timing
console.time('operation');
// ... code ...
console.timeEnd('operation');
// Memory usage
console.memory; // Chrome only
// Stack trace
console.trace('Trace point');
// Grouping for organized logs
console.group('User Authentication Flow');
console.log('Step 1: Validate credentials');
console.log('Step 2: Generate token');
console.groupEnd();
// Conditional logging
const debug = (label, data) => {
if (process.env.DEBUG) {
console.log(`[DEBUG] ${label}:`, JSON.stringify(data, null, 2));
}
};
// Hypothesis testing helper
function testHypothesis(name, test, expected) {
console.group(`Testing: ${name}`);
console.log(`Expected: ${expected}`);
const actual = test();
console.log(`Actual: ${actual}`);
console.log(`Result: ${actual === expected ? 'PASS' : 'FAIL'}`);
console.groupEnd();
return actual === expected;
}
// Usage
testHypothesis(
'H1: Cache returns stale data',
() => cache.get('key').timestamp,
Date.now()
);
```
javascript
// Basic logging
console.log('Basic log');
console.error('Error message');
console.warn('Warning');
// Object inspection with depth
console.dir(object, { depth: null, colors: true });
console.table(array);
// Performance timing
console.time('operation');
// ... code ...
console.timeEnd('operation');
// Memory usage
console.memory; // Chrome only
// Stack trace
console.trace('Trace point');
// Grouping for organized logs
console.group('User Authentication Flow');
console.log('Step 1: Validate credentials');
console.log('Step 2: Generate token');
console.groupEnd();
// Conditional logging
const debug = (label, data) => {
if (process.env.DEBUG) {
console.log(`[DEBUG] ${label}:`, JSON.stringify(data, null, 2));
}
};
// Hypothesis testing helper
function testHypothesis(name, test, expected) {
console.group(`Testing: ${name}`);
console.log(`Expected: ${expected}`);
const actual = test();
console.log(`Actual: ${actual}`);
console.log(`Result: ${actual === expected ? 'PASS' : 'FAIL'}`);
console.groupEnd();
return actual === expected;
}
// Usage
testHypothesis(
'H1: Cache returns stale data',
() => cache.get('key').timestamp,
Date.now()
);Debugging Async/Promise Issues
异步/Promise问题调试
```javascript
// Track promise chains
const debugPromise = (label, promise) => {
console.log(`[${label}] Started`);
return promise
.then(result => {
console.log(`[${label}] Resolved:`, result);
return result;
})
.catch(error => {
console.error(`[${label}] Rejected:`, error);
throw error;
});
};
// Usage
await debugPromise('DB Query', db.users.findOne({ id: 123 }));
// Debugging race conditions
async function debugRaceCondition() {
const operations = [
{ name: 'Op1', fn: async () => { await delay(100); return 'A'; } },
{ name: 'Op2', fn: async () => { await delay(50); return 'B'; } },
{ name: 'Op3', fn: async () => { await delay(150); return 'C'; } }
];
const results = await Promise.allSettled(
operations.map(async op => {
const start = Date.now();
const result = await op.fn();
const duration = Date.now() - start;
console.log(`${op.name} completed in ${duration}ms: ${result}`);
return { op: op.name, result, duration };
})
);
console.table(results.map(r => r.value));
}
// Debugging memory leaks with weak references
class DebugMemoryLeaks {
constructor() {
this.weakMap = new WeakMap();
this.strongRefs = new Map();
}
trackObject(id, obj) {
// Weak reference - will be GC'd if no other references
this.weakMap.set(obj, { id, created: Date.now() });
// Strong reference - prevents GC (potential leak source)
this.strongRefs.set(id, obj);
console.log(\`Tracking \${id}: Strong refs=\${this.strongRefs.size}\`);}
release(id) {
this.strongRefs.delete(id);
console.log(`Released ${id}: Strong refs=${this.strongRefs.size}`);
}
checkLeaks() {
console.log(`Potential leaks: ${this.strongRefs.size} strong references`);
return Array.from(this.strongRefs.keys());
}
}
```
javascript
// Track promise chains
const debugPromise = (label, promise) => {
console.log(`[${label}] Started`);
return promise
.then(result => {
console.log(`[${label}] Resolved:`, result);
return result;
})
.catch(error => {
console.error(`[${label}] Rejected:`, error);
throw error;
});
};
// Usage
await debugPromise('DB Query', db.users.findOne({ id: 123 }));
// Debugging race conditions
async function debugRaceCondition() {
const operations = [
{ name: 'Op1', fn: async () => { await delay(100); return 'A'; } },
{ name: 'Op2', fn: async () => { await delay(50); return 'B'; } },
{ name: 'Op3', fn: async () => { await delay(150); return 'C'; } }
];
const results = await Promise.allSettled(
operations.map(async op => {
const start = Date.now();
const result = await op.fn();
const duration = Date.now() - start;
console.log(`${op.name} completed in ${duration}ms: ${result}`);
return { op: op.name, result, duration };
})
);
console.table(results.map(r => r.value));
}
// Debugging memory leaks with weak references
class DebugMemoryLeaks {
constructor() {
this.weakMap = new WeakMap();
this.strongRefs = new Map();
}
trackObject(id, obj) {
// Weak reference - will be GC'd if no other references
this.weakMap.set(obj, { id, created: Date.now() });
// Strong reference - prevents GC (potential leak source)
this.strongRefs.set(id, obj);
console.log(`Tracking ${id}: Strong refs=${this.strongRefs.size}`);
}
release(id) {
this.strongRefs.delete(id);
console.log(`Released ${id}: Strong refs=${this.strongRefs.size}`);
}
checkLeaks() {
console.log(`Potential leaks: ${this.strongRefs.size} strong references`);
return Array.from(this.strongRefs.keys());
}
}Node.js Inspector
Node.js调试器
```bash
bash
undefinedStart with inspector
Start with inspector
node --inspect app.js
node --inspect-brk app.js # Break on first line
node --inspect app.js
node --inspect-brk app.js # Break on first line
Debug with Chrome DevTools
Debug with Chrome DevTools
Open chrome://inspect
Open chrome://inspect
```
undefinedVS Code Debug Configuration
VS Code调试配置
```json
{
"version": "0.2.0",
"configurations": [
{
"type": "node",
"request": "launch",
"name": "Debug Agent",
"program": "${workspaceFolder}/src/index.js",
"env": {
"NODE_ENV": "development"
}
}
]
}
```
json
{
"version": "0.2.0",
"configurations": [
{
"type": "node",
"request": "launch",
"name": "Debug Agent",
"program": "${workspaceFolder}/src/index.js",
"env": {
"NODE_ENV": "development"
}
}
]
}Container Debugging
容器调试
Docker
Docker
```bash
bash
undefinedView logs
View logs
docker logs <container> --tail=100 -f
docker logs <container> --tail=100 -f
Execute shell
Execute shell
docker exec -it <container> /bin/sh
docker exec -it <container> /bin/sh
Inspect container
Inspect container
docker inspect <container>
docker inspect <container>
Resource usage
Resource usage
docker stats <container>
docker stats <container>
Debug running container
Debug running container
docker run -it --rm
--network=container:<target>
nicolaka/netshoot ```
--network=container:<target>
nicolaka/netshoot ```
docker run -it --rm
--network=container:<target>
nicolaka/netshoot
--network=container:<target>
nicolaka/netshoot
undefinedKubernetes
Kubernetes
```bash
bash
undefinedPod logs
Pod logs
kubectl logs <pod> -n agents -f
kubectl logs <pod> -n agents --previous # Previous crash
kubectl logs <pod> -n agents -f
kubectl logs <pod> -n agents --previous # Previous crash
Execute in pod
Execute in pod
kubectl exec -it <pod> -n agents -- /bin/sh
kubectl exec -it <pod> -n agents -- /bin/sh
Debug with ephemeral container
Debug with ephemeral container
kubectl debug <pod> -n agents -it --image=busybox
kubectl debug <pod> -n agents -it --image=busybox
Port forward for local debugging
Port forward for local debugging
kubectl port-forward <pod> 8080:8080 -n agents
kubectl port-forward <pod> 8080:8080 -n agents
Events
Events
kubectl get events -n agents --sort-by='.lastTimestamp'
kubectl get events -n agents --sort-by='.lastTimestamp'
Resource usage
Resource usage
kubectl top pods -n agents
```
kubectl top pods -n agents
undefinedLog Analysis
日志分析
Pattern Matching
模式匹配
```bash
bash
undefinedSearch logs for errors
Search logs for errors
grep -i "error|exception|failed" app.log
grep -i "error|exception|failed" app.log
Count occurrences
Count occurrences
grep -c "ERROR" app.log
grep -c "ERROR" app.log
Context around matches
Context around matches
grep -B 5 -A 5 "OutOfMemory" app.log
grep -B 5 -A 5 "OutOfMemory" app.log
Filter by time range
Filter by time range
awk '/2024-01-15 10:00/,/2024-01-15 11:00/' app.log
```
awk '/2024-01-15 10:00/,/2024-01-15 11:00/' app.log
undefinedJSON Logs
JSON日志
```bash
bash
undefinedParse JSON logs with jq
Parse JSON logs with jq
cat app.log | jq 'select(.level == "error")'
cat app.log | jq 'select(.timestamp > "2024-01-15T10:00:00")'
cat app.log | jq 'select(.level == "error")'
cat app.log | jq 'select(.timestamp > "2024-01-15T10:00:00")'
Extract specific fields
Extract specific fields
cat app.log | jq -r '[.timestamp, .level, .message] | @tsv'
```
cat app.log | jq -r '[.timestamp, .level, .message] | @tsv'
undefinedPerformance Debugging
性能调试
Python Profiling
Python性能分析
```python
python
undefinedcProfile
cProfile
import cProfile
cProfile.run('main()', 'output.prof')
import cProfile
cProfile.run('main()', 'output.prof')
Line profiler
Line profiler
@profile
def slow_function():
pass
@profile
def slow_function():
pass
Memory profiler
Memory profiler
from memory_profiler import profile
@profile
def memory_heavy():
pass
```
from memory_profiler import profile
@profile
def memory_heavy():
pass
undefinedNetwork Debugging
网络调试
```bash
bash
undefinedCheck connectivity
Check connectivity
ping <host>
telnet <host> <port>
nc -zv <host> <port>
ping <host>
telnet <host> <port>
nc -zv <host> <port>
DNS resolution
DNS resolution
nslookup <host>
dig <host>
nslookup <host>
dig <host>
HTTP debugging
HTTP debugging
curl -v http://localhost:8080/health
curl -X POST -d '{"test": true}' -H "Content-Type: application/json" http://localhost:8080/api
```
curl -v http://localhost:8080/health
curl -X POST -d '{"test": true}' -H "Content-Type: application/json" http://localhost:8080/api
undefinedCommon Debug Checklist
通用调试检查清单
- Check Logs: Application, system, container logs
- Verify Configuration: Environment variables, config files
- Test Connectivity: Network, database, external services
- Check Resources: CPU, memory, disk space
- Review Recent Changes: Git log, deployment history
- Reproduce Locally: Same environment, same data
- Binary Search: Isolate the problem scope
- 检查日志:应用日志、系统日志、容器日志
- 验证配置:环境变量、配置文件
- 测试连通性:网络、数据库、外部服务
- 检查资源使用:CPU、内存、磁盘空间
- 查看最近变更:Git日志、部署历史
- 本地复现:相同环境、相同数据
- 二分排查:缩小问题范围
Debugging Decision Tree
调试决策树
Use this decision tree to determine the right debugging approach:
START: What kind of bug?
│
├─ Known error message/stack trace
│ └─ Use: Direct log analysis + Stack trace walkthrough
│
├─ Intermittent/Race condition
│ └─ Use: Extended thinking + Timeline reconstruction + Hypothesis-driven
│
├─ Performance degradation
│ └─ Use: Profiling + Hypothesis-driven + MECE analysis
│
├─ Distributed system failure
│ └─ Use: Extended thinking + Timeline reconstruction + Multi-system tracing
│
├─ Complex state bug
│ └─ Use: Extended thinking + Hypothesis-driven + pdb/debugger
│
├─ Memory leak
│ └─ Use: Memory profiling + Hypothesis-driven + Weak reference analysis
│
└─ Unknown root cause
└─ Use: Extended thinking + MECE analysis + 5 Whys使用以下决策树选择合适的调试方法:
START: What kind of bug?
│
├─ Known error message/stack trace
│ └─ Use: Direct log analysis + Stack trace walkthrough
│
├─ Intermittent/Race condition
│ └─ Use: Extended thinking + Timeline reconstruction + Hypothesis-driven
│
├─ Performance degradation
│ └─ Use: Profiling + Hypothesis-driven + MECE analysis
│
├─ Distributed system failure
│ └─ Use: Extended thinking + Timeline reconstruction + Multi-system tracing
│
├─ Complex state bug
│ └─ Use: Extended thinking + Hypothesis-driven + pdb/debugger
│
├─ Memory leak
│ └─ Use: Memory profiling + Hypothesis-driven + Weak reference analysis
│
└─ Unknown root cause
└─ Use: Extended thinking + MECE analysis + 5 WhysBest Practices for Complex Debugging
复杂调试最佳实践
1. Document Your Investigation
1. 记录你的调查过程
Always maintain a debugging log:
markdown
undefined始终维护调试日志:
markdown
undefinedBug Investigation: [Title]
Bug Investigation: [Title]
Start Time: 2024-01-15 10:00
Investigator: [Name]
Start Time: 2024-01-15 10:00
Investigator: [Name]
Timeline
Timeline
- 10:00 - Started investigation, checked logs
- 10:15 - Found error pattern in auth service
- 10:30 - Hypothesis: Cache expiration race condition
- 10:45 - Added debug logging, confirmed hypothesis
- 11:00 - Implemented fix, testing
- 10:00 - Started investigation, checked logs
- 10:15 - Found error pattern in auth service
- 10:30 - Hypothesis: Cache expiration race condition
- 10:45 - Added debug logging, confirmed hypothesis
- 11:00 - Implemented fix, testing
Hypotheses Tested
Hypotheses Tested
- H1: Cache race condition (CONFIRMED)
- H2: Database connection pool (REJECTED)
- H3: Network timeout (NOT TESTED)
- H1: Cache race condition (CONFIRMED)
- H2: Database connection pool (REJECTED)
- H3: Network timeout (NOT TESTED)
Root Cause
Root Cause
[Final determination]
[Final determination]
Fix Applied
Fix Applied
[Solution details]
[Solution details]
Prevention
Prevention
[How to prevent recurrence]
undefined[How to prevent recurrence]
undefined2. Use the Scientific Method
2. 运用科学方法
- Observe: Gather symptoms, error messages, logs
- Hypothesize: Generate 3-5 plausible explanations
- Predict: What would you see if hypothesis is true?
- Test: Design experiments to validate/invalidate
- Analyze: Compare predictions vs actual results
- Conclude: Confirm root cause with evidence
- 观察:收集症状、错误信息、日志
- 假设:生成3-5个合理的解释
- 预测:如果假设成立,你会看到什么?
- 测试:设计实验验证/推翻假设
- 分析:比较预测与实际结果
- 结论:用证据确认根因
3. Leverage Extended Thinking
3. 利用深度思考
When to activate extended thinking:
- Complexity threshold: More than 3 interacting systems
- Uncertainty high: Multiple equally plausible causes
- Stakes high: Production outage, security issue, data loss
- Pattern unclear: No obvious error messages or logs
- Time-sensitive: Need systematic approach under pressure
何时激活深度思考:
- 复杂度阈值:涉及3个以上交互系统
- 不确定性高:多个看似合理的原因
- 风险高:生产环境故障、安全问题、数据丢失
- 模式不清晰:无明显错误信息或日志
- 时间紧迫:需要在压力下采用系统化方法
4. Avoid Common Pitfalls
4. 避免常见陷阱
markdown
AVOID:
- ❌ Changing multiple things at once (can't isolate cause)
- ❌ Assuming first hypothesis is correct (confirmation bias)
- ❌ Debugging without logs/evidence (guessing)
- ❌ Not documenting what you tried (repeating failed attempts)
- ❌ Skipping reproduction step (fix might not work)
DO:
- ✅ Change one variable at a time
- ✅ Test multiple hypotheses systematically
- ✅ Add instrumentation before debugging
- ✅ Keep investigation log
- ✅ Write regression test after fixmarkdown
AVOID:
- ❌ 同时修改多个内容(无法定位原因)
- ❌ 假设第一个假设是正确的(确认偏差)
- ❌ 无日志/证据调试(猜测)
- ❌ 不记录尝试过的操作(重复失败的尝试)
- ❌ 跳过复现步骤(修复可能无效)
DO:
- ✅ 一次只修改一个变量
- ✅ 系统化测试多个假设
- ✅ 调试前添加监控
- ✅ 保留调查日志
- ✅ 修复后编写回归测试5. Debugging Instrumentation Patterns
5. 调试监控模式
python
undefinedpython
undefinedPython: Comprehensive debugging decorator
Python: 全面调试装饰器
import functools
import time
import logging
def debug_trace(func):
"""Decorator to trace function execution with timing and state"""
@functools.wraps(func)
def wrapper(*args, **kwargs):
func_name = func.qualname
logger.debug(f"→ Entering {func_name}")
logger.debug(f" Args: {args}")
logger.debug(f" Kwargs: {kwargs}")
start = time.perf_counter()
try:
result = func(*args, **kwargs)
duration = time.perf_counter() - start
logger.debug(f"← Exiting {func_name} ({duration:.3f}s)")
logger.debug(f" Result: {result}")
return result
except Exception as e:
duration = time.perf_counter() - start
logger.error(f"✗ Exception in {func_name} ({duration:.3f}s): {e}")
raise
return wrapperimport functools
import time
import logging
def debug_trace(func):
"""Decorator to trace function execution with timing and state"""
@functools.wraps(func)
def wrapper(*args, **kwargs):
func_name = func.qualname
logger.debug(f"→ Entering {func_name}")
logger.debug(f" Args: {args}")
logger.debug(f" Kwargs: {kwargs}")
start = time.perf_counter()
try:
result = func(*args, **kwargs)
duration = time.perf_counter() - start
logger.debug(f"← Exiting {func_name} ({duration:.3f}s)")
logger.debug(f" Result: {result}")
return result
except Exception as e:
duration = time.perf_counter() - start
logger.error(f"✗ Exception in {func_name} ({duration:.3f}s): {e}")
raise
return wrapperUsage
Usage
@debug_trace
def complex_operation(user_id, data):
# Your code here
pass
```javascript
// JavaScript: Comprehensive debugging wrapper
function debugTrace(label) {
return function(target, propertyKey, descriptor) {
const originalMethod = descriptor.value;
descriptor.value = async function(...args) {
console.log(\`→ Entering \${label || propertyKey}\`);
console.log(\` Args:\`, args);
const start = performance.now();
try {
const result = await originalMethod.apply(this, args);
const duration = performance.now() - start;
console.log(\`← Exiting \${label || propertyKey} (\${duration.toFixed(2)}ms)\`);
console.log(\` Result:\`, result);
return result;
} catch (error) {
const duration = performance.now() - start;
console.error(\`✗ Exception in \${label || propertyKey} (\${duration.toFixed(2)}ms):\`, error);
throw error;
}
};
return descriptor;
};
}
// Usage
class UserService {
@debugTrace('UserService.getUser')
async getUser(userId) {
// Your code here
}
}@debug_trace
def complex_operation(user_id, data):
# Your code here
pass
```javascript
// JavaScript: 全面调试包装器
function debugTrace(label) {
return function(target, propertyKey, descriptor) {
const originalMethod = descriptor.value;
descriptor.value = async function(...args) {
console.log(`→ Entering ${label || propertyKey}`);
console.log(` Args:`, args);
const start = performance.now();
try {
const result = await originalMethod.apply(this, args);
const duration = performance.now() - start;
console.log(`← Exiting ${label || propertyKey} (${duration.toFixed(2)}ms)`);
console.log(` Result:`, result);
return result;
} catch (error) {
const duration = performance.now() - start;
console.error(`✗ Exception in ${label || propertyKey} (${duration.toFixed(2)}ms):`, error);
throw error;
}
};
return descriptor;
};
}
// Usage
class UserService {
@debugTrace('UserService.getUser')
async getUser(userId) {
// Your code here
}
}Cross-References and Related Skills
交叉引用与相关技能
Related Skills
相关技能
This debugging skill integrates with:
-
extended-thinking ()
.claude/skills/extended-thinking/SKILL.md- Use for: Complex bugs with unknown root causes
- Activation: Add "use extended thinking" to your debugging prompt
- Benefit: Deeper pattern recognition, systematic hypothesis generation
-
complex-reasoning ()
.claude/skills/complex-reasoning/SKILL.md- Use for: Multi-step debugging requiring logical chains
- Patterns: Chain-of-thought, tree-of-thought for bug investigation
- Benefit: Structured reasoning through complex bug scenarios
-
deep-analysis ()
.claude/skills/deep-analysis/SKILL.md- Use for: Post-mortem analysis, root cause investigation
- Patterns: Comprehensive code review, architectural analysis
- Benefit: Identifies systemic issues beyond surface bugs
-
testing ()
.claude/skills/testing/SKILL.md- Use for: Writing regression tests after bug fix
- Integration: Bug → Debug → Fix → Test → Validate
- Benefit: Ensures bug doesn't recur
-
kubernetes ()
.claude/skills/kubernetes/SKILL.md- Use for: Distributed system debugging in K8s
- Tools: kubectl logs, exec, debug, events
- Integration: Container debugging patterns
此调试技能可与以下技能集成:
-
extended-thinking()
.claude/skills/extended-thinking/SKILL.md- 适用场景:根因未知的复杂Bug
- 激活方式:在调试提示中添加“use extended thinking”
- 优势:更深入的模式识别、系统化假设生成
-
complex-reasoning()
.claude/skills/complex-reasoning/SKILL.md- 适用场景:需要逻辑链的多步骤调试
- 模式:用于Bug调查的思维链、思维树
- 优势:结构化推理复杂Bug场景
-
deep-analysis()
.claude/skills/deep-analysis/SKILL.md- 适用场景:事后分析、根因调查
- 模式:全面代码审查、架构分析
- 优势:识别表面Bug之外的系统性问题
-
testing()
.claude/skills/testing/SKILL.md- 适用场景:Bug修复后编写回归测试
- 集成流程:Bug → 调试 → 修复 → 测试 → 验证
- 优势:确保Bug不再复发
-
kubernetes()
.claude/skills/kubernetes/SKILL.md- 适用场景:K8s中的分布式系统调试
- 工具:kubectl logs、exec、debug、events
- 集成:容器调试模式
When to Combine Skills
何时组合技能
| Scenario | Skills to Combine | Reasoning |
|---|---|---|
| Production outage | debugging + extended-thinking + kubernetes | Complex distributed system requires deep reasoning |
| Intermittent test failure | debugging + testing + complex-reasoning | Need systematic hypothesis testing |
| Performance regression | debugging + deep-analysis | Root cause may be architectural |
| Security vulnerability | debugging + extended-thinking + deep-analysis | Requires careful, thorough analysis |
| Memory leak | debugging + complex-reasoning | Multi-step investigation needed |
| 场景 | 组合技能 | 理由 |
|---|---|---|
| 生产环境故障 | debugging + extended-thinking + kubernetes | 复杂分布式系统需要深度推理 |
| 间歇性测试失败 | debugging + testing + complex-reasoning | 需要系统化假设测试 |
| 性能回归 | debugging + deep-analysis | 根因可能是架构层面的问题 |
| 安全漏洞 | debugging + extended-thinking + deep-analysis | 需要仔细、全面的分析 |
| 内存泄漏 | debugging + complex-reasoning | 需要多步骤调查 |
Integration Examples
集成示例
Example 1: Complex Production Bug
示例1:复杂生产环境Bug
bash
undefinedbash
undefinedPrompt combining skills
Prompt combining skills
Claude, I have a complex production bug affecting multiple services.
Please use extended thinking and the debugging skill to help investigate.
Symptoms:
- API requests timeout intermittently (1 in 50 requests)
- Only affects authenticated users
- Started after recent deployment
- No obvious errors in logs
Please use:
- MECE analysis to categorize possible causes
- Hypothesis-driven debugging framework
- Timeline reconstruction of recent changes
undefinedClaude, I have a complex production bug affecting multiple services.
Please use extended thinking and the debugging skill to help investigate.
Symptoms:
- API requests timeout intermittently (1 in 50 requests)
- Only affects authenticated users
- Started after recent deployment
- No obvious errors in logs
Please use:
- MECE analysis to categorize possible causes
- Hypothesis-driven debugging framework
- Timeline reconstruction of recent changes
undefinedExample 2: Memory Leak Investigation
示例2:内存泄漏调查
bash
undefinedbash
undefinedPrompt combining skills
Prompt combining skills
Claude, use complex reasoning and debugging skills to investigate a memory leak.
Context:
- Node.js service memory grows from 200MB to 2GB over 6 hours
- No errors logged
- Happens only in production, not staging
Apply:
- Hypothesis-driven framework (generate 5 hypotheses)
- Memory leak detection patterns (weak references)
- Extended thinking for pattern recognition across codebase
undefinedClaude, use complex reasoning and debugging skills to investigate a memory leak.
Context:
- Node.js service memory grows from 200MB to 2GB over 6 hours
- No errors logged
- Happens only in production, not staging
Apply:
- Hypothesis-driven framework (generate 5 hypotheses)
- Memory leak detection patterns (weak references)
- Extended thinking for pattern recognition across codebase
undefinedQuick Reference Card
快速参考卡
Debugging Workflow Summary
调试工作流总结
1. OBSERVE
- Collect error messages, logs, metrics
- Identify patterns (frequency, conditions, scope)
- Document symptoms
2. HYPOTHESIZE (use extended thinking if complex)
- Generate 3-5 plausible hypotheses
- Rank by likelihood
- Design tests for each
3. TEST
- Change one variable at a time
- Add instrumentation (logging, tracing)
- Collect evidence
4. ANALYZE
- Compare predictions vs results
- Eliminate invalidated hypotheses
- Refine remaining hypotheses
5. FIX
- Implement solution
- Add regression test
- Document root cause
6. VALIDATE
- Verify fix in affected environment
- Monitor metrics
- Update documentation1. OBSERVE
- 收集错误信息、日志、指标
- 识别模式(频率、条件、范围)
- 记录症状
2. HYPOTHESIZE(复杂场景下使用深度思考)
- 生成3-5个合理假设
- 按可能性排序
- 为每个假设设计测试
3. TEST
- 一次只修改一个变量
- 添加监控(日志、追踪)
- 收集证据
4. ANALYZE
- 比较预测与结果
- 排除无效假设
- 细化剩余假设
5. FIX
- 实施解决方案
- 添加回归测试
- 记录根因
6. VALIDATE
- 在受影响环境中验证修复
- 监控指标
- 更新文档Tool Selection Guide
工具选择指南
| Problem Type | Primary Tool | Secondary Tools |
|---|---|---|
| Logic error | pdb/debugger | Logging, unit tests |
| Performance | Profiler | Hypothesis testing, metrics |
| Memory leak | Memory profiler | Weak references, heap dumps |
| Async/timing | Timeline reconstruction | Extended thinking, logging |
| Distributed | Tracing (logs) | Kubernetes tools, MECE analysis |
| Unknown cause | Extended thinking | MECE, 5 Whys, hypothesis-driven |
Skill version: 2.0 (Enhanced with extended thinking integration)
Last updated: 2024-01-15
Maintained by: Golden Armada AI Agent Fleet
| 问题类型 | 主要工具 | 次要工具 |
|---|---|---|
| 逻辑错误 | pdb/调试器 | 日志记录、单元测试 |
| 性能问题 | 性能分析器 | 假设测试、指标 |
| 内存泄漏 | 内存分析器 | 弱引用、堆转储 |
| 异步/时序问题 | 时间线重建 | 深度思考、日志 |
| 分布式系统问题 | 追踪(日志) | Kubernetes工具、MECE分析 |
| 根因未知 | 深度思考 | MECE、5个为什么、基于假设的调试 |
技能版本:2.0(增强深度思考集成)
最后更新:2024-01-15
维护者:Golden Armada AI Agent Fleet