debugging

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Debugging Skill

调试技能

Provides comprehensive debugging capabilities with integrated extended thinking for complex scenarios.

提供全面的调试功能，并集成了针对复杂场景的深度思考能力。

When to Use This Skill

何时使用此技能

Activate this skill when working with:

Error troubleshooting
Log analysis
Performance debugging
Distributed system debugging
Memory and resource issues
Complex, multi-layered bugs requiring deep reasoning

在处理以下场景时激活此技能：

错误排查
日志分析
性能调试
分布式系统调试
内存与资源问题
需要深度推理的复杂多层级Bug

Extended Thinking for Complex Debugging

复杂调试场景下的深度思考

When to Enable Extended Thinking

何时启用深度思考

Use extended thinking (Claude's deeper reasoning mode) for debugging when:

Root Cause Unknown: Multiple possible causes, unclear failure patterns
Intermittent Issues: Race conditions, timing issues, non-deterministic failures
Multi-System Failures: Distributed system bugs spanning multiple services
Performance Mysteries: Unexpected slowdowns without obvious bottlenecks
Complex State Issues: Bugs involving intricate state transitions or side effects
Security Vulnerabilities: Subtle security issues requiring careful analysis

在以下调试场景中，使用深度思考（Claude的深度推理模式）：

根因未知：存在多种可能原因，故障模式不清晰
间歇性问题：竞态条件、时序问题、非确定性故障
多系统故障：跨多个服务的分布式系统Bug
性能谜团：无明显瓶颈的意外性能下降
复杂状态问题：涉及复杂状态转换或副作用的Bug
安全漏洞：需要仔细分析的细微安全问题

How to Activate Extended Thinking

如何激活深度思考

markdown

undefined

markdown

undefined

In your debugging prompt

Claude, please use extended thinking to help debug this issue:

[Describe the problem with symptoms, context, and what you've tried]


Extended thinking will provide:
- Systematic hypothesis generation
- Multi-path investigation strategies
- Deeper pattern recognition
- Cross-domain insights (e.g., network + application + infrastructure)

Claude, please use extended thinking to help debug this issue:

[Describe the problem with symptoms, context, and what you've tried]


深度思考将提供：
- 系统化的假设生成
- 多路径调查策略
- 更深入的模式识别
- 跨领域洞察（例如：网络 + 应用 + 基础设施）

Hypothesis-Driven Debugging Framework

基于假设的调试框架

Use this structured approach for complex bugs:

针对复杂Bug，使用以下结构化方法：

1. Observation Phase

1. 观察阶段

What happened?
- Error message/stack trace
- Frequency (always/intermittent)
- When it started
- Environmental context
- Recent changes

What happened?
- Error message/stack trace
- Frequency (always/intermittent)
- When it started
- Environmental context
- Recent changes

2. Hypothesis Generation

2. 假设生成

Generate 3-5 plausible hypotheses:

H1: [Most likely cause based on symptoms]
   Evidence for: [...]
   Evidence against: [...]
   Test: [How to validate/invalidate]

H2: [Alternative explanation]
   Evidence for: [...]
   Evidence against: [...]
   Test: [How to validate/invalidate]

H3: [Edge case or rare scenario]
   Evidence for: [...]
   Evidence against: [...]
   Test: [How to validate/invalidate]

Generate 3-5 plausible hypotheses:

H1: [Most likely cause based on symptoms]
   Evidence for: [...]
   Evidence against: [...]
   Test: [How to validate/invalidate]

H2: [Alternative explanation]
   Evidence for: [...]
   Evidence against: [...]
   Test: [How to validate/invalidate]

H3: [Edge case or rare scenario]
   Evidence for: [...]
   Evidence against: [...]
   Test: [How to validate/invalidate]

3. Systematic Testing

3. 系统化测试

Priority order (high to low confidence):
1. Test H1 → Result: [Pass/Fail/Inconclusive]
2. Test H2 → Result: [Pass/Fail/Inconclusive]
3. Test H3 → Result: [Pass/Fail/Inconclusive]

New evidence discovered:
- [Finding 1]
- [Finding 2]

Revised hypotheses if needed:
- [...]

Priority order (high to low confidence):
1. Test H1 → Result: [Pass/Fail/Inconclusive]
2. Test H2 → Result: [Pass/Fail/Inconclusive]
3. Test H3 → Result: [Pass/Fail/Inconclusive]

New evidence discovered:
- [Finding 1]
- [Finding 2]

Revised hypotheses if needed:
- [...]

4. Root Cause Identification

4. 根因定位

Confirmed root cause: [...]
Contributing factors: [...]
Why it wasn't caught earlier: [...]

Confirmed root cause: [...]
Contributing factors: [...]
Why it wasn't caught earlier: [...]

5. Fix + Validation

5. 修复与验证

Fix implemented: [...]
Tests added: [...]
Validation: [...]
Prevention: [...]

Fix implemented: [...]
Tests added: [...]
Validation: [...]
Prevention: [...]

Structured Debugging Templates

结构化调试模板

Template 1: MECE Bug Analysis (Mutually Exclusive, Collectively Exhaustive)

模板1：MECE Bug分析（相互独立、完全穷尽）

markdown

undefined

markdown

undefined

Bug: [Title]

Problem Statement

What: [Precise description]
Where: [System/component]
When: [Conditions/triggers]
Impact: [Severity/scope]

What: [Precise description]
Where: [System/component]
When: [Conditions/triggers]
Impact: [Severity/scope]

MECE Hypothesis Tree

Investigation Log

Time	Action	Result	Next Step
[HH:MM]	[What you tested]	[Finding]	[Decision]

Time	Action	Result	Next Step
[HH:MM]	[What you tested]	[Finding]	[Decision]

Root Cause

[Final determination with evidence]

Fix

[Solution with rationale]

undefined

[Solution with rationale]

undefined

Template 2: 5 Whys Analysis

模板2：5个为什么分析

markdown

undefined

markdown

undefined

Issue: [Brief description]

Symptom: [Observable problem]

Why 1: Why did this happen? → [Answer]

Why 2: Why did [answer from Why 1] occur? → [Answer]

Why 3: Why did [answer from Why 2] occur? → [Answer]

Why 4: Why did [answer from Why 3] occur? → [Answer]

Why 5: Why did [answer from Why 4] occur? → [Root cause]

Fix: [Addresses root cause] Prevention: [Process/check to prevent recurrence]

undefined

Symptom: [Observable problem]

Why 1: Why did this happen? → [Answer]

Why 2: Why did [answer from Why 1] occur? → [Answer]

Why 3: Why did [answer from Why 2] occur? → [Answer]

Why 4: Why did [answer from Why 3] occur? → [Answer]

Why 5: Why did [answer from Why 4] occur? → [Root cause]

Fix: [Addresses root cause] Prevention: [Process/check to prevent recurrence]

undefined

Template 3: Timeline Reconstruction

模板3：时间线重建

markdown

undefined

markdown

undefined

Incident Timeline: [Event]

Goal: Reconstruct exact sequence leading to failure

Time	Event	System State	Evidence
T-5min	[Normal operation]	[State]	[Logs]
T-2min	[Trigger event]	[State change]	[Logs/metrics]
T-30s	[Cascade starts]	[Degraded]	[Alerts]
T-0	[Failure]	[Failed state]	[Error logs]
T+5min	[Recovery action]	[Recovering]	[Actions taken]

Critical Path: [Sequence of events that led to failure] Alternative Scenarios: [What could have prevented it at each step]

undefined

Goal: Reconstruct exact sequence leading to failure

Time	Event	System State	Evidence
T-5min	[Normal operation]	[State]	[Logs]
T-2min	[Trigger event]	[State change]	[Logs/metrics]
T-30s	[Cascade starts]	[Degraded]	[Alerts]
T-0	[Failure]	[Failed state]	[Error logs]
T+5min	[Recovery action]	[Recovering]	[Actions taken]

Critical Path: [Sequence of events that led to failure] Alternative Scenarios: [What could have prevented it at each step]

undefined

Python Debugging Patterns

Python调试模式

Hypothesis-Driven Python Debugging Example

基于假设的Python调试示例

```python """ Bug: API endpoint returns 500 error intermittently Symptoms: 1 in 10 requests fail, always with same user IDs Hypothesis: Race condition in user data caching """

python

"""
Bug: API endpoint returns 500 error intermittently
Symptoms: 1 in 10 requests fail, always with same user IDs
Hypothesis: Race condition in user data caching
"""

H1: Cache key collision between users

Test: Add detailed logging around cache operations

import logging logging.basicConfig(level=logging.DEBUG)

def get_user(user_id): cache_key = f"user:{user_id}" logging.debug(f"Fetching cache key: {cache_key} for user {user_id}")

cached = cache.get(cache_key)
if cached:
    logging.debug(f"Cache hit: {cache_key} -> {cached}")
    return cached

user = db.query(User).filter_by(id=user_id).first()
logging.debug(f"DB fetch for user {user_id}: {user}")

cache.set(cache_key, user, timeout=300)
logging.debug(f"Cache set: {cache_key} -> {user}")

return user

import logging logging.basicConfig(level=logging.DEBUG)

def get_user(user_id): cache_key = f"user:{user_id}" logging.debug(f"Fetching cache key: {cache_key} for user {user_id}")

cached = cache.get(cache_key)
if cached:
    logging.debug(f"Cache hit: {cache_key} -> {cached}")
    return cached

user = db.query(User).filter_by(id=user_id).first()
logging.debug(f"DB fetch for user {user_id}: {user}")

cache.set(cache_key, user, timeout=300)
logging.debug(f"Cache set: {cache_key} -> {user}")

return user

Result: Discovered cache_key had different format in different code paths

Root cause: String formatting inconsistency (f"user:{id}" vs f"user_{id}")

```

undefined

Advanced Debugging with Context Managers

使用上下文管理器的高级调试

```python import time from contextlib import contextmanager

@contextmanager def debug_timer(operation_name): """Time operations and log if slow""" start = time.perf_counter() try: yield finally: duration = time.perf_counter() - start if duration > 1.0: # Slow operation threshold logging.warning( f"{operation_name} took {duration:.2f}s", extra={'operation': operation_name, 'duration': duration} )

python

import time
from contextlib import contextmanager

@contextmanager
def debug_timer(operation_name):
    """Time operations and log if slow"""
    start = time.perf_counter()
    try:
        yield
    finally:
        duration = time.perf_counter() - start
        if duration > 1.0:  # Slow operation threshold
            logging.warning(
                f"{operation_name} took {duration:.2f}s",
                extra={'operation': operation_name, 'duration': duration}
            )

Usage

with debug_timer("database_query"): results = db.query(User).filter(...).all()

@contextmanager def hypothesis_test(hypothesis_name, expected_outcome): """Test and validate debugging hypotheses""" print(f"\n=== Testing: {hypothesis_name} ===") print(f"Expected: {expected_outcome}") start_state = capture_state() try: yield finally: end_state = capture_state() outcome = compare_states(start_state, end_state) print(f"Actual: {outcome}") print(f"Hypothesis {'CONFIRMED' if outcome == expected_outcome else 'REJECTED'}")

with debug_timer("database_query"): results = db.query(User).filter(...).all()

Usage

with hypothesis_test( "H1: Database connection pool exhaustion", expected_outcome="pool_size increases during load" ): # Run load test for i in range(100): api_call() ```

with hypothesis_test( "H1: Database connection pool exhaustion", expected_outcome="pool_size increases during load" ): # Run load test for i in range(100): api_call()

undefined

pdb Debugger with Advanced Techniques

高级pdb调试技巧

```python

python

undefined

Basic breakpoint

import pdb; pdb.set_trace()

Python 3.7+

breakpoint()

Conditional breakpoint

if user_id == 12345: breakpoint()

Post-mortem debugging (debug after crash)

import pdb try: risky_function() except Exception: pdb.post_mortem()

Common pdb commands

n(ext) - Execute next line

s(tep) - Step into function

c(ontinue) - Continue execution

p expr - Print expression

pp expr - Pretty print

l(ist) - Show source code

w(here) - Show stack trace

u(p) - Move up stack frame

d(own) - Move down stack frame

b(reak) - Set breakpoint

cl(ear) - Clear breakpoint

q(uit) - Quit debugger

Advanced: Programmatic debugging

import pdb pdb.run('my_function()', globals(), locals()) ```

import pdb pdb.run('my_function()', globals(), locals())

undefined

Logging

日志记录

```python import logging

logging.basicConfig( level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler('debug.log'), logging.StreamHandler() ] )

logger = logging.getLogger(name)

logger.debug("Debug message") logger.info("Info message") logger.warning("Warning message") logger.error("Error message", exc_info=True) ```

python

import logging

logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('debug.log'),
        logging.StreamHandler()
    ]
)

logger = logging.getLogger(__name__)

logger.debug("Debug message")
logger.info("Info message")
logger.warning("Warning message")
logger.error("Error message", exc_info=True)

Exception Handling

异常处理

```python import traceback

try: result = risky_operation() except Exception as e: # Log full traceback logger.error(f"Operation failed: {e}") logger.error(traceback.format_exc())

# Or get traceback as string
tb = traceback.format_exception(type(e), e, e.__traceback__)
error_details = ''.join(tb)

```

python

import traceback

try:
    result = risky_operation()
except Exception as e:
    # Log full traceback
    logger.error(f"Operation failed: {e}")
    logger.error(traceback.format_exc())

    # Or get traceback as string
    tb = traceback.format_exception(type(e), e, e.__traceback__)
    error_details = ''.join(tb)

JavaScript/Node.js Debugging

JavaScript/Node.js调试

Hypothesis-Driven JavaScript Debugging Example

基于假设的JavaScript调试示例

```javascript /**

Bug: Memory leak in websocket connections
Symptoms: Memory grows over time, eventually crashes
Hypothesis: Event listeners not cleaned up on disconnect */

// H1: Event listeners accumulating // Test: Track listener counts class WebSocketManager { constructor() { this.connections = new Map(); this.debugListenerCounts = true; }

addConnection(userId, socket) { console.debug(`[H1 Test] Adding connection for user ${userId}`);

if (this.debugListenerCounts) {
  console.debug(\`[H1] Listener count before: \${socket.listenerCount('message')}\`);
}

socket.on('message', (data) => this.handleMessage(userId, data));
socket.on('close', () => this.removeConnection(userId));

if (this.debugListenerCounts) {
  console.debug(\`[H1] Listener count after: \${socket.listenerCount('message')}\`);
}

this.connections.set(userId, socket);

}

removeConnection(userId) { console.debug(`[H1 Test] Removing connection for user ${userId}`);

const socket = this.connections.get(userId);
if (socket) {
  const messageListenerCount = socket.listenerCount('message');
  console.debug(\`[H1] Listeners still attached: \${messageListenerCount}\`);

  // Result: Found 3+ listeners on same event!
  // Root cause: Not removing listeners on reconnect
  socket.removeAllListeners();
  this.connections.delete(userId);
}

} } ```

javascript

/**
 * Bug: Memory leak in websocket connections
 * Symptoms: Memory grows over time, eventually crashes
 * Hypothesis: Event listeners not cleaned up on disconnect
 */

// H1: Event listeners accumulating
// Test: Track listener counts
class WebSocketManager {
  constructor() {
    this.connections = new Map();
    this.debugListenerCounts = true;
  }

  addConnection(userId, socket) {
    console.debug(`[H1 Test] Adding connection for user ${userId}`);

    if (this.debugListenerCounts) {
      console.debug(`[H1] Listener count before: ${socket.listenerCount('message')}`);
    }

    socket.on('message', (data) => this.handleMessage(userId, data));
    socket.on('close', () => this.removeConnection(userId));

    if (this.debugListenerCounts) {
      console.debug(`[H1] Listener count after: ${socket.listenerCount('message')}`);
    }

    this.connections.set(userId, socket);
  }

  removeConnection(userId) {
    console.debug(`[H1 Test] Removing connection for user ${userId}`);

    const socket = this.connections.get(userId);
    if (socket) {
      const messageListenerCount = socket.listenerCount('message');
      console.debug(`[H1] Listeners still attached: ${messageListenerCount}`);

      // Result: Found 3+ listeners on same event!
      // Root cause: Not removing listeners on reconnect
      socket.removeAllListeners();
      this.connections.delete(userId);
    }
  }
}

Advanced Console Debugging

高级控制台调试

```javascript // Basic logging console.log('Basic log'); console.error('Error message'); console.warn('Warning');

// Object inspection with depth console.dir(object, { depth: null, colors: true }); console.table(array);

// Performance timing console.time('operation'); // ... code ... console.timeEnd('operation');

// Memory usage console.memory; // Chrome only

// Stack trace console.trace('Trace point');

// Grouping for organized logs console.group('User Authentication Flow'); console.log('Step 1: Validate credentials'); console.log('Step 2: Generate token'); console.groupEnd();

// Conditional logging const debug = (label, data) => { if (process.env.DEBUG) { console.log(`[DEBUG] ${label}:`, JSON.stringify(data, null, 2)); } };

// Hypothesis testing helper function testHypothesis(name, test, expected) { console.group(`Testing: ${name}`); console.log(`Expected: ${expected}`); const actual = test(); console.log(`Actual: ${actual}`); console.log(`Result: ${actual === expected ? 'PASS' : 'FAIL'}`); console.groupEnd(); return actual === expected; }

// Usage testHypothesis( 'H1: Cache returns stale data', () => cache.get('key').timestamp, Date.now() ); ```

javascript

// Basic logging
console.log('Basic log');
console.error('Error message');
console.warn('Warning');

// Object inspection with depth
console.dir(object, { depth: null, colors: true });
console.table(array);

// Performance timing
console.time('operation');
// ... code ...
console.timeEnd('operation');

// Memory usage
console.memory; // Chrome only

// Stack trace
console.trace('Trace point');

// Grouping for organized logs
console.group('User Authentication Flow');
console.log('Step 1: Validate credentials');
console.log('Step 2: Generate token');
console.groupEnd();

// Conditional logging
const debug = (label, data) => {
  if (process.env.DEBUG) {
    console.log(`[DEBUG] ${label}:`, JSON.stringify(data, null, 2));
  }
};

// Hypothesis testing helper
function testHypothesis(name, test, expected) {
  console.group(`Testing: ${name}`);
  console.log(`Expected: ${expected}`);
  const actual = test();
  console.log(`Actual: ${actual}`);
  console.log(`Result: ${actual === expected ? 'PASS' : 'FAIL'}`);
  console.groupEnd();
  return actual === expected;
}

// Usage
testHypothesis(
  'H1: Cache returns stale data',
  () => cache.get('key').timestamp,
  Date.now()
);

Debugging Async/Promise Issues

异步/Promise问题调试

```javascript // Track promise chains const debugPromise = (label, promise) => { console.log(`[${label}] Started`); return promise .then(result => { console.log(`[${label}] Resolved:`, result); return result; }) .catch(error => { console.error(`[${label}] Rejected:`, error); throw error; }); };

// Usage await debugPromise('DB Query', db.users.findOne({ id: 123 }));

// Debugging race conditions async function debugRaceCondition() { const operations = [ { name: 'Op1', fn: async () => { await delay(100); return 'A'; } }, { name: 'Op2', fn: async () => { await delay(50); return 'B'; } }, { name: 'Op3', fn: async () => { await delay(150); return 'C'; } } ];

console.table(results.map(r => r.value)); }

// Debugging memory leaks with weak references class DebugMemoryLeaks { constructor() { this.weakMap = new WeakMap(); this.strongRefs = new Map(); }

trackObject(id, obj) { // Weak reference - will be GC'd if no other references this.weakMap.set(obj, { id, created: Date.now() });

// Strong reference - prevents GC (potential leak source)
this.strongRefs.set(id, obj);

console.log(\`Tracking \${id}: Strong refs=\${this.strongRefs.size}\`);

}

release(id) { this.strongRefs.delete(id); console.log(`Released ${id}: Strong refs=${this.strongRefs.size}`); }

checkLeaks() { console.log(`Potential leaks: ${this.strongRefs.size} strong references`); return Array.from(this.strongRefs.keys()); } } ```

javascript

// Track promise chains
const debugPromise = (label, promise) => {
  console.log(`[${label}] Started`);
  return promise
    .then(result => {
      console.log(`[${label}] Resolved:`, result);
      return result;
    })
    .catch(error => {
      console.error(`[${label}] Rejected:`, error);
      throw error;
    });
};

// Usage
await debugPromise('DB Query', db.users.findOne({ id: 123 }));

// Debugging race conditions
async function debugRaceCondition() {
  const operations = [
    { name: 'Op1', fn: async () => { await delay(100); return 'A'; } },
    { name: 'Op2', fn: async () => { await delay(50); return 'B'; } },
    { name: 'Op3', fn: async () => { await delay(150); return 'C'; } }
  ];

  const results = await Promise.allSettled(
    operations.map(async op => {
      const start = Date.now();
      const result = await op.fn();
      const duration = Date.now() - start;
      console.log(`${op.name} completed in ${duration}ms: ${result}`);
      return { op: op.name, result, duration };
    })
  );

  console.table(results.map(r => r.value));
}

// Debugging memory leaks with weak references
class DebugMemoryLeaks {
  constructor() {
    this.weakMap = new WeakMap();
    this.strongRefs = new Map();
  }

  trackObject(id, obj) {
    // Weak reference - will be GC'd if no other references
    this.weakMap.set(obj, { id, created: Date.now() });

    // Strong reference - prevents GC (potential leak source)
    this.strongRefs.set(id, obj);

    console.log(`Tracking ${id}: Strong refs=${this.strongRefs.size}`);
  }

  release(id) {
    this.strongRefs.delete(id);
    console.log(`Released ${id}: Strong refs=${this.strongRefs.size}`);
  }

  checkLeaks() {
    console.log(`Potential leaks: ${this.strongRefs.size} strong references`);
    return Array.from(this.strongRefs.keys());
  }
}

Node.js Inspector

Node.js调试器

```bash

bash

undefined

Start with inspector

node --inspect app.js node --inspect-brk app.js # Break on first line

Debug with Chrome DevTools

Open chrome://inspect

```

undefined

VS Code Debug Configuration

VS Code调试配置

```json { "version": "0.2.0", "configurations": [ { "type": "node", "request": "launch", "name": "Debug Agent", "program": "${workspaceFolder}/src/index.js", "env": { "NODE_ENV": "development" } } ] } ```

json

{
  "version": "0.2.0",
  "configurations": [
    {
      "type": "node",
      "request": "launch",
      "name": "Debug Agent",
      "program": "${workspaceFolder}/src/index.js",
      "env": {
        "NODE_ENV": "development"
      }
    }
  ]
}

Container Debugging

容器调试

Docker

```bash

bash

undefined

View logs

docker logs <container> --tail=100 -f

Execute shell

docker exec -it <container> /bin/sh

Inspect container

docker inspect <container>

Resource usage

docker stats <container>

Debug running container

docker run -it --rm
--network=container:<target>
nicolaka/netshoot ```

docker run -it --rm
--network=container:<target>
nicolaka/netshoot

undefined

Kubernetes

```bash

bash

undefined

Pod logs

kubectl logs <pod> -n agents -f kubectl logs <pod> -n agents --previous # Previous crash

Execute in pod

kubectl exec -it <pod> -n agents -- /bin/sh

Debug with ephemeral container

kubectl debug <pod> -n agents -it --image=busybox

Port forward for local debugging

kubectl port-forward <pod> 8080:8080 -n agents

Events

kubectl get events -n agents --sort-by='.lastTimestamp'

Resource usage

kubectl top pods -n agents ```

kubectl top pods -n agents

undefined

Log Analysis

日志分析

Pattern Matching

模式匹配

```bash

bash

undefined

Search logs for errors

grep -i "error|exception|failed" app.log

Count occurrences

grep -c "ERROR" app.log

Context around matches

grep -B 5 -A 5 "OutOfMemory" app.log

Filter by time range

awk '/2024-01-15 10:00/,/2024-01-15 11:00/' app.log ```

awk '/2024-01-15 10:00/,/2024-01-15 11:00/' app.log

undefined

JSON Logs

JSON日志

```bash

bash

undefined

Parse JSON logs with jq

cat app.log | jq 'select(.level == "error")' cat app.log | jq 'select(.timestamp > "2024-01-15T10:00:00")'

Extract specific fields

cat app.log | jq -r '[.timestamp, .level, .message] | @tsv' ```

cat app.log | jq -r '[.timestamp, .level, .message] | @tsv'

undefined

Performance Debugging

性能调试

Python Profiling

Python性能分析

```python

python

undefined

cProfile

import cProfile cProfile.run('main()', 'output.prof')

Line profiler

@profile def slow_function(): pass

Memory profiler

from memory_profiler import profile

@profile def memory_heavy(): pass ```

from memory_profiler import profile

@profile def memory_heavy(): pass

undefined

Network Debugging

网络调试

```bash

bash

undefined

Check connectivity

ping <host> telnet <host> <port> nc -zv <host> <port>

DNS resolution

nslookup <host> dig <host>

HTTP debugging

curl -v http://localhost:8080/health curl -X POST -d '{"test": true}' -H "Content-Type: application/json" http://localhost:8080/api ```

curl -v http://localhost:8080/health curl -X POST -d '{"test": true}' -H "Content-Type: application/json" http://localhost:8080/api

undefined

Common Debug Checklist

通用调试检查清单

Check Logs: Application, system, container logs
Verify Configuration: Environment variables, config files
Test Connectivity: Network, database, external services
Check Resources: CPU, memory, disk space
Review Recent Changes: Git log, deployment history
Reproduce Locally: Same environment, same data
Binary Search: Isolate the problem scope

检查日志：应用日志、系统日志、容器日志
验证配置：环境变量、配置文件
测试连通性：网络、数据库、外部服务
检查资源使用：CPU、内存、磁盘空间
查看最近变更：Git日志、部署历史
本地复现：相同环境、相同数据
二分排查：缩小问题范围

Debugging Decision Tree

调试决策树

Use this decision tree to determine the right debugging approach:

START: What kind of bug?
│
├─ Known error message/stack trace
│  └─ Use: Direct log analysis + Stack trace walkthrough
│
├─ Intermittent/Race condition
│  └─ Use: Extended thinking + Timeline reconstruction + Hypothesis-driven
│
├─ Performance degradation
│  └─ Use: Profiling + Hypothesis-driven + MECE analysis
│
├─ Distributed system failure
│  └─ Use: Extended thinking + Timeline reconstruction + Multi-system tracing
│
├─ Complex state bug
│  └─ Use: Extended thinking + Hypothesis-driven + pdb/debugger
│
├─ Memory leak
│  └─ Use: Memory profiling + Hypothesis-driven + Weak reference analysis
│
└─ Unknown root cause
   └─ Use: Extended thinking + MECE analysis + 5 Whys

使用以下决策树选择合适的调试方法：

START: What kind of bug?
│
├─ Known error message/stack trace
│  └─ Use: Direct log analysis + Stack trace walkthrough
│
├─ Intermittent/Race condition
│  └─ Use: Extended thinking + Timeline reconstruction + Hypothesis-driven
│
├─ Performance degradation
│  └─ Use: Profiling + Hypothesis-driven + MECE analysis
│
├─ Distributed system failure
│  └─ Use: Extended thinking + Timeline reconstruction + Multi-system tracing
│
├─ Complex state bug
│  └─ Use: Extended thinking + Hypothesis-driven + pdb/debugger
│
├─ Memory leak
│  └─ Use: Memory profiling + Hypothesis-driven + Weak reference analysis
│
└─ Unknown root cause
   └─ Use: Extended thinking + MECE analysis + 5 Whys

Best Practices for Complex Debugging

复杂调试最佳实践

1. Document Your Investigation

1. 记录你的调查过程

Always maintain a debugging log:

markdown

undefined

始终维护调试日志：

markdown

undefined

Bug Investigation: [Title]

Start Time: 2024-01-15 10:00 Investigator: [Name]

Timeline

10:00 - Started investigation, checked logs
10:15 - Found error pattern in auth service
10:30 - Hypothesis: Cache expiration race condition
10:45 - Added debug logging, confirmed hypothesis
11:00 - Implemented fix, testing

10:00 - Started investigation, checked logs
10:15 - Found error pattern in auth service
10:30 - Hypothesis: Cache expiration race condition
10:45 - Added debug logging, confirmed hypothesis
11:00 - Implemented fix, testing

Hypotheses Tested

H1: Cache race condition (CONFIRMED)
H2: Database connection pool (REJECTED)
H3: Network timeout (NOT TESTED)

H1: Cache race condition (CONFIRMED)
H2: Database connection pool (REJECTED)
H3: Network timeout (NOT TESTED)

Root Cause

[Final determination]

Fix Applied

[Solution details]

Prevention

[How to prevent recurrence]

undefined

[How to prevent recurrence]

undefined

2. Use the Scientific Method

2. 运用科学方法

Observe: Gather symptoms, error messages, logs
Hypothesize: Generate 3-5 plausible explanations
Predict: What would you see if hypothesis is true?
Test: Design experiments to validate/invalidate
Analyze: Compare predictions vs actual results
Conclude: Confirm root cause with evidence

观察：收集症状、错误信息、日志
假设：生成3-5个合理的解释
预测：如果假设成立，你会看到什么？
测试：设计实验验证/推翻假设
分析：比较预测与实际结果
结论：用证据确认根因

3. Leverage Extended Thinking

3. 利用深度思考

When to activate extended thinking:

Complexity threshold: More than 3 interacting systems
Uncertainty high: Multiple equally plausible causes
Stakes high: Production outage, security issue, data loss
Pattern unclear: No obvious error messages or logs
Time-sensitive: Need systematic approach under pressure

何时激活深度思考：

复杂度阈值：涉及3个以上交互系统
不确定性高：多个看似合理的原因
风险高：生产环境故障、安全问题、数据丢失
模式不清晰：无明显错误信息或日志
时间紧迫：需要在压力下采用系统化方法

4. Avoid Common Pitfalls

4. 避免常见陷阱

markdown

AVOID:
- ❌ Changing multiple things at once (can't isolate cause)
- ❌ Assuming first hypothesis is correct (confirmation bias)
- ❌ Debugging without logs/evidence (guessing)
- ❌ Not documenting what you tried (repeating failed attempts)
- ❌ Skipping reproduction step (fix might not work)

DO:
- ✅ Change one variable at a time
- ✅ Test multiple hypotheses systematically
- ✅ Add instrumentation before debugging
- ✅ Keep investigation log
- ✅ Write regression test after fix

markdown

AVOID:
- ❌ 同时修改多个内容（无法定位原因）
- ❌ 假设第一个假设是正确的（确认偏差）
- ❌ 无日志/证据调试（猜测）
- ❌ 不记录尝试过的操作（重复失败的尝试）
- ❌ 跳过复现步骤（修复可能无效）

DO:
- ✅ 一次只修改一个变量
- ✅ 系统化测试多个假设
- ✅ 调试前添加监控
- ✅ 保留调查日志
- ✅ 修复后编写回归测试

5. Debugging Instrumentation Patterns

5. 调试监控模式

python

undefined

python

undefined

Python: Comprehensive debugging decorator

Python: 全面调试装饰器

import functools import time import logging

def debug_trace(func): """Decorator to trace function execution with timing and state""" @functools.wraps(func) def wrapper(*args, **kwargs): func_name = func.qualname logger.debug(f"→ Entering {func_name}") logger.debug(f" Args: {args}") logger.debug(f" Kwargs: {kwargs}")

    start = time.perf_counter()
    try:
        result = func(*args, **kwargs)
        duration = time.perf_counter() - start
        logger.debug(f"← Exiting {func_name} ({duration:.3f}s)")
        logger.debug(f"  Result: {result}")
        return result
    except Exception as e:
        duration = time.perf_counter() - start
        logger.error(f"✗ Exception in {func_name} ({duration:.3f}s): {e}")
        raise

return wrapper

import functools import time import logging

    start = time.perf_counter()
    try:
        result = func(*args, **kwargs)
        duration = time.perf_counter() - start
        logger.debug(f"← Exiting {func_name} ({duration:.3f}s)")
        logger.debug(f"  Result: {result}")
        return result
    except Exception as e:
        duration = time.perf_counter() - start
        logger.error(f"✗ Exception in {func_name} ({duration:.3f}s): {e}")
        raise

return wrapper

Usage

@debug_trace def complex_operation(user_id, data): # Your code here pass


```javascript
// JavaScript: Comprehensive debugging wrapper
function debugTrace(label) {
  return function(target, propertyKey, descriptor) {
    const originalMethod = descriptor.value;

    descriptor.value = async function(...args) {
      console.log(\`→ Entering \${label || propertyKey}\`);
      console.log(\`  Args:\`, args);

      const start = performance.now();
      try {
        const result = await originalMethod.apply(this, args);
        const duration = performance.now() - start;
        console.log(\`← Exiting \${label || propertyKey} (\${duration.toFixed(2)}ms)\`);
        console.log(\`  Result:\`, result);
        return result;
      } catch (error) {
        const duration = performance.now() - start;
        console.error(\`✗ Exception in \${label || propertyKey} (\${duration.toFixed(2)}ms):\`, error);
        throw error;
      }
    };

    return descriptor;
  };
}

// Usage
class UserService {
  @debugTrace('UserService.getUser')
  async getUser(userId) {
    // Your code here
  }
}

@debug_trace def complex_operation(user_id, data): # Your code here pass


```javascript
// JavaScript: 全面调试包装器
function debugTrace(label) {
  return function(target, propertyKey, descriptor) {
    const originalMethod = descriptor.value;

    descriptor.value = async function(...args) {
      console.log(`→ Entering ${label || propertyKey}`);
      console.log(`  Args:`, args);

      const start = performance.now();
      try {
        const result = await originalMethod.apply(this, args);
        const duration = performance.now() - start;
        console.log(`← Exiting ${label || propertyKey} (${duration.toFixed(2)}ms)`);
        console.log(`  Result:`, result);
        return result;
      } catch (error) {
        const duration = performance.now() - start;
        console.error(`✗ Exception in ${label || propertyKey} (${duration.toFixed(2)}ms):`, error);
        throw error;
      }
    };

    return descriptor;
  };
}

// Usage
class UserService {
  @debugTrace('UserService.getUser')
  async getUser(userId) {
    // Your code here
  }
}

Cross-References and Related Skills

交叉引用与相关技能

Related Skills

When to Combine Skills

何时组合技能

Scenario	Skills to Combine	Reasoning
Production outage	debugging + extended-thinking + kubernetes	Complex distributed system requires deep reasoning
Intermittent test failure	debugging + testing + complex-reasoning	Need systematic hypothesis testing
Performance regression	debugging + deep-analysis	Root cause may be architectural
Security vulnerability	debugging + extended-thinking + deep-analysis	Requires careful, thorough analysis
Memory leak	debugging + complex-reasoning	Multi-step investigation needed

场景	组合技能	理由
生产环境故障	debugging + extended-thinking + kubernetes	复杂分布式系统需要深度推理
间歇性测试失败	debugging + testing + complex-reasoning	需要系统化假设测试
性能回归	debugging + deep-analysis	根因可能是架构层面的问题
安全漏洞	debugging + extended-thinking + deep-analysis	需要仔细、全面的分析
内存泄漏	debugging + complex-reasoning	需要多步骤调查

Integration Examples

集成示例

Example 1: Complex Production Bug

示例1：复杂生产环境Bug

bash

undefined

bash

undefined

Prompt combining skills

Claude, I have a complex production bug affecting multiple services. Please use extended thinking and the debugging skill to help investigate.

Symptoms:

API requests timeout intermittently (1 in 50 requests)
Only affects authenticated users
Started after recent deployment
No obvious errors in logs

Please use:

MECE analysis to categorize possible causes
Hypothesis-driven debugging framework
Timeline reconstruction of recent changes

undefined

Claude, I have a complex production bug affecting multiple services. Please use extended thinking and the debugging skill to help investigate.

Symptoms:

API requests timeout intermittently (1 in 50 requests)
Only affects authenticated users
Started after recent deployment
No obvious errors in logs

Please use:

MECE analysis to categorize possible causes
Hypothesis-driven debugging framework
Timeline reconstruction of recent changes

undefined

Example 2: Memory Leak Investigation

示例2：内存泄漏调查

bash

undefined

bash

undefined

Prompt combining skills

Claude, use complex reasoning and debugging skills to investigate a memory leak.

Context:

Node.js service memory grows from 200MB to 2GB over 6 hours
No errors logged
Happens only in production, not staging

Apply:

Hypothesis-driven framework (generate 5 hypotheses)
Memory leak detection patterns (weak references)
Extended thinking for pattern recognition across codebase

undefined

Claude, use complex reasoning and debugging skills to investigate a memory leak.

Context:

Node.js service memory grows from 200MB to 2GB over 6 hours
No errors logged
Happens only in production, not staging

Apply:

Hypothesis-driven framework (generate 5 hypotheses)
Memory leak detection patterns (weak references)
Extended thinking for pattern recognition across codebase

undefined

Quick Reference Card

快速参考卡

Debugging Workflow Summary

调试工作流总结

1. OBSERVE
   - Collect error messages, logs, metrics
   - Identify patterns (frequency, conditions, scope)
   - Document symptoms

2. HYPOTHESIZE (use extended thinking if complex)
   - Generate 3-5 plausible hypotheses
   - Rank by likelihood
   - Design tests for each

3. TEST
   - Change one variable at a time
   - Add instrumentation (logging, tracing)
   - Collect evidence

4. ANALYZE
   - Compare predictions vs results
   - Eliminate invalidated hypotheses
   - Refine remaining hypotheses

5. FIX
   - Implement solution
   - Add regression test
   - Document root cause

6. VALIDATE
   - Verify fix in affected environment
   - Monitor metrics
   - Update documentation

1. OBSERVE
   - 收集错误信息、日志、指标
   - 识别模式（频率、条件、范围）
   - 记录症状

2. HYPOTHESIZE（复杂场景下使用深度思考）
   - 生成3-5个合理假设
   - 按可能性排序
   - 为每个假设设计测试

3. TEST
   - 一次只修改一个变量
   - 添加监控（日志、追踪）
   - 收集证据

4. ANALYZE
   - 比较预测与结果
   - 排除无效假设
   - 细化剩余假设

5. FIX
   - 实施解决方案
   - 添加回归测试
   - 记录根因

6. VALIDATE
   - 在受影响环境中验证修复
   - 监控指标
   - 更新文档

Tool Selection Guide

工具选择指南

Problem Type	Primary Tool	Secondary Tools
Logic error	pdb/debugger	Logging, unit tests
Performance	Profiler	Hypothesis testing, metrics
Memory leak	Memory profiler	Weak references, heap dumps
Async/timing	Timeline reconstruction	Extended thinking, logging
Distributed	Tracing (logs)	Kubernetes tools, MECE analysis
Unknown cause	Extended thinking	MECE, 5 Whys, hypothesis-driven

Skill version: 2.0 (Enhanced with extended thinking integration) Last updated: 2024-01-15 Maintained by: Golden Armada AI Agent Fleet

问题类型	主要工具	次要工具
逻辑错误	pdb/调试器	日志记录、单元测试
性能问题	性能分析器	假设测试、指标
内存泄漏	内存分析器	弱引用、堆转储
异步/时序问题	时间线重建	深度思考、日志
分布式系统问题	追踪（日志）	Kubernetes工具、MECE分析
根因未知	深度思考	MECE、5个为什么、基于假设的调试

技能版本：2.0（增强深度思考集成） 最后更新：2024-01-15 维护者：Golden Armada AI Agent Fleet