error-handling-patterns

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Error Handling Patterns

错误处理模式

Design error handling strategies that make failures explicit, recoverable, and debuggable. The central skill is matching error handling style to error semantics: not all errors are equal, and treating them equally produces systems that are equally bad at handling all of them.
设计可明确故障、支持恢复且便于调试的错误处理策略。核心技巧是将错误处理风格与错误语义匹配:并非所有错误都相同,同等对待所有错误会导致系统在处理各类错误时表现同样糟糕。

When to Use

适用场景

✅ Use for:
  • Choosing between exceptions, Result types, or error codes for a domain
  • Designing typed error hierarchies in TypeScript or Python
  • Implementing retry logic with backoff, jitter, and circuit breaking
  • Building React error boundaries and graceful degradation
  • Structuring error information for both users and developers
  • Python exception chaining and
    __cause__
    /
    __context__
    semantics
❌ NOT for:
  • Debugging a specific runtime error (use debugger or domain skill)
  • Logging pipeline infrastructure (use observability skill)
  • APM/monitoring configuration (use site-reliability-engineer skill)
  • Writing tests for error paths (use vitest-testing-patterns skill)

✅ 适用于:
  • 为某个领域选择异常、Result类型或错误码
  • 在TypeScript或Python中设计类型化异常层次结构
  • 实现带退避、抖动和熔断机制的重试逻辑
  • 构建React错误边界与优雅降级方案
  • 为用户和开发者结构化错误信息
  • Python异常链与
    __cause__
    /
    __context__
    语义
❌ 不适用于:
  • 调试特定运行时错误(使用调试器或领域技能)
  • 日志管道基础设施搭建(使用可观测性技能)
  • APM/监控配置(使用站点可靠性工程师技能)
  • 为错误路径编写测试(使用vitest-testing-patterns技能)

Core Decision: Exception vs Result Type vs Error Code

核心决策:异常 vs Result类型 vs 错误码

mermaid
flowchart TD
    Q1{Is this a programming error\nor contract violation?} -->|Yes| EX[Throw exception\nlet it crash]
    Q1 -->|No| Q2{Is the error part of\nnormal control flow?}
    Q2 -->|Yes| Q3{What is the call site context?}
    Q2 -->|No| Q4{Do callers need to\ndistinguish error types?}
    Q3 -->|Functional / monad-friendly| RT[Result or Either type]
    Q3 -->|Simple script or CLI| EC[Error code + message]
    Q4 -->|Yes| EH[Typed exception hierarchy]
    Q4 -->|No| GE[Generic exception\nwith structured message]
    EX --> NOTE1[Never catch at boundary —\nlet process restart]
    RT --> NOTE2[Compose with map/flatMap;\ncheck references/error-hierarchy-examples.md]
    EH --> NOTE3[See hierarchy design rules below]
Rules of thumb:
  • Library code: prefer Result types — never force callers to handle your exceptions
  • Application code: typed exception hierarchies work well; errors are exceptional
  • CLI / scripts: error codes are fine; the user is the error boundary
  • Async workers: Result types or structured error objects with retry metadata

mermaid
flowchart TD
    Q1{Is this a programming error\nor contract violation?} -->|Yes| EX[Throw exception\nlet it crash]
    Q1 -->|No| Q2{Is the error part of\nnormal control flow?}
    Q2 -->|Yes| Q3{What is the call site context?}
    Q2 -->|No| Q4{Do callers need to\ndistinguish error types?}
    Q3 -->|Functional / monad-friendly| RT[Result or Either type]
    Q3 -->|Simple script or CLI| EC[Error code + message]
    Q4 -->|Yes| EH[Typed exception hierarchy]
    Q4 -->|No| GE[Generic exception\nwith structured message]
    EX --> NOTE1[Never catch at boundary —\nlet process restart]
    RT --> NOTE2[Compose with map/flatMap;\ncheck references/error-hierarchy-examples.md]
    EH --> NOTE3[See hierarchy design rules below]
经验法则:
  • 库代码:优先使用Result类型——绝不强制调用方处理你的异常
  • 应用代码:类型化异常层次结构效果良好;错误应属于异常情况
  • CLI/脚本:错误码即可;用户就是错误边界
  • 异步工作器:Result类型或带重试元数据的结构化错误对象

Error Classification

错误分类

Classify every error along two axes before deciding how to handle it:
Transient (retry may succeed)Permanent (retry won't help)
User-actionableRate limit, quota exceededInvalid input, unauthorized
System-actionableNetwork timeout, DB connectionData corruption, schema mismatch
This classification determines:
  • Whether to retry (transient only)
  • What to show the user (user-actionable → message; system → generic error + tracking ID)
  • Whether to alert on-call (system permanent → page; transient spikes → alert)

在决定处理方式前,沿两个维度对每个错误进行分类:
临时错误(重试可能成功)永久错误(重试无效)
用户可操作速率限制、配额耗尽无效输入、未授权
系统可操作网络超时、数据库连接失败数据损坏、 schema不匹配
该分类决定:
  • 是否重试(仅临时错误)
  • 向用户展示内容(用户可操作→提示信息;系统错误→通用错误+跟踪ID)
  • 是否触发值班告警(系统永久错误→紧急通知;临时错误峰值→告警)

Should This Error Be Retried?

该错误是否应重试?

mermaid
flowchart TD
    E[Error occurs] --> C1{Is error transient?\nTimeout, 429, 503, connection reset}
    C1 -->|No| FAIL[Fail immediately\nReturn error to caller]
    C1 -->|Yes| C2{Have we exceeded\nmax retry attempts?}
    C2 -->|Yes| DLQ[Send to dead letter queue\nor return final failure]
    C2 -->|No| C3{Is circuit breaker OPEN?}
    C3 -->|Yes| CB[Return circuit-open error\nDo not attempt request]
    C3 -->|No| WAIT[Wait: exponential backoff\n+ full jitter]
    WAIT --> RETRY[Retry request]
    RETRY --> C1
    CB --> PROBE{After timeout:\nsend probe request}
    PROBE -->|Success| CLOSE[Close circuit\nResume normal traffic]
    PROBE -->|Fail| CB
Consult
references/retry-patterns.md
for backoff formulas, jitter strategies, and circuit breaker implementation.

mermaid
flowchart TD
    E[Error occurs] --> C1{Is error transient?\nTimeout, 429, 503, connection reset}
    C1 -->|No| FAIL[Fail immediately\nReturn error to caller]
    C1 -->|Yes| C2{Have we exceeded\nmax retry attempts?}
    C2 -->|Yes| DLQ[Send to dead letter queue\nor return final failure]
    C2 -->|No| C3{Is circuit breaker OPEN?}
    C3 -->|Yes| CB[Return circuit-open error\nDo not attempt request]
    C3 -->|No| WAIT[Wait: exponential backoff\n+ full jitter]
    WAIT --> RETRY[Retry request]
    RETRY --> C1
    CB --> PROBE{After timeout:\nsend probe request}
    PROBE -->|Success| CLOSE[Close circuit\nResume normal traffic]
    PROBE -->|Fail| CB
如需退避公式、抖动策略和熔断机制实现细节,请参考
references/retry-patterns.md

TypeScript: Error Hierarchy Design

TypeScript:错误层次结构设计

typescript
// Base application error — all domain errors extend this
class AppError extends Error {
  readonly code: string;
  readonly statusCode: number;
  readonly isOperational: boolean; // false = programmer error, crash process

  constructor(message: string, code: string, statusCode: number, isOperational = true) {
    super(message);
    this.name = this.constructor.name;
    this.code = code;
    this.statusCode = statusCode;
    this.isOperational = isOperational;
    Error.captureStackTrace(this, this.constructor);
  }
}

// Domain-specific errors
class ValidationError extends AppError {
  readonly fields: Record<string, string[]>;
  constructor(fields: Record<string, string[]>) {
    super('Validation failed', 'VALIDATION_ERROR', 422);
    this.fields = fields;
  }
}

class NotFoundError extends AppError {
  constructor(resource: string, id: string) {
    super(`${resource} ${id} not found`, 'NOT_FOUND', 404);
  }
}

class RateLimitError extends AppError {
  readonly retryAfterMs: number;
  constructor(retryAfterMs: number) {
    super('Rate limit exceeded', 'RATE_LIMIT', 429);
    this.retryAfterMs = retryAfterMs;
  }
}
Consult
references/error-hierarchy-examples.md
for Python equivalents, Result type implementations, and full hierarchy patterns.

typescript
// Base application error — all domain errors extend this
class AppError extends Error {
  readonly code: string;
  readonly statusCode: number;
  readonly isOperational: boolean; // false = programmer error, crash process

  constructor(message: string, code: string, statusCode: number, isOperational = true) {
    super(message);
    this.name = this.constructor.name;
    this.code = code;
    this.statusCode = statusCode;
    this.isOperational = isOperational;
    Error.captureStackTrace(this, this.constructor);
  }
}

// Domain-specific errors
class ValidationError extends AppError {
  readonly fields: Record<string, string[]>;
  constructor(fields: Record<string, string[]>) {
    super('Validation failed', 'VALIDATION_ERROR', 422);
    this.fields = fields;
  }
}

class NotFoundError extends AppError {
  constructor(resource: string, id: string) {
    super(`${resource} ${id} not found`, 'NOT_FOUND', 404);
  }
}

class RateLimitError extends AppError {
  readonly retryAfterMs: number;
  constructor(retryAfterMs: number) {
    super('Rate limit exceeded', 'RATE_LIMIT', 429);
    this.retryAfterMs = retryAfterMs;
  }
}
如需Python等效实现、Result类型示例和完整层次结构模式,请参考
references/error-hierarchy-examples.md

Result Type Pattern (TypeScript)

Result类型模式(TypeScript)

When errors are expected outcomes of operations (parsing, API calls, DB queries), use Result instead of throw:
typescript
type Result<T, E = AppError> =
  | { ok: true; value: T }
  | { ok: false; error: E };

// Helpers
const ok = <T>(value: T): Result<T, never> => ({ ok: true, value });
const err = <E>(error: E): Result<never, E> => ({ ok: false, error });

// Usage — caller is forced to handle both cases
async function fetchUser(id: string): Promise<Result<User, NotFoundError | NetworkError>> {
  try {
    const user = await db.users.findById(id);
    if (!user) return err(new NotFoundError('User', id));
    return ok(user);
  } catch (e) {
    return err(new NetworkError('DB unavailable', { cause: e }));
  }
}

// At call site — no silent failures
const result = await fetchUser(userId);
if (!result.ok) {
  if (result.error instanceof NotFoundError) return res.status(404).json(...);
  return res.status(500).json(...);
}
const user = result.value; // typed, safe

当错误是操作的预期结果(解析、API调用、数据库查询)时,使用Result而非throw:
typescript
type Result<T, E = AppError> =
  | { ok: true; value: T }
  | { ok: false; error: E };

// Helpers
const ok = <T>(value: T): Result<T, never> => ({ ok: true, value });
const err = <E>(error: E): Result<never, E> => ({ ok: false, error });

// Usage — caller is forced to handle both cases
async function fetchUser(id: string): Promise<Result<User, NotFoundError | NetworkError>> {
  try {
    const user = await db.users.findById(id);
    if (!user) return err(new NotFoundError('User', id));
    return ok(user);
  } catch (e) {
    return err(new NetworkError('DB unavailable', { cause: e }));
  }
}

// At call site — no silent failures
const result = await fetchUser(userId);
if (!result.ok) {
  if (result.error instanceof NotFoundError) return res.status(404).json(...);
  return res.status(500).json(...);
}
const user = result.value; // typed, safe

React Error Boundaries

React错误边界

Error boundaries catch render-time exceptions. They do NOT catch async errors (fetch failures, setTimeout, event handlers).
typescript
class RouteErrorBoundary extends React.Component<Props, State> {
  static getDerivedStateFromError(error: Error): State {
    return { hasError: true, error };
  }

  componentDidCatch(error: Error, info: React.ErrorInfo) {
    // Log to error tracking, not console.error in production
    logger.error('Render error', { error, componentStack: info.componentStack });
  }

  render() {
    if (this.state.hasError) {
      return <ErrorFallback error={this.state.error} onRetry={this.reset} />;
    }
    return this.props.children;
  }
}
Place boundaries at route level (one per page) and around isolated expensive subtrees (charts, rich editors). Do not wrap every component — too granular breaks the benefit.

错误边界可捕获渲染时异常,但无法捕获异步错误(fetch失败、setTimeout、事件处理器)。
typescript
class RouteErrorBoundary extends React.Component<Props, State> {
  static getDerivedStateFromError(error: Error): State {
    return { hasError: true, error };
  }

  componentDidCatch(error: Error, info: React.ErrorInfo) {
    // Log to error tracking, not console.error in production
    logger.error('Render error', { error, componentStack: info.componentStack });
  }

  render() {
    if (this.state.hasError) {
      return <ErrorFallback error={this.state.error} onRetry={this.reset} />;
    }
    return this.props.children;
  }
}
将边界放置在路由级别(每页一个),并围绕独立的高开销子树(图表、富编辑器)。不要包裹每个组件——粒度太细会失去优势。

Python: Exception Chaining

Python:异常链

Python's
raise X from Y
syntax preserves causal chains — use it always when re-raising:
python
class AppError(Exception):
    """Base error. All domain errors subclass this."""
    def __init__(self, message: str, code: str, status: int = 500):
        super().__init__(message)
        self.code = code
        self.status = status

class DatabaseError(AppError):
    def __init__(self, operation: str, cause: Exception):
        super().__init__(f"DB error during {operation}", "DB_ERROR", 503)
        self.__cause__ = cause  # explicit chain
Python的
raise X from Y
语法可保留因果链——重新抛出异常时务必使用该语法:
python
class AppError(Exception):
    """Base error. All domain errors subclass this."""
    def __init__(self, message: str, code: str, status: int = 500):
        super().__init__(message)
        self.code = code
        self.status = status

class DatabaseError(AppError):
    def __init__(self, operation: str, cause: Exception):
        super().__init__(f"DB error during {operation}", "DB_ERROR", 503)
        self.__cause__ = cause  # explicit chain

In application code

In application code

try: result = db.execute(query) except psycopg2.OperationalError as e: raise DatabaseError("user_fetch", e) from e # preserves full traceback

---
try: result = db.execute(query) except psycopg2.OperationalError as e: raise DatabaseError("user_fetch", e) from e # preserves full traceback

---

Structured Error Logging

结构化错误日志

Log errors with enough context to diagnose without reading code:
typescript
// Good: structured, queryable, developer-oriented
logger.error('Payment processing failed', {
  error: {
    code: error.code,
    message: error.message,
    stack: error.stack,
  },
  context: {
    userId,
    orderId,
    amount,
    paymentProvider,
    attempt: retryCount,
  },
  correlation: { requestId, traceId },
});

// Then surface a sanitized message to the user
// NEVER leak error.message to users — it may contain internals
return res.status(500).json({
  error: 'Payment could not be processed. Please try again.',
  errorId: requestId, // so support can look it up
});

记录错误时需包含足够上下文,无需阅读代码即可诊断问题:
typescript
// Good: structured, queryable, developer-oriented
logger.error('Payment processing failed', {
  error: {
    code: error.code,
    message: error.message,
    stack: error.stack,
  },
  context: {
    userId,
    orderId,
    amount,
    paymentProvider,
    attempt: retryCount,
  },
  correlation: { requestId, traceId },
});

// Then surface a sanitized message to the user
// NEVER leak error.message to users — it may contain internals
return res.status(500).json({
  error: 'Payment could not be processed. Please try again.',
  errorId: requestId, // so support can look it up
});

Anti-Patterns

反模式

Anti-Pattern: Pokemon Exception Handling

反模式:Pokemon异常处理

Novice: "Wrap everything in
try/catch
and log the error. At least it won't crash."
Expert: Catching all exceptions unconditionally ("gotta catch 'em all") hides programmer errors, masks resource leaks, and converts loud failures into silent corruption. The system appears healthy while data is being silently dropped.
typescript
// Wrong — swallows everything including programming errors
try {
  await processOrder(order);
} catch (e) {
  console.error('something went wrong', e); // lost forever
}

// Right — catch only what you can handle, let the rest propagate
try {
  await processOrder(order);
} catch (e) {
  if (e instanceof RateLimitError) {
    await queue.requeue(order, { delay: e.retryAfterMs });
    return;
  }
  // programming errors, unexpected DB errors — let them crash
  throw e;
}
Detection:
catch (e) { }
,
catch (e) { log(e) }
with no rethrow,
except Exception as e: pass
in Python. Any catch block with no condition and no rethrow.
Timeline: This has always been wrong. Renewed urgency in async/await era (2017+) because swallowed promise rejections are even harder to detect than swallowed sync exceptions.

新手做法:“将所有代码包裹在
try/catch
中并记录错误。至少不会崩溃。”
专家观点:无条件捕获所有异常(“要抓就抓全”)会隐藏编程错误、掩盖资源泄漏,并将明显的故障转化为静默损坏。系统看似正常运行,实则数据正在被悄悄丢弃。
typescript
// Wrong — swallows everything including programming errors
try {
  await processOrder(order);
} catch (e) {
  console.error('something went wrong', e); // lost forever
}

// Right — catch only what you can handle, let the rest propagate
try {
  await processOrder(order);
} catch (e) {
  if (e instanceof RateLimitError) {
    await queue.requeue(order, { delay: e.retryAfterMs });
    return;
  }
  // programming errors, unexpected DB errors — let them crash
  throw e;
}
检测方式
catch (e) { }
、无重新抛出的
catch (e) { log(e) }
、Python中的
except Exception as e: pass
。任何无条件且无重新抛出的捕获块。
时间线:这种做法一直是错误的。在async/await时代(2017年以后)问题更突出,因为被吞掉的Promise拒绝比同步异常更难检测。

Anti-Pattern: Stringly-Typed Errors

反模式:字符串类型错误

Novice: "I'll put the error type in the message string:
throw new Error('NOT_FOUND: User 123')
"
Expert: String-based error types force callers to parse strings, break under refactoring, provide no IDE support, and make exhaustive matching impossible. Callers pattern-match on strings that drift as the codebase evolves.
typescript
// Wrong — caller must parse strings, breaks silently on rename
throw new Error(`RATE_LIMIT: retry after ${ms}ms`);
// Caller: if (error.message.startsWith('RATE_LIMIT')) { ... }

// Right — typed, refactor-safe, IDE-navigable
throw new RateLimitError(ms);
// Caller: if (error instanceof RateLimitError) { ... error.retryAfterMs ... }
Python equivalent:
python
undefined
新手做法:“我把错误类型放在消息字符串里:
throw new Error('NOT_FOUND: User 123')
专家观点:基于字符串的错误类型迫使调用方解析字符串,重构时会失效,无IDE支持,且无法进行穷尽匹配。调用方会根据随代码库演变而变化的字符串进行模式匹配。
typescript
// Wrong — caller must parse strings, breaks silently on rename
throw new Error(`RATE_LIMIT: retry after ${ms}ms`);
// Caller: if (error.message.startsWith('RATE_LIMIT')) { ... }

// Right — typed, refactor-safe, IDE-navigable
throw new RateLimitError(ms);
// Caller: if (error instanceof RateLimitError) { ... error.retryAfterMs ... }
Python等效示例:
python
undefined

Wrong

Wrong

raise Exception(f"rate_limit:{retry_after}")
raise Exception(f"rate_limit:{retry_after}")

Right

Right

raise RateLimitError(retry_after_ms=retry_after)

**LLM mistake**: LLMs trained on StackOverflow examples frequently generate stringly-typed errors because SO answers prioritize brevity over correctness. Error codes as strings look concise in tutorials.

**Detection**: `instanceof Error` checks everywhere, string `.startsWith()` or `.includes()` in catch blocks, error codes stored in `message` field rather than a dedicated property.

---
raise RateLimitError(retry_after_ms=retry_after)

**LLM常见错误**:基于StackOverflow示例训练的LLM经常生成字符串类型错误,因为SO答案优先考虑简洁性而非正确性。教程中作为字符串的错误码看起来更简洁。

**检测方式**:到处使用`instanceof Error`检查、捕获块中使用字符串`.startsWith()`或`.includes()`、错误码存储在`message`字段而非专用属性中。

---

References

参考资料

  • references/retry-patterns.md
    — Consult when implementing retry logic: exponential backoff formulas, full vs equal jitter, circuit breaker state machine, dead letter queues
  • references/error-hierarchy-examples.md
    — Consult for complete TypeScript and Python typed error class examples, Result monad implementations, and error boundary patterns
  • references/retry-patterns.md
    — 实现重试逻辑时参考:指数退避公式、完全抖动与均等抖动、熔断状态机、死信队列
  • references/error-hierarchy-examples.md
    — 完整TypeScript和Python类型化错误类示例、Result monad实现、错误边界模式参考