cc-defensive-programming

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Skill: cc-defensive-programming

技能:cc-defensive-programming

STOP - Never Skip

紧急提醒 - 绝不能跳过

CheckWhy Critical
No executable code in assertionsCode disappears in production builds
No empty catch blocksSilently swallows bugs that compound
External input validatedSecurity vulnerabilities, data corruption

检查项关键原因
断言中不包含可执行代码生产构建中代码会被移除
禁止空catch块会静默掩盖bug,导致问题恶化
外部输入必须验证防止安全漏洞与数据损坏

CRISIS TRIAGE (2 minutes)

危机排查(2分钟)

Production down? Use this prioritized subset:
生产环境宕机?使用以下优先级排查子集:

Immediate (30 seconds each)

立即排查(每项30秒)

  1. Is external input validated at entry point? If no → add validation NOW
  2. Any empty catch blocks hiding the real error? If yes → add logging, find root cause
  3. Any assertions with side effects? If yes → extract to separate statement
  1. 入口处是否验证了外部输入? 若未验证 → 立即添加验证
  2. 是否存在空catch块掩盖真实错误? 若存在 → 添加日志,定位根因
  3. 是否存在带有副作用的断言? 若存在 → 将逻辑提取为独立语句

Before Deploying Fix (60 seconds)

部署修复前检查(60秒)

  1. Does fix match architectural error strategy? (return code vs exception vs shut down)
  2. Are you catching at right abstraction level? (not
    EOFException
    from
    GetEmployee()
    )
Why triage works: These 5 items catch 80% of defensive programming bugs. Full checklist (21 items) is for non-emergency review.
Cutting corners in a crisis creates the NEXT crisis. The "quick fix" empty catch block you add today becomes tomorrow's 3 AM page.

  1. 修复方案是否符合架构错误策略?(返回码 vs 异常 vs 停机)
  2. 是否在正确的抽象层级捕获异常?(例如:不要从
    GetEmployee()
    抛出
    EOFException
排查逻辑有效性: 这5项检查可覆盖80%的防御性编程bug。完整的21项检查清单适用于非紧急场景的评审。
危机中偷工减料会引发下一次危机。 你今天添加的“快速修复”空catch块,会变成明天凌晨3点的故障告警。

Key Definitions

核心定义

External Input

外部输入

Any data not provably controlled by current code path:
  • User input (keyboard, forms, CLI args)
  • Files (config, data, uploads)
  • Network (APIs, databases, inter-service calls)
  • Environment variables, system properties
  • Data from ANY other service, even internal ones
"Internal team API" is still external. If it crosses a network boundary or process boundary, validate it.
任何无法被当前代码路径完全控制的数据:
  • 用户输入(键盘、表单、CLI参数)
  • 文件(配置、数据、上传文件)
  • 网络数据(API、数据库、服务间调用)
  • 环境变量、系统属性
  • 来自任何其他服务的数据,即使是内部服务
“内部团队API”仍属于外部输入。 只要跨网络或进程边界,就必须验证。

Assertion

Assertion

Code used during development that allows a program to check itself as it runs. When true = operating as expected. When false = detected an unexpected error (bug). Use for conditions that should never occur.
开发阶段使用的代码,用于在程序运行时自检。当条件为真时,表示程序运行符合预期;当条件为假时,表示检测到意外错误(bug)。适用于那些绝对不应该发生的场景。

Barricade

Barricade

A damage-containment strategy. Interfaces designated as boundaries to "safe" areas. Data crossing these boundaries is checked for validity.
Limitation: Barricades reduce redundant validation but do NOT replace defense-in-depth for security-critical operations. If barricade validation has a bug, what happens?
一种损害控制策略。将接口设为“安全区域”的边界,所有跨边界的数据都需验证有效性。
注意事项: 防护边界可减少重复验证,但对于安全关键操作,不能替代纵深防御。若防护边界的验证存在bug,后果不堪设想。

Correctness vs Robustness

正确性 vs 健壮性

  • Correctness: Never returning an inaccurate result; no result is better than wrong result (safety-critical)
  • Robustness: Always trying to keep software operating, even if results are sometimes inaccurate (consumer apps)
  • 正确性: 绝不返回不准确的结果;无结果优于错误结果(适用于安全关键场景)
  • 健壮性: 始终尝试保持软件运行,即使结果偶尔不准确(适用于消费类应用)

Preconditions / Postconditions

前置条件 / 后置条件

  • Preconditions: Properties client code promises are true BEFORE calling routine
  • Postconditions: Properties routine promises are true AFTER executing
  • [XREF: Meyer 1997, "Design by Contract"]
  • 前置条件: 客户端代码在调用函数前必须满足的属性
  • 后置条件: 函数执行完成后必须满足的属性
  • [参考:Meyer 1997,《契约式设计》]

When NOT to Use

不适用于以下场景

  • Pure functions without side effects - Immutable data flows don't need defensive mutation protection (but still validate inputs at system boundaries)
  • Prototype/spike code - Time-boxed exploration (max 1 week) before committing to error strategy
  • Test code - Test doubles intentionally violate production patterns
  • Performance-critical inner loops - Where assertion overhead matters (must show profiling data proving >5% overhead)
SECURITY EXCEPTION: Security-critical code (authentication, authorization, cryptographic operations, PII handling) is NEVER exempt from defensive programming regardless of other factors. When in doubt, validate.
  • 无副作用的纯函数 - 不可变数据流无需防御性突变保护(但仍需在系统边界验证输入)
  • 原型/探索代码 - 限时探索(最长1周),尚未确定错误处理策略
  • 测试代码 - 测试替身会故意违反生产环境模式
  • 性能关键的内部循环 - 断言开销会影响性能(需提供性能分析数据证明开销超过5%)
安全例外: 安全关键代码(认证、授权、加密操作、PII处理)无论任何情况,都必须遵循防御性编程原则,无豁免权。存疑时,务必验证。

Crisis Invariants - NEVER SKIP

危机不变准则 - 绝不能跳过

These checks are almost always required. Exceptions need explicit justification and documentation:
CheckTimeWhy Critical
No executable code in assertions15 secCode disappears in production builds
No empty catch blocks15 secSilently swallows bugs that compound
External input validated30 secSecurity vulnerabilities, data corruption
Assertions for bugs only15 secAssertions disabled in production; anticipated errors need handling
Why these four? Violations create silent failures that are nearly impossible to debug later. They don't crash loudly - they corrupt data and hide bugs.
Rationalizing "I'll add proper error handling later"? You likely won't. Error handling added later is often incomplete because edge cases are forgotten. However, if you genuinely must defer, create a tracked ticket with specific scope.

以下检查几乎总是必需的。若需豁免,必须有明确理由并记录:
检查项耗时关键原因
断言中不包含可执行代码15秒生产构建中代码会被移除
禁止空catch块15秒会静默掩盖bug,导致问题恶化
外部输入必须验证30秒防止安全漏洞与数据损坏
断言仅用于检测bug15秒断言在生产环境中会被禁用;可预见的错误需用错误处理
为何是这四项? 违反这些准则会导致静默故障,后续几乎无法调试。它们不会直接崩溃,而是悄悄损坏数据、掩盖bug。
是否在想“以后再添加完善的错误处理”? 你很可能不会这么做。后期添加的错误处理通常不完整,因为会遗忘边缘场景。若确实需要延迟,需创建带具体范围的跟踪工单。

Pattern Reuse Gate

模式复用检查

BEFORE implementing any error handling, search the codebase:
Search ForWhy
Same error type elsewhereHow is it handled? Log? Throw? Return code?
Same module's error handlingWhat's the established pattern here?
Barricade/validation patternsWhere are the trust boundaries?
Exception hierarchyWhat custom exceptions exist?
Questions to answer:
  1. How does this codebase handle this type of error elsewhere?
  2. Is there an established error-handling strategy (exceptions vs return codes vs Result types)?
  3. What logging/monitoring patterns are used?
  4. Are there existing custom exception classes I should use?
If pattern found: Follow it. Consistency in error handling is critical for debugging.
If no pattern found: You're establishing one. Document your decision. Consider if this should become the pattern.
See: pattern-reuse-gate.md for full gate protocol.

在实现任何错误处理前,先搜索代码库:
搜索内容原因
相同错误类型的处理方式其他地方是如何处理的?日志?抛出?返回码?
同一模块的错误处理模式该模块已有的既定模式是什么?
防护边界/验证模式信任边界在哪里?
异常层级结构存在哪些自定义异常?
需回答的问题:
  1. 代码库中同类错误的处理方式是什么?
  2. 是否有既定的错误处理策略(异常 vs 返回码 vs Result类型)?
  3. 使用了哪些日志/监控模式?
  4. 是否存在可复用的自定义异常类?
若找到模式: 遵循该模式。错误处理的一致性对调试至关重要。
若未找到模式: 你正在建立新的模式。请记录你的决策,并考虑是否将其设为标准模式。
参考: pattern-reuse-gate.md 查看完整的检查流程。

Modes

模式

CHECKER

检查模式(CHECKER)

Purpose: Execute checklists for defensive programming, assertions, exceptions, and error handling Triggers:
  • "review my error handling"
  • "check for defensive programming"
  • "review assertions"
  • "check exception handling"
  • "audit my validation code" Non-Triggers:
  • "design my error handling strategy" -> APPLIER
  • "review my control flow" -> cc-control-flow-quality
  • "review my class design" -> cc-routine-and-class-design Checklist: See checklists.md Output Format: | Item | Status | Evidence | Location | |------|--------|----------|----------| Severity:
  • VIOLATION: Fails checklist item (missing validation, empty catch blocks, assertions with side effects)
  • WARNING: Partial compliance (inconsistent error handling, missing documentation)
  • PASS: Meets requirement
用途:执行防御性编程、断言、异常和错误处理的检查清单 触发指令:
  • "评审我的错误处理"
  • "检查防御性编程"
  • "评审断言"
  • "检查异常处理"
  • "审计我的验证代码" 非触发指令:
  • "设计我的错误处理策略" → 应用模式
  • "评审我的控制流" → cc-control-flow-quality
  • "评审我的类设计" → cc-routine-and-class-design 检查清单:查看 checklists.md 输出格式: | 检查项 | 状态 | 依据 | 位置 | |------|--------|----------|----------| 严重程度:
  • 违规(VIOLATION):未通过检查项(缺失验证、空catch块、带有副作用的断言)
  • 警告(WARNING):部分合规(错误处理不一致、缺失文档)
  • 通过(PASS):符合要求

APPLIER

应用模式(APPLIER)

Purpose: Apply defensive programming techniques when implementing error handling and validation Triggers:
  • "how should I handle this error"
  • "should I use assertion or error handling"
  • "help me design barricades"
  • "set up defensive programming"
  • "implement input validation" Non-Triggers:
  • "review my existing error handling" -> CHECKER
  • "optimize my error handling performance" -> performance skill Produces:
  • Assertion placement recommendations
  • Error-handling strategy decisions
  • Barricade architecture designs
  • Input validation implementations Constraints:
  • Use assertions for bugs that should never occur (p.191)
  • Use error handling for anticipated error conditions (p.194)
  • Design barricades at external interfaces (p.203)
  • Favor development debugging aids over production constraints (p.205)
用途:在实现错误处理和验证时,应用防御性编程技术 触发指令:
  • "我该如何处理这个错误"
  • "我应该用断言还是错误处理"
  • "帮我设计防护边界"
  • "设置防御性编程"
  • "实现输入验证" 非触发指令:
  • "评审我现有的错误处理" → 检查模式
  • "优化我的错误处理性能" → 性能相关技能 输出内容:
  • 断言位置建议
  • 错误处理策略决策
  • 防护边界架构设计
  • 输入验证实现方案 约束条件:
  • 断言用于检测绝对不应发生的bug(第191页)
  • 错误处理用于处理可预见的错误场景(第194页)
  • 在外部接口设计防护边界(第203页)
  • 优先选择开发阶段的调试工具,而非生产环境的限制(第205页)

Decision Flowcharts

决策流程图

Assertion vs Error Handling

Assertion vs 错误处理

dot
digraph assertion_vs_error {
    rankdir=TB;

    START [label="Handling a potentially\nbad condition" shape=doublecircle];
    never [label="Should this NEVER happen?\n(programmer bug)" shape=diamond];
    assertion [label="Use ASSERTION" shape=box style=filled fillcolor=lightblue];
    anticipated [label="Is it anticipated\nbad input?" shape=diamond];
    errorhandling [label="Use ERROR HANDLING" shape=box style=filled fillcolor=lightgreen];
    robust [label="Building highly\nrobust system?" shape=diamond];
    both [label="Use BOTH\n(assert + handle)" shape=box style=filled fillcolor=lightyellow];
    reconsider [label="Clarify requirements:\nIs it a bug or expected?\nAsk domain expert." shape=box style=filled fillcolor=lightcoral];

    START -> never;
    never -> robust [label="yes"];
    never -> anticipated [label="no"];
    robust -> both [label="yes"];
    robust -> assertion [label="no"];
    anticipated -> errorhandling [label="yes"];
    anticipated -> reconsider [label="no/unsure"];
}
dot
digraph assertion_vs_error {
    rankdir=TB;

    START [label="Handling a potentially\nbad condition" shape=doublecircle];
    never [label="Should this NEVER happen?\n(programmer bug)" shape=diamond];
    assertion [label="Use ASSERTION" shape=box style=filled fillcolor=lightblue];
    anticipated [label="Is it anticipated\nbad input?" shape=diamond];
    errorhandling [label="Use ERROR HANDLING" shape=box style=filled fillcolor=lightgreen];
    robust [label="Building highly\nrobust system?" shape=diamond];
    both [label="Use BOTH\n(assert + handle)" shape=box style=filled fillcolor=lightyellow];
    reconsider [label="Clarify requirements:\nIs it a bug or expected?\nAsk domain expert." shape=box style=filled fillcolor=lightcoral];

    START -> never;
    never -> robust [label="yes"];
    never -> anticipated [label="no"];
    robust -> both [label="yes"];
    robust -> assertion [label="no"];
    anticipated -> errorhandling [label="yes"];
    anticipated -> reconsider [label="no/unsure"];
}

Correctness vs Robustness

正确性 vs 健壮性

dot
digraph correctness_vs_robustness {
    rankdir=TB;

    START [label="Choose error\nhandling philosophy" shape=doublecircle];
    safety [label="Safety-critical?\n(medical, aviation, nuclear)" shape=diamond];
    correctness [label="Favor CORRECTNESS\nShut down > wrong result" shape=box style=filled fillcolor=lightcoral];
    consumer [label="Consumer app?\n(games, word processors)" shape=diamond];
    robustness [label="Favor ROBUSTNESS\nKeep running > perfect" shape=box style=filled fillcolor=lightgreen];
    analyze [label="ANALYZE DOMAIN\n(see guidance below)" shape=box style=filled fillcolor=lightyellow];

    START -> safety;
    safety -> correctness [label="yes"];
    safety -> consumer [label="no"];
    consumer -> robustness [label="yes"];
    consumer -> analyze [label="no"];
}
Domain Analysis Guidance (for "Analyze domain" path):
Domain TypeLean TowardKey Question
Enterprise/B2BCorrectness"Would wrong data cause business decisions based on false info?"
SaaS platformsBalanced"What's the blast radius of a wrong answer vs unavailability?"
Internal toolsRobustness"Is user technical enough to recover from a crash?"
Data pipelinesCorrectness"Does downstream processing assume data integrity?"
Real-time systemsContext-dependent"Is stale data better or worse than no data?"
dot
digraph correctness_vs_robustness {
    rankdir=TB;

    START [label="Choose error\nhandling philosophy" shape=doublecircle];
    safety [label="Safety-critical?\n(medical, aviation, nuclear)" shape=diamond];
    correctness [label="Favor CORRECTNESS\nShut down > wrong result" shape=box style=filled fillcolor=lightcoral];
    consumer [label="Consumer app?\n(games, word processors)" shape=diamond];
    robustness [label="Favor ROBUSTNESS\nKeep running > perfect" shape=box style=filled fillcolor=lightgreen];
    analyze [label="ANALYZE DOMAIN\n(see guidance below)" shape=box style=filled fillcolor=lightyellow];

    START -> safety;
    safety -> correctness [label="yes"];
    safety -> consumer [label="no"];
    consumer -> robustness [label="yes"];
    consumer -> analyze [label="no"];
}
领域分析指南(对应“分析领域”分支):
领域类型倾向选择核心问题
企业/B2B正确性"错误数据是否会导致基于虚假信息的业务决策?"
SaaS平台平衡策略"错误结果与服务不可用的影响范围哪个更大?"
内部工具健壮性"用户是否具备足够技术能力从崩溃中恢复?"
数据管道正确性"下游处理是否依赖数据的完整性?"
实时系统依赖场景"过期数据比无数据更好还是更差?"

Keep Debug Code in Production?

是否在生产环境保留调试代码?

dot
digraph debug_code_production {
    rankdir=TB;

    START [label="Should this debug\ncode stay in production?" shape=doublecircle];
    important [label="Checks important errors?\n(calculations, data integrity)" shape=diamond];
    keep1 [label="KEEP" shape=box style=filled fillcolor=lightgreen];
    crash [label="Causes hard crash?\n(no save opportunity)" shape=diamond];
    remove1 [label="REMOVE" shape=box style=filled fillcolor=lightcoral];
    diagnose [label="Helps remote diagnosis?\n(logging, state dumps)" shape=diamond];
    keep2 [label="KEEP as silent log" shape=box style=filled fillcolor=lightgreen];
    remove2 [label="REMOVE or make\nunobtrusive" shape=box style=filled fillcolor=lightyellow];

    START -> important;
    important -> keep1 [label="yes"];
    important -> crash [label="no"];
    crash -> remove1 [label="yes"];
    crash -> diagnose [label="no"];
    diagnose -> keep2 [label="yes"];
    diagnose -> remove2 [label="no"];
}
dot
digraph debug_code_production {
    rankdir=TB;

    START [label="Should this debug\ncode stay in production?" shape=doublecircle];
    important [label="Checks important errors?\n(calculations, data integrity)" shape=diamond];
    keep1 [label="KEEP" shape=box style=filled fillcolor=lightgreen];
    crash [label="Causes hard crash?\n(no save opportunity)" shape=diamond];
    remove1 [label="REMOVE" shape=box style=filled fillcolor=lightcoral];
    diagnose [label="Helps remote diagnosis?\n(logging, state dumps)" shape=diamond];
    keep2 [label="KEEP as silent log" shape=box style=filled fillcolor=lightgreen];
    remove2 [label="REMOVE or make\nunobtrusive" shape=box style=filled fillcolor=lightyellow];

    START -> important;
    important -> keep1 [label="yes"];
    important -> crash [label="no"];
    crash -> remove1 [label="yes"];
    crash -> diagnose [label="no"];
    diagnose -> keep2 [label="yes"];
    diagnose -> remove2 [label="no"];
}

Decision Framework: Assertions vs Error Handling

决策框架:Assertion vs 错误处理

Condition TypeUse AssertionUse Error HandlingGuidance
Should never occur (bug)YesNoAssert documents the impossibility
Can occur at runtimeNoYesHandle gracefully
External inputNoYesAlways validate external data
Internal interface (same module)YesNoAssert for contract violations
Internal interface (cross-module)YesYes, if crossing trust boundaryValidate at module boundaries
Precondition violationYesYes, if public APIPublic APIs need graceful errors
Security-criticalBothBothDefense in depth
Highly robust systemsBothBothBelt and suspenders
条件类型使用Assertion使用错误处理指导建议
绝对不应发生的场景(bug)断言用于标记该场景的不可能
运行时可能发生的场景优雅处理
外部输入必须验证所有外部数据
内部接口(同一模块)断言用于检测契约违反
内部接口(跨模块)是(若跨信任边界)在模块边界验证
前置条件违反是(若为公共API)公共API需要优雅的错误处理
安全关键场景两者都用两者都用纵深防御
高健壮性系统两者都用两者都用双重保障

Barricade Design (p.203-204)

防护边界设计(第203-204页)

MUST be performed in order:
  1. Identify external interfaces (user input, files, network, APIs, inter-service calls)
  2. Place validation at barricade boundary - all external data checked here
  3. Inside barricade: use assertions for internal bugs (data assumed validated)
  4. Strategy: External = error handling; Internal = assertions
Class-level barricade: Public methods validate and sanitize; private methods within that class can assume data is safe.
Critical caveat: "Trust inside barricade" means reduced redundant validation, NOT zero validation. For security-critical paths (auth, crypto, PII), validate again even inside the barricade. Bugs in barricade validation happen.
必须按以下顺序执行:
  1. 识别外部接口(用户输入、文件、网络、API、服务间调用)
  2. 放置验证逻辑在防护边界处 - 所有外部数据在此处检查
  3. 防护边界内部:使用断言检测内部bug(假设数据已通过验证)
  4. 策略:外部场景用错误处理;内部场景用断言
类级防护边界: 公共方法负责验证和清理数据;类内的私有方法可假设数据安全。
关键警告: “信任防护边界内部”意味着减少重复验证,而非完全不验证。对于安全关键路径(认证、加密、PII处理),即使在防护边界内部也需再次验证。防护边界的验证逻辑可能存在bug。

Async and Modern Patterns

异步与现代模式

Traditional exception propagation assumes synchronous call stacks. Modern patterns need different approaches:
传统异常传播基于同步调用栈,现代模式需要不同的处理方式:

Promises/Async-Await

Promises/Async-Await

javascript
// BAD: Unhandled rejection crashes Node.js
async function fetchUser(id) {
  const response = await fetch(`/api/users/${id}`); // Can reject
  return response.json();
}

// GOOD: Explicit error handling
async function fetchUser(id) {
  try {
    const response = await fetch(`/api/users/${id}`);
    if (!response.ok) {
      throw new UserNotFoundError(id); // Domain-level exception
    }
    return response.json();
  } catch (e) {
    if (e instanceof UserNotFoundError) throw e;
    throw new UserServiceError('Failed to fetch user', { cause: e });
  }
}
javascript
// BAD: Unhandled rejection crashes Node.js
async function fetchUser(id) {
  const response = await fetch(`/api/users/${id}`); // Can reject
  return response.json();
}

// GOOD: Explicit error handling
async function fetchUser(id) {
  try {
    const response = await fetch(`/api/users/${id}`);
    if (!response.ok) {
      throw new UserNotFoundError(id); // Domain-level exception
    }
    return response.json();
  } catch (e) {
    if (e instanceof UserNotFoundError) throw e;
    throw new UserServiceError('Failed to fetch user', { cause: e });
  }
}

Callbacks

回调函数

  • Errors don't propagate through callbacks - must be explicitly passed
  • Use error-first callback pattern:
    callback(error, result)
  • Wrap callback APIs in promises for better error handling
  • 错误不会通过回调自动传播 - 必须显式传递
  • 使用错误优先的回调模式:
    callback(error, result)
  • 将回调API包装为Promise,以获得更好的错误处理体验

Promise.all() Error Aggregation

Promise.all() 错误聚合

javascript
// BAD: First rejection loses other results
const results = await Promise.all(promises);

// GOOD: Collect all results including failures
const results = await Promise.allSettled(promises);
const failures = results.filter(r => r.status === 'rejected');
if (failures.length > 0) {
  logErrors(failures);
  // Decide: fail entirely or continue with partial results?
}
javascript
// BAD: First rejection loses other results
const results = await Promise.all(promises);

// GOOD: Collect all results including failures
const results = await Promise.allSettled(promises);
const failures = results.filter(r => r.status === 'rejected');
if (failures.length > 0) {
  logErrors(failures);
  // Decide: fail entirely or continue with partial results?
}

Event Handlers / React Error Boundaries

事件处理器 / React Error Boundaries

  • Synchronous exceptions in event handlers don't crash the app but ARE silently swallowed
  • Use React Error Boundaries to catch rendering errors
  • Log errors in event handlers explicitly
  • 事件处理器中的同步异常不会导致应用崩溃,但会被静默吞噬
  • 使用React Error Boundaries捕获渲染错误
  • 在事件处理器中显式记录错误

Offensive Programming (p.206)

攻击性编程(第206页)

Make errors painful during development so they're found and fixed:
TechniquePurpose
Make asserts abortDon't let programmers bypass known problems
Fill allocated memoryDetect memory allocation errors immediately
Fill files/streams completelyFlush out file-format errors early
Default/else clauses fail hardImpossible to overlook unexpected cases
Fill objects with junk before deletionDetect use-after-free immediately
Email error logs to yourselfGet notified of errors in the field
Paradox: During development, make errors noticeable and obnoxious. During production, make errors unobtrusive with graceful recovery.
在开发阶段让错误更显眼,以便及时发现并修复:
技术目的
让断言触发终止不让开发者绕过已知问题
填充已分配内存立即检测内存分配错误
完全填充文件/流尽早暴露文件格式错误
默认/else分支触发严重失败无法忽略意外情况
对象销毁前填充垃圾数据立即检测野指针问题
将错误日志发送给自己及时收到现场错误通知
悖论: 开发阶段让错误显眼且难以忽略;生产阶段让错误无感知,实现优雅恢复。

Production Transition (p.209-210)

生产环境过渡(第209-210页)

Debug Code TypeActionRationale
Checks important errors (calculations, data)KEEPTax calculation errors matter; messy screens don't
Checks trivial errors (screen updates)REMOVE or log silentlyPenalty is cosmetic only
Causes hard crashesREMOVEUsers need chance to save work
Enables graceful crash with diagnosticsKEEPMars Pathfinder diagnosed issues remotely
Logging for tech supportKEEPConvert assertions from halt to log
Exposes info to attackersREMOVEError messages shouldn't help attackers
调试代码类型操作理由
检查重要错误(计算、数据完整性)保留税务计算错误影响重大;界面混乱影响较小
检查次要错误(界面更新)移除 或静默记录仅影响界面美观
导致强制崩溃移除用户需要机会保存工作
支持优雅崩溃并提供诊断信息保留火星探路者任务通过保留的调试工具远程诊断并修复问题
用于技术支持的日志保留将断言从终止改为日志
向攻击者暴露敏感信息移除错误信息不应帮助攻击者

Red Flags - STOP and Reconsider

危险信号 - 立即停止并重新考虑

If you find yourself thinking any of these, you are about to violate the skill:
Skipping Validation:
  • "This input comes from trusted source"
  • "The caller will validate before passing"
  • "We control the data, it can't be bad"
  • "It's an internal API, only our team calls it"
Assertion Misuse:
  • "I'll just put the action in the assert - it's convenient"
  • "Assertions slow things down"
  • "I'll use an assert for this user input check"
Exception Abuse:
  • "Empty catch is fine, I don't care about this exception"
  • "I'll catch Exception and figure it out later"
  • "Throwing EOFException from GetEmployee() is fine"
Deadline Pressure:
  • "I'll add proper error handling after we ship"
  • "Garbage in, garbage out is acceptable for now"
  • "We can clean up the exception handling later"
Success Streak / Overconfidence:
  • "We've never had a problem with X"
  • "This has been working fine for months"
  • "I've done this pattern 10 times without issues"
  • "Our error handling is already pretty good"
Sunk Cost:
  • "But it works / passes all tests"
  • "I already spent 4 hours on this approach"
  • "Refactoring would take too long"
All of these mean: Apply the checklists anyway. Neither deadline pressure nor past success exempts you from validation.
如果你有以下想法,说明你即将违反本技能的准则:
跳过验证:
  • "这个输入来自可信源"
  • "调用方会在传递前验证"
  • "我们控制数据,不会有问题"
  • "这是内部API,只有我们团队调用"
断言误用:
  • "我把操作放在断言里就行,很方便"
  • "断言会拖慢代码"
  • "我用断言检查用户输入"
异常滥用:
  • "空catch块没问题,我不在乎这个异常"
  • "先捕获所有Exception,以后再处理"
  • "从GetEmployee()抛出EOFException没问题"
截止日期压力:
  • "上线后再添加完善的错误处理"
  • "现在‘垃圾进垃圾出’是可以接受的"
  • "我们以后再清理异常处理"
成功经验 / 过度自信:
  • "我们从来没遇到过X问题"
  • "这个功能已经正常运行几个月了"
  • "我已经用这个模式10次了,没出过问题"
  • "我们的错误处理已经很好了"
沉没成本:
  • "但它能正常运行 / 通过所有测试"
  • "我已经花了4小时在这个方案上"
  • "重构太费时间了"
以上所有想法都意味着:无论如何都要执行检查清单。截止日期压力和过往成功都不能成为跳过验证的理由。

Rationalization Counters

合理化借口的反驳

ExcuseReality
"Garbage in, garbage out is fine"For production software, it's the mark of a sloppy, nonsecure program (p.188)
"Assertions slow down my code"Compile them out for production; trade speed for safety during development (p.205)
"I'll just put the action in the assertion"Code won't execute when assertions are disabled in production (p.191)
"Empty catch blocks are fine"Either the try or catch is wrong; find and fix the root cause (p.201)
"I'll add defensive code everywhere"Too much defensive programming adds complexity and defects (p.210)
"One error strategy is enough"For highly robust code, use both assertions AND error handling (p.193)
"I can handle it locally with an exception"If you can handle locally, don't throw - handle locally (p.199)
"EOFException from GetTaxId() is fine"Exceptions must match routine's abstraction level (p.200)
"I'll validate later"Later never comes; edge cases are forgotten
"This is internal code, no validation needed"Use assertions for internal bugs - but you still need SOMETHING
"We've never had a problem with X"You prevented invisible failures; survivorship bias. The bugs you don't see are the ones you prevented.
"But it works / passes tests"Working now ≠ maintainable later. Tests verify behavior, not design quality.
"I already spent N hours on this"Sunk cost fallacy. Time spent is gone regardless. Question: fix now (2 hours) or debug later (20 hours)?
"This has been running fine for months"Past success doesn't predict future safety. Each change is a new risk. The Mars Climate Orbiter worked for 9 months before unit conversion bug destroyed it.
借口事实
"垃圾进垃圾出没问题"对于生产软件,这是 sloppy、不安全的标志(第188页)
"断言会拖慢代码"生产构建中可以编译移除断言;开发阶段用速度换取安全性(第205页)
"我把操作放在断言里就行"生产构建中断言被禁用,代码不会执行(第191页)
"空catch块没问题"try或catch必有一个是错误的;找到并修复根因(第201页)
"我要在所有地方添加防御性代码"过多的防御性编程会增加复杂度和缺陷(第210页)
"一种错误策略就够了"对于高健壮性代码,要同时使用断言和错误处理(第193页)
"我可以在本地用异常处理"若能在本地处理,就不要抛出异常(第199页)
"从GetTaxId()抛出EOFException没问题"异常必须与函数的抽象层级匹配(第200页)
"我以后再验证"以后永远不会来;边缘场景会被遗忘
"这是内部代码,不需要验证"用断言检测内部bug - 但你总得做些什么
"我们从来没遇到过X问题"你只是预防了隐形故障;幸存者偏差。你没看到的bug,正是你预防的那些。
"但它能正常运行 / 通过所有测试"现在能运行 ≠ 以后可维护。测试验证行为,而非设计质量。
"我已经花了N小时在这个上面"沉没成本谬误。已花费的时间无法挽回。问题是:现在修复(2小时)还是以后调试(20小时)?
"这个功能已经正常运行几个月了"过往成功不代表未来安全。每一次变更都是新风险。火星气候轨道器正常运行9个月后,因单位转换bug坠毁。

Pressure Testing Scenarios

压力测试场景

Scenario 1: Deadline Crunch

场景1:截止日期紧迫

Situation: Feature due tomorrow. You're tempted to skip input validation for the new API endpoint. Test: Is external data entering the system? REQUIRED Response: Yes. Validate it. Buffer overflows and SQL injection don't care about your deadline.
情况: 功能明天上线。你想跳过新API接口的输入验证。 测试: 是否有外部数据进入系统? 必须执行: 是。必须添加验证。缓冲区溢出和SQL注入不会在乎你的截止日期。

Scenario 2: Quick Fix

场景2:快速修复

Situation: Exception crashes prod. Quickest fix is empty catch block. Test: Does the exception represent a real error condition? REQUIRED Response: Yes. Either the try block is wrong (raises exception it shouldn't) or catch is wrong (not handling). Find root cause. At minimum, log it.
情况: 异常导致生产环境崩溃。最快的修复是添加空catch块。 测试: 该异常是否代表真实错误场景? 必须执行: 是。要么try块错误(抛出了不该抛出的异常),要么catch块错误(未正确处理)。找到根因。至少要添加日志。

Scenario 3: Convenient Assertion

场景3:便捷的断言

Situation: You need to initialize a subsystem. Putting
InitSubsystem()
inside
Assert(InitSubsystem())
is convenient. Test: Will this code be needed in production? REQUIRED Response: Yes. Don't put it in assertion. Production builds disable assertions.
情况: 你需要初始化一个子系统。把
InitSubsystem()
放在
Assert(InitSubsystem())
里很方便。 测试: 这段代码在生产环境中是否需要执行? 必须执行: 是。不要放在断言里。生产构建会禁用断言。

Scenario 4: Trusted Source

场景4:可信数据源

Situation: "This data comes from our own database, not user input." Test: Could the database ever contain bad data? (Migration errors, bugs, corruption) REQUIRED Response: Yes. Use barricade design - validate at boundaries. Or use assertions if it's truly internal and indicates a bug.
情况: "这个数据来自我们自己的数据库,不是用户输入。" 测试: 数据库是否可能包含错误数据?(迁移错误、bug、损坏) 必须执行: 是。使用防护边界设计 - 在边界处验证。若确实是内部场景且代表bug,可使用断言。

Scenario 5: Exception Abstraction

场景5:异常抽象层级

Situation: Your
Employee.GetTaxId()
method reads from a file. File throws
EOFException
. You propagate it. Test: Is
EOFException
at the same abstraction level as "Employee"? REQUIRED Response: No. Wrap in
EmployeeDataNotAvailable
or similar. Don't expose implementation details through exception types.
情况: 你的
Employee.GetTaxId()
方法读取文件,文件抛出
EOFException
,你直接向上传播该异常。 测试:
EOFException
是否与“Employee”的抽象层级匹配? 必须执行: 否。包装为
EmployeeDataNotAvailable
或类似的领域异常。不要通过异常暴露实现细节。

Scenario 6: Success Streak

场景6:成功经验

Situation: Last 5 features shipped without defensive programming review. No bugs reported. Test: Does past success mean current code is safe? REQUIRED Response: No. Survivorship bias - you don't see the bugs you prevented. Each change is new risk. Apply the skill anyway.
情况: 过去5个功能上线前都没做防御性编程评审,也没出现bug。 测试: 过往成功是否意味着当前代码安全? 必须执行: 否。幸存者偏差 - 你没看到那些被预防的bug。每一次变更都是新风险。无论如何都要应用本技能的检查。

Scenario 7: Sunk Cost

场景7:沉没成本

Situation: Spent 4 hours implementing error handling. It works but doesn't follow barricade design. Test: Should you refactor to follow the skill? REQUIRED Response: Evaluate: 2 hours refactoring now vs potential 20 hours debugging over project lifetime. Sunk cost is irrelevant to this calculation.
情况: 你花了4小时实现错误处理,功能能正常运行,但不符合防护边界设计。 测试: 你是否应该重构以遵循本技能的准则? 必须执行: 评估:现在重构2小时,还是未来可能花费20小时调试?沉没成本与该决策无关。

Evidence Summary

依据汇总

ClaimSourceApplication
"Garbage in, garbage out" is obsoleteMcConnell p.188Production software must validate or reject
Assertions especially useful in large/complex programsMcConnell p.189More code = more interface mismatches to catch
Error handling is architectural decisionMcConnell p.197Decide at architecture level, enforce consistently
Trade speed for debugging aidsMcConnell p.205Development builds can be slow if they catch bugs
Exceptions weaken encapsulationMcConnell p.198Callers must know what exceptions called code throws
Dead program does less damage than crippled oneHunt & ThomasFail fast, fail loud during development
Mars Pathfinder used debug code in productionMcConnell p.209JPL diagnosed and fixed remotely using left-in debug aids
Bugs cost 100x more to fix in productionIBM Systems Sciences InstituteValidates investment in early defensive programming
15-50% of development time spent on debuggingMcConnell, citing multiple studiesDefensive programming reduces this significantly
Mars Climate Orbiter lost due to unit mismatchNASA 19999 months of success doesn't mean code is safe
主张来源应用
"垃圾进垃圾出"已过时McConnell 第188页生产软件必须验证或拒绝无效输入
断言在大型/复杂程序中尤其有用McConnell 第189页代码越多,需要捕获的接口不匹配问题越多
错误处理是架构决策McConnell 第197页在架构层面确定策略,全局一致执行
用速度换取调试工具McConnell 第205页开发构建可以慢,但要能捕获bug
异常会削弱封装性McConnell 第198页调用方必须了解被调用代码抛出的异常
终止的程序比瘫痪的程序危害更小Hunt & Thomas开发阶段快速失败、大声报错
火星探路者任务在生产环境中保留了调试代码McConnell 第209页JPL团队通过保留的调试工具远程诊断并修复问题
生产环境修复bug的成本是开发阶段的100倍IBM系统科学研究所证明早期防御性编程的投资价值
15-50%的开发时间用于调试McConnell,引用多项研究防御性编程可显著减少这部分时间
火星气候轨道器因单位转换bug坠毁NASA 19999个月的成功不代表代码安全

Error-Handling Strategy Options (p.194-197)

错误处理策略选项(第194-197页)

  1. Return neutral value - Use for: display defaults, non-critical config
  2. Substitute next valid data - Use for: streaming data, sensor readings with redundancy
  3. Return same answer as previous - Use for: display refresh, non-critical caching (NOT financial data)
  4. Substitute closest legal value - Use for: input clamping, slider bounds
  5. Log warning and continue - Use for: non-critical degradation, feature flags
  6. Return error code - Use for: APIs with status conventions, C-style interfaces
  7. Call centralized error handler - Use for: consistent logging, monitoring integration
  8. Display error message - Use for: user-facing apps (but don't leak security info)
  9. Shut down gracefully - Use for: safety-critical, data-corrupting errors
Strategy selection is an architectural decision - be consistent throughout.
Application TypeFavorAvoid
Safety-critical (medical, aviation)Shut downReturn guessed value
Consumer apps (games, word processors)Keep runningCrash without save
Financial/auditFail with clear errorSilent substitution
Data pipelinesFail and retry OR quarantineSilent data loss
Real-time systemsDegrade gracefullyHard crash

  1. 返回中性值 - 适用:显示默认值、非关键配置
  2. 替换为下一个有效数据 - 适用:流数据、带冗余的传感器读数
  3. 返回与上次相同的结果 - 适用:界面刷新、非关键缓存(不适用于金融数据)
  4. 替换为最接近的合法值 - 适用:输入限制、滑块边界
  5. 记录警告并继续 - 适用:非关键功能降级、特性开关
  6. 返回错误码 - 适用:带状态约定的API、C风格接口
  7. 调用集中式错误处理器 - 适用:一致的日志、监控集成
  8. 显示错误信息 - 适用:面向用户的应用(但不要泄露安全信息)
  9. 优雅停机 - 适用:安全关键、数据损坏类错误
策略选择是架构决策 - 全局保持一致。
应用类型倾向选择避免选择
安全关键(医疗、航空)停机返回猜测值
消费类应用(游戏、文字处理)保持运行无保存直接崩溃
金融/审计报错并终止静默替换数据
数据管道失败重试或隔离静默丢失数据
实时系统优雅降级强制崩溃

Chain

后续流程

AfterNext
Validation completecc-control-flow-quality (CHECKER)
完成后下一步
验证完成cc-control-flow-quality(检查模式)