cc-defensive-programming
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSkill: cc-defensive-programming
技能:cc-defensive-programming
STOP - Never Skip
紧急提醒 - 绝不能跳过
| Check | Why Critical |
|---|---|
| No executable code in assertions | Code disappears in production builds |
| No empty catch blocks | Silently swallows bugs that compound |
| External input validated | Security vulnerabilities, data corruption |
| 检查项 | 关键原因 |
|---|---|
| 断言中不包含可执行代码 | 生产构建中代码会被移除 |
| 禁止空catch块 | 会静默掩盖bug,导致问题恶化 |
| 外部输入必须验证 | 防止安全漏洞与数据损坏 |
CRISIS TRIAGE (2 minutes)
危机排查(2分钟)
Production down? Use this prioritized subset:
生产环境宕机?使用以下优先级排查子集:
Immediate (30 seconds each)
立即排查(每项30秒)
- Is external input validated at entry point? If no → add validation NOW
- Any empty catch blocks hiding the real error? If yes → add logging, find root cause
- Any assertions with side effects? If yes → extract to separate statement
- 入口处是否验证了外部输入? 若未验证 → 立即添加验证
- 是否存在空catch块掩盖真实错误? 若存在 → 添加日志,定位根因
- 是否存在带有副作用的断言? 若存在 → 将逻辑提取为独立语句
Before Deploying Fix (60 seconds)
部署修复前检查(60秒)
- Does fix match architectural error strategy? (return code vs exception vs shut down)
- Are you catching at right abstraction level? (not from
EOFException)GetEmployee()
Why triage works: These 5 items catch 80% of defensive programming bugs. Full checklist (21 items) is for non-emergency review.
Cutting corners in a crisis creates the NEXT crisis. The "quick fix" empty catch block you add today becomes tomorrow's 3 AM page.
- 修复方案是否符合架构错误策略?(返回码 vs 异常 vs 停机)
- 是否在正确的抽象层级捕获异常?(例如:不要从抛出
GetEmployee())EOFException
排查逻辑有效性: 这5项检查可覆盖80%的防御性编程bug。完整的21项检查清单适用于非紧急场景的评审。
危机中偷工减料会引发下一次危机。 你今天添加的“快速修复”空catch块,会变成明天凌晨3点的故障告警。
Key Definitions
核心定义
External Input
外部输入
Any data not provably controlled by current code path:
- User input (keyboard, forms, CLI args)
- Files (config, data, uploads)
- Network (APIs, databases, inter-service calls)
- Environment variables, system properties
- Data from ANY other service, even internal ones
"Internal team API" is still external. If it crosses a network boundary or process boundary, validate it.
任何无法被当前代码路径完全控制的数据:
- 用户输入(键盘、表单、CLI参数)
- 文件(配置、数据、上传文件)
- 网络数据(API、数据库、服务间调用)
- 环境变量、系统属性
- 来自任何其他服务的数据,即使是内部服务
“内部团队API”仍属于外部输入。 只要跨网络或进程边界,就必须验证。
Assertion
Assertion
Code used during development that allows a program to check itself as it runs. When true = operating as expected. When false = detected an unexpected error (bug). Use for conditions that should never occur.
开发阶段使用的代码,用于在程序运行时自检。当条件为真时,表示程序运行符合预期;当条件为假时,表示检测到意外错误(bug)。适用于那些绝对不应该发生的场景。
Barricade
Barricade
A damage-containment strategy. Interfaces designated as boundaries to "safe" areas. Data crossing these boundaries is checked for validity.
Limitation: Barricades reduce redundant validation but do NOT replace defense-in-depth for security-critical operations. If barricade validation has a bug, what happens?
一种损害控制策略。将接口设为“安全区域”的边界,所有跨边界的数据都需验证有效性。
注意事项: 防护边界可减少重复验证,但对于安全关键操作,不能替代纵深防御。若防护边界的验证存在bug,后果不堪设想。
Correctness vs Robustness
正确性 vs 健壮性
- Correctness: Never returning an inaccurate result; no result is better than wrong result (safety-critical)
- Robustness: Always trying to keep software operating, even if results are sometimes inaccurate (consumer apps)
- 正确性: 绝不返回不准确的结果;无结果优于错误结果(适用于安全关键场景)
- 健壮性: 始终尝试保持软件运行,即使结果偶尔不准确(适用于消费类应用)
Preconditions / Postconditions
前置条件 / 后置条件
- Preconditions: Properties client code promises are true BEFORE calling routine
- Postconditions: Properties routine promises are true AFTER executing
- [XREF: Meyer 1997, "Design by Contract"]
- 前置条件: 客户端代码在调用函数前必须满足的属性
- 后置条件: 函数执行完成后必须满足的属性
- [参考:Meyer 1997,《契约式设计》]
When NOT to Use
不适用于以下场景
- Pure functions without side effects - Immutable data flows don't need defensive mutation protection (but still validate inputs at system boundaries)
- Prototype/spike code - Time-boxed exploration (max 1 week) before committing to error strategy
- Test code - Test doubles intentionally violate production patterns
- Performance-critical inner loops - Where assertion overhead matters (must show profiling data proving >5% overhead)
SECURITY EXCEPTION: Security-critical code (authentication, authorization, cryptographic operations, PII handling) is NEVER exempt from defensive programming regardless of other factors. When in doubt, validate.
- 无副作用的纯函数 - 不可变数据流无需防御性突变保护(但仍需在系统边界验证输入)
- 原型/探索代码 - 限时探索(最长1周),尚未确定错误处理策略
- 测试代码 - 测试替身会故意违反生产环境模式
- 性能关键的内部循环 - 断言开销会影响性能(需提供性能分析数据证明开销超过5%)
安全例外: 安全关键代码(认证、授权、加密操作、PII处理)无论任何情况,都必须遵循防御性编程原则,无豁免权。存疑时,务必验证。
Crisis Invariants - NEVER SKIP
危机不变准则 - 绝不能跳过
These checks are almost always required. Exceptions need explicit justification and documentation:
| Check | Time | Why Critical |
|---|---|---|
| No executable code in assertions | 15 sec | Code disappears in production builds |
| No empty catch blocks | 15 sec | Silently swallows bugs that compound |
| External input validated | 30 sec | Security vulnerabilities, data corruption |
| Assertions for bugs only | 15 sec | Assertions disabled in production; anticipated errors need handling |
Why these four? Violations create silent failures that are nearly impossible to debug later. They don't crash loudly - they corrupt data and hide bugs.
Rationalizing "I'll add proper error handling later"? You likely won't. Error handling added later is often incomplete because edge cases are forgotten. However, if you genuinely must defer, create a tracked ticket with specific scope.
以下检查几乎总是必需的。若需豁免,必须有明确理由并记录:
| 检查项 | 耗时 | 关键原因 |
|---|---|---|
| 断言中不包含可执行代码 | 15秒 | 生产构建中代码会被移除 |
| 禁止空catch块 | 15秒 | 会静默掩盖bug,导致问题恶化 |
| 外部输入必须验证 | 30秒 | 防止安全漏洞与数据损坏 |
| 断言仅用于检测bug | 15秒 | 断言在生产环境中会被禁用;可预见的错误需用错误处理 |
为何是这四项? 违反这些准则会导致静默故障,后续几乎无法调试。它们不会直接崩溃,而是悄悄损坏数据、掩盖bug。
是否在想“以后再添加完善的错误处理”? 你很可能不会这么做。后期添加的错误处理通常不完整,因为会遗忘边缘场景。若确实需要延迟,需创建带具体范围的跟踪工单。
Pattern Reuse Gate
模式复用检查
BEFORE implementing any error handling, search the codebase:
| Search For | Why |
|---|---|
| Same error type elsewhere | How is it handled? Log? Throw? Return code? |
| Same module's error handling | What's the established pattern here? |
| Barricade/validation patterns | Where are the trust boundaries? |
| Exception hierarchy | What custom exceptions exist? |
Questions to answer:
- How does this codebase handle this type of error elsewhere?
- Is there an established error-handling strategy (exceptions vs return codes vs Result types)?
- What logging/monitoring patterns are used?
- Are there existing custom exception classes I should use?
If pattern found: Follow it. Consistency in error handling is critical for debugging.
If no pattern found: You're establishing one. Document your decision. Consider if this should become the pattern.
See: pattern-reuse-gate.md for full gate protocol.
在实现任何错误处理前,先搜索代码库:
| 搜索内容 | 原因 |
|---|---|
| 相同错误类型的处理方式 | 其他地方是如何处理的?日志?抛出?返回码? |
| 同一模块的错误处理模式 | 该模块已有的既定模式是什么? |
| 防护边界/验证模式 | 信任边界在哪里? |
| 异常层级结构 | 存在哪些自定义异常? |
需回答的问题:
- 代码库中同类错误的处理方式是什么?
- 是否有既定的错误处理策略(异常 vs 返回码 vs Result类型)?
- 使用了哪些日志/监控模式?
- 是否存在可复用的自定义异常类?
若找到模式: 遵循该模式。错误处理的一致性对调试至关重要。
若未找到模式: 你正在建立新的模式。请记录你的决策,并考虑是否将其设为标准模式。
参考: pattern-reuse-gate.md 查看完整的检查流程。
Modes
模式
CHECKER
检查模式(CHECKER)
Purpose: Execute checklists for defensive programming, assertions, exceptions, and error handling
Triggers:
- "review my error handling"
- "check for defensive programming"
- "review assertions"
- "check exception handling"
- "audit my validation code" Non-Triggers:
- "design my error handling strategy" -> APPLIER
- "review my control flow" -> cc-control-flow-quality
- "review my class design" -> cc-routine-and-class-design Checklist: See checklists.md Output Format: | Item | Status | Evidence | Location | |------|--------|----------|----------| Severity:
- VIOLATION: Fails checklist item (missing validation, empty catch blocks, assertions with side effects)
- WARNING: Partial compliance (inconsistent error handling, missing documentation)
- PASS: Meets requirement
用途:执行防御性编程、断言、异常和错误处理的检查清单
触发指令:
- "评审我的错误处理"
- "检查防御性编程"
- "评审断言"
- "检查异常处理"
- "审计我的验证代码" 非触发指令:
- "设计我的错误处理策略" → 应用模式
- "评审我的控制流" → cc-control-flow-quality
- "评审我的类设计" → cc-routine-and-class-design 检查清单:查看 checklists.md 输出格式: | 检查项 | 状态 | 依据 | 位置 | |------|--------|----------|----------| 严重程度:
- 违规(VIOLATION):未通过检查项(缺失验证、空catch块、带有副作用的断言)
- 警告(WARNING):部分合规(错误处理不一致、缺失文档)
- 通过(PASS):符合要求
APPLIER
应用模式(APPLIER)
Purpose: Apply defensive programming techniques when implementing error handling and validation
Triggers:
- "how should I handle this error"
- "should I use assertion or error handling"
- "help me design barricades"
- "set up defensive programming"
- "implement input validation" Non-Triggers:
- "review my existing error handling" -> CHECKER
- "optimize my error handling performance" -> performance skill Produces:
- Assertion placement recommendations
- Error-handling strategy decisions
- Barricade architecture designs
- Input validation implementations Constraints:
- Use assertions for bugs that should never occur (p.191)
- Use error handling for anticipated error conditions (p.194)
- Design barricades at external interfaces (p.203)
- Favor development debugging aids over production constraints (p.205)
用途:在实现错误处理和验证时,应用防御性编程技术
触发指令:
- "我该如何处理这个错误"
- "我应该用断言还是错误处理"
- "帮我设计防护边界"
- "设置防御性编程"
- "实现输入验证" 非触发指令:
- "评审我现有的错误处理" → 检查模式
- "优化我的错误处理性能" → 性能相关技能 输出内容:
- 断言位置建议
- 错误处理策略决策
- 防护边界架构设计
- 输入验证实现方案 约束条件:
- 断言用于检测绝对不应发生的bug(第191页)
- 错误处理用于处理可预见的错误场景(第194页)
- 在外部接口设计防护边界(第203页)
- 优先选择开发阶段的调试工具,而非生产环境的限制(第205页)
Decision Flowcharts
决策流程图
Assertion vs Error Handling
Assertion vs 错误处理
dot
digraph assertion_vs_error {
rankdir=TB;
START [label="Handling a potentially\nbad condition" shape=doublecircle];
never [label="Should this NEVER happen?\n(programmer bug)" shape=diamond];
assertion [label="Use ASSERTION" shape=box style=filled fillcolor=lightblue];
anticipated [label="Is it anticipated\nbad input?" shape=diamond];
errorhandling [label="Use ERROR HANDLING" shape=box style=filled fillcolor=lightgreen];
robust [label="Building highly\nrobust system?" shape=diamond];
both [label="Use BOTH\n(assert + handle)" shape=box style=filled fillcolor=lightyellow];
reconsider [label="Clarify requirements:\nIs it a bug or expected?\nAsk domain expert." shape=box style=filled fillcolor=lightcoral];
START -> never;
never -> robust [label="yes"];
never -> anticipated [label="no"];
robust -> both [label="yes"];
robust -> assertion [label="no"];
anticipated -> errorhandling [label="yes"];
anticipated -> reconsider [label="no/unsure"];
}dot
digraph assertion_vs_error {
rankdir=TB;
START [label="Handling a potentially\nbad condition" shape=doublecircle];
never [label="Should this NEVER happen?\n(programmer bug)" shape=diamond];
assertion [label="Use ASSERTION" shape=box style=filled fillcolor=lightblue];
anticipated [label="Is it anticipated\nbad input?" shape=diamond];
errorhandling [label="Use ERROR HANDLING" shape=box style=filled fillcolor=lightgreen];
robust [label="Building highly\nrobust system?" shape=diamond];
both [label="Use BOTH\n(assert + handle)" shape=box style=filled fillcolor=lightyellow];
reconsider [label="Clarify requirements:\nIs it a bug or expected?\nAsk domain expert." shape=box style=filled fillcolor=lightcoral];
START -> never;
never -> robust [label="yes"];
never -> anticipated [label="no"];
robust -> both [label="yes"];
robust -> assertion [label="no"];
anticipated -> errorhandling [label="yes"];
anticipated -> reconsider [label="no/unsure"];
}Correctness vs Robustness
正确性 vs 健壮性
dot
digraph correctness_vs_robustness {
rankdir=TB;
START [label="Choose error\nhandling philosophy" shape=doublecircle];
safety [label="Safety-critical?\n(medical, aviation, nuclear)" shape=diamond];
correctness [label="Favor CORRECTNESS\nShut down > wrong result" shape=box style=filled fillcolor=lightcoral];
consumer [label="Consumer app?\n(games, word processors)" shape=diamond];
robustness [label="Favor ROBUSTNESS\nKeep running > perfect" shape=box style=filled fillcolor=lightgreen];
analyze [label="ANALYZE DOMAIN\n(see guidance below)" shape=box style=filled fillcolor=lightyellow];
START -> safety;
safety -> correctness [label="yes"];
safety -> consumer [label="no"];
consumer -> robustness [label="yes"];
consumer -> analyze [label="no"];
}Domain Analysis Guidance (for "Analyze domain" path):
| Domain Type | Lean Toward | Key Question |
|---|---|---|
| Enterprise/B2B | Correctness | "Would wrong data cause business decisions based on false info?" |
| SaaS platforms | Balanced | "What's the blast radius of a wrong answer vs unavailability?" |
| Internal tools | Robustness | "Is user technical enough to recover from a crash?" |
| Data pipelines | Correctness | "Does downstream processing assume data integrity?" |
| Real-time systems | Context-dependent | "Is stale data better or worse than no data?" |
dot
digraph correctness_vs_robustness {
rankdir=TB;
START [label="Choose error\nhandling philosophy" shape=doublecircle];
safety [label="Safety-critical?\n(medical, aviation, nuclear)" shape=diamond];
correctness [label="Favor CORRECTNESS\nShut down > wrong result" shape=box style=filled fillcolor=lightcoral];
consumer [label="Consumer app?\n(games, word processors)" shape=diamond];
robustness [label="Favor ROBUSTNESS\nKeep running > perfect" shape=box style=filled fillcolor=lightgreen];
analyze [label="ANALYZE DOMAIN\n(see guidance below)" shape=box style=filled fillcolor=lightyellow];
START -> safety;
safety -> correctness [label="yes"];
safety -> consumer [label="no"];
consumer -> robustness [label="yes"];
consumer -> analyze [label="no"];
}领域分析指南(对应“分析领域”分支):
| 领域类型 | 倾向选择 | 核心问题 |
|---|---|---|
| 企业/B2B | 正确性 | "错误数据是否会导致基于虚假信息的业务决策?" |
| SaaS平台 | 平衡策略 | "错误结果与服务不可用的影响范围哪个更大?" |
| 内部工具 | 健壮性 | "用户是否具备足够技术能力从崩溃中恢复?" |
| 数据管道 | 正确性 | "下游处理是否依赖数据的完整性?" |
| 实时系统 | 依赖场景 | "过期数据比无数据更好还是更差?" |
Keep Debug Code in Production?
是否在生产环境保留调试代码?
dot
digraph debug_code_production {
rankdir=TB;
START [label="Should this debug\ncode stay in production?" shape=doublecircle];
important [label="Checks important errors?\n(calculations, data integrity)" shape=diamond];
keep1 [label="KEEP" shape=box style=filled fillcolor=lightgreen];
crash [label="Causes hard crash?\n(no save opportunity)" shape=diamond];
remove1 [label="REMOVE" shape=box style=filled fillcolor=lightcoral];
diagnose [label="Helps remote diagnosis?\n(logging, state dumps)" shape=diamond];
keep2 [label="KEEP as silent log" shape=box style=filled fillcolor=lightgreen];
remove2 [label="REMOVE or make\nunobtrusive" shape=box style=filled fillcolor=lightyellow];
START -> important;
important -> keep1 [label="yes"];
important -> crash [label="no"];
crash -> remove1 [label="yes"];
crash -> diagnose [label="no"];
diagnose -> keep2 [label="yes"];
diagnose -> remove2 [label="no"];
}dot
digraph debug_code_production {
rankdir=TB;
START [label="Should this debug\ncode stay in production?" shape=doublecircle];
important [label="Checks important errors?\n(calculations, data integrity)" shape=diamond];
keep1 [label="KEEP" shape=box style=filled fillcolor=lightgreen];
crash [label="Causes hard crash?\n(no save opportunity)" shape=diamond];
remove1 [label="REMOVE" shape=box style=filled fillcolor=lightcoral];
diagnose [label="Helps remote diagnosis?\n(logging, state dumps)" shape=diamond];
keep2 [label="KEEP as silent log" shape=box style=filled fillcolor=lightgreen];
remove2 [label="REMOVE or make\nunobtrusive" shape=box style=filled fillcolor=lightyellow];
START -> important;
important -> keep1 [label="yes"];
important -> crash [label="no"];
crash -> remove1 [label="yes"];
crash -> diagnose [label="no"];
diagnose -> keep2 [label="yes"];
diagnose -> remove2 [label="no"];
}Decision Framework: Assertions vs Error Handling
决策框架:Assertion vs 错误处理
| Condition Type | Use Assertion | Use Error Handling | Guidance |
|---|---|---|---|
| Should never occur (bug) | Yes | No | Assert documents the impossibility |
| Can occur at runtime | No | Yes | Handle gracefully |
| External input | No | Yes | Always validate external data |
| Internal interface (same module) | Yes | No | Assert for contract violations |
| Internal interface (cross-module) | Yes | Yes, if crossing trust boundary | Validate at module boundaries |
| Precondition violation | Yes | Yes, if public API | Public APIs need graceful errors |
| Security-critical | Both | Both | Defense in depth |
| Highly robust systems | Both | Both | Belt and suspenders |
| 条件类型 | 使用Assertion | 使用错误处理 | 指导建议 |
|---|---|---|---|
| 绝对不应发生的场景(bug) | 是 | 否 | 断言用于标记该场景的不可能 |
| 运行时可能发生的场景 | 否 | 是 | 优雅处理 |
| 外部输入 | 否 | 是 | 必须验证所有外部数据 |
| 内部接口(同一模块) | 是 | 否 | 断言用于检测契约违反 |
| 内部接口(跨模块) | 是 | 是(若跨信任边界) | 在模块边界验证 |
| 前置条件违反 | 是 | 是(若为公共API) | 公共API需要优雅的错误处理 |
| 安全关键场景 | 两者都用 | 两者都用 | 纵深防御 |
| 高健壮性系统 | 两者都用 | 两者都用 | 双重保障 |
Barricade Design (p.203-204)
防护边界设计(第203-204页)
MUST be performed in order:
- Identify external interfaces (user input, files, network, APIs, inter-service calls)
- Place validation at barricade boundary - all external data checked here
- Inside barricade: use assertions for internal bugs (data assumed validated)
- Strategy: External = error handling; Internal = assertions
Class-level barricade: Public methods validate and sanitize; private methods within that class can assume data is safe.
Critical caveat: "Trust inside barricade" means reduced redundant validation, NOT zero validation. For security-critical paths (auth, crypto, PII), validate again even inside the barricade. Bugs in barricade validation happen.
必须按以下顺序执行:
- 识别外部接口(用户输入、文件、网络、API、服务间调用)
- 放置验证逻辑在防护边界处 - 所有外部数据在此处检查
- 防护边界内部:使用断言检测内部bug(假设数据已通过验证)
- 策略:外部场景用错误处理;内部场景用断言
类级防护边界: 公共方法负责验证和清理数据;类内的私有方法可假设数据安全。
关键警告: “信任防护边界内部”意味着减少重复验证,而非完全不验证。对于安全关键路径(认证、加密、PII处理),即使在防护边界内部也需再次验证。防护边界的验证逻辑可能存在bug。
Async and Modern Patterns
异步与现代模式
Traditional exception propagation assumes synchronous call stacks. Modern patterns need different approaches:
传统异常传播基于同步调用栈,现代模式需要不同的处理方式:
Promises/Async-Await
Promises/Async-Await
javascript
// BAD: Unhandled rejection crashes Node.js
async function fetchUser(id) {
const response = await fetch(`/api/users/${id}`); // Can reject
return response.json();
}
// GOOD: Explicit error handling
async function fetchUser(id) {
try {
const response = await fetch(`/api/users/${id}`);
if (!response.ok) {
throw new UserNotFoundError(id); // Domain-level exception
}
return response.json();
} catch (e) {
if (e instanceof UserNotFoundError) throw e;
throw new UserServiceError('Failed to fetch user', { cause: e });
}
}javascript
// BAD: Unhandled rejection crashes Node.js
async function fetchUser(id) {
const response = await fetch(`/api/users/${id}`); // Can reject
return response.json();
}
// GOOD: Explicit error handling
async function fetchUser(id) {
try {
const response = await fetch(`/api/users/${id}`);
if (!response.ok) {
throw new UserNotFoundError(id); // Domain-level exception
}
return response.json();
} catch (e) {
if (e instanceof UserNotFoundError) throw e;
throw new UserServiceError('Failed to fetch user', { cause: e });
}
}Callbacks
回调函数
- Errors don't propagate through callbacks - must be explicitly passed
- Use error-first callback pattern:
callback(error, result) - Wrap callback APIs in promises for better error handling
- 错误不会通过回调自动传播 - 必须显式传递
- 使用错误优先的回调模式:
callback(error, result) - 将回调API包装为Promise,以获得更好的错误处理体验
Promise.all() Error Aggregation
Promise.all() 错误聚合
javascript
// BAD: First rejection loses other results
const results = await Promise.all(promises);
// GOOD: Collect all results including failures
const results = await Promise.allSettled(promises);
const failures = results.filter(r => r.status === 'rejected');
if (failures.length > 0) {
logErrors(failures);
// Decide: fail entirely or continue with partial results?
}javascript
// BAD: First rejection loses other results
const results = await Promise.all(promises);
// GOOD: Collect all results including failures
const results = await Promise.allSettled(promises);
const failures = results.filter(r => r.status === 'rejected');
if (failures.length > 0) {
logErrors(failures);
// Decide: fail entirely or continue with partial results?
}Event Handlers / React Error Boundaries
事件处理器 / React Error Boundaries
- Synchronous exceptions in event handlers don't crash the app but ARE silently swallowed
- Use React Error Boundaries to catch rendering errors
- Log errors in event handlers explicitly
- 事件处理器中的同步异常不会导致应用崩溃,但会被静默吞噬
- 使用React Error Boundaries捕获渲染错误
- 在事件处理器中显式记录错误
Offensive Programming (p.206)
攻击性编程(第206页)
Make errors painful during development so they're found and fixed:
| Technique | Purpose |
|---|---|
| Make asserts abort | Don't let programmers bypass known problems |
| Fill allocated memory | Detect memory allocation errors immediately |
| Fill files/streams completely | Flush out file-format errors early |
| Default/else clauses fail hard | Impossible to overlook unexpected cases |
| Fill objects with junk before deletion | Detect use-after-free immediately |
| Email error logs to yourself | Get notified of errors in the field |
Paradox: During development, make errors noticeable and obnoxious. During production, make errors unobtrusive with graceful recovery.
在开发阶段让错误更显眼,以便及时发现并修复:
| 技术 | 目的 |
|---|---|
| 让断言触发终止 | 不让开发者绕过已知问题 |
| 填充已分配内存 | 立即检测内存分配错误 |
| 完全填充文件/流 | 尽早暴露文件格式错误 |
| 默认/else分支触发严重失败 | 无法忽略意外情况 |
| 对象销毁前填充垃圾数据 | 立即检测野指针问题 |
| 将错误日志发送给自己 | 及时收到现场错误通知 |
悖论: 开发阶段让错误显眼且难以忽略;生产阶段让错误无感知,实现优雅恢复。
Production Transition (p.209-210)
生产环境过渡(第209-210页)
| Debug Code Type | Action | Rationale |
|---|---|---|
| Checks important errors (calculations, data) | KEEP | Tax calculation errors matter; messy screens don't |
| Checks trivial errors (screen updates) | REMOVE or log silently | Penalty is cosmetic only |
| Causes hard crashes | REMOVE | Users need chance to save work |
| Enables graceful crash with diagnostics | KEEP | Mars Pathfinder diagnosed issues remotely |
| Logging for tech support | KEEP | Convert assertions from halt to log |
| Exposes info to attackers | REMOVE | Error messages shouldn't help attackers |
| 调试代码类型 | 操作 | 理由 |
|---|---|---|
| 检查重要错误(计算、数据完整性) | 保留 | 税务计算错误影响重大;界面混乱影响较小 |
| 检查次要错误(界面更新) | 移除 或静默记录 | 仅影响界面美观 |
| 导致强制崩溃 | 移除 | 用户需要机会保存工作 |
| 支持优雅崩溃并提供诊断信息 | 保留 | 火星探路者任务通过保留的调试工具远程诊断并修复问题 |
| 用于技术支持的日志 | 保留 | 将断言从终止改为日志 |
| 向攻击者暴露敏感信息 | 移除 | 错误信息不应帮助攻击者 |
Red Flags - STOP and Reconsider
危险信号 - 立即停止并重新考虑
If you find yourself thinking any of these, you are about to violate the skill:
Skipping Validation:
- "This input comes from trusted source"
- "The caller will validate before passing"
- "We control the data, it can't be bad"
- "It's an internal API, only our team calls it"
Assertion Misuse:
- "I'll just put the action in the assert - it's convenient"
- "Assertions slow things down"
- "I'll use an assert for this user input check"
Exception Abuse:
- "Empty catch is fine, I don't care about this exception"
- "I'll catch Exception and figure it out later"
- "Throwing EOFException from GetEmployee() is fine"
Deadline Pressure:
- "I'll add proper error handling after we ship"
- "Garbage in, garbage out is acceptable for now"
- "We can clean up the exception handling later"
Success Streak / Overconfidence:
- "We've never had a problem with X"
- "This has been working fine for months"
- "I've done this pattern 10 times without issues"
- "Our error handling is already pretty good"
Sunk Cost:
- "But it works / passes all tests"
- "I already spent 4 hours on this approach"
- "Refactoring would take too long"
All of these mean: Apply the checklists anyway. Neither deadline pressure nor past success exempts you from validation.
如果你有以下想法,说明你即将违反本技能的准则:
跳过验证:
- "这个输入来自可信源"
- "调用方会在传递前验证"
- "我们控制数据,不会有问题"
- "这是内部API,只有我们团队调用"
断言误用:
- "我把操作放在断言里就行,很方便"
- "断言会拖慢代码"
- "我用断言检查用户输入"
异常滥用:
- "空catch块没问题,我不在乎这个异常"
- "先捕获所有Exception,以后再处理"
- "从GetEmployee()抛出EOFException没问题"
截止日期压力:
- "上线后再添加完善的错误处理"
- "现在‘垃圾进垃圾出’是可以接受的"
- "我们以后再清理异常处理"
成功经验 / 过度自信:
- "我们从来没遇到过X问题"
- "这个功能已经正常运行几个月了"
- "我已经用这个模式10次了,没出过问题"
- "我们的错误处理已经很好了"
沉没成本:
- "但它能正常运行 / 通过所有测试"
- "我已经花了4小时在这个方案上"
- "重构太费时间了"
以上所有想法都意味着:无论如何都要执行检查清单。截止日期压力和过往成功都不能成为跳过验证的理由。
Rationalization Counters
合理化借口的反驳
| Excuse | Reality |
|---|---|
| "Garbage in, garbage out is fine" | For production software, it's the mark of a sloppy, nonsecure program (p.188) |
| "Assertions slow down my code" | Compile them out for production; trade speed for safety during development (p.205) |
| "I'll just put the action in the assertion" | Code won't execute when assertions are disabled in production (p.191) |
| "Empty catch blocks are fine" | Either the try or catch is wrong; find and fix the root cause (p.201) |
| "I'll add defensive code everywhere" | Too much defensive programming adds complexity and defects (p.210) |
| "One error strategy is enough" | For highly robust code, use both assertions AND error handling (p.193) |
| "I can handle it locally with an exception" | If you can handle locally, don't throw - handle locally (p.199) |
| "EOFException from GetTaxId() is fine" | Exceptions must match routine's abstraction level (p.200) |
| "I'll validate later" | Later never comes; edge cases are forgotten |
| "This is internal code, no validation needed" | Use assertions for internal bugs - but you still need SOMETHING |
| "We've never had a problem with X" | You prevented invisible failures; survivorship bias. The bugs you don't see are the ones you prevented. |
| "But it works / passes tests" | Working now ≠ maintainable later. Tests verify behavior, not design quality. |
| "I already spent N hours on this" | Sunk cost fallacy. Time spent is gone regardless. Question: fix now (2 hours) or debug later (20 hours)? |
| "This has been running fine for months" | Past success doesn't predict future safety. Each change is a new risk. The Mars Climate Orbiter worked for 9 months before unit conversion bug destroyed it. |
| 借口 | 事实 |
|---|---|
| "垃圾进垃圾出没问题" | 对于生产软件,这是 sloppy、不安全的标志(第188页) |
| "断言会拖慢代码" | 生产构建中可以编译移除断言;开发阶段用速度换取安全性(第205页) |
| "我把操作放在断言里就行" | 生产构建中断言被禁用,代码不会执行(第191页) |
| "空catch块没问题" | try或catch必有一个是错误的;找到并修复根因(第201页) |
| "我要在所有地方添加防御性代码" | 过多的防御性编程会增加复杂度和缺陷(第210页) |
| "一种错误策略就够了" | 对于高健壮性代码,要同时使用断言和错误处理(第193页) |
| "我可以在本地用异常处理" | 若能在本地处理,就不要抛出异常(第199页) |
| "从GetTaxId()抛出EOFException没问题" | 异常必须与函数的抽象层级匹配(第200页) |
| "我以后再验证" | 以后永远不会来;边缘场景会被遗忘 |
| "这是内部代码,不需要验证" | 用断言检测内部bug - 但你总得做些什么 |
| "我们从来没遇到过X问题" | 你只是预防了隐形故障;幸存者偏差。你没看到的bug,正是你预防的那些。 |
| "但它能正常运行 / 通过所有测试" | 现在能运行 ≠ 以后可维护。测试验证行为,而非设计质量。 |
| "我已经花了N小时在这个上面" | 沉没成本谬误。已花费的时间无法挽回。问题是:现在修复(2小时)还是以后调试(20小时)? |
| "这个功能已经正常运行几个月了" | 过往成功不代表未来安全。每一次变更都是新风险。火星气候轨道器正常运行9个月后,因单位转换bug坠毁。 |
Pressure Testing Scenarios
压力测试场景
Scenario 1: Deadline Crunch
场景1:截止日期紧迫
Situation: Feature due tomorrow. You're tempted to skip input validation for the new API endpoint.
Test: Is external data entering the system?
REQUIRED Response: Yes. Validate it. Buffer overflows and SQL injection don't care about your deadline.
情况: 功能明天上线。你想跳过新API接口的输入验证。
测试: 是否有外部数据进入系统?
必须执行: 是。必须添加验证。缓冲区溢出和SQL注入不会在乎你的截止日期。
Scenario 2: Quick Fix
场景2:快速修复
Situation: Exception crashes prod. Quickest fix is empty catch block.
Test: Does the exception represent a real error condition?
REQUIRED Response: Yes. Either the try block is wrong (raises exception it shouldn't) or catch is wrong (not handling). Find root cause. At minimum, log it.
情况: 异常导致生产环境崩溃。最快的修复是添加空catch块。
测试: 该异常是否代表真实错误场景?
必须执行: 是。要么try块错误(抛出了不该抛出的异常),要么catch块错误(未正确处理)。找到根因。至少要添加日志。
Scenario 3: Convenient Assertion
场景3:便捷的断言
Situation: You need to initialize a subsystem. Putting inside is convenient.
Test: Will this code be needed in production?
REQUIRED Response: Yes. Don't put it in assertion. Production builds disable assertions.
InitSubsystem()Assert(InitSubsystem())情况: 你需要初始化一个子系统。把放在里很方便。
测试: 这段代码在生产环境中是否需要执行?
必须执行: 是。不要放在断言里。生产构建会禁用断言。
InitSubsystem()Assert(InitSubsystem())Scenario 4: Trusted Source
场景4:可信数据源
Situation: "This data comes from our own database, not user input."
Test: Could the database ever contain bad data? (Migration errors, bugs, corruption)
REQUIRED Response: Yes. Use barricade design - validate at boundaries. Or use assertions if it's truly internal and indicates a bug.
情况: "这个数据来自我们自己的数据库,不是用户输入。"
测试: 数据库是否可能包含错误数据?(迁移错误、bug、损坏)
必须执行: 是。使用防护边界设计 - 在边界处验证。若确实是内部场景且代表bug,可使用断言。
Scenario 5: Exception Abstraction
场景5:异常抽象层级
Situation: Your method reads from a file. File throws . You propagate it.
Test: Is at the same abstraction level as "Employee"?
REQUIRED Response: No. Wrap in or similar. Don't expose implementation details through exception types.
Employee.GetTaxId()EOFExceptionEOFExceptionEmployeeDataNotAvailable情况: 你的方法读取文件,文件抛出,你直接向上传播该异常。
测试: 是否与“Employee”的抽象层级匹配?
必须执行: 否。包装为或类似的领域异常。不要通过异常暴露实现细节。
Employee.GetTaxId()EOFExceptionEOFExceptionEmployeeDataNotAvailableScenario 6: Success Streak
场景6:成功经验
Situation: Last 5 features shipped without defensive programming review. No bugs reported.
Test: Does past success mean current code is safe?
REQUIRED Response: No. Survivorship bias - you don't see the bugs you prevented. Each change is new risk. Apply the skill anyway.
情况: 过去5个功能上线前都没做防御性编程评审,也没出现bug。
测试: 过往成功是否意味着当前代码安全?
必须执行: 否。幸存者偏差 - 你没看到那些被预防的bug。每一次变更都是新风险。无论如何都要应用本技能的检查。
Scenario 7: Sunk Cost
场景7:沉没成本
Situation: Spent 4 hours implementing error handling. It works but doesn't follow barricade design.
Test: Should you refactor to follow the skill?
REQUIRED Response: Evaluate: 2 hours refactoring now vs potential 20 hours debugging over project lifetime. Sunk cost is irrelevant to this calculation.
情况: 你花了4小时实现错误处理,功能能正常运行,但不符合防护边界设计。
测试: 你是否应该重构以遵循本技能的准则?
必须执行: 评估:现在重构2小时,还是未来可能花费20小时调试?沉没成本与该决策无关。
Evidence Summary
依据汇总
| Claim | Source | Application |
|---|---|---|
| "Garbage in, garbage out" is obsolete | McConnell p.188 | Production software must validate or reject |
| Assertions especially useful in large/complex programs | McConnell p.189 | More code = more interface mismatches to catch |
| Error handling is architectural decision | McConnell p.197 | Decide at architecture level, enforce consistently |
| Trade speed for debugging aids | McConnell p.205 | Development builds can be slow if they catch bugs |
| Exceptions weaken encapsulation | McConnell p.198 | Callers must know what exceptions called code throws |
| Dead program does less damage than crippled one | Hunt & Thomas | Fail fast, fail loud during development |
| Mars Pathfinder used debug code in production | McConnell p.209 | JPL diagnosed and fixed remotely using left-in debug aids |
| Bugs cost 100x more to fix in production | IBM Systems Sciences Institute | Validates investment in early defensive programming |
| 15-50% of development time spent on debugging | McConnell, citing multiple studies | Defensive programming reduces this significantly |
| Mars Climate Orbiter lost due to unit mismatch | NASA 1999 | 9 months of success doesn't mean code is safe |
| 主张 | 来源 | 应用 |
|---|---|---|
| "垃圾进垃圾出"已过时 | McConnell 第188页 | 生产软件必须验证或拒绝无效输入 |
| 断言在大型/复杂程序中尤其有用 | McConnell 第189页 | 代码越多,需要捕获的接口不匹配问题越多 |
| 错误处理是架构决策 | McConnell 第197页 | 在架构层面确定策略,全局一致执行 |
| 用速度换取调试工具 | McConnell 第205页 | 开发构建可以慢,但要能捕获bug |
| 异常会削弱封装性 | McConnell 第198页 | 调用方必须了解被调用代码抛出的异常 |
| 终止的程序比瘫痪的程序危害更小 | Hunt & Thomas | 开发阶段快速失败、大声报错 |
| 火星探路者任务在生产环境中保留了调试代码 | McConnell 第209页 | JPL团队通过保留的调试工具远程诊断并修复问题 |
| 生产环境修复bug的成本是开发阶段的100倍 | IBM系统科学研究所 | 证明早期防御性编程的投资价值 |
| 15-50%的开发时间用于调试 | McConnell,引用多项研究 | 防御性编程可显著减少这部分时间 |
| 火星气候轨道器因单位转换bug坠毁 | NASA 1999 | 9个月的成功不代表代码安全 |
Error-Handling Strategy Options (p.194-197)
错误处理策略选项(第194-197页)
- Return neutral value - Use for: display defaults, non-critical config
- Substitute next valid data - Use for: streaming data, sensor readings with redundancy
- Return same answer as previous - Use for: display refresh, non-critical caching (NOT financial data)
- Substitute closest legal value - Use for: input clamping, slider bounds
- Log warning and continue - Use for: non-critical degradation, feature flags
- Return error code - Use for: APIs with status conventions, C-style interfaces
- Call centralized error handler - Use for: consistent logging, monitoring integration
- Display error message - Use for: user-facing apps (but don't leak security info)
- Shut down gracefully - Use for: safety-critical, data-corrupting errors
Strategy selection is an architectural decision - be consistent throughout.
| Application Type | Favor | Avoid |
|---|---|---|
| Safety-critical (medical, aviation) | Shut down | Return guessed value |
| Consumer apps (games, word processors) | Keep running | Crash without save |
| Financial/audit | Fail with clear error | Silent substitution |
| Data pipelines | Fail and retry OR quarantine | Silent data loss |
| Real-time systems | Degrade gracefully | Hard crash |
- 返回中性值 - 适用:显示默认值、非关键配置
- 替换为下一个有效数据 - 适用:流数据、带冗余的传感器读数
- 返回与上次相同的结果 - 适用:界面刷新、非关键缓存(不适用于金融数据)
- 替换为最接近的合法值 - 适用:输入限制、滑块边界
- 记录警告并继续 - 适用:非关键功能降级、特性开关
- 返回错误码 - 适用:带状态约定的API、C风格接口
- 调用集中式错误处理器 - 适用:一致的日志、监控集成
- 显示错误信息 - 适用:面向用户的应用(但不要泄露安全信息)
- 优雅停机 - 适用:安全关键、数据损坏类错误
策略选择是架构决策 - 全局保持一致。
| 应用类型 | 倾向选择 | 避免选择 |
|---|---|---|
| 安全关键(医疗、航空) | 停机 | 返回猜测值 |
| 消费类应用(游戏、文字处理) | 保持运行 | 无保存直接崩溃 |
| 金融/审计 | 报错并终止 | 静默替换数据 |
| 数据管道 | 失败重试或隔离 | 静默丢失数据 |
| 实时系统 | 优雅降级 | 强制崩溃 |
Chain
后续流程
| After | Next |
|---|---|
| Validation complete | cc-control-flow-quality (CHECKER) |
| 完成后 | 下一步 |
|---|---|
| 验证完成 | cc-control-flow-quality(检查模式) |