exhaustive-systems-analysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Exhaustive Systems Analysis

全面系统分析

Systematic audit methodology for rooting out latent issues in codebases, particularly agent-written code that needs verification before production use.
这是一种系统性的审计方法,用于排查代码库中的潜在问题,尤其是Agent编写的、需要验证是否可用于生产环境的代码。

Core Principles

核心原则

  1. Subsystem isolation — Analyze each subsystem separately to prevent context pollution
  2. Evidence-based findings — Every issue must cite specific code locations
  3. Severity-driven prioritization — Critical issues first, cosmetic issues last
  4. Assume all issues will be fixed — Don't hedge; be direct about what's wrong
  1. 子系统隔离 — 单独分析每个子系统,以避免上下文干扰
  2. 基于证据的分析结果 — 每个问题都必须引用具体的代码位置
  3. 基于严重程度的优先级排序 — 优先处理关键问题,最后处理外观性次要问题
  4. 假设所有问题都会被修复 — 不要含糊其辞,直接指出问题所在

Workflow

工作流程

Phase 1: System Decomposition

第一阶段:系统分解

Before analysis, map the system's subsystems. Auto-discover by:
  1. Read project structure — Identify major modules, packages, or directories
  2. Trace data flow — Follow how data enters, transforms, and exits
  3. Identify side effects — File I/O, network, database, IPC, state mutations
  4. Map dependencies — Which subsystems depend on which
Output a subsystem table:
markdown
| # | Subsystem | Files | Side Effects | Priority |
|---|-----------|-------|--------------|----------|
| 1 | Lock System | lock.rs | FS: mkdir, rm | High |
| 2 | API Layer | api/*.rs | Network, DB | High |
| 3 | Config Parser | config.rs | FS: read | Medium |
Priority heuristics:
  • High: Side effects, state management, security, concurrency
  • Medium: Business logic, data transformation, validation
  • Low: Pure utilities, formatting, logging
在分析之前,先梳理系统的子系统。可通过以下方式自动发现:
  1. 读取项目结构 — 识别主要模块、包或目录
  2. 追踪数据流 — 跟踪数据的输入、转换和输出过程
  3. 识别副作用 — 文件I/O、网络、数据库、IPC、状态变更
  4. 梳理依赖关系 — 明确子系统之间的依赖关系
输出子系统表格:
markdown
| # | Subsystem | Files | Side Effects | Priority |
|---|-----------|-------|--------------|----------|
| 1 | Lock System | lock.rs | FS: mkdir, rm | High |
| 2 | API Layer | api/*.rs | Network, DB | High |
| 3 | Config Parser | config.rs | FS: read | Medium |
优先级判定规则:
  • High:副作用、状态管理、安全性、并发
  • Medium:业务逻辑、数据转换、验证
  • Low:纯工具类代码、格式、日志

Phase 2: Sequential Analysis

第二阶段:顺序分析

Analyze subsystems in priority order. For large codebases (>5 subsystems or >3000 LOC per subsystem), prefer clearing context between subsystems to prevent analysis drift.
For each subsystem, apply the appropriate checklist based on subsystem type.
按照优先级顺序分析子系统。对于大型代码库(超过5个子系统或每个子系统超过3000行代码),建议在分析不同子系统之间清理上下文,以避免分析偏差。
针对每个子系统,根据子系统类型应用相应的检查清单。

Phase 3: Consolidation

第三阶段:整合分析结果

After all subsystems analyzed:
  1. Deduplicate cross-cutting findings
  2. Rank all issues by severity
  3. Produce final report with recommended action order

完成所有子系统的分析后:
  1. 去重跨子系统的分析结果
  2. 按严重程度对所有问题排序
  3. 生成包含建议修复顺序的最终报告

Analysis Checklists

分析检查清单

Select checklist based on subsystem characteristics. Apply multiple if applicable.
根据子系统的特性选择相应的检查清单。若适用,可同时应用多个清单。

Stateful Systems (files, databases, caches, locks)

有状态系统(文件、数据库、缓存、锁)

CheckQuestion
CorrectnessDoes code do what documentation claims?
AtomicityCan partial writes corrupt state?
Race conditionsCan concurrent access cause inconsistency?
CleanupAre resources released on all exit paths (success, error, panic)?
Error recoveryDo failures leave the system in a valid state?
Stale documentationDo comments match actual behavior?
Dead codeAre there unused code paths that could confuse maintainers?
检查项问题
正确性代码是否符合文档描述的功能?
原子性部分写入操作是否会破坏状态?
竞态条件并发访问是否会导致不一致?
资源清理是否在所有退出路径(成功、错误、Panic)下都释放了资源?
错误恢复发生故障后系统是否仍处于有效状态?
过时文档注释是否与实际行为一致?
死代码是否存在会让维护人员困惑的未使用代码路径?

APIs & Network (HTTP, gRPC, WebSocket, IPC)

API与网络(HTTP、gRPC、WebSocket、IPC)

CheckQuestion
Input validationAre all inputs validated before use?
Error responsesDo errors leak internal details?
Timeout handlingAre network operations bounded?
Retry safetyAre operations idempotent or properly guarded?
AuthenticationAre auth checks applied consistently?
Rate limitingCan the API be abused?
SerializationCan malformed payloads cause panics?
检查项问题
输入验证所有输入是否在使用前经过验证?
错误响应错误信息是否会泄露内部细节?
超时处理网络操作是否设置了时间限制?
重试安全性操作是否是幂等的,或已采取适当的防护措施?
身份验证身份验证检查是否一致应用?
速率限制API是否可能被滥用?
序列化格式错误的负载是否会导致Panic?

Concurrency (threads, async, channels, locks)

并发(线程、异步、通道、锁)

CheckQuestion
Deadlock potentialCan lock acquisition order cause deadlock?
Data racesIs shared mutable state properly synchronized?
StarvationCan any task be indefinitely blocked?
CancellationAre cancellation/shutdown paths clean?
Resource leaksAre spawned tasks/threads joined or detached properly?
Panic propagationDo panics in tasks crash the whole system?
检查项问题
死锁可能性锁的获取顺序是否会导致死锁?
数据竞争共享可变状态是否已正确同步?
饥饿问题是否存在任务被无限阻塞的情况?
取消处理取消/关闭路径是否干净?
资源泄漏生成的任务/线程是否已正确 join 或 detach?
Panic传播任务中的Panic是否会导致整个系统崩溃?

UI & Presentation (views, components, templates)

UI与展示层(视图、组件、模板)

CheckQuestion
State consistencyCan UI show stale or inconsistent state?
Error statesAre all error conditions rendered appropriately?
Loading statesAre async operations properly indicated?
AccessibilityAre interactions keyboard/screen-reader accessible?
Memory leaksAre subscriptions/observers cleaned up?
Re-render efficiencyAre unnecessary re-renders avoided?
检查项问题
状态一致性UI是否会显示过时或不一致的状态?
错误状态所有错误场景是否都已适当呈现?
加载状态异步操作是否有适当的加载提示?
可访问性交互是否支持键盘/屏幕阅读器访问?
内存泄漏订阅/观察者是否已清理?
重渲染效率是否避免了不必要的重渲染?

Data Processing (parsers, transformers, validators)

数据处理(解析器、转换器、验证器)

CheckQuestion
Edge casesAre empty, null, and boundary values handled?
Type coercionAre implicit conversions safe?
Overflow/underflowAre numeric operations bounded?
EncodingIs text encoding handled consistently (UTF-8)?
InjectionCan untrusted input escape its context?
InvariantsAre data invariants enforced and documented?
检查项问题
边界情况是否处理了空值、Null值和边界值?
类型转换隐式类型转换是否安全?
溢出/下溢数值操作是否设置了边界?
编码文本编码是否一致处理(UTF-8)?
注入风险不可信输入是否可能突破其上下文限制?
不变量数据不变量是否已被强制执行并记录?

Configuration & Setup (config files, environment, initialization)

配置与初始化(配置文件、环境、初始化过程)

CheckQuestion
DefaultsAre defaults safe and documented?
ValidationAre invalid configs rejected early with clear errors?
SecretsAre secrets handled securely (not logged, not in VCS)?
Hot reloadIf supported, is reload atomic and safe?
CompatibilityAre breaking changes versioned or migrated?

检查项问题
默认值默认值是否安全且已记录?
验证无效配置是否会被提前拒绝并给出清晰的错误信息?
敏感信息敏感信息是否已被安全处理(不记录日志、不存入版本控制系统)?
热重载如果支持热重载,重载操作是否是原子性且安全的?
兼容性破坏性变更是否已做版本标记或迁移处理?

Severity Classification

严重程度分类

Classify every finding. Assume user will fix all issues soon.
SeverityCriteriaExamples
CriticalData loss, security vulnerability, crash in productionUnhandled panic, SQL injection, file corruption
HighIncorrect behavior users will noticeWrong calculation, race causing wrong UI state, timeout too short
MediumTechnical debt that causes confusion or future bugsStale docs, misleading names, redundant code paths
LowCosmetic or minor improvementsUnused parameter, suboptimal algorithm (works correctly)

对每个分析结果进行分类。假设用户会尽快修复所有问题。
严重程度判定标准示例
Critical(关键)数据丢失、安全漏洞、生产环境崩溃未处理的Panic、SQL注入、文件损坏
High(高)用户会注意到的错误行为计算错误、竞态条件导致UI状态错误、超时时间过短
Medium(中)会导致困惑或未来Bug的技术债务过时文档、误导性命名、冗余代码路径
Low(低)外观性或次要改进未使用的参数、次优算法(功能正常)

Finding Format

分析结果格式

Every finding must follow this structure:
markdown
undefined
每个分析结果必须遵循以下结构:
markdown
undefined

[SUBSYSTEM] Finding N: Brief Title

[SUBSYSTEM] Finding N: Brief Title

Severity: Critical | High | Medium | Low Type: Bug | Race condition | Security | Stale docs | Dead code | Design flaw Location:
file.rs:line_range
or
file.rs:function_name
Problem: What's wrong and why it matters. Be specific.
Evidence: Code snippet or reasoning demonstrating the issue.
Recommendation: Specific fix. Include code if helpful.

---
Severity: Critical | High | Medium | Low Type: Bug | Race condition | Security | Stale docs | Dead code | Design flaw Location:
file.rs:line_range
or
file.rs:function_name
Problem: What's wrong and why it matters. Be specific.
Evidence: Code snippet or reasoning demonstrating the issue.
Recommendation: Specific fix. Include code if helpful.

---

Output Structure

输出结构

Adapt output to project organization. Common patterns:
根据项目的组织结构调整输出格式。常见模式:

Pattern A: Audit Directory (recommended for 5+ subsystems)

模式A:审计目录(推荐用于5个及以上子系统)

.claude/docs/audit/
├── 00-analysis-plan.md      # Subsystem table, priorities, methodology
├── 01-subsystem-name.md     # Individual analysis
├── 02-another-subsystem.md
└── SUMMARY.md               # Consolidated findings, action items
.claude/docs/audit/
├── 00-analysis-plan.md      # Subsystem table, priorities, methodology
├── 01-subsystem-name.md     # Individual analysis
├── 02-another-subsystem.md
└── SUMMARY.md               # Consolidated findings, action items

Pattern B: Single Document (for smaller systems)

模式B:单一文档(适用于小型系统)

.claude/docs/audit/system-name-audit.md
.claude/docs/audit/system-name-audit.md

Contains: plan, all findings, summary

Contains: plan, all findings, summary

undefined
undefined

Pattern C: Inline with Existing Docs

模式C:嵌入现有文档

If project has existing
docs/
or similar, place audit artifacts there.
Always create a summary with:
  • Total findings by severity
  • Top 5 most critical issues
  • Recommended fix order

如果项目已有
docs/
或类似目录,将审计产物放在该目录下。
务必生成一份摘要,包含:
  • 按严重程度统计的总问题数
  • 前5个最关键的问题
  • 建议的修复顺序

Session Management

会话管理

For thorough analysis:
  • Small systems (<1000 LOC, <3 subsystems): Single session acceptable
  • Medium systems (1000-5000 LOC, 3-7 subsystems): Clear context between phases
  • Large systems (>5000 LOC, >7 subsystems): Separate sessions per subsystem
When clearing context, document progress in the analysis plan file so the next session can continue.

为确保分析彻底:
  • 小型系统(<1000行代码,<3个子系统):可在单个会话内完成
  • 中型系统(1000-5000行代码,3-7个子系统):在不同阶段之间清理上下文
  • 大型系统(>5000行代码,>7个子系统):为每个子系统单独创建会话
清理上下文时,需在分析计划文件中记录进度,以便下一个会话可以继续。

Pre-Analysis: Known Issues Sweep

预分析:已知问题排查

Before deep analysis, scan for documented issues:
  1. Check CLAUDE.md / README for "gotchas" or "known issues"
  2. Search for TODO/FIXME/HACK comments
  3. Review recent commits for bug fixes (may indicate fragile areas)
  4. Check issue tracker if accessible
Add these as starting hypotheses—verify or refute during analysis.

在深度分析之前,先排查已记录的问题:
  1. 检查CLAUDE.md / README中的“注意事项”或“已知问题”
  2. 搜索TODO/FIXME/HACK注释
  3. 查看最近的提交中的Bug修复记录(可能指示系统的脆弱区域)
  4. 若可访问,检查问题追踪系统
将这些作为初始假设,在分析过程中验证或推翻。

Anti-Patterns to Avoid

需避免的反模式

Anti-PatternWhy It's BadInstead
Skimming codeMisses subtle bugsRead every line in scope
Assuming correctnessAgent code often has edge case bugsVerify each code path
Vague findings"This looks wrong" isn't actionableCite specific lines, explain why
Over-scopingAnalysis paralysisStrict subsystem boundaries
Ignoring testsTests reveal assumptionsRead tests to understand intent

反模式危害正确做法
略读代码会遗漏细微的Bug阅读范围内的每一行代码
假设代码正确Agent编写的代码通常存在边界情况Bug验证每一条代码路径
模糊的分析结果“这看起来有问题”不具备可操作性引用具体代码行,解释问题原因
范围过度扩大导致分析停滞严格遵守子系统边界
忽略测试测试会揭示代码的假设前提阅读测试以理解代码意图

Completion Criteria

完成标准

Analysis is complete when:
  1. ✅ All high-priority subsystems analyzed
  2. ✅ Every finding has severity, location, and recommendation
  3. ✅ Summary document exists with prioritized action items
  4. ✅ No "TBD" or "needs investigation" items remain
  5. ✅ Cross-references between related findings added
当满足以下条件时,分析完成:
  1. ✅ 所有高优先级子系统已完成分析
  2. ✅ 每个分析结果都包含严重程度、位置和修复建议
  3. ✅ 已生成包含优先级修复项的摘要文档
  4. ✅ 没有待确定(TBD)或需要进一步调查的项
  5. ✅ 已添加相关分析结果之间的交叉引用