exhaustive-systems-analysis

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Exhaustive Systems Analysis

全面系统分析

Systematic audit methodology for rooting out latent issues in codebases, particularly agent-written code that needs verification before production use.

这是一种系统性的审计方法，用于排查代码库中的潜在问题，尤其是Agent编写的、需要验证是否可用于生产环境的代码。

Core Principles

核心原则

Subsystem isolation — Analyze each subsystem separately to prevent context pollution
Evidence-based findings — Every issue must cite specific code locations
Severity-driven prioritization — Critical issues first, cosmetic issues last
Assume all issues will be fixed — Don't hedge; be direct about what's wrong

子系统隔离 — 单独分析每个子系统，以避免上下文干扰
基于证据的分析结果 — 每个问题都必须引用具体的代码位置
基于严重程度的优先级排序 — 优先处理关键问题，最后处理外观性次要问题
假设所有问题都会被修复 — 不要含糊其辞，直接指出问题所在

Workflow

工作流程

Phase 1: System Decomposition

第一阶段：系统分解

Before analysis, map the system's subsystems. Auto-discover by:

Read project structure — Identify major modules, packages, or directories
Trace data flow — Follow how data enters, transforms, and exits
Identify side effects — File I/O, network, database, IPC, state mutations
Map dependencies — Which subsystems depend on which

Output a subsystem table:

markdown

| # | Subsystem | Files | Side Effects | Priority |
|---|-----------|-------|--------------|----------|
| 1 | Lock System | lock.rs | FS: mkdir, rm | High |
| 2 | API Layer | api/*.rs | Network, DB | High |
| 3 | Config Parser | config.rs | FS: read | Medium |

Priority heuristics:

High: Side effects, state management, security, concurrency
Medium: Business logic, data transformation, validation
Low: Pure utilities, formatting, logging

在分析之前，先梳理系统的子系统。可通过以下方式自动发现：

读取项目结构 — 识别主要模块、包或目录
追踪数据流 — 跟踪数据的输入、转换和输出过程
识别副作用 — 文件I/O、网络、数据库、IPC、状态变更
梳理依赖关系 — 明确子系统之间的依赖关系

输出子系统表格：

markdown

| # | Subsystem | Files | Side Effects | Priority |
|---|-----------|-------|--------------|----------|
| 1 | Lock System | lock.rs | FS: mkdir, rm | High |
| 2 | API Layer | api/*.rs | Network, DB | High |
| 3 | Config Parser | config.rs | FS: read | Medium |

优先级判定规则：

High：副作用、状态管理、安全性、并发
Medium：业务逻辑、数据转换、验证
Low：纯工具类代码、格式、日志

Phase 2: Sequential Analysis

第二阶段：顺序分析

Analyze subsystems in priority order. For large codebases (>5 subsystems or >3000 LOC per subsystem), prefer clearing context between subsystems to prevent analysis drift.

For each subsystem, apply the appropriate checklist based on subsystem type.

按照优先级顺序分析子系统。对于大型代码库（超过5个子系统或每个子系统超过3000行代码），建议在分析不同子系统之间清理上下文，以避免分析偏差。

针对每个子系统，根据子系统类型应用相应的检查清单。

Phase 3: Consolidation

第三阶段：整合分析结果

After all subsystems analyzed:

Deduplicate cross-cutting findings
Rank all issues by severity
Produce final report with recommended action order

完成所有子系统的分析后：

去重跨子系统的分析结果
按严重程度对所有问题排序
生成包含建议修复顺序的最终报告

Analysis Checklists

分析检查清单

Select checklist based on subsystem characteristics. Apply multiple if applicable.

根据子系统的特性选择相应的检查清单。若适用，可同时应用多个清单。

Stateful Systems (files, databases, caches, locks)

有状态系统（文件、数据库、缓存、锁）

Check	Question
Correctness	Does code do what documentation claims?
Atomicity	Can partial writes corrupt state?
Race conditions	Can concurrent access cause inconsistency?
Cleanup	Are resources released on all exit paths (success, error, panic)?
Error recovery	Do failures leave the system in a valid state?
Stale documentation	Do comments match actual behavior?
Dead code	Are there unused code paths that could confuse maintainers?

检查项	问题
正确性	代码是否符合文档描述的功能？
原子性	部分写入操作是否会破坏状态？
竞态条件	并发访问是否会导致不一致？
资源清理	是否在所有退出路径（成功、错误、Panic）下都释放了资源？
错误恢复	发生故障后系统是否仍处于有效状态？
过时文档	注释是否与实际行为一致？
死代码	是否存在会让维护人员困惑的未使用代码路径？

APIs & Network (HTTP, gRPC, WebSocket, IPC)

API与网络（HTTP、gRPC、WebSocket、IPC）

Check	Question
Input validation	Are all inputs validated before use?
Error responses	Do errors leak internal details?
Timeout handling	Are network operations bounded?
Retry safety	Are operations idempotent or properly guarded?
Authentication	Are auth checks applied consistently?
Rate limiting	Can the API be abused?
Serialization	Can malformed payloads cause panics?

检查项	问题
输入验证	所有输入是否在使用前经过验证？
错误响应	错误信息是否会泄露内部细节？
超时处理	网络操作是否设置了时间限制？
重试安全性	操作是否是幂等的，或已采取适当的防护措施？
身份验证	身份验证检查是否一致应用？
速率限制	API是否可能被滥用？
序列化	格式错误的负载是否会导致Panic？

Concurrency (threads, async, channels, locks)

并发（线程、异步、通道、锁）

Check	Question
Deadlock potential	Can lock acquisition order cause deadlock?
Data races	Is shared mutable state properly synchronized?
Starvation	Can any task be indefinitely blocked?
Cancellation	Are cancellation/shutdown paths clean?
Resource leaks	Are spawned tasks/threads joined or detached properly?
Panic propagation	Do panics in tasks crash the whole system?

检查项	问题
死锁可能性	锁的获取顺序是否会导致死锁？
数据竞争	共享可变状态是否已正确同步？
饥饿问题	是否存在任务被无限阻塞的情况？
取消处理	取消/关闭路径是否干净？
资源泄漏	生成的任务/线程是否已正确 join 或 detach？
Panic传播	任务中的Panic是否会导致整个系统崩溃？

UI & Presentation (views, components, templates)

UI与展示层（视图、组件、模板）

Check	Question
State consistency	Can UI show stale or inconsistent state?
Error states	Are all error conditions rendered appropriately?
Loading states	Are async operations properly indicated?
Accessibility	Are interactions keyboard/screen-reader accessible?
Memory leaks	Are subscriptions/observers cleaned up?
Re-render efficiency	Are unnecessary re-renders avoided?

检查项	问题
状态一致性	UI是否会显示过时或不一致的状态？
错误状态	所有错误场景是否都已适当呈现？
加载状态	异步操作是否有适当的加载提示？
可访问性	交互是否支持键盘/屏幕阅读器访问？
内存泄漏	订阅/观察者是否已清理？
重渲染效率	是否避免了不必要的重渲染？

Data Processing (parsers, transformers, validators)

数据处理（解析器、转换器、验证器）

Check	Question
Edge cases	Are empty, null, and boundary values handled?
Type coercion	Are implicit conversions safe?
Overflow/underflow	Are numeric operations bounded?
Encoding	Is text encoding handled consistently (UTF-8)?
Injection	Can untrusted input escape its context?
Invariants	Are data invariants enforced and documented?

检查项	问题
边界情况	是否处理了空值、Null值和边界值？
类型转换	隐式类型转换是否安全？
溢出/下溢	数值操作是否设置了边界？
编码	文本编码是否一致处理（UTF-8）？
注入风险	不可信输入是否可能突破其上下文限制？
不变量	数据不变量是否已被强制执行并记录？

Configuration & Setup (config files, environment, initialization)

配置与初始化（配置文件、环境、初始化过程）

Check	Question
Defaults	Are defaults safe and documented?
Validation	Are invalid configs rejected early with clear errors?
Secrets	Are secrets handled securely (not logged, not in VCS)?
Hot reload	If supported, is reload atomic and safe?
Compatibility	Are breaking changes versioned or migrated?

检查项	问题
默认值	默认值是否安全且已记录？
验证	无效配置是否会被提前拒绝并给出清晰的错误信息？
敏感信息	敏感信息是否已被安全处理（不记录日志、不存入版本控制系统）？
热重载	如果支持热重载，重载操作是否是原子性且安全的？
兼容性	破坏性变更是否已做版本标记或迁移处理？

Severity Classification

严重程度分类

Classify every finding. Assume user will fix all issues soon.

Severity	Criteria	Examples
Critical	Data loss, security vulnerability, crash in production	Unhandled panic, SQL injection, file corruption
High	Incorrect behavior users will notice	Wrong calculation, race causing wrong UI state, timeout too short
Medium	Technical debt that causes confusion or future bugs	Stale docs, misleading names, redundant code paths
Low	Cosmetic or minor improvements	Unused parameter, suboptimal algorithm (works correctly)

对每个分析结果进行分类。假设用户会尽快修复所有问题。

严重程度	判定标准	示例
Critical（关键）	数据丢失、安全漏洞、生产环境崩溃	未处理的Panic、SQL注入、文件损坏
High（高）	用户会注意到的错误行为	计算错误、竞态条件导致UI状态错误、超时时间过短
Medium（中）	会导致困惑或未来Bug的技术债务	过时文档、误导性命名、冗余代码路径
Low（低）	外观性或次要改进	未使用的参数、次优算法（功能正常）

Finding Format

分析结果格式

Every finding must follow this structure:

markdown

undefined

每个分析结果必须遵循以下结构：

markdown

undefined

[SUBSYSTEM] Finding N: Brief Title

file.rs:line_range

file.rs:function_name

Problem: What's wrong and why it matters. Be specific.

Evidence: Code snippet or reasoning demonstrating the issue.

Recommendation: Specific fix. Include code if helpful.

---

file.rs:line_range

file.rs:function_name

Problem: What's wrong and why it matters. Be specific.

Evidence: Code snippet or reasoning demonstrating the issue.

Recommendation: Specific fix. Include code if helpful.

---

Output Structure

输出结构

Adapt output to project organization. Common patterns:

根据项目的组织结构调整输出格式。常见模式：

Pattern A: Audit Directory (recommended for 5+ subsystems)

模式A：审计目录（推荐用于5个及以上子系统）

.claude/docs/audit/
├── 00-analysis-plan.md      # Subsystem table, priorities, methodology
├── 01-subsystem-name.md     # Individual analysis
├── 02-another-subsystem.md
└── SUMMARY.md               # Consolidated findings, action items

.claude/docs/audit/
├── 00-analysis-plan.md      # Subsystem table, priorities, methodology
├── 01-subsystem-name.md     # Individual analysis
├── 02-another-subsystem.md
└── SUMMARY.md               # Consolidated findings, action items

Pattern B: Single Document (for smaller systems)

模式B：单一文档（适用于小型系统）

.claude/docs/audit/system-name-audit.md

.claude/docs/audit/system-name-audit.md

Contains: plan, all findings, summary

undefined

undefined

Pattern C: Inline with Existing Docs

模式C：嵌入现有文档

If project has existing

docs/

or similar, place audit artifacts there.

Always create a summary with:

Total findings by severity
Top 5 most critical issues
Recommended fix order

如果项目已有

docs/

务必生成一份摘要，包含：

按严重程度统计的总问题数
前5个最关键的问题
建议的修复顺序

Session Management

会话管理

For thorough analysis:

Small systems (<1000 LOC, <3 subsystems): Single session acceptable
Medium systems (1000-5000 LOC, 3-7 subsystems): Clear context between phases
Large systems (>5000 LOC, >7 subsystems): Separate sessions per subsystem

When clearing context, document progress in the analysis plan file so the next session can continue.

为确保分析彻底：

小型系统（<1000行代码，<3个子系统）：可在单个会话内完成
中型系统（1000-5000行代码，3-7个子系统）：在不同阶段之间清理上下文
大型系统（>5000行代码，>7个子系统）：为每个子系统单独创建会话

清理上下文时，需在分析计划文件中记录进度，以便下一个会话可以继续。

Pre-Analysis: Known Issues Sweep

预分析：已知问题排查

Before deep analysis, scan for documented issues:

Check CLAUDE.md / README for "gotchas" or "known issues"
Search for TODO/FIXME/HACK comments
Review recent commits for bug fixes (may indicate fragile areas)
Check issue tracker if accessible

Add these as starting hypotheses—verify or refute during analysis.

在深度分析之前，先排查已记录的问题：

检查CLAUDE.md / README中的“注意事项”或“已知问题”
搜索TODO/FIXME/HACK注释
查看最近的提交中的Bug修复记录（可能指示系统的脆弱区域）
若可访问，检查问题追踪系统

将这些作为初始假设，在分析过程中验证或推翻。

Anti-Patterns to Avoid

需避免的反模式

Anti-Pattern	Why It's Bad	Instead
Skimming code	Misses subtle bugs	Read every line in scope
Assuming correctness	Agent code often has edge case bugs	Verify each code path
Vague findings	"This looks wrong" isn't actionable	Cite specific lines, explain why
Over-scoping	Analysis paralysis	Strict subsystem boundaries
Ignoring tests	Tests reveal assumptions	Read tests to understand intent

反模式	危害	正确做法
略读代码	会遗漏细微的Bug	阅读范围内的每一行代码
假设代码正确	Agent编写的代码通常存在边界情况Bug	验证每一条代码路径
模糊的分析结果	“这看起来有问题”不具备可操作性	引用具体代码行，解释问题原因
范围过度扩大	导致分析停滞	严格遵守子系统边界
忽略测试	测试会揭示代码的假设前提	阅读测试以理解代码意图

Completion Criteria

完成标准

Analysis is complete when:

✅ All high-priority subsystems analyzed
✅ Every finding has severity, location, and recommendation
✅ Summary document exists with prioritized action items
✅ No "TBD" or "needs investigation" items remain
✅ Cross-references between related findings added

当满足以下条件时，分析完成：

✅ 所有高优先级子系统已完成分析
✅ 每个分析结果都包含严重程度、位置和修复建议
✅ 已生成包含优先级修复项的摘要文档
✅ 没有待确定（TBD）或需要进一步调查的项
✅ 已添加相关分析结果之间的交叉引用