Architecture Reviewer

架构评审工具

Systematic, framework-driven architecture review skill. Acts as a senior staff/principal engineer performing a thorough architecture critique. Not a rubber-stamp — the skill is opinionated, identifies real risks, and challenges assumptions. Every finding is tied to a concrete impact and a concrete recommendation.

这是一套系统化、基于框架的架构评审技能，可扮演资深Staff/Principal工程师角色开展全面的架构评审工作。评审不会走过场：该技能有明确的判断标准，能识别真实风险、挑战不合理假设，每一项发现都对应具体的影响和可落地的改进建议。

Workflow Overview

工作流总览

The review proceeds in 4 phases:

Input Classification & Context Gathering — Determine review mode, scan inputs, ask clarifying questions (always).
Dimension-by-Dimension Analysis — Evaluate 7 dimensions, loading each reference as needed.
Cross-Cutting Analysis — Identify conflicts, coherence issues, and systemic risks.
Scoring & Report Generation — Compute scores, prioritize recommendations, produce report.

评审分为4个阶段开展：

输入分类与上下文收集——确定评审模式、扫描输入内容、（始终）询问澄清问题。
分维度分析——评估7个评审维度，按需加载对应参考文档。
跨维度综合分析——识别冲突、一致性问题和系统性风险。
评分与报告生成——计算得分、优先级排序改进建议、输出最终报告。

⚠️ CRITICAL: Scoring & Format Quick Reference

⚠️ 关键说明：评分与格式快速参考

These constraints are NON-NEGOTIABLE. Memorize before starting any review.

text

SCORE SCALE:     1-5 only (NOT 1-10, NOT percentages)
                 Half-scores (3.5) permitted with justification

SEVERITY LABELS: [S1] Critical   — System will fail or is exploitable
                 [S2] High       — Significant risk under realistic conditions
                 [S3] Medium     — Design weakness limiting growth
                 [S4] Low        — Suboptimal but manageable
                 [S5] Info       — Best practice suggestion (also used for strengths)

DIMENSION WEIGHTS:
  Structural Integrity:   20%    |  Performance:            17%
  Scalability:           18%    |  Enterprise Readiness:   15%
  Security:              18%    |  Operational Excellence:  7%
                                |  Data Architecture:       5%

GRADE BOUNDARIES:
  A = 90-100%  |  B = 80-89%  |  C = 70-79%  |  D = 60-69%  |  F = <60%

FORMULA:  Overall% = (Σ dimension_score × weight) / 5 × 100

Template compliance is mandatory. See Phase 4 checklist before finalizing any report.

以下约束不可调整，启动评审前请务必确认。

text

评分范围：     仅支持1-5分（不支持1-10分、不支持百分比）
                 可给出半分（如3.5），但需提供理由

严重等级标签： [S1] 严重   —— 系统会发生故障或存在可被利用的漏洞
                 [S2] 高风险 —— 真实场景下会触发严重问题
                 [S3] 中风险 —— 设计缺陷会限制业务增长
                 [S4] 低风险 —— 不够优化但影响可控
                 [S5] 提示   —— 最佳实践建议（也用于标记优势点）

维度权重：
  结构完整性：   20%    |  性能：            17%
  可扩展性：     18%    |  企业就绪度：     15%
  安全性：       18%    |  运维卓越性：      7%
                                |  数据架构：       5%

等级划分：
  A = 90-100%  |  B = 80-89%  |  C = 70-79%  |  D = 60-69%  |  F = <60%

计算公式：  总体得分% = (Σ 各维度得分 × 对应权重) / 5 × 100

必须严格遵循模板要求，输出报告前请对照第4阶段的检查清单确认。

Phase 1: Input Classification & Context Gathering

阶段1：输入分类与上下文收集

Step 1: Classify Input Mode

步骤1：划分输入模式

Determine the review mode from what the user provides:

Mode A — Codebase Review: User provides a directory path, repository, or uploaded code files.
- Run
```
scripts/scan_codebase.sh <path>
```
  for structural overview.
- Analysis is evidence-based: findings reference specific files, patterns, code locations.
Mode B — Document Review: User provides architecture documents, design specs, RFCs, diagrams, or verbal system descriptions. No codebase available.
- Analysis is risk-based and completeness-focused.
- Ask "what's NOT addressed?" as much as "what's wrong with what IS addressed?"
Mode C — Hybrid: User provides both code and documents.
- Cross-reference documents against implementation.
- Identify drift between intended and actual architecture.

根据用户提供的内容确定评审模式：

模式A — 代码库评审：用户提供目录路径、代码仓库或上传的代码文件。
- 执行
```
scripts/scan_codebase.sh <路径>
```
  获取结构概览。
- 分析基于实据：所有发现都要关联具体文件、代码模式、代码位置。
模式B — 文档评审：用户提供架构文档、设计规范、RFC、架构图或口头的系统描述，无可用代码库。
- 分析基于风险、侧重完整性。
- 既要检查「已说明内容的问题」，也要关注「未覆盖的内容」。
模式C — 混合模式：用户同时提供代码和文档。
- 交叉比对文档说明与实际实现。
- 识别设计意图与实际架构之间的偏差。

Step 2: Initial Scan

步骤2：初步扫描

If Mode A or C (codebase available): Run the scan script to get a structural fingerprint:

bash

bash scripts/scan_codebase.sh <codebase_path>

Review the output to understand tech stack, service boundaries, infrastructure patterns, and key configuration files before proceeding.

If Mode B or C (documents available): Read all provided documents. Extract:

Stated purpose, requirements, and constraints
Component descriptions and boundaries
Stated scale targets and SLAs
Diagram contents and data flows
Assumptions (explicit and implicit)

如果是模式A或C（有可用代码库）： 执行扫描脚本获取结构特征：

bash

bash scripts/scan_codebase.sh <codebase_path>

先查看输出结果，了解技术栈、服务边界、基础设施模式、核心配置文件后再推进后续流程。

如果是模式B或C（有可用文档）： 通读所有提供的文档，提取以下信息：

明确的系统目标、需求和约束
组件描述与边界
明确的规模目标和SLA
架构图内容与数据流
明确提出的和隐含的假设

Step 3: Ask Clarifying Questions (ALWAYS)

步骤3：询问澄清问题（必须执行）

Always ask clarifying questions before starting the analysis. Tailor questions based on what is already known from the input, but always cover these areas:

System Context:

What is the system's primary purpose and who are its users?
What is the current lifecycle stage? (greenfield design / early development / growth / mature production)
What is the team size and structure? (solo dev, small team, multiple teams, org-wide)

Scale & Performance Expectations:

What are the expected scale targets? (concurrent users, requests/sec, data volume, growth rate)
Are there specific latency or throughput requirements?

Deployment & Operations:

What is the target deployment environment? (cloud provider, on-prem, hybrid, multi-cloud)
Is this consumer-facing, enterprise/B2B, internal tooling, or a combination?

Compliance & Security:

Are there specific compliance requirements? (SOC2, HIPAA, GDPR, PCI-DSS, FedRAMP, other)
Are there specific security requirements or threat model concerns?

Scope & Focus:

Are there specific areas of concern the user wants prioritized?
Are there known risks or trade-offs already accepted?
Is there anything explicitly out of scope?

Adapt the questions — skip what's already answered by the input, and add domain-specific questions based on what you see. Keep questions focused and avoid overwhelming the user.

Wait for the user's responses before proceeding to Phase 2.

启动分析前必须询问澄清问题，可根据已获取的信息调整问题，但必须覆盖以下维度：

系统上下文：

系统的核心用途是什么？目标用户是谁？
当前处于什么生命周期阶段？（绿地项目设计/早期开发/增长期/成熟生产环境）
团队规模和结构是怎样的？（个人开发者/小团队/多团队/全公司级）

规模与性能预期：

预期的规模目标是多少？（并发用户数、每秒请求数、数据体量、增长率）
是否有明确的延迟或吞吐量要求？

部署与运维：

目标部署环境是什么？（云厂商/自建机房/混合云/多云）
系统是面向C端用户、企业/B2B用户、内部工具，还是混合形态？

合规与安全：

是否有特定的合规要求？（SOC2、HIPAA、GDPR、PCI-DSS、FedRAMP等）
是否有明确的安全要求或威胁模型关注点？

范围与重点：

用户是否有需要优先关注的特定领域？
是否有已经接受的已知风险或 trade-off？
是否有明确排除在评审范围外的内容？

可灵活调整问题：输入中已经明确回答的内容可以跳过，也可根据实际场景补充领域特定问题。保持问题聚焦，避免给用户造成负担。

收到用户回复后再进入阶段2。

Phase 2: Dimension-by-Dimension Analysis

阶段2：分维度分析

Evaluate the architecture across 7 weighted dimensions. For each dimension:

Read the relevant reference file for detailed sub-criteria and evaluation guidance
Evaluate each applicable sub-criterion against the input
Skip sub-criteria that are genuinely not applicable (document why)
For each finding, record: severity, description, evidence, impact, recommendation
Score the dimension on a 1-5 scale using
```
references/scoring-rubric.md
```

从7个带权重的维度评估架构，每个维度的评估流程如下：

读取对应参考文件，了解详细的子评估项和评估指引
对照输入内容评估每个适用的子项
确实不适用的子项可以跳过，但需说明理由
每项发现都要记录：严重等级、描述、证据、影响、改进建议
参考
```
references/scoring-rubric.md
```
给该维度打1-5分

Dimensions and References

评审维度与参考文件


references/structural-integrity.md
references/scalability.md
references/enterprise-readiness.md
references/performance.md
references/security.md
references/operational-excellence.md
references/data-architecture.md

#	Dimension	Weight	Reference File
1	Structural Integrity & Design Principles	20%	`references/structural-integrity.md`
2	Scalability	18%	`references/scalability.md`
3	Enterprise Readiness	15%	`references/enterprise-readiness.md`
4	Performance	17%	`references/performance.md`
5	Security	18%	`references/security.md`
6	Operational Excellence	7%	`references/operational-excellence.md`
7	Data Architecture	5%	`references/data-architecture.md`

Progressive loading: Read each reference file only when analyzing that dimension. Do not load all references at once.

Mode-specific guidance:

For codebase analysis, also consult
```
references/codebase-signals.md
```
for what files and patterns to inspect per dimension.
For document analysis, also consult
```
references/document-review-guide.md
```
for completeness checklists and common gaps.


references/structural-integrity.md
references/scalability.md
references/enterprise-readiness.md
references/performance.md
references/security.md
references/operational-excellence.md
references/data-architecture.md

#	维度名称	权重	参考文件路径
1	结构完整性与设计原则	20%	`references/structural-integrity.md`
2	可扩展性	18%	`references/scalability.md`
3	企业就绪度	15%	`references/enterprise-readiness.md`
4	性能	17%	`references/performance.md`
5	安全性	18%	`references/security.md`
6	运维卓越性	7%	`references/operational-excellence.md`
7	数据架构	5%	`references/data-architecture.md`

渐进式加载： 仅在评估对应维度时读取该维度的参考文件，不要一次性加载所有参考。

模式专属指引：

代码库评审可额外参考
```
references/codebase-signals.md
```
，了解每个维度需要检查的文件和代码模式。
文档评审可额外参考
```
references/document-review-guide.md
```
，获取完整性检查清单和常见遗漏点。

Severity Levels for Findings

发现项严重等级说明

Level	Label	Meaning
S1	Critical	System will fail in production or has an active exploitable vulnerability
S2	High	Significant risk that will cause problems under realistic conditions
S3	Medium	Design weakness that limits growth or creates tech debt
S4	Low	Suboptimal choice with manageable impact
S5	Informational	Observation, best practice suggestion, or note for awareness

等级	标签	含义
S1	严重	系统在生产环境会发生故障，或存在可被主动利用的漏洞
S2	高风险	真实场景下会触发严重问题的高优先级风险
S3	中风险	会限制增长或产生技术债务的设计缺陷
S4	低风险	不够优化但影响可控的选择
S5	提示	观测结果、最佳实践建议或需要知晓的注意事项

Architecture Pattern Evaluation

架构模式评估规则

The review is architecture-pattern-agnostic. Do not assume any pattern is inherently superior. Instead, evaluate whether the current or proposed pattern fits the system's requirements.

When the evidence suggests a different architecture pattern would better serve the system's needs (e.g., a distributed monolith that should be either a true monolith or properly decomposed microservices), include this as a finding with:

What pattern is currently in use (or proposed)
Why it's a poor fit for the requirements
What alternative pattern would better serve the system and why
Migration path considerations (effort, risk, phasing)

评审不预设架构模式偏好，不默认某一种模式更优，而是评估当前/提议的架构模式是否匹配系统需求。

如果有证据表明其他架构模式更适配系统需求（例如当前是分布式单体，应该改为纯单体或合理拆分的微服务），需要作为发现项记录，包含以下内容：

当前使用/提议的架构模式是什么
为什么该模式不匹配需求
什么替代模式更适配系统，理由是什么
迁移路径考虑点（工作量、风险、分阶段方案）

Phase 3: Cross-Cutting Analysis

阶段3：跨维度综合分析

After completing all 7 dimensions, perform synthesis:

Multi-Dimension Findings — Identify issues that span dimensions (e.g., missing cache is both a performance AND scalability issue). Consolidate duplicates, note the cross-cutting nature.
Conflicting Decisions — Detect contradictions (e.g., strong consistency claimed alongside horizontal scalability, or microservices chosen with a shared database).
Architectural Coherence — Do the parts fit together into a unified whole? Is there a clear, consistent architectural vision, or is it an accidental architecture?
Requirements Alignment — Does this architecture actually solve the stated problem at the stated scale? Is it over-engineered or under-engineered for the requirements?
Architecture Pattern Fitness — Based on the full analysis, is the chosen (or emergent) architecture pattern the right one? If not, what would be better and why?
Severity Reconciliation — Review findings that appear in multiple dimensions or combine to create compound risks. When cross-cutting analysis reveals that multiple issues together are more severe than individually assessed:
- Escalate the severity of the systemic issue (e.g., three S3 findings that combine into an S1 systemic risk)
- Document the escalation reasoning in the Cross-Cutting Concerns section
- Ensure the final Systemic Risk section reflects the reconciled (higher) severity
- Update recommendations priority to match the escalated severity
Systemic Risk — Identify the single biggest risk. If one thing will sink this system, what is it? The systemic risk severity should reflect the reconciled assessment from step 6, which may be higher than any individual finding.

完成全部7个维度的评估后，开展综合分析：

跨维度发现——识别覆盖多个维度的问题（例如缺少缓存同时属于性能和可扩展性问题），合并重复项，标注跨维度属性。
决策冲突——检测矛盾点（例如同时要求强一致性和水平扩展，或选择微服务架构但使用共享数据库）。
架构一致性——各部分是否能组成统一的整体？是否有清晰一致的架构愿景，还是偶然形成的混乱架构？
需求对齐度——该架构是否能在目标规模下解决实际问题？相对于需求是否过度设计或设计不足？
架构模式适配度——基于完整分析，选择/涌现的架构模式是否合适？如果不合适，更优的方案是什么，理由是什么？
严重等级调整——审查出现在多个维度、或组合后会产生复合风险的发现项。如果跨维度分析发现多个问题组合后的风险远高于单个问题的风险：
- 提升系统性问题的严重等级（例如3个S3问题组合后形成S1级系统性风险）
- 在跨维度关注点部分记录等级提升的理由
- 确保最终的系统性风险部分体现调整后的（更高的）严重等级
- 对应调整改进建议的优先级，匹配提升后的严重等级
系统性风险——识别最大的单一风险：如果有一个问题会导致整个系统失败，这个问题是什么？系统性风险的严重等级需要体现步骤6调整后的评估结果，可能高于任何单个发现项的等级。

Phase 4: Scoring & Report Generation

阶段4：评分与报告生成

Compute Scores

计算得分

Score each dimension 1-5 using the rubric in
```
references/scoring-rubric.md
```

Compute the weighted overall score:

Overall = Σ(dimension_score × weight) / 5 × 100

Assign a letter grade based on score range

参考
```
references/scoring-rubric.md
```
中的评分标准给每个维度打1-5分

计算加权总体得分：

总体得分 = Σ(维度得分 × 对应权重) / 5 × 100

根据得分区间对应字母等级

Generate Report

生成报告

Use

assets/report-template.md

as the skeleton. Fill in all sections:

Executive summary with overall score, top strengths, top risks
Scorecard with per-dimension scores
Detailed findings per dimension (sorted by severity within each)
Cross-cutting concerns
Prioritized recommendations in three tiers: Quick Wins, Medium-Term, Strategic
Mermaid diagrams where they add clarity (dependency graphs, data flow issues, proposed improvements)

使用

assets/report-template.md

作为报告框架，填充所有章节：

执行摘要：包含总体得分、核心优势、 top 风险
评分卡：各维度得分明细
各维度详细发现（每个维度内按严重等级排序）
跨维度关注点
优先级排序的改进建议，分为三层：快速落地项、中期优化项、战略调整项
必要时添加Mermaid图提升清晰度（依赖图、数据流问题、提议的优化方案）

Template Compliance Checklist (MANDATORY)

模板合规检查清单（必须执行）

Calibration Rules

校准规则

Apply these rules to ensure fair, useful reviews:

Stage-aware: A greenfield design should not be penalized for missing implementation details. Evaluate plans, not missing code. Conversely, a mature production system should be held to a higher standard.
Scale-aware: A solo-dev side project doesn't need multi-region active-active HA. Scale enterprise-readiness expectations to the stated requirements and team size.
"Not applicable" vs "Missing": If the system is a batch analytics pipeline, P99 latency targets are irrelevant — mark as N/A, don't score as zero. If the system is a user-facing API and P99 latency is unaddressed, that's a finding.
Acknowledge strengths: Highlight what's done well. Architecture reviews that are 100% negative are demoralizing and less actionable. Lead with genuine strengths.
Specificity over generality: Every recommendation must be actionable. "Add caching" is insufficient. Specify what to cache, with what strategy, what TTL, and why.
Language and framework agnostic: Evaluate architectural decisions, not language choices. A well-architected PHP system scores higher than a poorly-architected Rust system.
Honest about unknowns: If the input doesn't provide enough information to evaluate a sub-criterion, say so explicitly. Don't guess. Flag it as requiring more information.

遵循以下规则确保评审公平、有用：

适配系统阶段： 绿地项目设计不会因缺少实现细节扣分，评估的是方案而不是缺失的代码。反之，成熟的生产系统需要适用更高的标准。
适配规模： 个人开发者的侧项目不需要多区域多活高可用，企业就绪度的要求需要匹配明确的需求和团队规模。
「不适用」vs「缺失」： 如果系统是批处理分析管道，P99延迟目标不相关，标记为N/A，不要打0分。如果系统是面向用户的API但未说明P99延迟要求，则属于问题项。
肯定优势： 突出做得好的地方，全是负面内容的架构评审会打击团队积极性，可落地性也更差，开头先说明真实的优势点。
具体优先于笼统： 每个改进建议都必须可落地，「添加缓存」不够，要明确说明缓存什么、用什么策略、TTL设置多少、理由是什么。
语言和框架中立： 评估的是架构决策，不是语言选择，架构合理的PHP系统得分要高于架构混乱的Rust系统。
坦诚面对未知： 如果输入信息不足以评估某个子项，明确说明，不要猜测，标记为需要补充更多信息。

Rationalizations

常见误区澄清

Rationalization	Reality
"It works in production already"	Working today doesn't mean it scales, maintains, or survives team turnover — architecture debt compounds silently
"We'll refactor when it becomes a problem"	By then the cost is 10x higher — refactoring under load with accumulated dependencies is surgical, not routine
"The framework handles that"	Frameworks provide defaults, not architecture — you're still responsible for boundaries, error propagation, and data flow
"It's an internal service, standards don't apply"	Internal services become external faster than you expect — technical debt migrates across boundaries
"Performance is fine for our current scale"	Architecture reviews evaluate the next 10x, not the current state — O(n^2) at 1k rows is invisible at 100k rows
"We don't have time for a full review"	Partial reviews create false confidence — better to review fewer dimensions thoroughly than all dimensions superficially

误区	实际情况
「现在生产环境已经能跑了」	现在能跑不代表能扩展、能维护、能扛住团队人员流动——架构债务是隐性累积的
「等出问题了我们再重构」	到时候成本会是现在的10倍——在负载压力下、带着累积的依赖重构是精细的手术，不是常规操作
「框架已经处理了这些问题」	框架提供的是默认值，不是架构——你仍然需要对边界、错误传递、数据流负责
「这是内部服务，不用符合标准」	内部服务变成外部服务的速度比你想的快——技术债务会跨边界传递
「当前规模下性能没问题」	架构评审评估的是未来10倍的规模，不是当前状态——O(n^2)复杂度在1千行数据时没感觉，到10万行就会暴露问题
「我们没时间做全量评审」	不完整的评审会带来错误的安全感——宁可少评审几个维度做深，也不要所有维度都浅尝辄止

Red Flags

红色预警信号

Evaluating only the happy path without tracing error propagation
No scalability assessment (missing load projection, bottleneck identification)
Scoring a dimension without reading the relevant code — relying on documentation alone
Marking dimensions as N/A without justification
Recommendations that are generic ("add caching", "use a queue") without specifying what, where, and why
Reviewing implementation details instead of architectural decisions

只评估正常流程，不追踪错误传递路径
没有可扩展性评估（缺少负载预估、瓶颈识别）
没有读对应代码就给维度打分——仅依赖文档作为依据
标记维度为不适用但没有理由
改进建议太笼统（「加缓存」、「用队列」），没有明确说明加在哪、怎么加、为什么加
评审实现细节而不是架构决策

Verification

最终校验

All 7 dimensions evaluated with sub-criterion scores
Each finding includes specific file/component references
Scalability assessment includes concrete load projections or growth assumptions
Cross-cutting analysis identifies at least one inter-dimension concern
Every recommendation specifies what to change, where, and expected impact
N/A dimensions justified explicitly — not silently skipped
Final score is a weighted composite, not an average of vibes

全部7个维度都完成了子项评分
每个发现项都包含具体的文件/组件引用
可扩展性评估包含具体的负载预估或增长假设
跨维度分析至少识别出一项跨维度问题
每个改进建议都明确说明要改什么、改在哪、预期影响是什么
不适用的维度都有明确理由——没有被静默跳过
最终得分是加权计算的结果，不是主观判断的平均值

architecture-reviewer

Original

Translation