code-review

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Code Review

代码评审

Value: Feedback and communication -- structured review catches defects that the author cannot see, and separating review into stages prevents thoroughness in one area from crowding out another.

价值： 反馈与沟通——结构化评审能发现代码作者自身难以察觉的缺陷，分阶段评审可避免因在某一领域过度投入而忽略其他方面的检查。

Purpose

目的

Teaches a systematic three-stage code review that evaluates spec compliance, code quality, and domain integrity as separate passes. Prevents combined reviews from letting issues slip through by ensuring each dimension gets focused attention.

传授一套系统化的三阶代码评审方法，将规范合规性、代码质量和领域完整性作为独立环节分别评估。通过确保每个维度都得到专注检查，避免合并评审导致问题被遗漏。

Practices

实践方法

Three Stages, In Order

按顺序执行的三个阶段

Review code in three sequential stages. Do not combine them. Each stage has a single focus. A failure in an earlier stage blocks later stages -- there is no point reviewing code quality on code that does not meet the spec.

Stage 1: Spec Compliance. Does the code do what was asked? Not more, not less.

For each acceptance criterion or requirement:

Find the code that implements it
Find the test that verifies it
Confirm the implementation matches the spec exactly

Mark each criterion: PASS, FAIL (missing/incomplete/divergent), or CONCERN (implemented but potentially incorrect). Flag anything built beyond requirements as OVER-BUILT.

If any criterion is FAIL, stop. Return to implementation before continuing.

分三个连续阶段评审代码，不可合并。每个阶段仅聚焦一个核心点。若前一阶段未通过，则需终止后续阶段——对不符合规范的代码进行代码质量评审毫无意义。

阶段1：规范合规性。代码是否完全符合需求？不多做，不少做。

针对每条验收标准或需求：

找到实现该需求的代码
找到验证该需求的测试用例
确认实现与规范完全匹配

为每条标准标记：通过（PASS）、失败（FAIL，缺失/不完整/偏离）或存疑（CONCERN，已实现但可能存在错误）。将超出需求范围的实现标记为“过度实现（OVER-BUILT）”。

若有任何标准标记为失败，立即终止评审，返回修改实现。

Vertical Slice Layer Coverage

垂直切片层覆盖检查

For tasks that implement a vertical slice (adding user-observable behavior), perform the following checks in order:

Entry-point wiring check (diff-based): Examine whether the changeset includes modifications to the application's entry point or its wiring/routing layer. If the slice claims to add new user-observable behavior but the diff does not touch any wiring or entry-point code, the review fails unless the author explicitly documents why existing wiring already routes to the new behavior.
End-to-end traceability: Verify that a path can be traced from the application's external entry point, through any infrastructure or integration layer, to the new domain logic, and back to observable output. If any segment of this path is missing from the changeset and not already present in the codebase, flag the gap.
Boundary-level test coverage: Confirm that at least one test exercises the new behavior through the application's external boundary (e.g., an HTTP request, a CLI invocation, a message on a queue) rather than calling internal functions directly. Where the application architecture makes automated boundary tests feasible, their absence is a review concern.
Test-level smell check: If every test in the changeset is a unit test of isolated internal functions with no integration or acceptance-level test, flag this as a concern. The slice may be implementing domain logic without proving it is reachable through the running application.

Stage 2: Code Quality. Is the code clear, maintainable, and well-tested?

Review each changed file for:

Clarity: Can you understand what the code does without extra context? Are names descriptive? Is the structure obvious?
Domain types: Are semantic types used where primitives appear? You MUST follow the
```
domain-modeling
```
skill for primitive obsession detection.
Error handling: Are errors handled with typed errors? Are all paths covered?
Test quality: Do tests verify behavior, not implementation? Is coverage adequate for the changed code?
YAGNI: Is there unused code, speculative features, or premature abstraction?

Categorize findings by severity:

CRITICAL: Bug risk, likely to cause defects
IMPORTANT: Maintainability concern, should fix before merge
SUGGESTION: Style or minor improvement, optional

If any CRITICAL issue exists, stop. Return to implementation.

Stage 3: Domain Integrity. Final gate -- does the code respect domain boundaries?

Check for:

Compile-time enforcement opportunities: Are tests checking things the type system could enforce instead?
Domain type consistency: Are semantic types used at all boundaries, or do primitives leak through?
Validation placement: Is validation at construction (parse-don't-validate), not scattered through business logic?
State representation: Can the types represent invalid states?

Flag issues but do not block on suggestions. Domain integrity flags are strongly recommended but not required for merge.

对于实现垂直切片（添加用户可感知功能）的任务，按以下顺序执行检查：

入口点连接检查（基于差异）：查看变更集是否包含对应用程序入口点或其连接/路由层的修改。若切片声称添加了新的用户可感知功能，但差异中未涉及任何连接或入口点代码，除非作者明确说明现有连接已能路由到新功能，否则评审不通过。
端到端可追溯性：验证是否能从应用程序的外部入口点，经过基础设施或集成层，追踪到新的领域逻辑，再回到可感知的输出。若变更集中缺少该路径的任何环节且代码库中原本也不存在，需标记该缺口。
边界级测试覆盖：确认至少有一个测试用例通过应用程序的外部边界（如HTTP请求、CLI调用、队列消息）来测试新功能，而非直接调用内部函数。若应用程序架构支持自动化边界测试但未实现，需标记为评审存疑点。
测试气味检查：若变更集中的所有测试都是针对孤立内部函数的单元测试，没有集成或验收级测试，需标记为存疑点。该切片可能仅实现了领域逻辑，但未证明其能在运行中的应用程序中被访问。

阶段2：代码质量。代码是否清晰、可维护且测试充分？

针对每个变更文件检查：

清晰度：无需额外上下文是否能理解代码功能？命名是否具有描述性？结构是否清晰？
领域类型：是否在原语类型的使用场景中采用了语义化类型？必须遵循
```
domain-modeling
```
技能中的“原语痴迷”检测原则。
错误处理：是否使用类型化错误处理？是否覆盖了所有路径？
测试质量：测试是否验证行为而非实现？变更代码的测试覆盖是否充分？
YAGNI原则：是否存在未使用的代码、推测性功能或过早抽象？

按严重程度分类发现的问题：

严重（CRITICAL）：存在bug风险，可能导致缺陷
重要（IMPORTANT）：可维护性问题，合并前需修复
建议（SUGGESTION）：风格或微小改进，可选修复

若存在任何严重问题，立即终止评审，返回修改实现。

阶段3：领域完整性。最终关卡——代码是否尊重领域边界？

检查内容：

编译时强制执行机会：是否有本可通过类型系统强制执行，却用测试来检查的内容？
领域类型一致性：是否在所有边界都使用了语义化类型，还是原语类型泄露到了边界之外？
验证位置：验证是否在构造阶段完成（遵循parse-don't-validate原则），而非分散在业务逻辑中？
状态表示：类型是否能表示无效状态？

标记问题但不强制要求修改。领域完整性标记为强烈建议项，而非合并的必要条件。

Review Output

评审输出

Produce a structured summary after all three stages:

REVIEW SUMMARY
Stage 1 (Spec Compliance): PASS/FAIL
Stage 2 (Code Quality): PASS/FAIL/PASS with suggestions
Stage 3 (Domain Integrity): PASS/FAIL/PASS with flags

Overall: APPROVED / CHANGES REQUIRED

If CHANGES REQUIRED:
  1. [specific required change]
  2. [specific required change]

完成所有三个阶段后，生成结构化总结：

REVIEW SUMMARY
Stage 1 (Spec Compliance): PASS/FAIL
Stage 2 (Code Quality): PASS/FAIL/PASS with suggestions
Stage 3 (Domain Integrity): PASS/FAIL/PASS with flags

Overall: APPROVED / CHANGES REQUIRED

If CHANGES REQUIRED:
  1. [specific required change]
  2. [specific required change]

Structured Review Evidence

结构化评审证据

After completing all three stages, produce a REVIEW_RESULT evidence packet containing: per-stage verdicts {stage, verdict (PASS/FAIL), findings [{severity, description, file, line?, required_change?}]}, overall_verdict, required_changes_count, blocking_findings_count.

When

pipeline-state

is provided in context metadata, the code-review skill operates in pipeline mode and stores the evidence to

.factory/audit-trail/slices/<slice-id>/review.json

. When running standalone, the evidence is informational only (not stored).

In factory mode, the full team reviews before the pipeline pushes code -- this is the quality checkpoint that replaces consensus-during-build. All blocking review feedback must be addressed before push. See

references/mob-review.md

for the factory mode review subsection.

完成所有三个阶段后，生成包含以下内容的REVIEW_RESULT证据包：各阶段结论{stage, verdict (PASS/FAIL), findings[{severity, description, file, line?, required_change?}]}、整体结论、需修改项数量、阻塞性问题数量。

若上下文元数据中提供了

pipeline-state

，code-review技能将以流水线模式运行，并将证据存储至

.factory/audit-trail/slices/<slice-id>/review.json

。独立运行时，证据仅作参考（不存储）。

在工厂模式下，团队全员需在流水线推送代码前完成评审——这是替代“构建期间达成共识”的质量检查点。所有阻塞性评审反馈必须在推送前解决。有关工厂模式评审的详细内容，请参阅

references/mob-review.md

。

Handling Disagreements

处理分歧

When your review finding conflicts with the implementation approach:

State the concern with specific code references
Explain the risk -- what could go wrong
Propose an alternative
If no agreement after one round, escalate to the user

You exist to catch what the author missed, not to block progress.

当你的评审结论与实现方案存在冲突时：

结合具体代码引用说明存疑点
解释风险——可能会出现什么问题
提出替代方案
若一轮沟通后仍未达成共识，升级反馈给用户

你的职责是发现作者遗漏的问题，而非阻碍进度。

Business Value and UX Awareness

业务价值与UX意识

During Stage 1, also consider:

Does this slice deliver visible user value?
Are acceptance criteria specific and testable (not vague)?
Does the user journey remain coherent after this change?
Are edge cases and error states handled from the user's perspective?

These are not blocking concerns but should be noted when relevant.

在阶段1中，还需考虑：

该切片是否能为用户带来可见价值？
验收标准是否具体且可测试（而非模糊表述）？
变更后用户旅程是否仍连贯？
是否从用户视角处理了边缘情况和错误状态？

这些内容不属于阻塞性问题，但相关时需记录。

Enforcement Note

实施说明

This skill provides advisory guidance. It instructs the agent on correct review procedure but cannot mechanically prevent skipping stages or merging without review. When used with the

tdd

skill in automated mode, the orchestrator can gate PR creation on review completion. In guided mode or standalone, the agent follows these practices by convention. If you observe stages being skipped, point it out.

本技能提供指导性建议。它指导Agent遵循正确的评审流程，但无法机械性地阻止跳过阶段或未评审就合并代码。与

tdd

技能配合使用自动化模式时，编排器可将PR创建的权限管控在评审完成后。在引导模式或独立运行时，Agent将按惯例遵循这些实践。若发现跳过阶段的情况，请指出问题。

Verification

验证

Dependencies

依赖

This skill works standalone. For enhanced workflows, it integrates with:

domain-modeling: Provides the primitive obsession and parse-don't-validate principles referenced in Stage 2 and Stage 3
tdd: Reviews often follow a TDD cycle; this skill validates the output of that cycle
mutation-testing: Can follow code review as an additional quality gate

Missing a dependency? Install with:

npx skills add jwilger/agent-skills --skill domain-modeling

本技能可独立运行。如需增强工作流，可与以下技能集成：

domain-modeling：提供阶段2和阶段3中引用的“原语痴迷”和“parse-don't-validate”原则
tdd：评审通常在TDD周期之后进行；本技能可验证该周期的输出
mutation-testing：可作为代码评审之后的额外质量关卡

缺少依赖？通过以下命令安装：

npx skills add jwilger/agent-skills --skill domain-modeling