honest-review
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHonest Review
诚信评审
Research-driven code review. Every finding validated with evidence.
基于研究的代码评审。所有评审结论均有证据支撑。
Dispatch
调度方式
| $ARGUMENTS | Mode |
|---|---|
| Empty + changes in session (git diff) | Session review of changed files |
| Empty + no changes (first message) | Full codebase audit |
| File or directory path | Scoped review of that path |
| "audit" | Force full codebase audit |
| PR number/URL | Review PR changes (gh pr diff) |
| Git range (HEAD~3..HEAD) | Review changes in that range |
| 参数 | 模式 |
|---|---|
| 无参数 + 会话中存在代码变更(git diff) | 变更文件的会话评审 |
| 无参数 + 无代码变更(首次消息) | 全代码库审计 |
| 文件或目录路径 | 指定路径的范围评审 |
| "audit" | 强制执行全代码库审计 |
| PR编号/URL | 评审PR变更内容(gh pr diff) |
| Git范围(HEAD~3..HEAD) | 评审指定Git范围内的变更 |
Review Levels (Both Modes)
评审层级(两种模式通用)
Every review covers three abstraction levels, each examining both
defects and unnecessary complexity:
Surface (lines, expressions, functions):
Correctness, error handling, security, readability.
Simplify: dead code, complex conditionals to early returns, hand-rolled to stdlib.
Structural (modules, classes, boundaries):
Test coverage, coupling, interface contracts, cognitive complexity.
Simplify: 1:1 wrappers, single-use abstractions, pass-through plumbing.
Algorithmic (algorithms, data structures, system design):
Complexity class, N+1, resource leaks, concurrency.
Simplify: O(n^2) to O(n), wrong data structure, unnecessary serialization.
Context-dependent: add security checks for auth/payment/user-data code.
Add observability checks for services/APIs.
Full checklists: read references/checklists.md
每一次评审都会覆盖三个抽象层级,每个层级都会检查缺陷与不必要的复杂度:
表层(代码行、表达式、函数):
正确性、错误处理、安全性、可读性。
简化优化:死代码清理、复杂条件语句改为提前返回、自定义实现替换为标准库调用。
结构层(模块、类、边界):
测试覆盖率、耦合度、接口契约、认知复杂度。
简化优化:一对一包装器移除、单次使用抽象删除、透传式管道简化。
算法层(算法、数据结构、系统设计):
复杂度等级、N+1查询、资源泄漏、并发问题。
简化优化:O(n²)复杂度降为O(n)、替换不合适的数据结构、移除不必要的序列化操作。
上下文适配:针对认证/支付/用户数据相关代码增加安全性检查。针对服务/API增加可观测性检查。完整检查清单:查看references/checklists.md
Research Validation
研究验证
THIS IS THE CORE DIFFERENTIATOR. Do not report findings based solely
on LLM knowledge. For every non-trivial finding, validate with research:
Two-phase review per scope:
- Flag phase: Analyze code, generate hypotheses ("this API may be deprecated", "this SQL pattern may be injectable", "this dependency has a known CVE")
- Validate phase: For each flag, spawn research subagent(s) to confirm:
- Context7: look up current library docs for API correctness
- WebSearch: check current best practices, security advisories
- WebFetch: query package registries (npm, PyPI, crates.io)
- gh: check open issues, security advisories for dependencies
- Only report findings with evidence. Cite sources.
Research playbook: read references/research-playbook.md
这是核心差异化特性。不得仅基于LLM知识报告评审结论。对于所有非 trivial 的评审发现,必须通过研究进行验证:
每个范围的两阶段评审流程:
- 标记阶段:分析代码,生成假设(如“该API可能已废弃”、“该SQL模式可能存在注入风险”、“该依赖存在已知CVE漏洞”)
- 验证阶段:针对每个标记项,生成研究子代理进行确认:
- Context7:查询当前库文档以验证API正确性
- WebSearch:查阅当前最佳实践、安全公告
- WebFetch:查询包管理仓库(npm、PyPI、crates.io)
- gh:检查依赖的公开问题、安全公告
- 仅报告带有证据的发现,并注明来源。
研究操作手册:查看references/research-playbook.md
Mode 1: Session Review
模式1:会话评审
Step 1: Identify Changes
步骤1:识别变更
Run to capture both staged and unstaged changes.
Collect for full context.
Identify original task intent from session history.
git diff --name-only HEADgit diff HEAD运行以捕获暂存和未暂存的所有变更。收集以获取完整上下文。从会话历史中识别原始任务意图。
git diff --name-only HEADgit diff HEADStep 2: Scale and Launch
步骤2:扩展并启动评审
| Scope | Strategy |
|---|---|
| 1-2 files | Inline review at all 3 levels. Spawn research subagents for flagged findings. |
| 3-5 files | Spawn 3 parallel reviewer subagents (Surface/Structural/Algorithmic). Each flags then researches within their level. |
| 6+ files or 3+ modules | Spawn a team. See below. |
Team structure for large session reviews (6+ files):
[Lead: reconcile findings, produce final report]
|-- Surface Reviewer
| Wave 1: subagents analyzing files (1 per file)
| Wave 2: subagents researching flagged findings
|-- Structural Reviewer
| Wave 1: subagents analyzing module boundaries
| Wave 2: subagents researching flagged findings
|-- Algorithmic Reviewer
| Wave 1: subagents analyzing performance/complexity
| Wave 2: subagents researching flagged findings
|-- Verification Runner
Wave 1: subagents running build, lint, tests
Wave 2: subagents spot-checking behaviorEach teammate operates independently. Each runs internal waves of
massively parallelized subagents. No overlapping file ownership.
| 范围 | 策略 |
|---|---|
| 1-2个文件 | 对所有3个层级进行内联评审。针对标记的发现生成研究子代理。 |
| 3-5个文件 | 生成3个并行评审子代理(表层/结构层/算法层)。每个代理在各自层级内先标记再进行研究验证。 |
| 6个及以上文件或3个及以上模块 | 生成评审团队。详情见下文。 |
大型会话评审的团队结构(6个及以上文件):
[负责人:整合评审结果,生成最终报告]
|-- 表层评审器
| 第一阶段:子代理分析所有文件(每个文件对应一个子代理)
| 第二阶段:子代理对标记的发现进行研究验证
|-- 结构层评审器
| 第一阶段:子代理分析模块边界
| 第二阶段:子代理对标记的发现进行研究验证
|-- 算法层评审器
| 第一阶段:子代理分析性能/复杂度
| 第二阶段:子代理对标记的发现进行研究验证
|-- 验证执行器
第一阶段:子代理运行构建、代码检查、测试
第二阶段:子代理进行行为抽样检查每个团队成员独立工作,各自运行大规模并行子代理的多阶段流程。不得出现文件所有权重叠的情况。
Step 3: Reconcile (5 Steps)
步骤3:整合结果(5步流程)
- Question: For each finding, ask: (a) Is this actually broken or just unfamiliar? (b) Is there research evidence? (c) Would fixing this genuinely improve the code? Discard unvalidated findings.
- Deduplicate: Same issue at different levels — keep deepest root cause
- Resolve conflicts: When levels disagree, choose most net simplification
- Elevate: Surface patterns across files to structural/algorithmic root
- Prioritize: P0/S0 (must fix), P1/S1 (should fix), P2/S2 (report but do not implement)
Severity calibration: P0 = will cause production incident. Not "ugly code."
- 质疑:针对每个发现,询问:(a) 这真的是问题还是只是不熟悉的写法?(b) 是否有研究证据支撑?(c) 修复这个问题是否真的能提升代码质量?丢弃未经验证的发现。
- 去重:同一问题在不同层级被发现时,保留最根本的原因
- 解决冲突:当不同层级的结论不一致时,选择能带来最大简化效果的方案
- 提炼:将跨文件的模式问题升级为结构层/算法层的根本问题
- 优先级排序:分为P0/S0(必须修复)、P1/S1(应该修复)、P2/S2(仅报告无需修复)
严重程度校准:P0级问题指会导致生产事故的问题,而非“代码不美观”。
Step 4: Present and Execute
步骤4:呈现结果与执行修复
Present all P0/P1/S0/S1 findings with evidence and citations.
Ask: "Implement fixes? [all / select / skip]"
If approved: parallel subagents by file (no overlapping ownership).
Then verify: build/lint, tests, behavior spot-check.
Output format: read references/output-formats.md
呈现所有P0/P1/S0/S1级别的发现,并附上证据与引用来源。询问:“是否执行修复?[全部/选择部分/跳过]”
若获得批准:按文件分配并行子代理(无文件所有权重叠)。完成修复后进行验证:构建、代码检查、测试、行为抽样检查。
输出格式:查看references/output-formats.md
Mode 2: Full Codebase Audit
模式2:全代码库审计
Step 1: Discover
步骤1:探索分析
Explore: language(s), framework(s), build system, directory structure,
entry points, dependency manifest, approximate size.
For 500+ files: prioritize recently modified, entry points, public API,
high-complexity areas. State scope in report.
探索内容:使用的语言、框架、构建系统、目录结构、入口点、依赖清单、代码库大致规模。
对于500个及以上文件的代码库:优先评审最近修改的文件、入口点、公开API、高复杂度区域。在报告中说明评审范围。
Step 2: Design and Launch Team
步骤2:设计并启动评审团队
Spawn a team with domain-based ownership. Each teammate runs all 3
review levels + research validation on their owned files.
[Lead: cross-domain analysis, reconciliation, final report]
|-- Domain A Reviewer — e.g., Backend
| Wave 1: parallel subagents scanning all owned files
| Wave 2: parallel subagents deep-diving flagged files
| Wave 3: parallel subagents researching flagged assumptions
|-- Domain B Reviewer — e.g., Frontend
| [same wave pattern]
|-- Domain C Reviewer — e.g., Tests/Infra
| [same wave pattern]
|-- Dependency and Security Researcher
Wave 1: subagents auditing each dependency (version, CVEs, license)
Wave 2: subagents checking security patterns against current docs
Wave 3: subagents verifying API usage against library docs (Context7)Adapt team composition to project type.
Team archetypes + scaling: read references/team-templates.md
生成按领域划分所有权的评审团队。每个团队成员需对其负责的文件完成所有3个层级的评审 + 研究验证。
[负责人:跨领域分析、结果整合、生成最终报告]
|-- 领域A评审器 —— 例如:后端
| 第一阶段:并行子代理扫描所有负责的文件
| 第二阶段:并行子代理深入分析标记的文件
| 第三阶段:并行子代理对标记的假设进行研究验证
|-- 领域B评审器 —— 例如:前端
| [相同阶段流程]
|-- 领域C评审器 —— 例如:测试/基础设施
| [相同阶段流程]
|-- 依赖与安全研究员
第一阶段:子代理审计每个依赖(版本、CVE漏洞、许可证)
第二阶段:子代理对照当前文档检查安全模式
第三阶段:子代理对照库文档(Context7)验证API使用正确性根据项目类型调整团队组成。团队原型与扩展方式:查看references/team-templates.md
Step 3: Teammate Instructions
步骤3:团队成员指令
Each teammate receives: role, owned files, project context, all 3 review
levels, instruction to run two-phase (flag then research-validate), and
findings format. Full template: references/team-templates.md
每个团队成员会收到:角色、负责的文件、项目上下文、3个评审层级的要求、两阶段评审(标记→研究验证)的指令、发现结果的格式要求。完整模板:查看references/team-templates.md
Step 4: Cross-Domain Analysis (Lead)
步骤4:跨领域分析(负责人)
While teammates review, lead spawns parallel subagents for:
- Architecture: module boundaries, dependency graph
- Data flow: trace key paths end-to-end
- Error propagation: consistency across system
- Shared patterns: duplication vs. necessary abstraction
在团队成员评审的同时,负责人生成并行子代理完成以下工作:
- 架构:模块边界、依赖关系图
- 数据流:端到端追踪关键路径
- 错误传播:系统内的一致性
- 共享模式:重复代码与必要抽象的区分
Step 5: Reconcile Across Domains
步骤5:跨领域结果整合
Same 5-step reconciliation. Cross-domain deduplication and elevation.
使用相同的5步整合流程,进行跨领域的去重与提炼。
Step 6: Report
步骤6:生成报告
Output format: references/output-formats.md
Required sections: Critical, Significant, Cross-Domain, Health Summary,
Top 3 Recommendations. All findings include evidence + citations.
输出格式:查看references/output-formats.md
必填章节:严重问题、重要问题、跨领域问题、健康状况总结、Top3建议。所有发现均需包含证据与引用来源。
Step 7: Execute (If Approved)
步骤7:执行修复(若获得批准)
Ask: "Implement fixes? [all / select / skip]"
If approved: parallel subagents by file (no overlapping ownership).
Then verify: build/lint, tests, behavior spot-check.
询问:“是否执行修复?[全部/选择部分/跳过]”
若获得批准:按文件分配并行子代理(无文件所有权重叠)。完成修复后进行验证:构建、代码检查、测试、行为抽样检查。
Healthy Codebase
健康代码库
If no P0/P1 or S0 findings: state this explicitly. Acknowledge health.
Do not inflate minor issues. A short report is a good report.
若未发现P0/P1或S0级问题:需明确说明这一点,认可代码库的健康状态。不得夸大次要问题。简短的报告就是好报告。
Reference Files
参考文件
| File | When to Read |
|---|---|
| references/checklists.md | During analysis or building teammate prompts |
| references/research-playbook.md | When setting up research validation subagents |
| references/output-formats.md | When producing final output |
| references/team-templates.md | When designing teams (Mode 2 or large Mode 1) |
| 文件 | 阅读时机 |
|---|---|
| references/checklists.md | 分析过程中或编写团队成员提示语时 |
| references/research-playbook.md | 设置研究验证子代理时 |
| references/output-formats.md | 生成最终输出时 |
| references/team-templates.md | 设计评审团队时(模式2或大型模式1场景) |
Critical Rules
核心规则
- Every non-trivial finding must have research evidence or be discarded
- Do not police style — follow the codebase's conventions
- Do not report phantom bugs requiring impossible conditions
- More than 15 findings means re-prioritize — 5 validated findings beat 50 speculative
- Never skip reconciliation
- Always present before implementing (approval gate)
- Always verify after implementing (build, tests, behavior)
- Never assign overlapping file ownership
- 所有非 trivial 的发现必须有研究证据支撑,否则需丢弃
- 不得强制统一代码风格 —— 遵循代码库已有的约定
- 不得报告需要极端条件才会触发的虚假bug
- 发现数量超过15个时需重新排序优先级 —— 5个经过验证的发现优于50个推测性发现
- 绝不跳过结果整合步骤
- 执行修复前必须先呈现结果(需经过批准)
- 执行修复后必须进行验证(构建、测试、行为检查)
- 绝不分配重叠的文件所有权