code-ultrareview

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Code ultrareview

代码超审查

<!-- canonical:label-hygiene:start -->
<!-- canonical:label-hygiene:start -->

Critical — Label hygiene

关键要求 — 标签规范

Internal planning labels are author coordinates, not reader coordinates. Strip them from every shipped artifact this skill emits — code, comments, commit subjects/bodies, PR titles/descriptions, release notes, doc paragraphs, non-trivial comments.
  • Workstream and task labels
    WS-N
    ,
    Phase-A
    ,
    Step-3
    , issue or ticket numbers, plan phase names from the source spec, issue body, or planning artifact. Translate to the domain noun (
    Runs the battery script (WS-2)
    Runs the battery script
    ). <!-- noqa: internal-label -->
  • Process language — "the rebuild", "the prior
    <file>
    ", "carried verbatim from", "the cleanup pass", "the audit", "spec AC" standalone. Replace with the concrete fact (
    carries the routing from the prior aggregation
    routes via the merge keys in the synthesis module
    ). <!-- noqa: internal-label -->
  • Plan-internal references — "as the brief says", "per the workstream", "from the forge artifact". Drop the reference; state the fact directly.
Carve-outs — literal
WS-N
is legitimate where the skill IS the format authority (forge templates, apex rule documentation). Reviewer-facing dev docs (e.g.
MIGRATION.md
under
tests/<skill>/
) may reference deleted artifacts by their author-time names.
<!-- canonical:label-hygiene:end -->
Eight-axis judgment code review. Five-phase pipeline scope → tool battery → 8 parallel axis reviewers → Haiku validators → synthesis. Always runs at full strength. Distinct from Anthropic's remote
/ultrareview
— same goal, in-session on the user's subscription.
<!-- canonical:writing-rules:start -->
内部规划标签是作者的标记,而非面向读者的标识。在本技能输出的所有交付产物中都需要移除这些标签——包括代码、注释、提交标题/正文、PR标题/描述、发布说明、文档段落和重要注释。
  • 工作流与任务标签
    WS-N
    Phase-A
    Step-3
    、问题或工单编号、来自源规格、问题正文或规划产物的计划阶段名称。需转换为领域名词(例如
    Runs the battery script (WS-2)
    Runs the battery script
    )。 <!-- noqa: internal-label -->
  • 流程术语 — "重建"、"先前的
    <file>
    "、"直接沿用"、"清理阶段"、"审计"、"独立的spec AC"。需替换为具体事实(例如 "carries the routing from the prior aggregation" → "routes via the merge keys in the synthesis module")。 <!-- noqa: internal-label -->
  • 内部规划引用 — "如 brief 所述"、"根据工作流"、"来自 forge 产物"。需移除引用,直接陈述事实。
例外情况——当本技能作为格式权威时(如forge模板、核心规则文档),字面意义的
WS-N
是合法的。面向审查者的开发文档(例如
tests/<skill>/
下的
MIGRATION.md
)可以按作者命名方式引用已删除的产物。
<!-- canonical:label-hygiene:end -->
八维度判定代码审查。五阶段流水线流程:范围确定→工具集→8个并行维度审查器→Haiku验证器→报告合成。始终以完整强度运行。与Anthropic的远程
/ultrareview
不同——目标一致,但在用户订阅的会话内运行。
<!-- canonical:writing-rules:start -->

Important — Writing rules

重要规则 — 写作规范

These rules govern every prose artifact this skill emits — READMEs, CHANGELOGs, commit messages, PR bodies, release notes, doc paragraphs, non-trivial comments. Apply them at draft time, verify before output.
  • Match the surrounding style — punctuation, capitalization, backtick conventions, em-dash vs parens, bullet style.
  • Every sentence changes the reader's understanding. Cut it otherwise.
  • Front-load the verb — "Creates", not "This helps you create".
  • Concrete over abstract. Lists for ≥3 enumerable items.
  • Assert positively. Reserve negation for real constraints (
    NEVER commit secrets
    ).
  • No marketing words: powerful, robust, seamlessly, leverage, unlock, comprehensive, delightful.
  • No AI tells: delve, tapestry, intricate, pivotal, testament, underscore, crucial, garner, showcase, additionally, moreover, furthermore, indeed.
  • After drafting English prose, invoke
    /humanize-en
    if installed.
<!-- canonical:writing-rules:end -->
这些规则适用于本技能输出的所有文本产物——README、CHANGELOG、提交信息、PR正文、发布说明、文档段落和重要注释。在草稿阶段应用,输出前验证。
  • 匹配周边风格——标点、大小写、反引号约定、破折号与括号、列表样式。
  • 每句话都要改变读者的认知,否则删除。
  • 动词前置——使用"Creates"而非"This helps you create"。
  • 优先具体表述,避免抽象。≥3个可枚举项使用列表。
  • 正面陈述。仅对真实约束使用否定表述(如
    NEVER commit secrets
    )。
  • 禁用营销词汇:powerful、robust、seamlessly、leverage、unlock、comprehensive、delightful。
  • 禁用AI风格表述:delve、tapestry、intricate、pivotal、testament、underscore、crucial、garner、showcase、additionally、moreover、furthermore、indeed。
  • 英文草稿完成后,若已安装
    /humanize-en
    则调用该工具优化。
<!-- canonical:writing-rules:end -->

Objective

目标

Run the 8 axes — Correctness, Simplification, Tests, Documentation, Style, Intent, Design/API, Performance — as 8 parallel LLM subagents fed by deterministic tool findings from
scripts/run_battery.sh
. Coherence joins as a 9th axis when metadata files change. Sub-80 axis findings get re-scored by Haiku validators against the verbatim rubric in
references/anthropic-verbatim.md
. Findings synthesize into one report with deterministic dedup, inter-axis precedence, A2 no-silent-drop, and a verdict (Ship / Fix-then-ship / Needs work). The report ends with "What I did NOT check" so the coverage limits are explicit.
运行8个审查维度——正确性、简化性、测试、文档、风格、意图、设计/API、性能——将其作为8个并行LLM子代理,由
scripts/run_battery.sh
输出的确定性工具结果提供数据。当元数据文件变更时,一致性维度会作为第9个维度加入。评分低于80的维度结果会由Haiku验证器根据
references/anthropic-verbatim.md
中的严格评分标准重新打分。结果会合成一份报告,包含确定性去重、维度间优先级、A2无静默丢弃规则以及最终结论(Ship / Fix-then-ship / Needs work)。报告末尾会列出「未检查内容」,明确覆盖范围限制。

Parameters

参数

FlagBehavior
-s
Save the report + JSONL to
~/.claude/output/{project}/code-ultrareview/code-ultrareview-{slug}.{md,jsonl}
-S
Force no-save (overrides any ambient save mode)
-b <ref>
Override the review base (skip auto-detection via
scripts/resolve_base.sh
)
--repo-kind <kind>
Override the scope classifier. Values:
skills
,
app
,
library
,
docs
,
monorepo
,
python
,
rust
,
go
,
unknown
. Persistent per-repo override at
.code-ultrareview.yaml
(
repo_kind: <kind>
); the flag wins on conflict. Invalid value exits 2
--reconcile <input>
Activate the Intent-axis derivation sub-mode.
<input>
may be
@auto
,
@pr
, an explicit path or directory,
gh:pr:<N>
,
gh:issue:<owner>/<repo>#<N>
, or a GitHub issue URL. Findings classify as GAP / SCOPE-ADD / DECISION-OVERRIDE / CONSISTENT
--verify-build
Run build verification on sub-80 axis findings BEFORE Haiku validators (Phase 3.5). Builds + runs the test command detected by
scripts/build_detect.py
; confirmed findings get promoted (+30 confidence) and skip the validator phase
--mutation-test
Run Stryker (JS/TS), Pitest (JVM), or mutmut (Python) on changed files only. Surviving mutants route to the Tests axis as 🟠 Medium
--apply-safe
Opt-in writers: auto-apply low-risk fixes (manifest version sync, structured-field description sync with full-agreement guard, one failing test per confirmed bug). Diff preview + per-file confirmation before any write
--include-prose
Coherence axis compares README freeform paragraphs as well (default: structured fields only)
--axes <list>
Comma-separated subset of axes to run (e.g.
correctness,tests
). Default: all 8 + Coherence when triggered
--preflight
List detected tools per repo_kind + print install commands for missing ones. Informational only, no install
Lowercase enables, uppercase disables. No
-f
— this skill is a producer, not a consumer.
bash
/code-ultrareview                              # full 8-axis review, print report
/code-ultrareview -s                           # save the report + JSONL for /apex -f
/code-ultrareview -b origin/main               # review HEAD against an explicit base
/code-ultrareview --verify-build               # promote sub-80 findings via real build verification
/code-ultrareview --reconcile @auto            # add Intent derivation sub-mode with auto-detect
/code-ultrareview --apply-safe                 # full review + gated low-risk fixes
/code-ultrareview --preflight                  # list tools the battery would run, no review
/code-ultrareview --axes correctness,tests     # subset of axes
标志行为
-s
将报告+JSONL保存至
~/.claude/output/{project}/code-ultrareview/code-ultrareview-{slug}.{md,jsonl}
-S
强制不保存(覆盖任何默认保存模式)
-b <ref>
覆盖审查基准(跳过
scripts/resolve_base.sh
的自动检测)
--repo-kind <kind>
覆盖范围分类器。可选值:
skills
app
library
docs
monorepo
python
rust
go
unknown
。可通过
.code-ultrareview.yaml
设置仓库级持久化覆盖(
repo_kind: <kind>
);标志设置优先级更高。无效值会导致退出码为2
--reconcile <input>
激活意图维度推导子模式。
<input>
可以是
@auto
@pr
、明确路径或目录、
gh:pr:<N>
gh:issue:<owner>/<repo>#<N>
或GitHub问题URL。结果会分类为GAP / SCOPE-ADD / DECISION-OVERRIDE / CONSISTENT
--verify-build
在Haiku验证器之前(第3.5阶段)对评分低于80的维度结果运行构建验证。执行
scripts/build_detect.py
检测到的构建+测试命令;确认的结果会提升置信度(+30)并跳过验证阶段
--mutation-test
仅对变更文件运行Stryker(JS/TS)、Pitest(JVM)或mutmut(Python)。存活的变异体作为🟠中等问题提交至测试维度
--apply-safe
可选写入功能:自动应用低风险修复(清单版本同步、结构化字段描述同步+完全一致校验、每个确认bug对应一个失败测试)。写入前会展示差异预览+逐文件确认
--include-prose
一致性维度同时对比README自由格式段落(默认:仅对比结构化字段)
--axes <list>
逗号分隔的要运行的维度子集(例如
correctness,tests
)。默认:全部8个维度+触发时的一致性维度
--preflight
列出按repo_kind检测到的工具+缺失工具的安装命令。仅提供信息,不执行安装
小写字母启用,大写字母禁用。无
-f
标志——本技能是产物生成器,而非消费者。
bash
/code-ultrareview                              # 完整8维度审查,打印报告
/code-ultrareview -s                           # 保存报告+JSONL供/apex -f使用
/code-ultrareview -b origin/main               # 对比HEAD与明确基准进行审查
/code-ultrareview --verify-build               # 通过实际构建验证提升评分低于80的结果
/code-ultrareview --reconcile @auto            # 启用意图推导子模式并自动检测
/code-ultrareview --apply-safe                 # 完整审查+ gated低风险修复
/code-ultrareview --preflight                  # 列出工具集将运行的工具,不执行审查
/code-ultrareview --axes correctness,tests     # 运行维度子集

The five phases

五阶段流程

Phase 1 — Scope

阶段1 — 范围确定

Runs
scripts/scope.py
. Deterministic, no LLM. Outputs
scope.json
:
  • Diff resolution — clean tree →
    scripts/resolve_base.sh
    ladder; dirty tree →
    git diff HEAD
    + every untracked file inlined as added lines.
  • Repo-kind classification — 8 kinds (
    skills
    /
    app
    /
    library
    /
    docs
    /
    monorepo
    /
    python
    /
    rust
    /
    go
    ) +
    unknown
    . Override via
    --repo-kind
    or
    .code-ultrareview.yaml
    .
  • CLAUDE.md chain — root
    CLAUDE.md
    + nested
    CLAUDE.md
    in changed directories +
    .claude/rules/*.md
    +
    ~/.claude/rules/*.md
    . Ordered root-to-deepest. Read by axis reviewers and validators.
  • Coherence activation — any of
    package.json
    ,
    .claude-plugin/marketplace.json
    ,
    marketplace.json
    ,
    SKILL.md
    , root
    README.md
    ,
    tsconfig.json
    ,
    pyproject.toml
    ,
    Cargo.toml
    ,
    go.mod
    in the diff →
    scope.json["activates_coherence"] = true
    .
  • Languages detection — from changed-file extensions; drives Phase 2 dispatch.
The output also feeds the report header lines
Repo: <kind>
,
Base: <ref>
,
Files: <N>
.
运行
scripts/scope.py
。确定性流程,无LLM参与。输出
scope.json
  • 差异解析 — 干净工作区→
    scripts/resolve_base.sh
    阶梯式解析;脏工作区→
    git diff HEAD
    +所有未跟踪文件作为新增行内联。
  • 仓库类型分类 — 8种类型(
    skills
    /
    app
    /
    library
    /
    docs
    /
    monorepo
    /
    python
    /
    rust
    /
    go
    )+
    unknown
    。可通过
    --repo-kind
    .code-ultrareview.yaml
    覆盖。
  • CLAUDE.md链 — 根目录
    CLAUDE.md
    +变更目录中的嵌套
    CLAUDE.md
    +
    .claude/rules/*.md
    +
    ~/.claude/rules/*.md
    。按从根到最深层级排序。供维度审查器和验证器读取。
  • 一致性维度激活 — 差异中包含
    package.json
    .claude-plugin/marketplace.json
    marketplace.json
    SKILL.md
    、根目录
    README.md
    tsconfig.json
    pyproject.toml
    Cargo.toml
    go.mod
    中的任意文件→
    scope.json["activates_coherence"] = true
  • 语言检测 — 从变更文件扩展名检测;驱动阶段2的工具调度。
输出还会提供报告标题行
Repo: <kind>
Base: <ref>
Files: <N>

Phase 2 — Tool battery

阶段2 — 工具集

Runs
scripts/run_battery.sh
. Deterministic CLIs feed
tool-findings.jsonl
tagged by axis with
confidence: 100
. Tools dispatch per
scope.json["languages"]
: npx-wrapped (
knip
/
jscpd
/
markdownlint-cli2
/
@microsoft/api-extractor
) and uvx-wrapped (
lizard
/
vulture
/
semgrep
/
vale
) tools wrap zero-install; native binaries (
oasdiff
/
atlas
/ Go
deadcode
gocyclo
dupl
/
cargo-machete
) fall back to PATH. Bundled
references/perf-rules/
carries the universal N+1 and sync-I/O semgrep rules. Per-tool axis routing lives in
scripts/battery_ingest.py
. Full Tool → Axis → Install table in
README.md
.
Graceful skip. Missing tools emit
WARN: <tool> not found — install: <command>
to stderr and append to
scope.json["tools_skipped"]
; the skill continues. The battery NEVER auto-installs — no
brew
,
cargo
,
go
,
pip
, or
npm
install runs.
Phase 2 extension —
--mutation-test
.
scripts/run_mutation.sh
dispatches Stryker (JS/TS), mutmut (Python), or pitest-maven (JVM) scoped to changed files only. Surviving mutants route to the Tests axis as 🟠 Medium with
confidence: 100
(skips Phase 4 validators). Runtime can exceed 10 minutes per language; default 600 s timeout overridable via
MUTATION_TIMEOUT
. Graceful skip on missing tool or config. Details:
references/ultra-execution.md
.
运行
scripts/run_battery.sh
。确定性CLI工具生成
tool-findings.jsonl
,按维度标记
confidence: 100
。工具根据
scope.json["languages"]
调度:npx封装的工具(
knip
/
jscpd
/
markdownlint-cli2
/
@microsoft/api-extractor
)和uvx封装的工具(
lizard
/
vulture
/
semgrep
/
vale
)支持零安装;原生二进制工具(
oasdiff
/
atlas
/Go的
deadcode
/
gocyclo
/
dupl
/
cargo-machete
)回退到PATH查找。
references/perf-rules/
包含通用的N+1和同步I/O semgrep规则。工具到维度的路由定义在
scripts/battery_ingest.py
中。完整的工具→维度→安装表见
README.md
优雅跳过。缺失工具会向stderr输出
WARN: <tool> not found — install: <command>
并添加到
scope.json["tools_skipped"]
;技能会继续运行。工具集绝不会自动安装——不会执行
brew
cargo
go
pip
npm
安装命令。
阶段2扩展 —
--mutation-test
scripts/run_mutation.sh
调度Stryker(JS/TS)、mutmut(Python)或pitest-maven(JVM),仅针对变更文件。存活的变异体作为🟠中等问题提交至测试维度,
confidence: 100
(跳过阶段4验证器)。每个语言的运行时间可能超过10分钟;默认超时600秒,可通过
MUTATION_TIMEOUT
覆盖。缺失工具或配置时优雅跳过。详情见
references/ultra-execution.md

Phase 3 — Axis review

阶段3 — 维度审查

The orchestrator prepares 8 per-axis bundles (+ Coherence when active) via
scripts/axis_dispatch.py prepare
, then launches every bundle as a parallel
Explore
Task
in one message. Each subagent reads its axis brief, the rubric in
references/anthropic-verbatim.md
, the diff, and its filtered tool findings. Each emits canonical-schema JSONL on stdout. Subagents cannot spawn other subagents — the main thread launches both axis reviewers AND validators.
The 8 always-on axes: Correctness · Simplification · Tests · Documentation · Style · Intent · Design/API · Performance. Each maps to
references/axes/<name>.md
for scope + repo-kind branches. Coherence is the conditional 9th — added when
scope.json["activates_coherence"]
is true; when inactive, the header surfaces
Coherence axis: inactive
so the absence is visible. Full axis map, inter-axis precedence, and orchestration details (prepare CLI, bundle schema, no-silent-failure contract):
references/axes-overview.md
+
references/orchestration.md
.
协调器通过
scripts/axis_dispatch.py prepare
准备8个维度的包(激活时额外加一致性维度),然后在一条消息中启动所有包作为并行
Explore
Task
。每个子代理读取其维度说明、
references/anthropic-verbatim.md
中的评分标准、代码差异和过滤后的工具结果。每个子代理在stdout输出符合规范的JSONL格式结果。子代理无法生成其他子代理——主线程同时启动维度审查器和验证器。
始终启用的8个维度:正确性·简化性·测试·文档·风格·意图·设计/API·性能。每个维度对应
references/axes/<name>.md
中的范围+仓库类型分支。一致性维度是可选的第9个维度——当
scope.json["activates_coherence"]
为true时添加;未激活时,报告标题会显示
Coherence axis: inactive
,明确告知该维度未运行。完整的维度映射、维度间优先级和协调详情(准备CLI、包结构、无静默失败约定)见
references/axes-overview.md
+
references/orchestration.md

Phase 4 — Validation

阶段4 — 验证

The orchestrator prepares per-finding validator bundles via
scripts/run_validators.py prepare
, then launches one Haiku
Task
per finding in the same message — batched ≤10 parallel. Each validator receives the finding + diff context + the deepest matching CLAUDE.md snippet + the verbatim rubric, re-scores 0-100, and re-checks the cited CLAUDE.md rule actually exists in
claude_md_chain
(demotes with
CLAUDE.md rule not found at <path>
if not).
Confidence threshold = 80 (
scripts/synthesis_core.py:CONFIDENCE_THRESHOLD
). Tool-battery findings (confidence 100) skip the validator phase — they are deterministic. Validators stay read-only — no Write / Edit / Bash, no nested subagent spawn.
Typical runtime. 5-15 sub-80 findings → one batch → ~30-60s. 25+ findings spread over 2-3 batches stay under ~2 min. Latency is dominated by Haiku launch overhead, not inference.
A2 contract. No sub-80 finding silently dropped. Each one is promoted to ≥80, demoted with reason, or surfaced in
### ⚠️ Unverified
with the validator's reason text.
Validator prepare CLI, bundle schema, ingest pass details:
references/orchestration.md
.
Phase 3.5 —
--verify-build
.
Build verification runs BEFORE validators via
scripts/run_build_verify.py
(composing
scripts/build_detect.py
+
synthesis_core.iterate_unverified
— +30 confidence, cap 95, floor 80). Sub-80 findings on
correctness
/
tests
/
design-api
/
performance
get promoted past the validator phase when the build fails. Other axes pass through unchanged. Details:
references/orchestration.md
.
协调器通过
scripts/run_validators.py prepare
准备每个结果的验证器包,然后在同一条消息中为每个结果启动一个Haiku
Task
——批量处理≤10个并行任务。每个验证器接收结果+差异上下文+最匹配的CLAUDE.md片段+严格评分标准,重新打分0-100,并检查引用的CLAUDE.md规则是否存在于
claude_md_chain
中(不存在则降级并标注
CLAUDE.md rule not found at <path>
)。
置信度阈值=80(
scripts/synthesis_core.py:CONFIDENCE_THRESHOLD
)。工具集结果(置信度100)跳过验证阶段——它们是确定性的。验证器保持只读——不执行Write/Edit/Bash操作,不生成嵌套子代理。
典型运行时间。5-15个评分低于80的结果→一批次→约30-60秒。25+个结果分2-3批次处理→约2分钟内完成。延迟主要由Haiku启动开销决定,而非推理时间。
A2约定。评分低于80的结果不会被静默丢弃。每个结果要么提升至≥80,要么降级并给出理由,要么在
### ⚠️ Unverified
中展示并附带验证器的理由文本。
验证器准备CLI、包结构、结果处理详情见
references/orchestration.md
阶段3.5 —
--verify-build
。构建验证在验证器之前通过
scripts/run_build_verify.py
运行(结合
scripts/build_detect.py
+
synthesis_core.iterate_unverified
——置信度+30,上限95,下限80)。正确性/测试/设计-API/性能维度中评分低于80的结果在构建失败时会跳过验证阶段直接提升。其他维度结果保持不变。详情见
references/orchestration.md

Phase 5 — Synthesis

阶段5 — 报告合成

Runs
scripts/synthesize.py
on top of
scripts/synthesis_core.py
primitives:
  1. Dedup by
    (location, finding-text)
    .
  2. Inter-axis precedence — when 2+ axes flag the same
    file:line
    with the same finding wording, highest severity wins; ties resolve via
    Correctness > Design/API > Simplification > Tests > Documentation > Style > Intent > Performance > Coherence
    (
    scripts/synthesis_core.py:AXIS_PRIORITY
    ). Distinct findings at coincident lines (a Correctness null-deref and a Tests missing-assert on the same line) survive as separate entries.
  3. A2 routing — sub-80 stays in Unverified with the validator's reason.
  4. Verdict
    Ship
    /
    Fix-then-ship
    /
    Needs work
    (
    scripts/synthesis_core.py:compute_verdict
    ).
  5. Report emission — markdown to terminal +
    ~/.claude/output/{project}/code-ultrareview/code-ultrareview-{slug}.md
    . JSONL alongside with Conventional Comments labels (
    issue
    /
    suggestion
    /
    nitpick
    /
    question
    ).
The closing "What I did NOT check" section is mandatory and always present, even when nothing was skipped — it lists security (defers to
/security-review
), runtime performance / benchmarks (explicit non-goal), flaky test detection (explicit non-goal), and any tools from
scope.json["tools_skipped"]
.
基于
scripts/synthesis_core.py
的原语运行
scripts/synthesize.py
  1. 去重 — 按
    (location, finding-text)
    去重。
  2. 维度间优先级 — 当2+个维度标记同一
    file:line
    且结果表述相同时,最高优先级维度胜出;平局按
    Correctness > Design/API > Simplification > Tests > Documentation > Style > Intent > Performance > Coherence
    解决(
    scripts/synthesis_core.py:AXIS_PRIORITY
    )。同一行的不同结果(例如同一行的正确性空指针引用和测试缺失断言)会作为独立条目保留。
  3. A2路由 — 评分低于80的结果保留在Unverified部分并附带验证器理由。
  4. 结论
    Ship
    /
    Fix-then-ship
    /
    Needs work
    scripts/synthesis_core.py:compute_verdict
    )。
  5. 报告输出 — markdown格式输出到终端+保存至
    ~/.claude/output/{project}/code-ultrareview/code-ultrareview-{slug}.md
    。同时输出符合Conventional Comments标签(
    issue
    /
    suggestion
    /
    nitpick
    /
    question
    )的JSONL格式。
末尾的**「未检查内容」**部分是必填项,始终存在,即使没有跳过任何内容——会列出安全问题(转至
/security-review
)、运行时性能/基准测试(明确非目标)、不稳定测试检测(明确非目标)以及
scope.json["tools_skipped"]
中的所有工具。

Final report layout

最终报告布局

templates/code-ultrareview.md
is the canonical wire format — every
##
section renders verbatim in template order with its emoji prefix; no rename, merge, reorder, or improvise. Terminal echo is mandatory — the full canonical report prints to the chat-terminal on every invocation;
-s
is purely additive (writes the same bytes to
~/.claude/output/{project}/code-ultrareview/code-ultrareview-{slug}.md
, byte-for-byte identical to terminal output). Severity marker mapping (🔴 High blocks ship · 🟠 Medium fix-soon · 🟢 Low nit · ⚠️ Unverified sub-80) lives in
scripts/synthesis_core.py:SEVERITY_MARKERS
.
templates/code-ultrareview.md
是标准格式——每个
##
部分都会按模板顺序带emoji前缀原样渲染;不允许重命名、合并、重新排序或即兴修改。终端输出是强制要求——每次调用都会在聊天终端打印完整的标准报告;
-s
仅作为附加功能(将相同内容写入
~/.claude/output/{project}/code-ultrareview/code-ultrareview-{slug}.md
,与终端输出字节完全一致)。严重程度标记映射(🔴高优先级阻止发布·🟠中等优先级尽快修复·🟢低优先级小问题·⚠️未验证评分低于80)定义在
scripts/synthesis_core.py:SEVERITY_MARKERS
中。

Trust model

信任模型

The skill ingests third-party content — CLAUDE.md files, PR bodies, planning artifacts (
--reconcile
), GitHub issue bodies — which can carry indirect prompt-injection. Axis reviewers and validators are read-only (no Write / Edit / Bash mutation). User review of the report is the trust boundary before any
--apply-safe
write;
--apply-safe
itself gates writes behind diff preview + per-file confirmation.
本技能会摄入第三方内容——CLAUDE.md文件、PR正文、规划产物(
--reconcile
)、GitHub问题正文——这些内容可能携带间接提示注入。维度审查器和验证器是只读的(不执行Write/Edit/Bash修改)。在执行
--apply-safe
写入之前,用户对报告的审查是信任边界;
--apply-safe
本身会在写入前展示差异预览+逐文件确认。

Rules

规则

  • Only new findings. Issues the diff introduces. Pre-existing findings carry the
    Pre-existing
    tier for context, never flip the verdict.
  • No silent drop (A2). Sub-80 findings surface in
    ### ⚠️ Unverified
    with rationale
    Sub-80 confidence ({score}) — verify locally before action.
  • Fail loud. A phase that cannot run (unresolvable base, missing tool with no skip path, dependency failure) appears in the header or as a finding. Never silent.
  • Cite precisely. Every finding carries
    file:line
    ; CLAUDE.md findings quote the violated rule verbatim; permalinks use
    https://github.com/<owner>/<repo>/blob/<full-sha>/<path>#L<n>-L<m>
    (full SHA via
    git rev-parse HEAD
    ).
  • Full report in chat every time. The complete report prints to the terminal on every invocation.
    -s
    writes the same bytes to disk; it never gates or summarises chat output.
  • NEVER auto-install tools. Missing tools surface install commands in the report and
    scope.json["tools_skipped"]
    . The user installs them.
  • NEVER modify code without
    --apply-safe
    .
    Default is read-only review.
    --apply-safe
    writers are surgical and per-file confirmed.
  • 仅报告新问题。仅关注代码差异引入的问题。预先存在的问题会标记为
    Pre-existing
    作为上下文,绝不会改变最终结论。
  • 无静默丢弃(A2)。评分低于80的结果会在
    ### ⚠️ Unverified
    中展示,理由为
    Sub-80 confidence ({score}) — verify locally before action.
  • 失败时明确提示。无法运行的阶段(无法解析基准、无跳过路径的缺失工具、依赖失败)会在标题或结果中显示。绝不会静默失败。
  • 精确引用。每个结果都携带
    file:line
    ;CLAUDE.md相关结果会逐字引用违反的规则;永久链接使用
    https://github.com/<owner>/<repo>/blob/<full-sha>/<path>#L<n>-L<m>
    (完整SHA通过
    git rev-parse HEAD
    获取)。
  • 每次调用都在聊天中输出完整报告。每次调用都会在终端打印完整报告。
    -s
    仅将相同内容写入磁盘;绝不会限制或总结聊天输出。
  • 绝不自动安装工具。缺失工具会在报告中显示安装命令并添加到
    scope.json["tools_skipped"]
    。由用户自行安装。
  • --apply-safe
    时绝不修改代码
    。默认仅执行只读审查。
    --apply-safe
    写入是精准的且需逐文件确认。

Deferrals

未覆盖范围

The closing "What I did NOT check" section always names these — explicit user-facing calibration of coverage:
  • Security
    /security-review
    or Anthropic's
    claude-code-security-review
    . Security is a distinct concern with its own deeper review pattern.
  • Runtime performance / benchmarks → not covered. The Performance axis catches static patterns (N+1, sync I/O) but not runtime profiling.
  • Flaky test detection → not covered. The Tests axis catches structural smells, not flake.
  • Tools from
    scope.json["tools_skipped"]
    → listed explicitly so the user sees what they sacrificed by not installing the native binaries.
末尾的「未检查内容」部分始终会列出以下内容——明确告知用户覆盖范围:
  • 安全问题
    /security-review
    或Anthropic的
    claude-code-security-review
    。安全是独立的关注点,有专门的深度审查模式。
  • 运行时性能/基准测试 → 未覆盖。性能维度仅检测静态模式(N+1、同步I/O),不涉及运行时分析。
  • 不稳定测试检测 → 未覆盖。测试维度仅检测结构问题,不涉及不稳定测试。
  • scope.json["tools_skipped"]
    中的工具
    → 明确列出,让用户了解未安装原生工具所牺牲的功能。

Graceful degradation

优雅降级

  • No CLAUDE.md / no
    .claude/rules
    — Style axis runs without baseline; the report says
    Style axis: skipped — no rules baseline found
    .
  • No
    npx
    / no
    uvx
    — every wrappable tool skips; only PATH binaries run.
  • Missing native binary (
    oasdiff
    ,
    atlas
    ,
    cargo-machete
    , Go tools) — emits to stderr +
    scope.json["tools_skipped"]
    . The relevant axis loses its tool input but still runs LLM judgment.
  • Unresolvable base — fail loud with the resolver's hint line. Do not guess.
  • Unknown repo_kind — axes run with their
    unknown
    branch (no specialization).
  • Coherence inactive — when no metadata files change, the 9th axis simply does not launch. The report header says
    Coherence axis: inactive
    so the absence is visible.
  • 无CLAUDE.md/无
    .claude/rules
    — 风格维度在无基准规则的情况下运行;报告会显示
    Style axis: skipped — no rules baseline found
  • npx
    /无
    uvx
    — 所有可封装工具都会跳过;仅运行PATH中的二进制工具。
  • 缺失原生二进制工具
    oasdiff
    atlas
    cargo-machete
    、Go工具) — 向stderr输出+添加到
    scope.json["tools_skipped"]
    。相关维度会失去工具输入,但仍会运行LLM判定。
  • 无法解析基准 — 明确提示解析器的提示信息。绝不猜测。
  • 未知仓库类型 — 维度使用其
    unknown
    分支运行(无特殊化处理)。
  • 一致性维度未激活 — 当无元数据文件变更时,第9个子代理不会启动。报告标题会显示
    Coherence axis: inactive
    ,明确告知该维度未运行。

Composition

组合使用

Bridge to the fix pass after the report ships:
  • /apex -f ~/.claude/output/{project}/code-ultrareview/code-ultrareview-{slug}.md
    — structured fix pass (requires
    -s
    ; pass the absolute path the report prints).
  • /oneshot "<finding>"
    — single-finding quick fix (takes a description, not a file).
报告生成后可衔接修复流程:
  • /apex -f ~/.claude/output/{project}/code-ultrareview/code-ultrareview-{slug}.md
    — 结构化修复流程(需要
    -s
    ;传入报告打印的绝对路径)。
  • /oneshot "<finding>"
    — 单问题快速修复(接收问题描述,而非文件)。

Opt-in flag composition

可选标志组合

The four opt-in flags layer orthogonally on the always-on pipeline: mutation tests join Phase 2 tool findings;
--verify-build
adds Phase 3.5;
--reconcile
enriches the Intent axis in Phase 3;
--apply-safe
writers run post-synthesis with diff preview + per-file confirmation. Without a flag, its feature is off. Full composition matrix and per-flag details:
references/ultra-execution.md
.
四个可选标志可在默认流水线基础上正交叠加:变异测试加入阶段2工具结果;
--verify-build
添加阶段3.5;
--reconcile
增强阶段3的意图维度;
--apply-safe
写入在合成后运行,带差异预览+逐文件确认。无标志时对应功能关闭。完整的组合矩阵和每个标志的详情见
references/ultra-execution.md

What this skill is NOT

本技能不具备的能力

  • Not a security audit. Defers to
    /security-review
    . The closing section makes this explicit on every report.
  • Not a linter or formatter. The deterministic tool battery (npx/uvx-wrapped CLIs) handles linting and dead-code detection. The skill layers LLM judgment on top of those signals.
  • Not Anthropic's remote
    /ultrareview
    .
    Distinct surface — this skill runs in-session on the user's subscription;
    /ultrareview
    runs in a remote sandbox and bills per run.
  • Not a fix tool. Report-only by default.
    --apply-safe
    covers three surgical writers; everything else routes to
    /apex
    or
    /oneshot
    .
  • 不是安全审计工具。转至
    /security-review
    。每份报告的末尾部分都会明确说明这一点。
  • 不是代码检查器或格式化工具。确定性工具集(npx/uvx封装的CLI)负责代码检查和死代码检测。本技能在这些信号之上叠加LLM判定。
  • 不是Anthropic的远程
    /ultrareview
    。界面不同——本技能在用户订阅的会话内运行;
    /ultrareview
    在远程沙箱中运行并按次计费。
  • 不是修复工具。默认仅生成报告。
    --apply-safe
    仅支持三种精准写入;其他修复需转至
    /apex
    /oneshot

References

参考文档

  • Reviewer primitives
    references/anthropic-verbatim.md
    (rubric + HIGH SIGNAL + false-positive taxonomy),
    references/axes-overview.md
    (8 axes + Coherence + inter-axis precedence),
    references/axes/<name>.md
    (per-axis briefs),
    references/orchestration.md
    (Phase 3 + 4 + 3.5 prepare CLIs and bundle schemas).
  • Opt-in flags
    references/ultra-execution.md
    covers
    --verify-build
    ,
    --mutation-test
    ,
    --reconcile
    ,
    --apply-safe
    in full.
  • Scripts
    scope.py
    (Phase 1),
    run_battery.sh
    +
    battery_ingest.py
    (Phase 2),
    axis_dispatch.py
    (Phase 3),
    run_validators.py
    (Phase 4),
    synthesize.py
    +
    synthesis_core.py
    +
    findings_to_jsonl.py
    (Phase 5). Opt-in:
    run_build_verify.py
    ,
    run_mutation.sh
    ,
    derivation/run.py
    ,
    apply_safe/{version_sync,description_sync,failing_test_writer}.py
    .
  • 审查者原语
    references/anthropic-verbatim.md
    (评分标准+高信号+误判分类)、
    references/axes-overview.md
    (8个维度+一致性维度+维度间优先级)、
    references/axes/<name>.md
    (各维度说明)、
    references/orchestration.md
    (阶段3+4+3.5准备CLI和包结构)。
  • 可选标志
    references/ultra-execution.md
    详细介绍
    --verify-build
    --mutation-test
    --reconcile
    --apply-safe
  • 脚本
    scope.py
    (阶段1)、
    run_battery.sh
    +
    battery_ingest.py
    (阶段2)、
    axis_dispatch.py
    (阶段3)、
    run_validators.py
    (阶段4)、
    synthesize.py
    +
    synthesis_core.py
    +
    findings_to_jsonl.py
    (阶段5)。可选脚本:
    run_build_verify.py
    run_mutation.sh
    derivation/run.py
    apply_safe/{version_sync,description_sync,failing_test_writer}.py

Gotchas

注意事项

  1. Sub-80 findings can be dropped instead of surfaced in
    ### ⚠️ Unverified
    .
    The A2 contract (
    scripts/synthesis_core.py:apply_a2
    ) is no-silent-drop. The model sometimes treats a sub-80 score as a rejection signal and omits the finding entirely. Fix: scan the
    ### ⚠️ Unverified
    section explicitly on every report; compare finding count to axis output to catch drops.
  2. First-run
    npx
    /
    uvx
    downloads add latency.
    Cold start adds ~5s per tool the first time the battery touches it; subsequent runs are fast (cached at
    ~/.npm/_npx
    /
    ~/.cache/uv
    ). The README install table documents this so users don't fear repeated downloads.
  3. Coherence activates silently on metadata changes. A single
    package.json
    touch triggers the 9th subagent automatically. Watch for the
    Coherence axis: active
    line in the report header — it tells you the axis ran without you asking.
  4. --reconcile @auto
    skips silently on malformed planning artifacts.
    A forge or apex file with broken YAML frontmatter (unclosed
    ---
    , tab indentation, unquoted colons) is dropped from the auto-detect list. Verify with
    head -20 ~/.claude/output/{project}/forge/forge-*.md
    before relying on
    @auto
    .
  1. 评分低于80的结果可能被丢弃而非显示在
    ### ⚠️ Unverified
    。A2约定(
    scripts/synthesis_core.py:apply_a2
    )要求无静默丢弃。模型有时会将评分低于80视为拒绝信号并完全省略结果。修复方法:每次报告都明确检查
    ### ⚠️ Unverified
    部分;对比结果数量与维度输出以发现丢失的结果。
  2. 首次运行
    npx
    /
    uvx
    下载会增加延迟
    。冷启动时每个工具首次运行会增加约5秒延迟;后续运行会很快(缓存于
    ~/.npm/_npx
    /
    ~/.cache/uv
    )。README安装表会说明这一点,避免用户担心重复下载。
  3. 元数据变更时一致性维度会自动激活。仅修改
    package.json
    就会自动触发第9个子代理。注意报告标题中的
    Coherence axis: active
    行——它会告知你该维度已运行,即使你未主动触发。
  4. --reconcile @auto
    会在规划产物格式错误时静默跳过
    。YAML前置格式错误的forge或apex文件(未闭合
    ---
    、制表符缩进、未加引号的冒号)会被从自动检测列表中移除。依赖
    @auto
    之前请用
    head -20 ~/.claude/output/{project}/forge/forge-*.md
    验证。