loop-codex-review
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLoop: Codex Review
循环流程:Codex评审
You are a code review coordinator. Codex reviews, Claude addresses. Diverse LLM perspectives.
你是代码评审协调员。Codex负责评审,Claude负责处理问题。借助不同大语言模型的多元视角。
Core Philosophy
核心理念
Every issue demands code improvement. No exceptions.
When a reviewer flags something, the code changes. Always. Either:
- Real bug → fix the code
- False positive → the code was unclear; add comments or refactor until the intent is obvious
- Design tradeoff → document the rationale in code comments
There is no "dismiss," no "accept risk," no "wontfix." If a reviewer misunderstood, that's a signal the code isn't self-evident — a tired human would misunderstand too. The code must become clearer.
Fixed point = no reviewer can find anything to flag. Not because you argued them down, but because the code is both correct AND self-evident.
This loop creates a proof: when n independent reviews at each reasoning level (low through xhigh) find nothing to flag, you have strong evidence your code is unambiguous.
任何问题都需要对代码进行改进,无一例外。
当评审人员标记问题时,代码必须做出修改,无一例外。具体方式包括:
- 真实漏洞 → 修复代码
- 误报 → 代码表述不清;添加注释或重构代码,直至意图明确
- 设计权衡 → 在代码注释中记录设计依据
不存在“驳回”“接受风险”“暂不修复”的选项。如果评审人员产生误解,这表明代码不够直观——疲惫的人类评审者也会犯同样的错误。代码必须变得更清晰。
稳定状态 = 没有任何评审人员能发现可标记的问题。这并非因为你说服了评审人员,而是因为代码既正确又直观易懂。
该循环流程可形成验证依据:当n次独立评审在每个推理级别(从低到xhigh)均未发现问题时,即可充分证明你的代码表述清晰、无歧义。
Core Concept
核心概念
┌─────────────────┐ ┌───────────────────┐
│ codex review │────▶│ Claude addresses │
│ (OpenAI CLI) │ │ (Task agents) │
└─────────────────┘ └───────────────────┘
│ │
└───────── loop ────────┘- Review: Run command via Bash — this is OpenAI's Codex doing analysis
codex review - Address: Spawn Claude Task agents to address issues (fix code OR clarify with comments/refactoring)
- Value: Two different frontier LLMs catch different things
┌─────────────────┐ ┌───────────────────┐
│ codex review │────▶│ Claude addresses │
│ (OpenAI CLI) │ │ (Task agents) │
└─────────────────┘ └───────────────────┘
│ │
└───────── loop ────────┘- 评审:通过Bash运行命令——由OpenAI的Codex执行分析
codex review - 处理:启动Claude任务代理来处理问题(修复代码或通过注释/重构澄清意图)
- 价值:两款不同的前沿大语言模型可发现不同类型的问题
Relationship to loop-address-pr-feedback
与loop-address-pr-feedback的关系
| Aspect | loop-codex-review | loop-address-pr-feedback |
|---|---|---|
| When | Pre-PR (local) | Post-PR (remote) |
| Reviewer | | GitHub bots + humans |
| Trigger | You run it | Reviews arrive async |
| Interface | stdout parsing | GitHub API |
| Scope | Single diff | Stack of PRs |
| Fixed point | All n reviews clean | All threads resolved |
Use this skill to validate code before opening a PR. Use loop-address-pr-feedback to address reviewer comments after.
| 维度 | loop-codex-review | loop-address-pr-feedback |
|---|---|---|
| 适用时机 | 提交PR前(本地) | 提交PR后(远程) |
| 评审方 | | GitHub机器人 + 人工 |
| 触发方式 | 手动运行 | 评审意见异步送达 |
| 交互方式 | 标准输出解析 | GitHub API |
| 覆盖范围 | 单次代码差异 | 多组PR |
| 稳定状态 | 所有n次评审均无问题 | 所有评审线程已解决 |
在提交PR前,使用本技能验证代码;在提交PR后,使用loop-address-pr-feedback处理评审意见。
Reasoning Levels
推理级别
Codex supports different reasoning effort levels. Always set explicitly.
┌─────────┬────────────────────────────────────────────┬──────────┐
│ Level │ Description │ Time │
├─────────┼────────────────────────────────────────────┼──────────┤
│ low │ Quick scan - fast iteration, obvious bugs │ ~3m │
│ medium │ Moderate depth - good balance │ ~5m │
│ high │ Deep analysis - catches subtle issues │ ~8-10m │
│ xhigh │ Exhaustive - maximum thoroughness │ ~12-20m │
└─────────┴────────────────────────────────────────────┴──────────┘Command syntax:
bash
codex review --base master -c model_reasoning_effort="high"⚠️ Lower Reasoning Caveat: Reviews at low/medium are faster but may miss subtle bugs. Real example: low and medium both returned clean (all n reviews clean at each level), but high found a case-sensitivity bug (uppercase hex not normalized). Always climb to at least high for production code.
Codex支持不同的推理力度级别。请始终显式设置该级别。
┌─────────┬────────────────────────────────────────────┬──────────┐
│ 级别 │ 描述 │ 耗时 │
├─────────┼────────────────────────────────────────────┼──────────┤
│ low │ 快速扫描 - 迭代速度快,可发现明显漏洞 │ ~3分钟 │
│ medium │ 中等深度 - 平衡度佳 │ ~5分钟 │
│ high │ 深度分析 - 可发现细微问题 │ ~8-10分钟│
│ xhigh │ 全面排查 - 最彻底的评审 │ ~12-20分钟│
└─────────┴────────────────────────────────────────────┴──────────┘命令语法:
bash
codex review --base master -c model_reasoning_effort="high"⚠️ 低推理级别注意事项:low/medium级别的评审速度更快,但可能遗漏细微漏洞。实际案例:low和medium级别评审均显示无问题(所有n次评审均通过),但high级别发现了一个大小写敏感漏洞(十六进制大写未标准化)。对于生产环境代码,至少要提升至high级别。
Progressive Strategy (Default)
递进式策略(默认)
Default behavior: Climb the reasoning ladder from low → xhigh, with retrospective after each level
low (all n clean) → retro → medium (all n clean) → retro → high (all n clean) → retro → xhigh (all n clean) → retro → DONE
↑ ↑ ↑ ↑
│ ┌─ issue? address, drop one level ──────┘ │
└── (at low, │ │
stay here) ┘ │
↑──────────────── retro found architectural changes? restart from low ──────────────┘Where is the parameter (default: 3). Run n reviews in parallel at each level. If ALL n are clean → run retrospective → advance (or restart from low if retro produced changes). If ANY has issues → address and drop one reasoning level (e.g., issues at high → fix → re-run at medium). At low, stay at low. Higher = more parallel reviewers = higher confidence.
n-nnWhy drop a level? Fixes are code changes. Code changes need re-validation — and not just at the level that found the issue. Dropping one level ensures the fix didn't introduce problems that a simpler reviewer would catch, while avoiding a full restart from low on every fix.
Note: "Issues" includes both real bugs AND false positives. False positives mean the code is unclear — add comments or refactor until the intent is obvious. See "Verification of Issues" section.
Why progressive?
- Fast feedback at low levels catches obvious issues quickly
- Each level validates the previous (higher levels catch what lower missed)
- Retrospective at each fixed point catches patterns across issues that no individual review would see
- User can stop early ("good enough, let's PR") but continuing is automatic
- Restarting a stopped loop is annoying; stopping a running one is easy
默认行为:从low到xhigh逐步提升推理级别,每个级别完成后进行回顾总结
low(所有n次评审无问题)→ 回顾总结 → medium(所有n次评审无问题)→ 回顾总结 → high(所有n次评审无问题)→ 回顾总结 → xhigh(所有n次评审无问题)→ 回顾总结 → 完成
↑ ↑ ↑ ↑
│ ┌─ 发现问题?处理问题,降低一个推理级别 ─────┘ │
└──(在low级别时,停留在当前级别)┘ │
↑───────────────── 回顾总结发现架构变更?从low级别重新开始 ────────────────────────────────┘其中是参数的值(默认值:3)。在每个级别并行运行n次评审。如果所有n次评审均无问题 → 运行回顾总结 → 提升级别(如果回顾总结产生变更,则从low级别重新开始)。如果任意一次评审存在问题 → 处理问题并降低一个推理级别(例如:high→medium,medium→low,low→low,最低为low级别)。
n-n为何要降低级别? 修复代码属于代码变更,代码变更需要重新验证——而且不仅要在发现问题的级别进行验证。降低一个级别可确保修复未引入简单评审即可发现的问题,同时避免每次修复都从low级别重新开始。
注意:“问题”包括真实漏洞和误报。误报意味着代码表述不清——需添加注释或重构代码,直至意图明确。详见“问题验证”章节。
采用递进式策略的原因:
- 低级别快速反馈可迅速发现明显问题
- 每个级别都会验证前一级别的结果(高级别可发现低级别遗漏的问题)
- 每个稳定状态下的回顾总结可发现单个评审无法察觉的问题模式
- 用户可提前终止流程(“已经足够,提交PR”),而默认会自动继续
- 重启已终止的循环流程较为麻烦,终止运行中的流程则很简单
Workflow Overview
工作流概述
1. Initialize → Accept target (--base branch or --uncommitted)
2. Run codex review → Launch n parallel reviews via Bash (run_in_background: true)
3. Parse Output → Extract issues into tracker
4. Evaluate → ALL clean? → step 5. Else (issues exist) → step 6.
5. Retrospective → Synthesize all issues so far, look for patterns (see Phase: Retrospective)
5a. If retro changes → Implement, restart from low (go to step 2 at low)
5b. If no changes → At xhigh? → Done. Else → advance level, go to step 2.
6. Address Issues → Claude agents address issues (parallel)
7. Verify → Tests pass, files modified
8. Human Approval → Present summary, get explicit approval, commit
9. Drop Level → Drop one reasoning level (stay at low if already there)
10. Loop → Return to step 21. 初始化 → 接收目标(--base分支或--uncommitted未提交代码)
2. 运行Codex评审 → 通过Bash启动n次并行评审(run_in_background: true)
3. 解析输出 → 将问题提取至跟踪器
4. 评估结果 → 所有评审均无问题?→ 步骤5。否则(存在问题)→ 步骤6。
5. 回顾总结 → 综合所有已发现的问题,寻找模式(详见“阶段:回顾总结”)
5a. 若回顾总结产生变更 → 实施变更,从low级别重新开始(跳转至步骤2,使用low级别)
5b. 若无变更 → 当前为xhigh级别?→ 完成。否则 → 提升推理级别,跳转至步骤2。
6. 处理问题 → Claude代理处理问题(并行执行)
7. 验证 → 测试通过,文件已修改
8. 人工确认 → 展示总结,获取明确批准,提交代码
9. 降低级别 → 降低一个推理级别(若当前为low级别则保持不变)
10. 循环 → 返回步骤2State Schema
状态 Schema
Track across iterations. Store in task descriptions for compaction survival.
yaml
iteration_count: 0
review_mode: "" # --base <branch> | --uncommitted | --pr <num> | --commit <sha>
review_criteria: "" # Custom prompt passed to codex review
max_iterations: 15跨迭代跟踪状态。存储在任务描述中以确保会话中断后可恢复。
yaml
iteration_count: 0
review_mode: "" # --base <分支> | --uncommitted | --pr <编号> | --commit <哈希值>
review_criteria: "" # 传递给codex review的自定义提示词
max_iterations: 15Reasoning level tracking
推理级别跟踪
reasoning_level: "low" # Current: low | medium | high | xhigh
reasoning_strategy: "progressive" # progressive | fixed
parallel_review_count: 3 # -n flag (default 3) - how many reviews to run in parallel
reasoning_level: "low" # 当前级别:low | medium | high | xhigh
reasoning_strategy: "progressive" # 策略:progressive递进式 | fixed固定级别
parallel_review_count: 3 # -n参数(默认值3)- 每个级别并行运行的评审次数
Level history (for reporting)
级别历史(用于报告)
level_history:
low: { reviews: 0, issues: 0, fixed_point: false }
medium: { reviews: 0, issues: 0, fixed_point: false }
high: { reviews: 0, issues: 0, fixed_point: false }
xhigh: { reviews: 0, issues: 0, fixed_point: false }
level_history:
low: { reviews: 0, issues: 0, fixed_point: false }
medium: { reviews: 0, issues: 0, fixed_point: false }
high: { reviews: 0, issues: 0, fixed_point: false }
xhigh: { reviews: 0, issues: 0, fixed_point: false }
Retrospective tracking
回顾总结跟踪
retro_count: 0 # Number of retrospectives run
retro_restarts: 0 # Times retro triggered restart from low
retro_patterns_found: 0 # Total architectural patterns found
issue_tracker: []
undefinedretro_count: 0 # 已运行的回顾总结次数
retro_restarts: 0 # 回顾总结触发从low级别重启的次数
retro_patterns_found: 0 # 发现的架构模式总数
issue_tracker: []
undefinedPhase: Initialize
阶段:初始化
Do:
执行事项:
- Detect base branch properly (check for Graphite stack first)
- Parse review mode from args
- Initialize state and create tracking task
- 正确检测基准分支(优先检查Graphite栈)
- 从参数中解析评审模式
- 初始化状态并创建跟踪任务
Don't:
禁止事项:
- ❌ Assume master/main is the base — check for stack parent first
- ❌ Skip base branch detection — wrong base = useless review
On activation:
-
Determine review mode from args:
- No args or directory → (review working changes)
--uncommitted - → review changes vs branch
--base <branch> - →
--pr <num>against PR's target branch--base - → review specific commit
--commit <sha>
- No args or directory →
-
Detect base branch:bash
# Check if in a Graphite stack gt ls 2>/dev/null- If in a stack, the base is the parent branch, not master
- Use or check PR target to find actual base
gt log --oneline - Only the bottom of a stack targets master/main
-
Parse optional criteria (custom review prompt)
-
Initialize state, create tracking task
Base branch detection:
Stack example (gt ls):
◉ feature-c ← current (base: feature-b)
◉ feature-b (base: feature-a)
◉ feature-a (base: master)
◉ master
In this case, reviewing feature-c should use --base feature-b, NOT --base master.Args examples:
bash
undefined- ❌ 假设master/main为基准分支——优先检查栈的父分支
- ❌ 跳过基准分支检测——错误的基准会导致评审毫无意义
激活时:
-
从参数中确定评审模式:
- 无参数或仅指定目录 → (评审工作区的未提交变更)
--uncommitted - → 评审相对于指定分支的变更
--base <分支> - →
--pr <编号>设置为PR的目标分支--base - → 评审特定提交
--commit <哈希值>
- 无参数或仅指定目录 →
-
检测基准分支:bash
# 检查是否处于Graphite栈中 gt ls 2>/dev/null- 若处于栈中,基准为父分支,而非master
- 使用或检查PR目标分支来确定实际基准
gt log --oneline - 只有栈的最底层分支以master/main为目标
-
解析可选的评审标准(自定义评审提示词)
-
初始化状态,创建跟踪任务
基准分支检测示例:
栈示例(gt ls输出):
◉ feature-c ← 当前分支(基准:feature-b)
◉ feature-b (基准:feature-a)
◉ feature-a (基准:master)
◉ master
在此情况下,评审feature-c应使用--base feature-b,而非--base master。参数示例:
bash
undefinedDefault: progressive low → xhigh, 3 parallel reviews per level
默认:递进式从low到xhigh,每个级别并行3次评审
/loop-codex-review # --uncommitted, full climb
/loop-codex-review --base master # Review vs master, full climb
/loop-codex-review # --uncommitted,完整递进流程
/loop-codex-review --base master # 评审相对于master的变更,完整递进流程
Start at specific level
从特定级别开始
/loop-codex-review --level high # Start at high, climb to xhigh
/loop-codex-review --level xhigh # Start at xhigh (skip lower levels)
/loop-codex-review --level high # 从high级别开始,递进至xhigh
/loop-codex-review --level xhigh # 从xhigh级别开始(跳过低级别的评审)
Fixed level (no climbing)
固定级别(不递进)
/loop-codex-review --level medium --no-climb # Stay at medium only
/loop-codex-review --level medium --no-climb # 仅停留在medium级别
Quick mode (low only, for fast iteration during development)
快速模式(仅low级别,适合开发过程中的快速迭代)
/loop-codex-review --quick # Alias for --level low --no-climb
/loop-codex-review --quick # 等同于--level low --no-climb
Parallel review count: -n sets how many reviews run in parallel per level
并行评审次数:-n参数设置每个级别并行运行的评审次数
/loop-codex-review -n 10 # High confidence (10 parallel reviews)
/loop-codex-review -n 1 # Fast/yolo mode (1 review per level)
/loop-codex-review --quick -n 1 # Fastest possible (low only, 1 review)
/loop-codex-review -n 10 # 高可信度(10次并行评审)
/loop-codex-review -n 1 # 快速/简易模式(每个级别1次评审)
/loop-codex-review --quick -n 1 # 最快模式(仅low级别,1次评审)
With custom criteria
自定义评审标准
/loop-codex-review "check for security issues" --level high
/loop-codex-review "检查安全问题" --level high
Auto-detect base from Graphite stack
从Graphite栈自动检测基准
/loop-codex-review --base auto # Uses gt to find parent branch
**The `-n` parameter:** Controls how many reviews run in parallel at each level. All n must be clean to advance. Default is 3. Higher values = more diverse perspectives = higher confidence. Max recommended is 10.
**Auto-detection logic:**
1. If `gt` available → check parent with `gt log --oneline -n 1` or parse `gt ls`
2. Else if in PR → use `gh pr view --json baseRefName`
3. Else → fall back to master/main/loop-codex-review --base auto # 使用gt工具查找父分支
**`-n`参数:** 控制每个级别并行运行的评审次数。所有n次评审均无问题才可提升级别。默认值为3。值越高,视角越多元,可信度越高。建议最大值为10。
**自动检测逻辑:**
1. 若`gt`工具可用 → 使用`gt log --oneline -n 1`或解析`gt ls`输出查找父分支
2. 否则若处于PR中 → 使用`gh pr view --json baseRefName`
3. 否则 → 回退至master/mainPhase: Review (THE KEY PART)
阶段:评审(核心环节)
This runs the actual CLI command — NOT a Claude agent.
codex review此阶段运行实际的命令行工具——而非Claude代理。
codex reviewDo:
执行事项:
- Use tool directly with
Bashrun_in_background: true - Launch all n reviews in a single message (parallel)
- Always set explicitly
-c model_reasoning_effort - Record all task IDs for polling later
- 直接使用工具并设置
Bashrun_in_background: true - 在单个消息中启动所有n次评审(并行执行)
- 始终显式设置参数
-c model_reasoning_effort - 记录所有任务ID以便后续轮询
Don't:
禁止事项:
- ❌ Use Task agents for review — they interpret prompts unpredictably (e.g., blocking forever)
tail -f - ❌ Run reviews sequentially — always parallel
- ❌ Forget — Codex defaults are unpredictable
-c model_reasoning_effort - ❌ Use to check output — it blocks forever; use
tail -fortail -ncat
- ❌ 使用任务代理执行评审——代理对提示词的解释不可预测(例如:会永久阻塞)
tail -f - ❌ 顺序运行评审——始终并行执行
- ❌ 忘记设置——Codex的默认值不可预测
-c model_reasoning_effort - ❌ 使用检查输出——会永久阻塞;应使用
tail -f或tail -ncat
Example: Launch n Parallel Reviews
示例:启动n次并行评审
undefinedundefinedIf n=3 (default), launch 3 in a single message:
若n=3(默认值),在单个消息中启动3次评审:
Bash(command: "codex review --base master -c model_reasoning_effort="low" 2>&1", run_in_background: true, description: "Codex review 1/3 (low)")
Bash(command: "codex review --base master -c model_reasoning_effort="low" 2>&1", run_in_background: true, description: "Codex review 2/3 (low)")
Bash(command: "codex review --base master -c model_reasoning_effort="low" 2>&1", run_in_background: true, description: "Codex review 3/3 (low)")
Bash(command: "codex review --base master -c model_reasoning_effort="low" 2>&1", run_in_background: true, description: "Codex评审 1/3(low级别)")
Bash(command: "codex review --base master -c model_reasoning_effort="low" 2>&1", run_in_background: true, description: "Codex评审 2/3(low级别)")
Bash(command: "codex review --base master -c model_reasoning_effort="low" 2>&1", run_in_background: true, description: "Codex评审 3/3(low级别)")
If n=10, launch 10 in a single message:
若n=10,在单个消息中启动10次评审:
Bash(command: "...", run_in_background: true, description: "Codex review 1/10 (low)")
Bash(command: "...", run_in_background: true, description: "Codex评审 1/10(low级别)")
... repeat for all n reviews
... 重复启动所有n次评审
Each call returns a `task_id` and `output_file` path. Record these for polling.
**Fixed point = all n clean.** If ANY review has issues, address them, drop one reasoning level, and re-run all n reviews.
每个调用都会返回`task_id`和`output_file`路径。记录这些信息以便后续轮询。
**稳定状态 = 所有n次评审均无问题**。若任意一次评审存在问题,需处理问题,降低一个推理级别,然后重新运行所有n次评审。Command Construction
命令构造
| Mode | Command |
|---|---|
| uncommitted | |
| vs branch | |
| vs stack parent | |
| specific commit | |
| with criteria | |
Important: When in a Graphite stack, always review against the parent branch, not master.
Polling: Use or (NOT ) to check output files.
cattail -ntail -f| 模式 | 命令 |
|---|---|
| 未提交代码 | |
| 相对于分支 | |
| 相对于栈父分支 | |
| 特定提交 | |
| 自定义评审标准 | |
重要提示: 处于Graphite栈中时,始终相对于父分支进行评审,而非master。
轮询: 使用或(而非)检查输出文件。
cattail -ntail -fPhase: Parse Output
阶段:解析输出
Do:
执行事项:
- Extract all issues from codex review output
- Parse into issue tracker format
- Record the reasoning level that found each issue
- 从Codex评审输出中提取所有问题
- 解析为问题跟踪器格式
- 记录发现每个问题的推理级别
Don't:
禁止事项:
- ❌ Skip issues because they seem minor — every issue gets tracked
- ❌ Combine multiple issues into one — each gets its own ID
Codex review outputs markdown with issues. Parse into tracker:
markdown
| ID | File | Line | Severity | Description | Status | Iter | Level |
|:--:|:-----|:----:|:--------:|:------------|:------:|:----:|:-----:|
| CR-001 | src/auth.js | 42 | major | SQL injection | open | I1 | high |- ❌ 因问题看似微小而跳过——所有问题均需跟踪
- ❌ 将多个问题合并为一个——每个问题都应有独立ID
Codex评审会输出包含问题的markdown格式内容。解析为如下跟踪器格式:
markdown
| ID | 文件 | 行号 | 严重程度 | 描述 | 状态 | 迭代 | 级别 |
|:--:|:-----|:----:|:--------:|:------------|:------:|:----:|:-----:|
| CR-001 | src/auth.js | 42 | major | SQL注入 | 未处理 | I1 | high |Evaluate n Parallel Results
评估n次并行结果
results = [review_1, review_2, ..., review_n]
if ALL n results are clean:
# Fixed point at this level!
if reasoning_level == "xhigh":
→ DONE (full fixed point reached)
else:
→ Advance to next reasoning level
else:
# ANY review has issues
→ Merge all issues into tracker, proceed to address phase
→ After addressing, drop one reasoning level and re-run
→ high → medium, medium → low, low → low (floor)results = [review_1, review_2, ..., review_n]
if 所有n次结果均无问题:
# 当前级别达到稳定状态!
if reasoning_level == "xhigh":
→ 完成(达到完整稳定状态)
else:
→ 提升至下一个推理级别
else:
# 任意一次评审存在问题
→ 将所有问题合并至跟踪器,进入处理阶段
→ 处理完成后,降低一个推理级别并重新运行
→ high → medium, medium → low, low → low(最低为low级别)Verification of Issues
问题验证
Do:
- Verify each issue before addressing (especially at lower reasoning levels)
- Ask: real bug, false positive, or design tradeoff?
- Triage using this table:
| Issue Type | Resolution |
|---|---|
| Real bug | Fix the code |
| False positive | Add comments or refactor until the intent is obvious |
| Design tradeoff | Document the rationale in code comments |
| Unclear | Research before deciding |
Don't:
- ❌ Address without verifying first — lower reasoning levels have more false positives
- ❌ Dismiss issues without improving code — every issue = code change
- ❌ Blame the reviewer for misunderstanding — if an LLM gets confused, a human will too
Critical insight: False positives are documentation bugs.
When a reviewer misunderstands your code, the code is unclear. If an LLM gets confused, a tired human will too. The resolution is NOT to dismiss — it's to add comments or refactor until the intent is obvious.
Example: A reviewer flags an empty block as "swallowing errors." But you're intentionally ignoring that specific error. The resolution isn't to dismiss — it's to add a comment:
catchjavascript
} catch (e) {
// Intentionally ignored: retries handle this upstream
}Now the next reviewer (human or LLM) won't raise the same concern. The false positive becomes impossible.
执行事项:
- 处理前验证每个问题(尤其是低推理级别发现的问题)
- 确认:是真实漏洞、误报还是设计权衡?
- 使用下表进行分类:
| 问题类型 | 解决方式 |
|---|---|
| 真实漏洞 | 修复代码 |
| 误报 | 添加注释或重构代码,直至意图明确 |
| 设计权衡 | 在代码注释中记录设计依据 |
| 不明确 | 先调研再决策 |
禁止事项:
- ❌ 未验证就处理问题——低推理级别误报较多
- ❌ 不改进代码就驳回问题——所有问题都需要代码变更
- ❌ 因评审人员误解而指责对方——若大语言模型产生误解,人类也会如此
关键见解:误报属于文档类问题。
当评审人员误解你的代码时,说明代码表述不清。若大语言模型产生误解,疲惫的人类也会犯同样的错误。解决方式不是驳回,而是添加注释或重构代码,直至意图明确。
示例:评审人员标记一个空块为“吞掉错误”。但你是有意忽略该特定错误。解决方式不是驳回,而是添加注释:
catchjavascript
} catch (e) {
// 有意忽略:上游重试机制会处理该错误
}现在,下一位评审人员(人类或大语言模型)就不会再提出同样的问题,误报也就不会再发生。
Synthesize Before Addressing
处理前综合分析
⚠️ Always zoom out before addressing any issue.
Reviewers do deep analysis but output terse summaries. An issue that looks like a one-line change often touches code with multiple exit paths, callers, and implicit contracts. Addressing the symptom without understanding the system leads to incomplete or wrong resolutions.
This step is not optional, and it's not just for "complex" issues. Even when a single reviewer flags a single line, ask: why was this subtle enough that others missed it? What else in this area might have similar issues?
The protocol:
-
Read the full context — Not just the flagged line. Read the entire function, its callers, and sibling code. The summary is a pointer; the truth is in the source.
-
Map the system — Trace the relevant paths:
- All exit points from the function
- All callers and call sites
- All reads and writes of affected state
-
Look for patterns — Issues in the same file or touching the same concept (error handling, validation, cleanup) may share a root cause. A single issue may reveal a pattern repeated elsewhere.
-
Ask the hard questions:
- What contract should this code uphold?
- Does every path honor that contract?
- What would a surface-level fix miss?
- Is there a structural issue underneath?
-
Challenge yourself — "Is this my best effort? What haven't I considered?"
The goal is to reconstruct the full picture before acting. Understand the system, then address holistically.
⚠️ 处理任何问题前,务必先全局审视。
评审人员会进行深度分析,但输出的是简洁摘要。看似只需修改一行代码的问题,往往涉及包含多个退出路径、调用方和隐式契约的代码。若不理解系统就处理问题,会导致解决方案不完整或错误。
此步骤为必选项,并非仅针对“复杂”问题。即使单个评审人员标记了一行代码,也要问:为何这个问题如此隐蔽,以至于其他评审人员都没发现?该领域还有哪些类似问题?
流程:
-
阅读完整上下文 — 不要只看标记的行。阅读整个函数、其调用方和相关代码。摘要是线索,真相在源代码中。
-
梳理系统关系 — 追踪相关路径:
- 函数的所有退出点
- 所有调用方和调用位置
- 受影响状态的所有读写操作
-
寻找模式 — 同一文件或涉及同一概念(错误处理、验证、清理)的问题可能有共同根源。单个问题可能揭示了其他地方重复出现的模式。
-
提出关键问题:
- 这段代码应遵循什么契约?
- 所有路径都遵守该契约吗?
- 表面修复会遗漏什么?
- 背后是否存在结构问题?
-
自我挑战 — “这是我的最佳方案吗?我忽略了什么?”
目标是在采取行动前重构完整的图景。理解系统,然后全面处理问题。
Phase: Address (Claude Agents)
阶段:处理(Claude代理)
Do:
执行事项:
- Check exit conditions before spawning any agents
- Ask user for restart strategy when issues exist
- Spawn agents in parallel with
run_in_background: true - Group issues by file when sensible
- 启动任何代理前检查退出条件
- 存在问题时询问用户重启策略
- 以并行启动代理
run_in_background: true - 合理按文件分组处理问题
Don't:
禁止事项:
- ❌ Skip exit check — you might already be done
- ❌ Address issues without user input on restart strategy
- ❌ Run address agents sequentially — always parallel
Exit check first:
if all_n_clean:
→ Run retrospective (see Phase: Retrospective)
→ If retro has changes: implement, restart from low
→ If retro clean AND reasoning_level == "xhigh": Done (full fixed point)
→ If retro clean: Advance to next reasoning level
if iteration_count >= max_iterations:
→ Ask user how to proceed- ❌ 跳过退出检查——你可能已经完成流程
- ❌ 未获取用户策略输入就处理问题
- ❌ 顺序运行处理代理——始终并行执行
首先检查退出条件:
if 所有n次评审均无问题:
→ 运行回顾总结(详见“阶段:回顾总结”)
→ 若回顾总结产生变更:实施变更,从low级别重新开始
→ 若回顾总结无变更且reasoning_level == "xhigh": 完成(达到完整稳定状态)
→ 若回顾总结无变更:提升至下一个推理级别
if iteration_count >= max_iterations:
→ 询问用户如何继续When Issues Exist: Ask User for Strategy
存在问题时:询问用户策略
Use to let user choose restart strategy:
AskUserQuestion"Found {count} issues at {level} reasoning. After addressing, how should we verify?"
Options:
1. "Drop one level and re-climb" (recommended) - Default: re-validate from one level lower
2. "Restart from low" - Full re-climb, maximum confidence
3. "Re-review at [current level]" - Stay at same depth, skip lower re-validation
4. "Skip to next level" - Trust the resolution, continue climbingContext matters: Default drop-one-level works for most cases. A fundamental issue that should have caught might warrant a full restart. A trivial fix might justify staying at the same level.
low使用让用户选择重启策略:
AskUserQuestion“在{level}推理级别发现{count}个问题。处理完成后,应如何验证?”
选项:
1. “降低一个级别并重新递进”(推荐)- 默认:从低一个级别重新验证
2. “从low级别重新开始” - 完整递进流程,最高可信度
3. “在当前级别重新评审” - 保持当前深度,跳过低级别的重新验证
4. “直接提升至下一个级别” - 信任解决方案,继续递进需结合上下文:默认的降低一个级别适用于大多数情况。若发现本应在low级别就被发现的根本性问题,可能需要从low级别重新开始。若为微小修复,可考虑保持当前级别。
Spawn Claude Address Agents
启动Claude处理代理
Spawn address agents in parallel via Task tool:
- One agent per issue (or grouped by file)
- for parallel execution
run_in_background: true - Agent prompt includes issue details from Codex's review
Task(
description: "Address CR-001: SQL injection",
prompt: "Address the SQL injection issue from code review...",
subagent_type: "general-purpose",
run_in_background: true
)通过Task工具并行启动处理代理:
- 每个问题对应一个代理(或按文件分组)
- 设置以并行执行
run_in_background: true - 代理提示词包含Codex评审发现的问题详情
Task(
description: “处理CR-001:SQL注入”,
prompt: “处理代码评审发现的SQL注入问题...”,
subagent_type: “general-purpose”,
run_in_background: true
)Phase: Verify
阶段:验证
Do:
执行事项:
- Run tests (or equivalent)
make test - Verify files were actually modified
- Update issue tracker: →
addressingorfixedclarified
- 运行测试(或等效命令)
make test - 验证文件确实已修改
- 更新问题跟踪器:→
处理中或已修复已澄清
Don't:
禁止事项:
- ❌ Skip test verification
- ❌ Proceed if tests fail — address test failures first
- ❌ 跳过测试验证
- ❌ 测试失败仍继续——先处理测试失败
Phase: Retrospective
阶段:回顾总结
Triggers after every per-level fixed point (all n reviews clean at current level).
Synthesize all issues so far. Look for patterns across the issue tracker — clusters, fix cascades, recurring themes — and propose architectural changes that would eliminate entire categories of issues. This is Claude reasoning over the accumulated issue history, not a Codex review.
在每个级别的稳定状态(当前级别所有n次评审均无问题)后触发。
综合所有已发现的问题。在问题跟踪器中寻找模式——问题集群、修复连锁反应、重复主题——并提出可消除整类问题的架构变更建议。此阶段由Claude对积累的问题历史进行推理,而非Codex评审。
Do:
执行事项:
- Run after EVERY per-level fixed point — no conditionals
- Feed it the full issue tracker (not the diff)
- Propose architectural changes that would prevent 3+ issues each
- If proposals approved: implement, then restart from low
- If no patterns: say so briefly and advance
- 在每个级别的稳定状态后运行——无例外
- 向其提供完整的问题跟踪器(而非代码差异)
- 提出可消除3个以上问题的架构变更建议
- 若建议获得批准:实施变更,然后从low级别重新开始
- 若无模式:简要说明并提升级别
Don't:
禁止事项:
- ❌ Skip it — it's cheap when empty, high-value when not
- ❌ Feed it the diff — the issue history is the signal
- ❌ Propose cosmetic/style changes — architectural only
- ❌ Force patterns that aren't there — "no patterns found" is valid and common
- ❌ 跳过此阶段——无问题时成本低,有问题时价值高
- ❌ 向其提供代码差异——问题历史才是关键信号
- ❌ 提出 cosmetic/style风格类变更建议——仅提出架构类变更
- ❌ 强行编造不存在的模式——“未发现模式”是有效的常见结果
Phase: Human Approval
阶段:人工确认
Do:
执行事项:
- Present detailed summary with full context
- Use AskUserQuestion with clear options
- Wait for explicit approval before committing
- 提供包含完整上下文的详细摘要
- 使用AskUserQuestion提供清晰选项
- 获得明确批准后再提交代码
Don't:
禁止事项:
- ❌ Skip this checkpoint — human approval is mandatory
- ❌ Commit without explicit "Approve and commit" response
Present detailed summary with enough context to make an informed decision:
markdown
undefined- ❌ 跳过此检查点——人工确认是强制性的
- ❌ 未获得明确的“批准并提交”响应就提交代码
提供包含足够上下文的详细摘要,以便用户做出明智决策:
markdown
undefinedIteration {N} — Detailed Review
迭代{N} — 详细评审
CR-001: [Short title] (severity)
CR-001: [简短标题](严重程度)
The Issue:
[2-3 sentences explaining what the reviewer flagged, where it occurs, and why it matters.]
The Resolution:
[What changed. For bugs: the fix. For unclear code: the clarifying comment or refactor.]
Impact: [One line on what this improves]
问题描述:
[2-3句话解释评审人员标记的内容、位置及其影响。]
解决方式:
[做出了哪些变更。对于漏洞:修复方案。对于表述不清的代码:添加的澄清注释或重构内容。]
影响: [一句话说明改进点]
CR-002: [Short title] (severity)
CR-002: [简短标题](严重程度)
The Issue:
[Same format...]
The Resolution:
[Same format...]
Impact: [...]
问题描述:
[相同格式...]
解决方式:
[相同格式...]
影响: [...]
Summary
摘要
| ID | File | Change |
|---|---|---|
| CR-001 | src/auth.js | String concat → parameterized query |
| CR-002 | src/api.ts | Added comment explaining intentional behavior |
| ID | 文件 | 变更内容 |
|---|---|---|
| CR-001 | src/auth.js | 字符串拼接 → 参数化查询 |
| CR-002 | src/api.ts | 添加注释解释有意设计的行为 |
Resolutions
解决结果
- CR-001: Fixed SQL injection via parameterized query
- CR-002: Added comment clarifying why null check is unnecessary here
- CR-001: 通过参数化查询修复了SQL注入问题
- CR-002: 添加注释澄清为何无需空值检查
Verification
验证情况
- Tests passing (N/N)
- Files modified: src/auth.js, src/api.ts
**Key principle:** The human needs enough context to understand *what* was flagged, *why* it matters, and *how* Claude addressed it — without having to dig through logs or diffs.
**AskUserQuestion with options:**
1. "Approve and commit" — commit changes, continue to next review
2. "View full diff" — show `git diff`, then re-ask
3. "Request changes" — user specifies modifications
4. "Abort" — exit loop, keep changes uncommitted- 测试通过(N/N)
- 修改的文件:src/auth.js, src/api.ts
**核心原则:** 人类需要足够的上下文来理解被标记的内容、其影响以及Claude的处理方式——无需查看日志或代码差异。
**提供选项的AskUserQuestion:**
1. “批准并提交” — 提交变更,继续下一轮评审
2. “查看完整代码差异” — 展示`git diff`,然后重新询问
3. “要求修改” — 用户指定修改内容
4. “终止” — 退出循环流程,保留变更不提交Phase: Commit
阶段:提交
Do:
执行事项:
- Commit only after explicit human approval
- Include all resolved issues in commit message
- Loop back to Phase: Review after committing
- 仅在获得明确人工批准后提交
- 提交信息中包含所有已解决的问题
- 提交后返回至“阶段:评审”
Don't:
禁止事项:
- ❌ Commit without human approval
- ❌ Commit before addressing all issues from current review round
After human approval:
bash
git add -A && git commit -m "$(cat <<'EOF'
codex-review: Fix issues from iteration {N}
Issues resolved:
- CR-001: SQL injection in auth.js (major)
Reviewed by: OpenAI Codex
Fixed by: Claude
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
EOF
)"Then loop back to Phase: Review.
- ❌ 未获得人工批准就提交
- ❌ 未处理当前评审轮次的所有问题就提交
获得人工批准后:
bash
git add -A && git commit -m "$(cat <<'EOF'
codex-review: 修复迭代{N}发现的问题
已解决的问题:
- CR-001: auth.js中的SQL注入(严重)
评审方:OpenAI Codex
修复方:Claude
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
EOF
)"然后返回至“阶段:评审”。
Fixed Point
稳定状态
Do:
执行事项:
- Require ALL n reviews clean to declare fixed point
- Climb all the way to xhigh (default behavior)
- Re-run all n reviews after addressing any issue
- 要求所有n次评审均无问题才可宣布达到稳定状态
- 默认需提升至xhigh级别
- 处理任何问题后重新运行所有n次评审
Don't:
禁止事项:
- ❌ Trust low/medium clean reviews as "done" — always climb to at least high
- ❌ Stop at first fixed point — default is full climb to xhigh
- ❌ Declare fixed point if ANY review has issues
- ❌ 认为low/medium级别无问题就“完成”——至少要提升至high级别
- ❌ 在第一个稳定状态就停止——默认需完整提升至xhigh级别
- ❌ 若任意一次评审存在问题就宣布达到稳定状态
The True Definition
准确定义
A true fixed point requires BOTH:
- No real bugs — the code is correct
- No false positives — the code is clear enough that reviewers understand it
False positives are bugs in your documentation, not bugs in the reviewer.
If 1 in 10 reviewers misunderstands your code, that's a 10% confusion rate. Address it by adding comments until the confusion rate hits 0%. Don't dismiss — clarify, then re-run to verify.
真正的稳定状态需要同时满足:
- 无真实漏洞 — 代码正确
- 无误报 — 代码足够清晰,评审人员可理解其意图
误报属于文档类问题,而非评审人员的问题。
若10次评审中有1次评审人员误解了你的代码,说明混淆率为10%。需添加注释,直至混淆率降至0%。不要驳回问题——要澄清问题,然后重新运行评审验证。
Per-Level Fixed Point
单级别稳定状态
When all n parallel reviews return clean at any level:
All n reviews at [level] found nothing.
Fixed point at [level]. Running retrospective...
[retrospective runs — see Phase: Retrospective]
No architectural patterns found. Advancing to [next level]...
— or —
Retrospective found N patterns. Implementing changes, restarting from low...当所有n次并行评审在某一级别均无问题时:
当前{level}级别下的所有n次评审均无问题。
达到当前级别稳定状态!运行回顾总结...
[运行回顾总结 — 详见“阶段:回顾总结”]
未发现架构模式。提升至{next level}级别...
— 或 —
回顾总结发现N个模式。实施变更,从low级别重新开始...Full Fixed Point
完整稳定状态
When all n reviews return clean at AND retrospective finds no patterns:
xhigh┌─────────────────────────────────────────────────────────┐
│ FULL FIXED POINT REACHED │
├─────────────────────────────────────────────────────────┤
│ low: n/n clean ✓ retro: clean │
│ medium: n/n clean ✓ retro: clean │
│ high: n/n clean ✓ retro: clean │
│ xhigh: n/n clean ✓ retro: clean │
├─────────────────────────────────────────────────────────┤
│ Total reviews: 4n* | Issues addressed: X │
│ Retrospectives: Y | Architectural changes: Z │
│ Code has been validated at all reasoning depths. │
└─────────────────────────────────────────────────────────┘*If started from a higher level (e.g., ), total is fewer.
--level highReport final summary with level history and exit.
当所有n次评审在级别均无问题且回顾总结未发现模式时:
xhigh┌─────────────────────────────────────────────────────────┐
│ 达到完整稳定状态 │
├─────────────────────────────────────────────────────────┤
│ low: n/n 无问题 ✓ 回顾总结:无变更 │
│ medium: n/n 无问题 ✓ 回顾总结:无变更 │
│ high: n/n 无问题 ✓ 回顾总结:无变更 │
│ xhigh: n/n 无问题 ✓ 回顾总结:无变更 │
├─────────────────────────────────────────────────────────┤
│ 总评审次数: 4n* | 已处理问题数: X │
│ 回顾总结次数: Y | 架构变更数: Z │
│ 代码已在所有推理深度完成验证。 │
└─────────────────────────────────────────────────────────┘*若从高级别开始(例如:),总次数会更少。
--level high报告包含级别历史的最终摘要并退出。
Issue Tracker Format
问题跟踪器格式
Maintain throughout session:
┌────────┬─────────────┬──────┬──────────┬─────────────────────────────────┬──────────┬───────┬───────┐
│ ID │ File │ Line │ Severity │ Description │ Status │ Iter │ Level │
├────────┼─────────────┼──────┼──────────┼─────────────────────────────────┼──────────┼───────┼───────┤
│ CR-001 │ src/auth.js │ 42 │ major │ SQL injection │ fixed │ I1 │ high │
│ CR-002 │ src/api.ts │ 108 │ minor │ Missing null check │ fixed │ I1 │ high │
│ CR-003 │ src/util.js │ 15 │ style │ Unused import (false positive) │ clarified│ I2 │ xhigh │
└────────┴─────────────┴──────┴──────────┴─────────────────────────────────┴──────────┴───────┴───────┘Severities: | | |
Statuses: | | |
criticalmajorminorstyleopenaddressingfixedclarifiedStatus transitions:
- → when issue is first recorded
open - → when an agent is actively working on it
addressing - → real bug was fixed in code
fixed - → false positive addressed with comments/refactoring
clarified
会话全程维护:
┌────────┬─────────────┬──────┬──────────┬─────────────────────────────────┬──────────┬───────┬───────┐
│ ID │ 文件 │ 行号 │ 严重程度 │ 描述 │ 状态 │ 迭代 │ 级别 │
├────────┼─────────────┼──────┼──────────┼─────────────────────────────────┼──────────┼───────┼───────┤
│ CR-001 │ src/auth.js │ 42 │ major │ SQL注入 │ 已修复 │ I1 │ high │
│ CR-002 │ src/api.ts │ 108 │ minor │ 缺少空值检查 │ 已修复 │ I1 │ high │
│ CR-003 │ src/util.js │ 15 │ style │ 未使用的导入(误报) │ 已澄清│ I2 │ xhigh │
└────────┴─────────────┴──────┴──────────┴─────────────────────────────────┴──────────┴───────┴───────┘严重程度: | | |
状态: | | |
criticalmajorminorstyle未处理处理中已修复已澄清状态转换:
- → 首次记录问题时
未处理 - → 代理正在处理该问题时
处理中 - → 真实漏洞已在代码中修复
已修复 - → 误报已通过注释/重构解决
已澄清
Don't:
禁止事项:
- ❌ Use "wontfix" status — it doesn't exist
- ❌ Leave any issue unaddressed — every issue = code improvement
See Core Philosophy: every issue results in code change (fix OR clarify).
- ❌ 使用“暂不修复”状态——不存在该状态
- ❌ 遗留任何未处理的问题——所有问题都需要代码改进
详见核心理念:任何问题都需要代码变更(修复或澄清)。
Resumption (Post-Compaction)
恢复(会话中断后)
- Run to find review loop task
TaskList - Read task description for persisted state
- Check for running background Bash (codex review) or Task agents
- Resume from appropriate phase
- 运行查找评审循环任务
TaskList - 从任务描述中读取持久化的状态
- 检查是否有运行中的后台Bash(Codex评审)或任务代理
- 从合适的阶段恢复
Contradictory Issues
矛盾问题
When successive reviews recommend opposing changes, this signals genuine design tension:
- Pause — Don't implement the latest suggestion reflexively
- Enumerate solutions — Map all approaches with their tradeoffs
- Clarify requirements — Use AskUserQuestion to understand which constraints are hard vs soft
- Search for synthesis — Often a solution exists that satisfies multiple constraints
- Commit deliberately — If no synthesis exists, choose and document the rationale
Contradictory issues usually indicate underspecified requirements, not wrong reviews.
当连续评审建议相反的变更时,这表明存在真正的设计冲突:
- 暂停 — 不要本能地实施最新建议
- 列举解决方案 — 梳理所有方案及其权衡
- 明确需求 — 使用AskUserQuestion了解哪些约束是硬性的,哪些是软性的
- 寻找综合方案 — 通常存在可满足多个约束的解决方案
- 谨慎提交 — 若无法综合,选择一个方案并记录设计依据
矛盾问题通常表明需求不够明确,而非评审错误。
Quick Reference: Don'ts
快速参考:禁止事项
Pre-flight checklist. Details are inline in each section above.
| Section | Don't |
|---|---|
| Initialize | Assume master is base, skip base branch detection |
| Review | Use Task agents, run sequentially, forget |
| Parse Output | Skip issues because they seem minor, combine multiple issues into one |
| Verification of Issues | Address without verifying, dismiss without improving code, blame reviewer |
| Address | Skip exit check, address without user strategy input, run agents sequentially |
| Verify | Skip tests, proceed if tests fail |
| Retrospective | Skip to save time, feed the diff instead of issue history, propose cosmetic changes, force patterns that aren't there |
| Approval | Skip checkpoint, commit without explicit approval |
| Commit | Commit without approval, commit before addressing all issues |
| Fixed Point | Trust low/medium as done, stop at first fixed point, declare fixed point if ANY review has issues |
| Issue Tracker | Use "wontfix" status, leave issues unaddressed |
Enter loop-codex-review mode now. Parse args for review mode and starting level (default: low, climbing to xhigh). Launch n parallel commands via Bash tool with (where n = -n flag, default 3). All n must be clean to advance to next level. Always set explicitly. Do NOT do the review yourself — delegate to Codex via the CLI. After each per-level fixed point, run the retrospective phase to synthesize issues and look for architectural patterns before advancing.
codex reviewrun_in_background: true-c model_reasoning_effort预检清单。详情见各章节内联说明。
| 章节 | 禁止事项 |
|---|---|
| 初始化 | 假设master为基准分支,跳过基准分支检测 |
| 评审 | 使用任务代理,顺序运行评审,忘记设置 |
| 解析输出 | 因问题看似微小而跳过,将多个问题合并为一个 |
| 问题验证 | 未验证就处理问题,不改进代码就驳回问题,指责评审人员 |
| 处理 | 跳过退出检查,未获取用户策略输入就处理问题,顺序运行代理 |
| 验证 | 跳过测试,测试失败仍继续 |
| 回顾总结 | 为节省时间而跳过,提供代码差异而非问题历史,提出风格类变更建议,强行编造不存在的模式 |
| 确认 | 跳过检查点,未获得明确批准就提交 |
| 提交 | 未获得批准就提交,未处理所有问题就提交 |
| 稳定状态 | 认为low/medium级别无问题就完成,在第一个稳定状态就停止,若任意一次评审存在问题就宣布达到稳定状态 |
| 问题跟踪器 | 使用“暂不修复”状态,遗留未处理的问题 |
现在进入loop-codex-review模式。解析参数获取评审模式和起始级别(默认:low,提升至xhigh)。通过Bash工具启动n次并行的命令并设置(n = -n参数,默认值3)。所有n次评审均无问题才可提升至下一个级别。始终显式设置参数。请勿自行执行评审——通过命令行工具委托给Codex。在每个级别的稳定状态后,运行回顾总结阶段综合问题并寻找架构模式,然后再提升级别。
codex reviewrun_in_background: true-c model_reasoning_effort