escalation-governance

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Escalation Governance

模型升级治理

Overview

概述

Model escalation (haiku→sonnet→opus) trades speed/cost for reasoning capability. This trade-off must be justified.

Core principle: Escalation is for tasks that genuinely require deeper reasoning, not for "maybe a smarter model will figure it out."

模型升级（haiku→sonnet→opus）是以速度/成本为代价换取更强的推理能力。这种权衡必须具备充分的合理性。

核心原则： 升级仅适用于真正需要深度推理的任务，而非“说不定更智能的模型能搞定”的情况。

The Iron Law

铁律

NO ESCALATION WITHOUT INVESTIGATION FIRST

Verification: Run the command with

--help

flag to verify availability.

Escalation is never a shortcut. If you haven't understood why the current model is insufficient, escalation is premature.

NO ESCALATION WITHOUT INVESTIGATION FIRST

验证方式： 运行带

--help

参数的命令以确认可用性。

升级绝非捷径。如果你尚未理解当前模型为何无法胜任任务，那么升级操之过急。

When to Escalate

升级适用场景

Legitimate escalation triggers:

Trigger	Description	Example
Genuine complexity	Task inherently requires nuanced judgment	Security policy trade-offs
Reasoning depth	Multiple inference steps with uncertainty	Architecture decisions
Novel patterns	No existing patterns apply	First-of-kind implementation
High stakes	Error cost justifies capability investment	Production deployment
Ambiguity resolution	Multiple valid interpretations need weighing	Spec clarification

合理的升级触发条件：

触发条件	描述	示例
真实复杂度	任务本质上需要细致的判断	安全政策权衡
推理深度	存在不确定性的多步推理任务	架构决策
全新模式	无现有模式可套用	首创性实现
高风险	错误成本足以支撑算力投入	生产环境部署
歧义解决	需要权衡多种合理解读	需求规格澄清

When NOT to Escalate

升级禁止场景

Illegitimate escalation triggers:

Anti-Pattern	Why It's Wrong	What to Do Instead
"Maybe smarter model will figure it out"	This is thrashing	Investigate root cause
Multiple failed attempts	Suggests wrong approach, not insufficient capability	Question your assumptions
Time pressure	Urgency doesn't change task complexity	Systematic investigation is faster
Uncertainty without investigation	You haven't tried to understand yet	Gather evidence first
"Just to be safe"	False safety - wastes resources	Assess actual complexity

不合理的升级触发条件：

反模式	错误原因	替代方案
“说不定更智能的模型能搞定”	这属于盲目尝试	调查根本原因
多次尝试失败	说明方法错误，而非能力不足	质疑你的假设
时间压力	紧迫性不会改变任务复杂度	系统性调查反而更快
未调查就存在不确定性	你尚未尝试理解问题	先收集证据
“求稳妥”	虚假安全，浪费资源	评估实际复杂度

Decision Framework

决策框架

Before escalating, answer these questions:

在升级前，先回答以下问题：

1. Have I understood the problem?

1. 我是否理解问题？

Can I articulate why the current model is insufficient?
Have I identified what specific reasoning capability is missing?
Is this a capability gap or a knowledge gap?

If knowledge gap: Gather more information, don't escalate.

我能否清晰说明当前模型为何无法胜任？
我是否明确了缺失的具体推理能力？
这是能力缺口还是知识缺口？

如果是知识缺口： 收集更多信息，不要升级。

2. Have I investigated systematically?

2. 我是否进行了系统性调查？

Did I read error messages/outputs carefully?
Did I check for similar solved problems?
Did I form and test a hypothesis?

If not investigated: Complete investigation first.

我是否仔细阅读了错误信息/输出结果？
我是否查找过类似的已解决问题？
我是否形成并验证了假设？

如果未调查： 先完成调查。

3. Is escalation the right solution?

3. 升级是否是正确的解决方案？

Would a different approach work at current model level?
Is the task inherently complex, or am I making it complex?
Would breaking the task into smaller pieces help?

If decomposable: Break down, don't escalate.

在当前模型级别，换一种方法是否可行？
任务本身真的复杂，还是我把它复杂化了？
将任务拆分为更小的模块是否有帮助？

如果可拆分： 拆分任务，不要升级。

4. Can I justify the trade-off?

4. 我能否证明这种权衡的合理性？

What's the cost (latency, tokens, money) of escalation?
What's the benefit (accuracy, safety, completeness)?
Is the benefit proportional to the cost?

If not proportional: Don't escalate.

升级的成本是什么（延迟、token消耗、资金）？
升级的收益是什么（准确性、安全性、完整性）？
收益与成本是否成正比？

如果不成正比： 不要升级。

Escalation Protocol

升级流程

When escalation IS justified:

Document the reason - State why current model is insufficient
Specify the scope - What specific subtask needs higher capability?
Define success - How will you know the escalated task succeeded?
Return promptly - Drop back to efficient model after reasoning task

当升级具备合理性时：

记录原因 - 说明当前模型为何无法胜任
明确范围 - 具体哪个子任务需要更高的能力
定义成功标准 - 如何判断升级后的任务已成功完成
及时降级 - 完成推理任务后，切回高效的低阶模型

Common Rationalizations

常见合理化借口

Excuse	Reality
"This is complex"	Complex for whom? Have you tried?
"Better safe than sorry"	Safety theater wastes resources
"I tried and failed"	How many times? Did you investigate why?
"The user expects quality"	Quality comes from process, not model size
"Just this once"	Exceptions become habits
"Time is money"	Systematic approach is faster than thrashing

借口	实际情况
“这个任务很复杂”	对谁而言复杂？你尝试过解决吗？
“稳妥总比出错好”	形式上的安全只会浪费资源
“我试过了但失败了”	试了多少次？你调查过失败原因吗？
“用户期望高质量”	质量来自流程，而非模型规模
“就这一次”	例外会变成习惯
“时间就是金钱”	系统性方法比盲目尝试更快

Agent Schema

Agent 配置 Schema

Agents can declare escalation hints in frontmatter:

yaml

model: haiku
escalation:
  to: sonnet                 # Suggested escalation target
  hints:                     # Advisory triggers (orchestrator may override)
    - security_sensitive     # Touches auth, secrets, permissions
    - ambiguous_input        # Multiple valid interpretations
    - novel_pattern          # No existing patterns apply
    - high_stakes            # Error would be costly

Verification: Run the command with

--help

flag to verify availability.

Key points:

Hints are advisory, not mandatory
Orchestrator has final authority
Orchestrator can escalate without hints (broader context)
Orchestrator can ignore hints (task is actually simple)

Agents 可在前置声明中添加升级提示：

yaml

model: haiku
escalation:
  to: sonnet                 # 建议的升级目标
  hints:                     # 参考触发条件（编排器可覆盖）
    - security_sensitive     # 涉及认证、密钥、权限
    - ambiguous_input        # 存在多种合理解读
    - novel_pattern          # 无现有模式可套用
    - high_stakes            # 错误成本高昂

验证方式： 运行带

--help

参数的命令以确认可用性。

关键点：

提示仅作参考，不具强制性
编排器拥有最终决策权
编排器可在无提示的情况下升级（基于更广泛的上下文）
编排器可忽略提示（任务实际很简单）

Orchestrator Authority

编排器权限

The orchestrator (typically Opus) makes final escalation decisions:

Can follow hints: When hint matches observed conditions Can override to escalate: When context demands it (even without hints) Can override to stay: When task is simpler than hints suggest Can escalate beyond hint: Go to opus even if hint says sonnet

The orchestrator's judgment, informed by conversation context, supersedes static hints.

编排器（通常为Opus）拥有升级的最终决策权：

可遵循提示： 当提示与观察到的条件匹配时 可强制升级： 当上下文需要时（即使无提示） 可强制不升级： 当任务比提示中描述的更简单时 可超提示升级： 即使提示建议升级到sonnet，也可直接升级到opus

编排器基于对话上下文做出的判断，优先级高于静态提示。

Red Flags - STOP and Investigate

危险信号 - 停止并调查

If you catch yourself thinking:

"Let me try with a better model"
"This should be simple but isn't working"
"I've tried everything" (but haven't investigated why)
"The smarter model will know what to do"
"I don't understand why this isn't working"

ALL of these mean: STOP. Investigate first.

如果你出现以下想法，请立即停止：

“我试试用更好的模型”
“这个任务应该很简单但就是不行”
“我已经试过所有方法了”（但未调查原因）
“更智能的模型知道该怎么做”
“我不明白为什么这个任务搞不定”

以上所有情况都意味着：停止。先调查。

Integration with Agent Workflow

与 Agent 工作流的集成

**Verification:** Run the command with `--help` flag to verify availability.
Agent starts task at assigned model
├── Task succeeds → Complete
└── Task struggles →
    ├── Investigate systematically
    │   ├── Root cause found → Fix at current model
    │   └── Genuine capability gap → Escalate with justification
    └── Don't investigate → WRONG PATH
        └── "Maybe escalate?" → NO. Investigate first.

Verification: Run the command with

--help

flag to verify availability.

**验证方式：** 运行带`--help`参数的命令以确认可用性。
Agent 从分配的模型开始执行任务
├── 任务成功 → 完成
└── 任务遇到困难 →
    ├── 进行系统性调查
    │   ├── 找到根本原因 → 在当前模型级别修复
    │   └── 确认存在真实能力缺口 → 提交合理理由后升级
    └── 未调查 → 错误路径
        └── “要不要升级？” → 不行。先调查。

验证方式： 运行带

--help

参数的命令以确认可用性。

Quick Reference

快速参考

Situation	Action
Task inherently requires nuanced reasoning	Escalate
Agent uncertain but hasn't investigated	Investigate first
Multiple attempts failed	Question approach, not model
Security/high-stakes decision	Escalate
"Maybe smarter model knows"	Never escalate on this basis
Hint fires, task is actually simple	Override, stay at current model
No hint fires, task is actually complex	Override, escalate

场景	行动
任务本质需要细致推理	升级
Agent 不确定但未调查	先调查
多次尝试失败	质疑方法，而非模型
安全/高风险决策	升级
“说不定更智能的模型知道”	绝不能以此为理由升级
提示触发但任务实际简单	覆盖提示，保持当前模型
无提示但任务实际复杂	覆盖提示，执行升级

Model Capability Notes

模型能力说明

MCP Tool Search (Claude Code 2.1.7+): Haiku models do not support MCP tool search. If a workflow uses many MCP tools (descriptions exceeding 10% of context), those tools load upfront on haiku instead of being deferred. This can consume significant context. Consider escalating to sonnet for MCP-heavy workflows or ensure haiku agents use only native tools (Read, Write, Bash, etc.).

Claude.ai MCP Connectors (Claude Code 2.1.46+): Users with claude.ai connectors configured may have additional MCP tools auto-loaded, increasing the total tool description footprint. This makes it more likely that haiku agents will exceed the 10% tool search threshold. When escalation decisions involve MCP-heavy workflows, factor in claude.ai connector tool count via

/mcp

Effort Controls as Escalation Alternative (Opus 4.6 / Claude Code 2.1.32+): Opus 4.6 introduces adaptive thinking with effort levels (

low

medium

high

max

). Before escalating between models, consider whether adjusting effort level on the current model would suffice:

Instead of...	Consider...	When
Haiku → Sonnet	Stay on Haiku	Task is still deterministic, just needs more context
Sonnet → Opus	Opus@medium	Moderate reasoning, not deep architectural analysis
Opus@default → "maybe try again"	Opus@max	Genuine complexity that needs maximum reasoning depth

Effort controls do NOT replace the escalation governance framework — they provide an additional axis. The Iron Law still applies: investigate before changing either model or effort level.

MCP工具搜索（Claude Code 2.1.7+）：Haiku模型不支持MCP工具搜索。如果工作流使用大量MCP工具（描述占上下文的10%以上），这些工具会在Haiku模型启动时预先加载，而非延迟加载，这会消耗大量上下文。对于重度依赖MCP的工作流，可考虑升级到Sonnet，或确保Haiku Agents仅使用原生工具（Read、Write、Bash等）。

Claude.ai MCP连接器（Claude Code 2.1.46+）：配置了claude.ai连接器的用户可能会自动加载额外的MCP工具，增加工具描述的总占比。这会使Haiku Agents更易超过10%的工具搜索阈值。当升级决策涉及重度依赖MCP的工作流时，需通过

/mcp

命令考虑claude.ai连接器的工具数量。

作为升级替代方案的算力控制（Opus 4.6 / Claude Code 2.1.32+）：Opus 4.6引入了带算力等级的自适应思考功能（

low

、

medium

、

high

、

max

）。在跨模型升级前，可考虑调整当前模型的算力等级是否足够：

替代升级操作	考虑方案	适用场景
Haiku → Sonnet	保持使用Haiku	任务仍具确定性，只是需要更多上下文
Sonnet → Opus	Opus@medium	中等推理需求，无需深度架构分析
Opus@默认 → “再试一次”	Opus@max	真正复杂的任务，需要最大推理深度

算力控制不能替代升级治理框架——它只是提供了另一个调整维度。铁律仍然适用：在更改模型或算力等级前，必须先调查。

Troubleshooting

故障排除

Common Issues

常见问题

Command not found Ensure all dependencies are installed and in PATH

Permission errors Check file permissions and run with appropriate privileges

Unexpected behavior Enable verbose logging with

--verbose

flag

命令未找到 确保所有依赖已安装并在PATH中

权限错误 检查文件权限，使用适当权限运行

意外行为 添加

--verbose

参数启用详细日志

escalation-governance

Original

Translation

Table of Contents

目录

Escalation Governance

模型升级治理

Overview

概述

The Iron Law

铁律

When to Escalate

升级适用场景

When NOT to Escalate

升级禁止场景

Decision Framework

决策框架

1. Have I understood the problem?

1. 我是否理解问题？

2. Have I investigated systematically?

2. 我是否进行了系统性调查？

3. Is escalation the right solution?

3. 升级是否是正确的解决方案？

4. Can I justify the trade-off?

4. 我能否证明这种权衡的合理性？

Escalation Protocol

升级流程

Common Rationalizations

常见合理化借口

Agent Schema

Agent 配置 Schema

Orchestrator Authority

编排器权限

Red Flags - STOP and Investigate

危险信号 - 停止并调查

Integration with Agent Workflow

与 Agent 工作流的集成

Quick Reference

快速参考

Model Capability Notes

模型能力说明

Troubleshooting

故障排除

Common Issues

常见问题