escalation-governance
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTable of Contents
目录
- Overview
- The Iron Law
- When to Escalate
- When NOT to Escalate
- Decision Framework
- 1. Have I understood the problem?
- 2. Have I investigated systematically?
- 3. Is escalation the right solution?
- 4. Can I justify the trade-off?
- Escalation Protocol
- Common Rationalizations
- Agent Schema
- Orchestrator Authority
- Red Flags - STOP and Investigate
- Integration with Agent Workflow
- Quick Reference
Escalation Governance
模型升级治理
Overview
概述
Model escalation (haiku→sonnet→opus) trades speed/cost for reasoning capability. This trade-off must be justified.
Core principle: Escalation is for tasks that genuinely require deeper reasoning, not for "maybe a smarter model will figure it out."
模型升级(haiku→sonnet→opus)是以速度/成本为代价换取更强的推理能力。这种权衡必须具备充分的合理性。
核心原则: 升级仅适用于真正需要深度推理的任务,而非“说不定更智能的模型能搞定”的情况。
The Iron Law
铁律
NO ESCALATION WITHOUT INVESTIGATION FIRSTVerification: Run the command with flag to verify availability.
--helpEscalation is never a shortcut. If you haven't understood why the current model is insufficient, escalation is premature.
NO ESCALATION WITHOUT INVESTIGATION FIRST验证方式: 运行带参数的命令以确认可用性。
--help升级绝非捷径。如果你尚未理解当前模型为何无法胜任任务,那么升级操之过急。
When to Escalate
升级适用场景
Legitimate escalation triggers:
| Trigger | Description | Example |
|---|---|---|
| Genuine complexity | Task inherently requires nuanced judgment | Security policy trade-offs |
| Reasoning depth | Multiple inference steps with uncertainty | Architecture decisions |
| Novel patterns | No existing patterns apply | First-of-kind implementation |
| High stakes | Error cost justifies capability investment | Production deployment |
| Ambiguity resolution | Multiple valid interpretations need weighing | Spec clarification |
合理的升级触发条件:
| 触发条件 | 描述 | 示例 |
|---|---|---|
| 真实复杂度 | 任务本质上需要细致的判断 | 安全政策权衡 |
| 推理深度 | 存在不确定性的多步推理任务 | 架构决策 |
| 全新模式 | 无现有模式可套用 | 首创性实现 |
| 高风险 | 错误成本足以支撑算力投入 | 生产环境部署 |
| 歧义解决 | 需要权衡多种合理解读 | 需求规格澄清 |
When NOT to Escalate
升级禁止场景
Illegitimate escalation triggers:
| Anti-Pattern | Why It's Wrong | What to Do Instead |
|---|---|---|
| "Maybe smarter model will figure it out" | This is thrashing | Investigate root cause |
| Multiple failed attempts | Suggests wrong approach, not insufficient capability | Question your assumptions |
| Time pressure | Urgency doesn't change task complexity | Systematic investigation is faster |
| Uncertainty without investigation | You haven't tried to understand yet | Gather evidence first |
| "Just to be safe" | False safety - wastes resources | Assess actual complexity |
不合理的升级触发条件:
| 反模式 | 错误原因 | 替代方案 |
|---|---|---|
| “说不定更智能的模型能搞定” | 这属于盲目尝试 | 调查根本原因 |
| 多次尝试失败 | 说明方法错误,而非能力不足 | 质疑你的假设 |
| 时间压力 | 紧迫性不会改变任务复杂度 | 系统性调查反而更快 |
| 未调查就存在不确定性 | 你尚未尝试理解问题 | 先收集证据 |
| “求稳妥” | 虚假安全,浪费资源 | 评估实际复杂度 |
Decision Framework
决策框架
Before escalating, answer these questions:
在升级前,先回答以下问题:
1. Have I understood the problem?
1. 我是否理解问题?
- Can I articulate why the current model is insufficient?
- Have I identified what specific reasoning capability is missing?
- Is this a capability gap or a knowledge gap?
If knowledge gap: Gather more information, don't escalate.
- 我能否清晰说明当前模型为何无法胜任?
- 我是否明确了缺失的具体推理能力?
- 这是能力缺口还是知识缺口?
如果是知识缺口: 收集更多信息,不要升级。
2. Have I investigated systematically?
2. 我是否进行了系统性调查?
- Did I read error messages/outputs carefully?
- Did I check for similar solved problems?
- Did I form and test a hypothesis?
If not investigated: Complete investigation first.
- 我是否仔细阅读了错误信息/输出结果?
- 我是否查找过类似的已解决问题?
- 我是否形成并验证了假设?
如果未调查: 先完成调查。
3. Is escalation the right solution?
3. 升级是否是正确的解决方案?
- Would a different approach work at current model level?
- Is the task inherently complex, or am I making it complex?
- Would breaking the task into smaller pieces help?
If decomposable: Break down, don't escalate.
- 在当前模型级别,换一种方法是否可行?
- 任务本身真的复杂,还是我把它复杂化了?
- 将任务拆分为更小的模块是否有帮助?
如果可拆分: 拆分任务,不要升级。
4. Can I justify the trade-off?
4. 我能否证明这种权衡的合理性?
- What's the cost (latency, tokens, money) of escalation?
- What's the benefit (accuracy, safety, completeness)?
- Is the benefit proportional to the cost?
If not proportional: Don't escalate.
- 升级的成本是什么(延迟、token消耗、资金)?
- 升级的收益是什么(准确性、安全性、完整性)?
- 收益与成本是否成正比?
如果不成正比: 不要升级。
Escalation Protocol
升级流程
When escalation IS justified:
- Document the reason - State why current model is insufficient
- Specify the scope - What specific subtask needs higher capability?
- Define success - How will you know the escalated task succeeded?
- Return promptly - Drop back to efficient model after reasoning task
当升级具备合理性时:
- 记录原因 - 说明当前模型为何无法胜任
- 明确范围 - 具体哪个子任务需要更高的能力
- 定义成功标准 - 如何判断升级后的任务已成功完成
- 及时降级 - 完成推理任务后,切回高效的低阶模型
Common Rationalizations
常见合理化借口
| Excuse | Reality |
|---|---|
| "This is complex" | Complex for whom? Have you tried? |
| "Better safe than sorry" | Safety theater wastes resources |
| "I tried and failed" | How many times? Did you investigate why? |
| "The user expects quality" | Quality comes from process, not model size |
| "Just this once" | Exceptions become habits |
| "Time is money" | Systematic approach is faster than thrashing |
| 借口 | 实际情况 |
|---|---|
| “这个任务很复杂” | 对谁而言复杂?你尝试过解决吗? |
| “稳妥总比出错好” | 形式上的安全只会浪费资源 |
| “我试过了但失败了” | 试了多少次?你调查过失败原因吗? |
| “用户期望高质量” | 质量来自流程,而非模型规模 |
| “就这一次” | 例外会变成习惯 |
| “时间就是金钱” | 系统性方法比盲目尝试更快 |
Agent Schema
Agent 配置 Schema
Agents can declare escalation hints in frontmatter:
yaml
model: haiku
escalation:
to: sonnet # Suggested escalation target
hints: # Advisory triggers (orchestrator may override)
- security_sensitive # Touches auth, secrets, permissions
- ambiguous_input # Multiple valid interpretations
- novel_pattern # No existing patterns apply
- high_stakes # Error would be costlyVerification: Run the command with flag to verify availability.
--helpKey points:
- Hints are advisory, not mandatory
- Orchestrator has final authority
- Orchestrator can escalate without hints (broader context)
- Orchestrator can ignore hints (task is actually simple)
Agents 可在前置声明中添加升级提示:
yaml
model: haiku
escalation:
to: sonnet # 建议的升级目标
hints: # 参考触发条件(编排器可覆盖)
- security_sensitive # 涉及认证、密钥、权限
- ambiguous_input # 存在多种合理解读
- novel_pattern # 无现有模式可套用
- high_stakes # 错误成本高昂验证方式: 运行带参数的命令以确认可用性。
--help关键点:
- 提示仅作参考,不具强制性
- 编排器拥有最终决策权
- 编排器可在无提示的情况下升级(基于更广泛的上下文)
- 编排器可忽略提示(任务实际很简单)
Orchestrator Authority
编排器权限
The orchestrator (typically Opus) makes final escalation decisions:
Can follow hints: When hint matches observed conditions
Can override to escalate: When context demands it (even without hints)
Can override to stay: When task is simpler than hints suggest
Can escalate beyond hint: Go to opus even if hint says sonnet
The orchestrator's judgment, informed by conversation context, supersedes static hints.
编排器(通常为Opus)拥有升级的最终决策权:
可遵循提示: 当提示与观察到的条件匹配时
可强制升级: 当上下文需要时(即使无提示)
可强制不升级: 当任务比提示中描述的更简单时
可超提示升级: 即使提示建议升级到sonnet,也可直接升级到opus
编排器基于对话上下文做出的判断,优先级高于静态提示。
Red Flags - STOP and Investigate
危险信号 - 停止并调查
If you catch yourself thinking:
- "Let me try with a better model"
- "This should be simple but isn't working"
- "I've tried everything" (but haven't investigated why)
- "The smarter model will know what to do"
- "I don't understand why this isn't working"
ALL of these mean: STOP. Investigate first.
如果你出现以下想法,请立即停止:
- “我试试用更好的模型”
- “这个任务应该很简单但就是不行”
- “我已经试过所有方法了”(但未调查原因)
- “更智能的模型知道该怎么做”
- “我不明白为什么这个任务搞不定”
以上所有情况都意味着:停止。先调查。
Integration with Agent Workflow
与 Agent 工作流的集成
**Verification:** Run the command with `--help` flag to verify availability.
Agent starts task at assigned model
├── Task succeeds → Complete
└── Task struggles →
├── Investigate systematically
│ ├── Root cause found → Fix at current model
│ └── Genuine capability gap → Escalate with justification
└── Don't investigate → WRONG PATH
└── "Maybe escalate?" → NO. Investigate first.Verification: Run the command with flag to verify availability.
--help**验证方式:** 运行带`--help`参数的命令以确认可用性。
Agent 从分配的模型开始执行任务
├── 任务成功 → 完成
└── 任务遇到困难 →
├── 进行系统性调查
│ ├── 找到根本原因 → 在当前模型级别修复
│ └── 确认存在真实能力缺口 → 提交合理理由后升级
└── 未调查 → 错误路径
└── “要不要升级?” → 不行。先调查。验证方式: 运行带参数的命令以确认可用性。
--helpQuick Reference
快速参考
| Situation | Action |
|---|---|
| Task inherently requires nuanced reasoning | Escalate |
| Agent uncertain but hasn't investigated | Investigate first |
| Multiple attempts failed | Question approach, not model |
| Security/high-stakes decision | Escalate |
| "Maybe smarter model knows" | Never escalate on this basis |
| Hint fires, task is actually simple | Override, stay at current model |
| No hint fires, task is actually complex | Override, escalate |
| 场景 | 行动 |
|---|---|
| 任务本质需要细致推理 | 升级 |
| Agent 不确定但未调查 | 先调查 |
| 多次尝试失败 | 质疑方法,而非模型 |
| 安全/高风险决策 | 升级 |
| “说不定更智能的模型知道” | 绝不能以此为理由升级 |
| 提示触发但任务实际简单 | 覆盖提示,保持当前模型 |
| 无提示但任务实际复杂 | 覆盖提示,执行升级 |
Model Capability Notes
模型能力说明
MCP Tool Search (Claude Code 2.1.7+): Haiku models do not support MCP tool search. If a workflow uses many MCP tools (descriptions exceeding 10% of context), those tools load upfront on haiku instead of being deferred. This can consume significant context. Consider escalating to sonnet for MCP-heavy workflows or ensure haiku agents use only native tools (Read, Write, Bash, etc.).
Claude.ai MCP Connectors (Claude Code 2.1.46+): Users with claude.ai connectors configured may have additional MCP tools auto-loaded, increasing the total tool description footprint. This makes it more likely that haiku agents will exceed the 10% tool search threshold. When escalation decisions involve MCP-heavy workflows, factor in claude.ai connector tool count via .
/mcpEffort Controls as Escalation Alternative (Opus 4.6 / Claude Code 2.1.32+): Opus 4.6 introduces adaptive thinking with effort levels (, , , ). Before escalating between models, consider whether adjusting effort level on the current model would suffice:
lowmediumhighmax| Instead of... | Consider... | When |
|---|---|---|
| Haiku → Sonnet | Stay on Haiku | Task is still deterministic, just needs more context |
| Sonnet → Opus | Opus@medium | Moderate reasoning, not deep architectural analysis |
| Opus@default → "maybe try again" | Opus@max | Genuine complexity that needs maximum reasoning depth |
Effort controls do NOT replace the escalation governance framework — they provide an additional axis. The Iron Law still applies: investigate before changing either model or effort level.
MCP工具搜索(Claude Code 2.1.7+):Haiku模型不支持MCP工具搜索。如果工作流使用大量MCP工具(描述占上下文的10%以上),这些工具会在Haiku模型启动时预先加载,而非延迟加载,这会消耗大量上下文。对于重度依赖MCP的工作流,可考虑升级到Sonnet,或确保Haiku Agents仅使用原生工具(Read、Write、Bash等)。
Claude.ai MCP连接器(Claude Code 2.1.46+):配置了claude.ai连接器的用户可能会自动加载额外的MCP工具,增加工具描述的总占比。这会使Haiku Agents更易超过10%的工具搜索阈值。当升级决策涉及重度依赖MCP的工作流时,需通过命令考虑claude.ai连接器的工具数量。
/mcp作为升级替代方案的算力控制(Opus 4.6 / Claude Code 2.1.32+):Opus 4.6引入了带算力等级的自适应思考功能(、、、)。在跨模型升级前,可考虑调整当前模型的算力等级是否足够:
lowmediumhighmax| 替代升级操作 | 考虑方案 | 适用场景 |
|---|---|---|
| Haiku → Sonnet | 保持使用Haiku | 任务仍具确定性,只是需要更多上下文 |
| Sonnet → Opus | Opus@medium | 中等推理需求,无需深度架构分析 |
| Opus@默认 → “再试一次” | Opus@max | 真正复杂的任务,需要最大推理深度 |
算力控制不能替代升级治理框架——它只是提供了另一个调整维度。铁律仍然适用:在更改模型或算力等级前,必须先调查。
Troubleshooting
故障排除
Common Issues
常见问题
Command not found
Ensure all dependencies are installed and in PATH
Permission errors
Check file permissions and run with appropriate privileges
Unexpected behavior
Enable verbose logging with flag
--verbose命令未找到
确保所有依赖已安装并在PATH中
权限错误
检查文件权限,使用适当权限运行
意外行为
添加参数启用详细日志
--verbose