escalation-governance

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Table of Contents

目录

Escalation Governance

模型升级治理

Overview

概述

Model escalation (haiku→sonnet→opus) trades speed/cost for reasoning capability. This trade-off must be justified.
Core principle: Escalation is for tasks that genuinely require deeper reasoning, not for "maybe a smarter model will figure it out."
模型升级(haiku→sonnet→opus)是以速度/成本为代价换取更强的推理能力。这种权衡必须具备充分的合理性。
核心原则: 升级仅适用于真正需要深度推理的任务,而非“说不定更智能的模型能搞定”的情况。

The Iron Law

铁律

NO ESCALATION WITHOUT INVESTIGATION FIRST
Verification: Run the command with
--help
flag to verify availability.
Escalation is never a shortcut. If you haven't understood why the current model is insufficient, escalation is premature.
NO ESCALATION WITHOUT INVESTIGATION FIRST
验证方式: 运行带
--help
参数的命令以确认可用性。
升级绝非捷径。如果你尚未理解当前模型为何无法胜任任务,那么升级操之过急。

When to Escalate

升级适用场景

Legitimate escalation triggers:
TriggerDescriptionExample
Genuine complexityTask inherently requires nuanced judgmentSecurity policy trade-offs
Reasoning depthMultiple inference steps with uncertaintyArchitecture decisions
Novel patternsNo existing patterns applyFirst-of-kind implementation
High stakesError cost justifies capability investmentProduction deployment
Ambiguity resolutionMultiple valid interpretations need weighingSpec clarification
合理的升级触发条件:
触发条件描述示例
真实复杂度任务本质上需要细致的判断安全政策权衡
推理深度存在不确定性的多步推理任务架构决策
全新模式无现有模式可套用首创性实现
高风险错误成本足以支撑算力投入生产环境部署
歧义解决需要权衡多种合理解读需求规格澄清

When NOT to Escalate

升级禁止场景

Illegitimate escalation triggers:
Anti-PatternWhy It's WrongWhat to Do Instead
"Maybe smarter model will figure it out"This is thrashingInvestigate root cause
Multiple failed attemptsSuggests wrong approach, not insufficient capabilityQuestion your assumptions
Time pressureUrgency doesn't change task complexitySystematic investigation is faster
Uncertainty without investigationYou haven't tried to understand yetGather evidence first
"Just to be safe"False safety - wastes resourcesAssess actual complexity
不合理的升级触发条件:
反模式错误原因替代方案
“说不定更智能的模型能搞定”这属于盲目尝试调查根本原因
多次尝试失败说明方法错误,而非能力不足质疑你的假设
时间压力紧迫性不会改变任务复杂度系统性调查反而更快
未调查就存在不确定性你尚未尝试理解问题先收集证据
“求稳妥”虚假安全,浪费资源评估实际复杂度

Decision Framework

决策框架

Before escalating, answer these questions:
在升级前,先回答以下问题:

1. Have I understood the problem?

1. 我是否理解问题?

  • Can I articulate why the current model is insufficient?
  • Have I identified what specific reasoning capability is missing?
  • Is this a capability gap or a knowledge gap?
If knowledge gap: Gather more information, don't escalate.
  • 我能否清晰说明当前模型为何无法胜任?
  • 我是否明确了缺失的具体推理能力?
  • 这是能力缺口还是知识缺口?
如果是知识缺口: 收集更多信息,不要升级。

2. Have I investigated systematically?

2. 我是否进行了系统性调查?

  • Did I read error messages/outputs carefully?
  • Did I check for similar solved problems?
  • Did I form and test a hypothesis?
If not investigated: Complete investigation first.
  • 我是否仔细阅读了错误信息/输出结果?
  • 我是否查找过类似的已解决问题?
  • 我是否形成并验证了假设?
如果未调查: 先完成调查。

3. Is escalation the right solution?

3. 升级是否是正确的解决方案?

  • Would a different approach work at current model level?
  • Is the task inherently complex, or am I making it complex?
  • Would breaking the task into smaller pieces help?
If decomposable: Break down, don't escalate.
  • 在当前模型级别,换一种方法是否可行?
  • 任务本身真的复杂,还是我把它复杂化了?
  • 将任务拆分为更小的模块是否有帮助?
如果可拆分: 拆分任务,不要升级。

4. Can I justify the trade-off?

4. 我能否证明这种权衡的合理性?

  • What's the cost (latency, tokens, money) of escalation?
  • What's the benefit (accuracy, safety, completeness)?
  • Is the benefit proportional to the cost?
If not proportional: Don't escalate.
  • 升级的成本是什么(延迟、token消耗、资金)?
  • 升级的收益是什么(准确性、安全性、完整性)?
  • 收益与成本是否成正比?
如果不成正比: 不要升级。

Escalation Protocol

升级流程

When escalation IS justified:
  1. Document the reason - State why current model is insufficient
  2. Specify the scope - What specific subtask needs higher capability?
  3. Define success - How will you know the escalated task succeeded?
  4. Return promptly - Drop back to efficient model after reasoning task
当升级具备合理性时:
  1. 记录原因 - 说明当前模型为何无法胜任
  2. 明确范围 - 具体哪个子任务需要更高的能力
  3. 定义成功标准 - 如何判断升级后的任务已成功完成
  4. 及时降级 - 完成推理任务后,切回高效的低阶模型

Common Rationalizations

常见合理化借口

ExcuseReality
"This is complex"Complex for whom? Have you tried?
"Better safe than sorry"Safety theater wastes resources
"I tried and failed"How many times? Did you investigate why?
"The user expects quality"Quality comes from process, not model size
"Just this once"Exceptions become habits
"Time is money"Systematic approach is faster than thrashing
借口实际情况
“这个任务很复杂”对谁而言复杂?你尝试过解决吗?
“稳妥总比出错好”形式上的安全只会浪费资源
“我试过了但失败了”试了多少次?你调查过失败原因吗?
“用户期望高质量”质量来自流程,而非模型规模
“就这一次”例外会变成习惯
“时间就是金钱”系统性方法比盲目尝试更快

Agent Schema

Agent 配置 Schema

Agents can declare escalation hints in frontmatter:
yaml
model: haiku
escalation:
  to: sonnet                 # Suggested escalation target
  hints:                     # Advisory triggers (orchestrator may override)
    - security_sensitive     # Touches auth, secrets, permissions
    - ambiguous_input        # Multiple valid interpretations
    - novel_pattern          # No existing patterns apply
    - high_stakes            # Error would be costly
Verification: Run the command with
--help
flag to verify availability.
Key points:
  • Hints are advisory, not mandatory
  • Orchestrator has final authority
  • Orchestrator can escalate without hints (broader context)
  • Orchestrator can ignore hints (task is actually simple)
Agents 可在前置声明中添加升级提示:
yaml
model: haiku
escalation:
  to: sonnet                 # 建议的升级目标
  hints:                     # 参考触发条件(编排器可覆盖)
    - security_sensitive     # 涉及认证、密钥、权限
    - ambiguous_input        # 存在多种合理解读
    - novel_pattern          # 无现有模式可套用
    - high_stakes            # 错误成本高昂
验证方式: 运行带
--help
参数的命令以确认可用性。
关键点:
  • 提示仅作参考,不具强制性
  • 编排器拥有最终决策权
  • 编排器可在无提示的情况下升级(基于更广泛的上下文)
  • 编排器可忽略提示(任务实际很简单)

Orchestrator Authority

编排器权限

The orchestrator (typically Opus) makes final escalation decisions:
Can follow hints: When hint matches observed conditions Can override to escalate: When context demands it (even without hints) Can override to stay: When task is simpler than hints suggest Can escalate beyond hint: Go to opus even if hint says sonnet
The orchestrator's judgment, informed by conversation context, supersedes static hints.
编排器(通常为Opus)拥有升级的最终决策权:
可遵循提示: 当提示与观察到的条件匹配时 可强制升级: 当上下文需要时(即使无提示) 可强制不升级: 当任务比提示中描述的更简单时 可超提示升级: 即使提示建议升级到sonnet,也可直接升级到opus
编排器基于对话上下文做出的判断,优先级高于静态提示。

Red Flags - STOP and Investigate

危险信号 - 停止并调查

If you catch yourself thinking:
  • "Let me try with a better model"
  • "This should be simple but isn't working"
  • "I've tried everything" (but haven't investigated why)
  • "The smarter model will know what to do"
  • "I don't understand why this isn't working"
ALL of these mean: STOP. Investigate first.
如果你出现以下想法,请立即停止:
  • “我试试用更好的模型”
  • “这个任务应该很简单但就是不行”
  • “我已经试过所有方法了”(但未调查原因)
  • “更智能的模型知道该怎么做”
  • “我不明白为什么这个任务搞不定”
以上所有情况都意味着:停止。先调查。

Integration with Agent Workflow

与 Agent 工作流的集成

**Verification:** Run the command with `--help` flag to verify availability.
Agent starts task at assigned model
├── Task succeeds → Complete
└── Task struggles →
    ├── Investigate systematically
    │   ├── Root cause found → Fix at current model
    │   └── Genuine capability gap → Escalate with justification
    └── Don't investigate → WRONG PATH
        └── "Maybe escalate?" → NO. Investigate first.
Verification: Run the command with
--help
flag to verify availability.
**验证方式:** 运行带`--help`参数的命令以确认可用性。
Agent 从分配的模型开始执行任务
├── 任务成功 → 完成
└── 任务遇到困难 →
    ├── 进行系统性调查
    │   ├── 找到根本原因 → 在当前模型级别修复
    │   └── 确认存在真实能力缺口 → 提交合理理由后升级
    └── 未调查 → 错误路径
        └── “要不要升级?” → 不行。先调查。
验证方式: 运行带
--help
参数的命令以确认可用性。

Quick Reference

快速参考

SituationAction
Task inherently requires nuanced reasoningEscalate
Agent uncertain but hasn't investigatedInvestigate first
Multiple attempts failedQuestion approach, not model
Security/high-stakes decisionEscalate
"Maybe smarter model knows"Never escalate on this basis
Hint fires, task is actually simpleOverride, stay at current model
No hint fires, task is actually complexOverride, escalate
场景行动
任务本质需要细致推理升级
Agent 不确定但未调查先调查
多次尝试失败质疑方法,而非模型
安全/高风险决策升级
“说不定更智能的模型知道”绝不能以此为理由升级
提示触发但任务实际简单覆盖提示,保持当前模型
无提示但任务实际复杂覆盖提示,执行升级

Model Capability Notes

模型能力说明

MCP Tool Search (Claude Code 2.1.7+): Haiku models do not support MCP tool search. If a workflow uses many MCP tools (descriptions exceeding 10% of context), those tools load upfront on haiku instead of being deferred. This can consume significant context. Consider escalating to sonnet for MCP-heavy workflows or ensure haiku agents use only native tools (Read, Write, Bash, etc.).
Claude.ai MCP Connectors (Claude Code 2.1.46+): Users with claude.ai connectors configured may have additional MCP tools auto-loaded, increasing the total tool description footprint. This makes it more likely that haiku agents will exceed the 10% tool search threshold. When escalation decisions involve MCP-heavy workflows, factor in claude.ai connector tool count via
/mcp
.
Effort Controls as Escalation Alternative (Opus 4.6 / Claude Code 2.1.32+): Opus 4.6 introduces adaptive thinking with effort levels (
low
,
medium
,
high
,
max
). Before escalating between models, consider whether adjusting effort level on the current model would suffice:
Instead of...Consider...When
Haiku → SonnetStay on HaikuTask is still deterministic, just needs more context
Sonnet → OpusOpus@mediumModerate reasoning, not deep architectural analysis
Opus@default → "maybe try again"Opus@maxGenuine complexity that needs maximum reasoning depth
Effort controls do NOT replace the escalation governance framework — they provide an additional axis. The Iron Law still applies: investigate before changing either model or effort level.
MCP工具搜索(Claude Code 2.1.7+):Haiku模型不支持MCP工具搜索。如果工作流使用大量MCP工具(描述占上下文的10%以上),这些工具会在Haiku模型启动时预先加载,而非延迟加载,这会消耗大量上下文。对于重度依赖MCP的工作流,可考虑升级到Sonnet,或确保Haiku Agents仅使用原生工具(Read、Write、Bash等)。
Claude.ai MCP连接器(Claude Code 2.1.46+):配置了claude.ai连接器的用户可能会自动加载额外的MCP工具,增加工具描述的总占比。这会使Haiku Agents更易超过10%的工具搜索阈值。当升级决策涉及重度依赖MCP的工作流时,需通过
/mcp
命令考虑claude.ai连接器的工具数量。
作为升级替代方案的算力控制(Opus 4.6 / Claude Code 2.1.32+):Opus 4.6引入了带算力等级的自适应思考功能(
low
medium
high
max
)。在跨模型升级前,可考虑调整当前模型的算力等级是否足够:
替代升级操作考虑方案适用场景
Haiku → Sonnet保持使用Haiku任务仍具确定性,只是需要更多上下文
Sonnet → OpusOpus@medium中等推理需求,无需深度架构分析
Opus@默认 → “再试一次”Opus@max真正复杂的任务,需要最大推理深度
算力控制不能替代升级治理框架——它只是提供了另一个调整维度。铁律仍然适用:在更改模型或算力等级前,必须先调查。

Troubleshooting

故障排除

Common Issues

常见问题

Command not found Ensure all dependencies are installed and in PATH
Permission errors Check file permissions and run with appropriate privileges
Unexpected behavior Enable verbose logging with
--verbose
flag
命令未找到 确保所有依赖已安装并在PATH中
权限错误 检查文件权限,使用适当权限运行
意外行为 添加
--verbose
参数启用详细日志