circuit-breaker
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOverview
概述
The circuit-breaker skill is a safety mechanism that prevents infinite loops, resource exhaustion, and accidental destruction during autonomous development. It operates at the loop level (complementing which operates at the task level). Without circuit-breaker protection, autonomous loops can waste hours on stagnant problems, exhaust API limits, or accidentally destroy configuration files. This skill enforces hard boundaries that keep autonomous operations productive and safe.
resilient-executionAnnounce at start: "Circuit breaker is active — monitoring for stagnation, rate limits, and file protection."
circuit-breaker 技能是一种安全机制,可防止自主开发过程中出现无限循环、资源耗尽和意外破坏。它运行在循环层面(作为运行在任务层面的 的补充)。如果没有circuit-breaker保护,自主循环可能会在停滞的问题上浪费数小时、耗尽API限额,或是意外损坏配置文件。本技能会强制执行硬性边界,保障自主操作高效、安全运行。
resilient-execution启动时播报: "熔断器已激活——正在监控停滞状态、速率限制和文件保护规则。"
Phase 1: Circuit State Check
第1阶段:熔断器状态检查
Before each loop iteration, check the current circuit state:
+-----------+ threshold +-----------+ cooldown +------------+
| CLOSED |----exceeded----->| OPEN |----elapsed------>| HALF-OPEN |
| (normal) | | (halted) | | (probe) |
+-----------+ +-----------+ +-----+------+
^ |
| success |
+--------------------------------------------------------------+
| failure |
| +-----------+ |
+--------------------+ OPEN |<----------------------------+
+-----------+| State | Meaning | Action |
|---|---|---|
| CLOSED | Normal operation | Execute iteration, monitor all thresholds |
| OPEN | Halted due to threshold breach | Report status, wait for cooldown, or escalate |
| HALF-OPEN | Probing after cooldown | Allow ONE iteration. If success: close. If failure: re-open. |
STOP: Check circuit state BEFORE executing any loop iteration. Do NOT execute if circuit is OPEN.
每次循环迭代前,检查当前熔断器状态:
+-----------+ threshold +-----------+ cooldown +------------+
| CLOSED |----exceeded----->| OPEN |----elapsed------>| HALF-OPEN |
| (normal) | | (halted) | | (probe) |
+-----------+ +-----------+ +-----+------+
^ |
| success |
+--------------------------------------------------------------+
| failure |
| +-----------+ |
+--------------------+ OPEN |<----------------------------+
+-----------+| 状态 | 含义 | 执行操作 |
|---|---|---|
| CLOSED | 正常运行 | 执行迭代,监控所有阈值 |
| OPEN | 因阈值触发已暂停 | 上报状态,等待冷却,或升级问题 |
| HALF-OPEN | 冷却后试探运行 | 仅允许执行1次迭代。成功则关闭熔断器,失败则重新开启 |
停止操作:执行任何循环迭代前必须先检查熔断器状态。如果熔断器处于OPEN状态,禁止执行迭代。
Phase 2: Stagnation Detection
第2阶段:停滞检测
Monitor these thresholds continuously during autonomous operation:
| Condition | Threshold | Detection Method | Action |
|---|---|---|---|
| No progress | 3 consecutive loops with zero meaningful changes | Track files modified + tasks completed per loop | OPEN circuit |
| Identical errors | 5 consecutive loops producing the same error | Compare error messages across iterations | OPEN circuit |
| Output decline | 70% decline in output volume across iterations | Compare output line count across last 3 iterations | OPEN circuit |
| Permission denials | 3 consecutive tool permission failures | Track permission errors | OPEN circuit |
| Test fix loop | >80% of effort spent on test fixes only | Track work type per iteration | OPEN circuit, investigate root cause |
| Circular approach | Same 2-3 approaches alternating without resolution | Track approach history | OPEN circuit |
自主运行过程中持续监控以下阈值:
| 触发条件 | 阈值 | 检测方法 | 执行操作 |
|---|---|---|---|
| 无进展 | 连续3次循环无任何有效变更 | 追踪每次循环的文件修改量+任务完成量 | 开启熔断器 |
| 相同错误 | 连续5次循环返回相同错误 | 对比不同迭代的错误信息 | 开启熔断器 |
| 输出下降 | 迭代间输出量下降70% | 对比最近3次迭代的输出行数 | 开启熔断器 |
| 权限拒绝 | 连续3次工具调用权限失败 | 追踪权限错误 | 开启熔断器 |
| 测试修复循环 | 超过80%的工作量仅用于修复测试 | 追踪每次迭代的工作类型 | 开启熔断器,排查根因 |
| 循环方案 | 交替使用2-3种相同方案仍未解决问题 | 追踪方案历史 | 开启熔断器 |
Stagnation Scoring
停滞评分规则
Each iteration, compute a progress score:
| Indicator | Score |
|---|---|
| New test passing that was previously failing | +3 |
| Task marked complete | +5 |
| File modified with meaningful changes | +1 |
| Build/lint error resolved | +2 |
| Same error as previous iteration | -2 |
| No files modified | -3 |
| Reverted previous changes | -1 |
Threshold: If cumulative score across 3 iterations is negative, OPEN the circuit.
STOP: If any threshold is breached, OPEN the circuit immediately. Do NOT attempt "one more try."
每次迭代计算一次进展评分:
| 指标 | 得分 |
|---|---|
| 此前失败的测试新增通过 | +3 |
| 任务标记为完成 | +5 |
| 文件被有效修改 | +1 |
| 构建/语法错误被修复 | +2 |
| 和上一次迭代返回相同错误 | -2 |
| 无文件被修改 | -3 |
| 回滚了之前的修改 | -1 |
阈值: 如果3次迭代的累计得分为负,开启熔断器。
停止操作:一旦触发任何阈值,立即开启熔断器。禁止尝试「再试一次」。
Phase 3: Recovery Protocol
第3阶段:恢复协议
When the circuit opens, follow this recovery sequence:
熔断器开启后,遵循以下恢复流程:
Cooldown Period
冷却期
- Default: 30 minutes before retry
- Purpose: Prevents rapid cycling through the same failing state
- After cooldown: Circuit enters HALF-OPEN state
- 默认值: 重试前等待30分钟
- 目的: 防止短时间内反复进入相同的失败状态
- 冷却结束后: 熔断器进入HALF-OPEN状态
HALF-OPEN Behavior
HALF-OPEN状态规则
- Allow exactly ONE iteration to execute
- If successful (positive progress score): Close circuit, resume normal operation
- If failed (same stagnation pattern): Re-open circuit, double the cooldown timer
- 仅允许执行1次迭代
- 如果执行成功(进展评分为正):关闭熔断器,恢复正常运行
- 如果执行失败(出现相同停滞模式):重新开启熔断器,冷却时间翻倍
Recovery Strategy Decision Table
恢复策略决策表
| Stagnation Type | Strategy 1 | Strategy 2 | Strategy 3 | Strategy 4 |
|---|---|---|---|---|
| No progress (stuck on same task) | Regenerate plan with fresh analysis | Break stuck task into 3+ subtasks | Skip to next task, return later | Escalate to user |
| Identical errors (same error repeating) | Change approach entirely | Check if error is environmental | Search for known issue/workaround | Escalate with error log |
| Test fix loop (tests keep breaking) | Review test assumptions | Check if implementation approach is flawed | Simplify implementation scope | Escalate with test analysis |
| Circular approach (alternating same fixes) | Step back and re-analyze root cause | Try approach NOT yet attempted | Reduce scope to minimal working version | Escalate with approach history |
STOP: After recovery, monitor the next 3 iterations closely. If stagnation recurs, escalate immediately.
| 停滞类型 | 策略1 | 策略2 | 策略3 | 策略4 |
|---|---|---|---|---|
| 无进展(卡在同一任务) | 基于全新分析重新生成方案 | 将卡住的任务拆分为3个以上子任务 | 跳过当前任务,后续再处理 | 升级通知用户 |
| 相同错误(重复返回同一错误) | 完全更换解决思路 | 检查是否为环境导致的错误 | 搜索已知问题/解决方案 | 附带错误日志升级问题 |
| 测试修复循环(测试持续失败) | 复查测试假设是否合理 | 检查实现方案是否存在缺陷 | 缩小实现范围 | 附带测试分析升级问题 |
| 循环方案(交替使用相同修复逻辑) | 退回步骤重新分析根因 | 尝试从未使用过的方案 | 缩减范围到最小可运行版本 | 附带方案历史升级问题 |
停止操作:恢复后密切监控接下来的3次迭代。如果再次出现停滞,立即升级问题。
Phase 4: Rate Limiting
第4阶段:速率限制
Track and enforce API usage limits:
| Parameter | Default | Purpose |
|---|---|---|
| MAX_CALLS_PER_HOUR | 100 | Prevents API overuse |
| Reset window | Hourly (rolling) | Automatic counter reset |
| Countdown display | Active | Shows remaining calls before limit |
追踪并强制执行API使用限额:
| 参数 | 默认值 | 目的 |
|---|---|---|
| MAX_CALLS_PER_HOUR | 100 | 防止API过度调用 |
| 重置窗口 | 小时级(滚动计算) | 自动重置计数器 |
| 倒计时展示 | 开启 | 展示限额耗尽前的剩余调用次数 |
Rate Limit Behavior
速率限制规则
- Track API calls per rolling hour
- At 80% of limit: display warning, prioritize remaining calls
- At 100% of limit: pause execution, display countdown to reset
- Never exceed limit — wait for reset window
- 按滚动小时追踪API调用量
- 达到限额80%时:展示警告,优先分配剩余调用额度
- 达到100%限额时:暂停执行,展示重置倒计时
- 绝对禁止超出限额——等待窗口重置后再继续
Three-Layer Timeout Detection
三层超时检测
For long-running operations (especially API calls with extended limits):
| Layer | Detection | Fallback |
|---|---|---|
| 1. Timeout guard | Exit code 124 or timeout signal | Capture partial output, log what completed |
| 2. JSON validation | Parse response structure | Attempt text extraction from raw response |
| 3. Text fallback | Raw output capture | Log everything, report for human review |
针对长时间运行的操作(尤其是有延长限额的API调用):
| 层级 | 检测逻辑 | 降级方案 |
|---|---|---|
| 1. 超时防护 | 退出码124或超时信号 | 捕获部分输出,记录已完成的内容 |
| 2. JSON校验 | 解析响应结构 | 尝试从原始响应中提取文本内容 |
| 3. 文本兜底 | 原始输出捕获 | 记录所有内容,上报供人工审核 |
Phase 5: File Protection
第5阶段:文件保护
<HARD-GATE>
Configuration files must NEVER be deleted during autonomous operations. This is non-negotiable.
</HARD-GATE>
<HARD-GATE>
自主运行过程中绝对不得删除配置文件,这是不可协商的硬性规则。
</HARD-GATE>
Protected Paths
受保护路径
| Path | Type | Why Protected |
|---|---|---|
| Directory | Loop state and configuration |
| File | Ralph configuration |
| File | Current plan — source of truth for loop |
| File | Agent definitions |
| Directory | Specifications — source of truth for features |
| Directory | Claude Code configuration |
| File | Agent operating manual |
| Directory | Persisted learnings across sessions |
| 路径 | 类型 | 保护原因 |
|---|---|---|
| 目录 | 循环状态和配置信息 |
| 文件 | Ralph配置文件 |
| 文件 | 当前计划——循环的可信来源 |
| 文件 | Agent定义 |
| 目录 | 需求规范——功能的可信来源 |
| 目录 | Claude Code配置 |
| 文件 | Agent操作手册 |
| 目录 | 跨会话持久化的学习内容 |
Protection Mechanisms
保护机制
| Mechanism | How It Works | When It Triggers |
|---|---|---|
| Allowlist enforcement | Only permitted tools can modify protected files | Before any file write to protected path |
| Integrity validation | Check protected files exist after each iteration | End of every loop iteration |
| Pre-operation checks | Verify protected files before destructive operations | Before |
| Restricted commands | Block | When command targets protected path |
| 机制 | 实现逻辑 | 触发时机 |
|---|---|---|
| 白名单强制校验 | 仅允许授权工具修改受保护文件 | 向受保护路径写入文件前 |
| 完整性校验 | 每次迭代后检查受保护文件是否存在 | 每次循环迭代结束时 |
| 操作前检查 | 执行破坏性操作前校验受保护文件 | 执行 |
| 命令限制 | 禁止对受保护路径执行 | 命令目标为受保护路径时 |
Pre-Destructive Operation Checklist
破坏性操作前检查清单
Before any , , or :
rmgit cleangit checkout .- List all files that will be affected
- Check each against the protected paths list
- If ANY protected file would be affected: ABORT and report
- If safe: proceed with caution
- After operation: verify all protected files still exist
STOP: If a protected file is missing after any operation, halt immediately and restore it.
执行任何、或前:
rmgit cleangit checkout .- 列出所有将被影响的文件
- 逐一和受保护路径列表比对
- 如果有任何受保护文件会被影响:中止操作并上报
- 确认安全后:谨慎执行操作
- 操作完成后:校验所有受保护文件仍然存在
停止操作:如果任何操作后发现受保护文件丢失,立即暂停运行并恢复文件。
Phase 6: Monitoring and Metrics
第6阶段:监控和指标
Track these metrics across loop iterations:
| Metric | Purpose | Alert Threshold |
|---|---|---|
| Loop count | Total iterations executed | >20 for a single task |
| Tasks completed | Progress measurement | 0 for 3+ iterations |
| Files modified | Change velocity | 0 for 3+ iterations |
| Test pass rate | Quality trend | Declining for 3+ iterations |
| Error frequency | Stagnation early warning | Increasing for 3+ iterations |
| Output volume | Productivity trend | 70% decline |
| API calls remaining | Rate limit proximity | <20% remaining |
| Progress score | Overall health | Negative for 3 iterations |
跨循环迭代追踪以下指标:
| 指标 | 目的 | 告警阈值 |
|---|---|---|
| 循环次数 | 已执行的总迭代数 | 单个任务超过20次 |
| 已完成任务 | 进度衡量 | 连续3次以上迭代完成数为0 |
| 已修改文件 | 变更速度 | 连续3次以上迭代修改数为0 |
| 测试通过率 | 质量趋势 | 连续3次以上迭代持续下降 |
| 错误频率 | 停滞早期预警 | 连续3次以上迭代持续上升 |
| 输出量 | 生产率趋势 | 下降70% |
| 剩余API调用量 | 速率限制接近程度 | 剩余不足20% |
| 进展评分 | 整体健康度 | 连续3次迭代为负 |
Per-Iteration Status Log
每次迭代状态日志
markdown
undefinedmarkdown
undefinedIteration [N] — [timestamp]
Iteration [N] — [timestamp]
- Circuit state: CLOSED / HALF-OPEN
- Tasks completed: [N]
- Files modified: [list]
- Tests: [X passed, Y failed, Z skipped]
- Errors encountered: [list]
- Progress score: [+/- N]
- API calls remaining: [N]
- Stagnation risk: LOW / MEDIUM / HIGH
---- Circuit state: CLOSED / HALF-OPEN
- Tasks completed: [N]
- Files modified: [list]
- Tests: [X passed, Y failed, Z skipped]
- Errors encountered: [list]
- Progress score: [+/- N]
- API calls remaining: [N]
- Stagnation risk: LOW / MEDIUM / HIGH
---Anti-Patterns / Common Mistakes
反模式/常见错误
| What NOT to Do | Why It Fails | What to Do Instead |
|---|---|---|
| Ignore stagnation signals | Wastes hours on unsolvable problems | Open circuit at threshold breach |
| Manually override open circuit | Bypasses safety mechanism | Follow recovery protocol properly |
| Skip file protection checks | Config deletion derails entire project | Always verify protected files after operations |
| Set cooldown to zero | Rapid cycling through same failure | Respect 30-minute minimum cooldown |
| Count test-fix-only iterations as progress | Masks the real problem (flawed approach) | Flag >80% test-fix effort as stagnation |
| Delete and recreate protected files | Loses configuration state | Never delete protected files, only update |
| Ignore rate limit warnings | Hits hard limit mid-operation | Prioritize when at 80% of limit |
| Run destructive commands without pre-checks | May delete protected files | Always check affected files first |
| 禁止操作 | 失败原因 | 正确做法 |
|---|---|---|
| 忽视停滞信号 | 在无法解决的问题上浪费数小时 | 触发阈值时立即开启熔断器 |
| 手动覆盖开闸状态 | 绕过安全机制 | 严格遵循恢复协议 |
| 跳过文件保护检查 | 配置删除会导致整个项目中断 | 操作后始终校验受保护文件 |
| 将冷却期设为0 | 短时间内反复进入相同失败状态 | 遵守最少30分钟的冷却期要求 |
| 将仅修复测试的迭代算作进展 | 掩盖真正的问题(方案缺陷) | 将超过80%工作量用于测试修复的情况标记为停滞 |
| 删除并重建受保护文件 | 丢失配置状态 | 绝对不删除受保护文件,仅做更新 |
| 忽视速率限制警告 | 操作过程中触发硬限制 | 达到80%限额时优先分配调用资源 |
| 不做前置检查直接执行破坏性命令 | 可能删除受保护文件 | 始终先检查会被影响的文件列表 |
Anti-Rationalization Guards
反合理化防护
| Thought | Reality |
|---|---|
| "One more try will fix it" | That is what you said 3 iterations ago. Open the circuit. |
| "The error is almost fixed" | "Almost" for 5 iterations means the approach is wrong. |
| "I cannot stop now, I am so close" | Sunk cost fallacy. Open circuit, reassess. |
| "The cooldown is too long" | The cooldown prevents wasting more time on the same failure. |
| "These config files are not important" | They are protected for a reason. Do not delete them. |
| "The rate limit will not be hit" | Track it. Do not guess. |
| "This is a different error" | Check if it is truly different or the same root cause manifesting differently. |
Do NOT override an open circuit. Follow the recovery protocol.
| 错误想法 | 实际情况 |
|---|---|
| "再试一次就能修好" | 你前3次迭代也是这么说的,开启熔断器。 |
| "这个错误马上就能修好" | 连续5次迭代都「马上修好」说明你的方案有问题。 |
| "我不能停,马上就完成了" | 沉没成本谬误,开启熔断器重新评估。 |
| "冷却时间太长了" | 冷却期是为了避免你在同一个问题上浪费更多时间。 |
| "这些配置文件不重要" | 它们被保护是有原因的,不要删除。 |
| "不会触发速率限制的" | 做好追踪,不要猜。 |
| "这是不同的错误" | 检查是真的不同,还是同一个根因的不同表现。 |
禁止覆盖开闸状态,严格遵循恢复协议。
Integration Points
集成点
| Skill | Relationship |
|---|---|
| Task-level retries (3 attempts). Circuit-breaker activates AFTER resilient-execution exhausts retries within individual tasks. |
| Circuit-breaker monitors the loop. Opens circuit when loop-level stagnation detected. |
| Status block provides metrics for stagnation detection. |
| Circuit-breaker ensures verification passes before closing a loop. |
| Stagnation patterns are persisted to memory for future avoidance. |
| Circuit-breaker events feed into improvement metrics. |
| 技能 | 关联关系 |
|---|---|
| 任务级重试(3次尝试)。circuit-breaker会在 |
| circuit-breaker监控循环运行,检测到循环级停滞时开启熔断器。 |
| 状态模块提供停滞检测所需的指标。 |
| circuit-breaker会确保校验通过后才关闭循环。 |
| 停滞模式会被持久化到内存中,用于后续规避。 |
| circuit-breaker事件会作为改进指标的输入。 |
Scope Clarification
范围说明
| Scope | Skill | Behavior |
|---|---|---|
| Task-level | | Try 3 approaches for a single failing task |
| Loop-level | | Halt the entire loop when patterns indicate systemic failure |
The circuit breaker activates AFTER resilient-execution has exhausted its retries within individual tasks. If tasks keep failing despite 3 retries each, the circuit breaker detects the pattern.
| 范围 | 所属技能 | 行为 |
|---|---|---|
| 任务级 | | 单个失败任务尝试3种解决思路 |
| 循环级 | | 当模式表明存在系统性故障时,暂停整个循环 |
circuit-breaker会在耗尽单个任务的重试次数后激活。如果每个任务都经过3次重试仍然持续失败,circuit-breaker会检测到该模式。
resilient-executionProcess Summary
流程总结
- Before each loop iteration: Check circuit state (CLOSED/HALF-OPEN/OPEN)
- If OPEN: Report status, wait for cooldown, or escalate
- If HALF-OPEN: Allow one probe iteration, evaluate result
- If CLOSED: Execute normally, monitor all thresholds
- After each iteration: Update metrics, compute progress score, evaluate thresholds
- If threshold exceeded: Open circuit, report reason, begin cooldown
- After cooldown: Enter HALF-OPEN, allow one probe
- After probe: Close if successful, re-open with doubled cooldown if failed
- 每次循环迭代前: 检查熔断器状态(CLOSED/HALF-OPEN/OPEN)
- 如果为OPEN: 上报状态,等待冷却,或升级问题
- 如果为HALF-OPEN: 允许一次试探迭代,评估执行结果
- 如果为CLOSED: 正常执行,监控所有阈值
- 每次迭代后: 更新指标,计算进展评分,评估阈值触发情况
- 如果触发阈值: 开启熔断器,上报原因,进入冷却期
- 冷却结束后: 进入HALF-OPEN状态,允许一次试探迭代
- 试探后: 成功则关闭熔断器,失败则重新开启并将冷却时间翻倍
Skill Type
技能类型
RIGID — Thresholds and protection rules must be followed exactly. Do not relax circuit breaker conditions. Do not override open circuits. Do not skip file protection checks. Do not ignore stagnation signals.
RIGID——必须严格遵守阈值和保护规则。不得放宽熔断器条件、不得手动覆盖开闸状态、不得跳过文件保护检查、不得忽视停滞信号。