resilient-execution

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Overview

概述

The resilient-execution skill prevents premature failure by enforcing a minimum of 3 genuinely different approaches before escalating to the user. It provides a structured error classification system, an approach cascade methodology, and transparent logging of each attempt. Without this skill, agents give up too early — with it, they systematically exhaust alternatives and only escalate with full evidence.
Announce at start: "I'm using the resilient-execution skill — I will try multiple approaches before escalating."

resilient-execution skill 通过要求向用户上报问题前至少尝试3种完全不同的解决方案,避免执行提前失败。它提供了结构化的错误分类体系、方案级联方法论,以及每次尝试的透明日志记录。没有该skill时,Agent会过早放弃;使用该skill后,它们会系统性地尝试所有可选方案,且仅在掌握完整证据的情况下才会上报问题。
启动时声明: "I'm using the resilient-execution skill — I will try multiple approaches before escalating."

Phase 1: Error Classification

阶段1:错误分类

When an approach fails, immediately classify the error before retrying:
Error TypeDefinitionIndicatorsCorrect Response
TransientTemporary infrastructure failureNetwork timeout, rate limit, 503 error, lock contentionWait briefly, retry the same approach
EnvironmentalMissing or misconfigured dependencyModule not found, wrong version, missing env var, permission deniedFix the environment, then retry same approach
LogicalWrong approach or incorrect assumptionWrong output, unexpected behavior, type mismatch, wrong API usageRethink the approach entirely
FundamentalGenuinely impossible with available toolsAPI does not exist, hardware limitation, missing capabilityEscalate to user with evidence
<HARD-GATE> You MUST try at least 3 different approaches before telling the user something cannot be done. "I tried and it didn't work" is not acceptable without evidence of 3 genuine attempts with meaningfully different strategies. </HARD-GATE>
STOP: Classify the error before choosing your next approach. Wrong classification leads to wasted retries.

当方案失败时,重试前请立即对错误进行分类:
错误类型定义判定指标正确应对方式
Transient(瞬态)临时基础设施故障网络超时、速率限制、503错误、锁竞争短暂等待后重试同一方案
Environmental(环境类)依赖缺失或配置错误模块未找到、版本错误、环境变量缺失、权限不足修复环境后重试同一方案
Logical(逻辑类)方案错误或假设不成立输出错误、行为异常、类型不匹配、API调用错误彻底重新设计方案
Fundamental(根本性)现有工具确实无法实现API不存在、硬件限制、能力缺失携带证据向用户上报
<HARD-GATE> 在告知用户某件事无法完成之前,你必须至少尝试3种不同的方案。没有3种采用存在本质差异的策略的真实尝试作为证据,仅说「我试过了但没用」是不被接受的。 </HARD-GATE>
停止操作:选择下一个方案前请先对错误分类。错误的分类会导致重试资源浪费。

Phase 2: Approach Cascade

阶段2:方案级联

Execute the cascade systematically. Each attempt must be a genuinely different strategy.
Attempt 1: Primary approach (most direct solution)
    | fails
    v
Classify error -> Can same approach work with a fix?
    | YES -> Fix and retry (does NOT count as a new attempt)
    | NO  -> Proceed to Attempt 2
    v
Attempt 2: Alternative approach 1 (different technique)
    | fails
    v
Classify error -> Is this fundamentally blocked?
    | YES -> Proceed directly to escalation
    | NO  -> Proceed to Attempt 3
    v
Attempt 3: Alternative approach 2 (different path entirely)
    | fails
    v
Circuit breaker -> Present findings to user with full evidence
系统性执行级联流程,每次尝试都必须是存在本质差异的策略。
Attempt 1: Primary approach (most direct solution)
    | fails
    v
Classify error -> Can same approach work with a fix?
    | YES -> Fix and retry (does NOT count as a new attempt)
    | NO  -> Proceed to Attempt 2
    v
Attempt 2: Alternative approach 1 (different technique)
    | fails
    v
Classify error -> Is this fundamentally blocked?
    | YES -> Proceed directly to escalation
    | NO  -> Proceed to Attempt 3
    v
Attempt 3: Alternative approach 2 (different path entirely)
    | fails
    v
Circuit breaker -> Present findings to user with full evidence

For Each Attempt, Log:

每次尝试需记录以下内容:

markdown
undefined
markdown
undefined

Attempt N: [Approach Name]

Attempt N: [Approach Name]

Strategy: [what makes this different from previous attempts] What I tried: [specific description with commands/code] What happened: [exact error or unexpected result] Why it failed: [root cause analysis] Classification: [Transient / Environmental / Logical / Fundamental] What to try next: [reasoning for next approach]

> **STOP: Log every attempt before moving to the next. Do NOT skip logging — it is evidence for the escalation report.**

---
Strategy: [what makes this different from previous attempts] What I tried: [specific description with commands/code] What happened: [exact error or unexpected result] Why it failed: [root cause analysis] Classification: [Transient / Environmental / Logical / Fundamental] What to try next: [reasoning for next approach]

> **停止操作:进入下一次尝试前请先记录本次尝试的所有信息。不得跳过日志记录——这是上报报告的证据。**

---

Phase 3: Alternative Approach Selection

阶段3:备选方案选择

When the primary approach fails, select the next approach using this decision table:
Failure TypeStrategy 1Strategy 2Strategy 3
Library/API does not workDifferent libraryDirect implementation (no library)Shell command / external tool
Algorithm produces wrong resultDifferent algorithmDecompose into smaller stepsSimplify constraints, solve easier version
Permission/access deniedDifferent access methodEscalate with manual stepsWork around via alternative path
Tool limitationDifferent toolCombine multiple toolsProvide manual instructions
Integration failureMock the dependencyUse alternative interfaceIsolate and test components separately
Performance issueDifferent data structureBatch/stream processingApproximate solution
当主方案失败时,参考以下决策表选择下一个方案:
失败类型策略1策略2策略3
库/API不可用更换其他库不依赖库直接实现调用Shell命令/外部工具
算法输出错误结果更换其他算法拆分为更小的步骤执行简化约束,先解决更简单的版本
权限/访问被拒绝更换访问方式提供手动步骤上报通过备选路径绕过限制
工具存在局限性更换其他工具组合多个工具实现提供手动操作指引
集成失败Mock依赖项使用备选接口隔离组件分别测试
性能问题更换数据结构批量/流式处理采用近似解决方案

Alternative Strategy Hierarchy

备选策略优先级

Try these in order of preference:
  1. Different tool — use a different library, API, or command
  2. Different algorithm — solve the same problem a different way
  3. Decompose — break the problem into smaller, solvable parts
  4. Simplify — remove constraints and solve a simpler version first
  5. Work around — achieve the goal through a different path entirely
  6. Manual steps — provide clear instructions the user can follow themselves

按优先级从高到低尝试:
  1. 更换工具 —— 使用不同的库、API或命令
  2. 更换算法 —— 用不同的方法解决同一个问题
  3. 问题拆解 —— 将问题拆分为更小的、可解决的部分
  4. 简化问题 —— 去掉约束,先解决更简单的版本
  5. 路径绕过 —— 通过完全不同的路径达成目标
  6. 手动步骤 —— 提供清晰的指引让用户自行操作

Phase 4: Escalation Report

阶段4:上报报告

After 3 genuine attempts with different approaches, produce this report:
markdown
undefined
完成3次采用不同方案的真实尝试后,生成如下报告:
markdown
undefined

Execution Report

Execution Report

I tried 3 different approaches to [goal]:
I tried 3 different approaches to [goal]:

Attempt 1: [Approach Name]

Attempt 1: [Approach Name]

Strategy: [description] Result: Failed because [specific reason] Error: [exact error message or unexpected output]
Strategy: [description] Result: Failed because [specific reason] Error: [exact error message or unexpected output]

Attempt 2: [Approach Name]

Attempt 2: [Approach Name]

Strategy: [description] Result: Failed because [specific reason] Error: [exact error message or unexpected output]
Strategy: [description] Result: Failed because [specific reason] Error: [exact error message or unexpected output]

Attempt 3: [Approach Name]

Attempt 3: [Approach Name]

Strategy: [description] Result: Failed because [specific reason] Error: [exact error message or unexpected output]
Strategy: [description] Result: Failed because [specific reason] Error: [exact error message or unexpected output]

Root Cause Analysis

Root Cause Analysis

[Why all three approaches failed — identify the common blocker]
[Why all three approaches failed — identify the common blocker]

Recommended Next Steps

Recommended Next Steps

  • Option A: [what the user could try]
  • Option B: [alternative path]
  • Option C: [if applicable]
  • Option A: [what the user could try]
  • Option B: [alternative path]
  • Option C: [if applicable]

What I Need From You to Proceed

What I Need From You to Proceed

[Specific ask — access, information, permission, or decision]

> **STOP: Do NOT escalate without this report. The user needs evidence that 3 genuine attempts were made.**

---
[Specific ask — access, information, permission, or decision]

> **停止操作:没有该报告不得上报问题。用户需要证明你确实完成了3次有效尝试的证据。**

---

Decision Table: When Retries Count as "Genuine"

决策表:重试判定为「有效尝试」的标准

Counts as Genuine AttemptDoes NOT Count
Different library or toolSame library with different import
Different algorithm or data structureSame algorithm with tweaked parameters
Different architectural approachSame approach with minor code changes
Manual workaround vs automatedSame automation with retry loop
Breaking problem into sub-problemsSame monolithic approach with logging added
Using an entirely different APISame API with different authentication method (unless auth was the error)

属于有效尝试不属于有效尝试
使用不同的库或工具同一个库仅修改导入方式
使用不同的算法或数据结构同一个算法仅调整参数
使用不同的架构方案同一个方案仅做少量代码修改
手动绕过方案 vs 自动化方案同一个自动化方案仅增加重试循环
将问题拆分为子问题解决同一个整体方案仅增加日志
使用完全不同的API同一个API仅修改认证方式(除非认证是错误根源)

Anti-Patterns / Common Mistakes

反模式/常见错误

What NOT to DoWhy It FailsWhat to Do Instead
Retry the same approach 3 times and call it "3 attempts"Same approach = same failure. Not genuine alternatives.Each attempt must use a meaningfully different strategy
Give up after 1 failureMisses 2+ viable approachesAlways try at least 3 genuinely different approaches
Skip error classificationWithout classification, you retry wrong thingsClassify BEFORE choosing next approach
Hide failed attempts from the userUser cannot help without contextLog and report every attempt transparently
Escalate without trying manual workaroundMany things that fail in automation work manuallyAlways consider manual steps as Approach 3
Blame the platform without investigation"Platform limitation" is often wrongSearch for workarounds before declaring impossible
Fix environment issues and count as new attemptFixing env + retrying same approach is 1 attemptOnly count genuinely different strategies
Skip logging intermediate attemptsLoses evidence trail, cannot produce escalation reportLog every attempt immediately

禁止行为错误原因正确做法
重复尝试同一个方案3次并称之为「3次尝试」同一个方案=同样的失败,不属于有效备选方案每次尝试必须采用存在本质差异的策略
1次失败后就放弃会错过至少2种可行的方案始终至少尝试3种完全不同的方案
跳过错误分类没有分类就会重试错误的方案选择下一个方案前先完成错误分类
向用户隐藏失败的尝试没有上下文用户无法提供帮助透明地记录并上报每一次尝试
没有尝试手动绕过方案就上报很多自动化失败的场景手动操作可以成功始终将手动步骤作为第3种方案考虑
没有调研就将问题归咎于平台限制「平台限制」的判定通常是错误的声明无法实现前先搜索绕过方案
修复环境问题后重试算作新的尝试修复环境+重试同一方案只能算1次尝试仅完全不同的策略才算新的尝试
跳过中间尝试的日志记录丢失证据链,无法生成上报报告每次尝试完成后立即记录

Anti-Rationalization Guards

反合理化规则

ThoughtReality
"This genuinely cannot be done"Have you tried 3 different approaches? Probably not.
"The error is clear, I know what is wrong"Clear errors can have hidden root causes. Investigate.
"I have already tried everything"List what you tried. There are always more options.
"The user should fix this themselves"Provide a manual path, but try 3 approaches first.
"This is a platform limitation"Limitations often have workarounds. Search for them.
"The same error keeps happening"Same error with different approaches = different root cause. Classify.
"This is taking too long"Giving up takes longer when the user has to start over.
"A simpler version would not be useful"A working simple version beats a broken complex one.
Do NOT escalate without 3 genuine attempts. Period.

错误想法事实
「这确实不可能做到」你试过3种不同的方案了吗?大概率没有。
「错误很明显,我知道问题出在哪」明显的错误可能存在隐藏的根因,需要调研。
「我已经试过所有方案了」列出来你试过的方案,永远有更多可选方案。
「用户应该自己修复这个问题」先尝试3种方案,再提供手动路径。
「这是平台的限制」限制通常有绕过方案,去搜索。
「一直报同一个错误」不同方案出现同一个错误=不同的根因,需要分类。
「这太耗费时间了」用户重新开始解决问题会耗费更多时间。
「简化版本没有用」能运行的简化版本好过无法运行的复杂版本。
没有3次有效尝试绝对不能上报。没有例外。

Integration Points

集成点

SkillRelationship
circuit-breaker
Activated after resilient-execution exhausts retries at the loop level
task-management
Invokes resilient-execution when a task step fails
self-learning
Records failure patterns to avoid repeating them in future sessions
planning
Uses failure history to choose more robust approaches
auto-improvement
Tracks retry success rates and approach effectiveness
verification-before-completion
Invokes resilient-execution if verification fails

Skill关联关系
circuit-breaker
resilient-execution在循环层面耗尽重试次数后激活
task-management
任务步骤失败时调用resilient-execution
self-learning
记录失败模式,避免未来会话重复出现相同问题
planning
利用失败历史选择更鲁棒的方案
auto-improvement
跟踪重试成功率和方案有效性
verification-before-completion
验证失败时调用resilient-execution

Concrete Examples

具体示例

Example: File Parsing Failure

示例:文件解析失败

Attempt 1: JSON.parse() on the file
  Result: SyntaxError — file contains comments (JSONC format)
  Classification: Logical — wrong parser for this format

Attempt 2: Strip comments with regex, then JSON.parse()
  Result: Failed — nested block comments not handled
  Classification: Logical — regex too simple for comment stripping

Attempt 3: Use `jsonc-parser` library (handles JSONC natively)
  Result: Success — file parsed correctly
Attempt 1: JSON.parse() on the file
  Result: SyntaxError — file contains comments (JSONC format)
  Classification: Logical — wrong parser for this format

Attempt 2: Strip comments with regex, then JSON.parse()
  Result: Failed — nested block comments not handled
  Classification: Logical — regex too simple for comment stripping

Attempt 3: Use `jsonc-parser` library (handles JSONC natively)
  Result: Success — file parsed correctly

Example: API Integration Failure

示例:API集成失败

Attempt 1: Direct HTTP request to API endpoint
  Result: 403 Forbidden — authentication required
  Classification: Environmental — missing auth config

  Fix: Add API key from .env
  Result: 429 Too Many Requests — rate limited
  Classification: Transient — wait and retry
  Result: 200 OK but response format changed from docs
  Classification: Logical — API version mismatch

Attempt 2: Use official SDK instead of raw HTTP
  Result: SDK throws "unsupported region" error
  Classification: Environmental — region config needed

Attempt 3: Use GraphQL endpoint instead of REST
  Result: Success — GraphQL endpoint supports all regions

Attempt 1: Direct HTTP request to API endpoint
  Result: 403 Forbidden — authentication required
  Classification: Environmental — missing auth config

  Fix: Add API key from .env
  Result: 429 Too Many Requests — rate limited
  Classification: Transient — wait and retry
  Result: 200 OK but response format changed from docs
  Classification: Logical — API version mismatch

Attempt 2: Use official SDK instead of raw HTTP
  Result: SDK throws "unsupported region" error
  Classification: Environmental — region config needed

Attempt 3: Use GraphQL endpoint instead of REST
  Result: Success — GraphQL endpoint supports all regions

Key Principles

核心原则

  • Never give up silently — always show what was tried
  • Genuine alternatives — each attempt must be a meaningfully different approach, not the same thing with minor tweaks
  • Root cause analysis — understand WHY before trying the next approach
  • Learn from failure — update memory with what did not work and why
  • Transparent — show the user your reasoning at each step
  • Classify first — error type determines whether to retry same approach or try a new one

  • 永远不要默默放弃 —— 始终展示你尝试过的方案
  • 有效备选方案 —— 每次尝试必须是存在本质差异的方案,不是同一个方案的微小调整
  • 根因分析 —— 尝试下一个方案前先理解失败的原因
  • 从失败中学习 —— 记录无效方案及原因,更新到记忆中
  • 透明化 —— 每一步都向用户展示你的推理过程
  • 先分类再处理 —— 错误类型决定了是重试同一方案还是尝试新方案

Skill Type

Skill类型

RIGID — The 3-attempt minimum is a HARD-GATE. Error classification is mandatory before each retry. The escalation report format must be followed exactly. Do not relax these requirements regardless of perceived simplicity.
RIGID(刚性规则) —— 至少3次尝试是HARD-GATE(硬性门槛)。每次重试前必须进行错误分类。必须严格遵循上报报告格式。无论感知到的问题有多简单,都不得放宽这些要求。