resilient-execution
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOverview
概述
The resilient-execution skill prevents premature failure by enforcing a minimum of 3 genuinely different approaches before escalating to the user. It provides a structured error classification system, an approach cascade methodology, and transparent logging of each attempt. Without this skill, agents give up too early — with it, they systematically exhaust alternatives and only escalate with full evidence.
Announce at start: "I'm using the resilient-execution skill — I will try multiple approaches before escalating."
resilient-execution skill 通过要求向用户上报问题前至少尝试3种完全不同的解决方案,避免执行提前失败。它提供了结构化的错误分类体系、方案级联方法论,以及每次尝试的透明日志记录。没有该skill时,Agent会过早放弃;使用该skill后,它们会系统性地尝试所有可选方案,且仅在掌握完整证据的情况下才会上报问题。
启动时声明: "I'm using the resilient-execution skill — I will try multiple approaches before escalating."
Phase 1: Error Classification
阶段1:错误分类
When an approach fails, immediately classify the error before retrying:
| Error Type | Definition | Indicators | Correct Response |
|---|---|---|---|
| Transient | Temporary infrastructure failure | Network timeout, rate limit, 503 error, lock contention | Wait briefly, retry the same approach |
| Environmental | Missing or misconfigured dependency | Module not found, wrong version, missing env var, permission denied | Fix the environment, then retry same approach |
| Logical | Wrong approach or incorrect assumption | Wrong output, unexpected behavior, type mismatch, wrong API usage | Rethink the approach entirely |
| Fundamental | Genuinely impossible with available tools | API does not exist, hardware limitation, missing capability | Escalate to user with evidence |
STOP: Classify the error before choosing your next approach. Wrong classification leads to wasted retries.
当方案失败时,重试前请立即对错误进行分类:
| 错误类型 | 定义 | 判定指标 | 正确应对方式 |
|---|---|---|---|
| Transient(瞬态) | 临时基础设施故障 | 网络超时、速率限制、503错误、锁竞争 | 短暂等待后重试同一方案 |
| Environmental(环境类) | 依赖缺失或配置错误 | 模块未找到、版本错误、环境变量缺失、权限不足 | 修复环境后重试同一方案 |
| Logical(逻辑类) | 方案错误或假设不成立 | 输出错误、行为异常、类型不匹配、API调用错误 | 彻底重新设计方案 |
| Fundamental(根本性) | 现有工具确实无法实现 | API不存在、硬件限制、能力缺失 | 携带证据向用户上报 |
停止操作:选择下一个方案前请先对错误分类。错误的分类会导致重试资源浪费。
Phase 2: Approach Cascade
阶段2:方案级联
Execute the cascade systematically. Each attempt must be a genuinely different strategy.
Attempt 1: Primary approach (most direct solution)
| fails
v
Classify error -> Can same approach work with a fix?
| YES -> Fix and retry (does NOT count as a new attempt)
| NO -> Proceed to Attempt 2
v
Attempt 2: Alternative approach 1 (different technique)
| fails
v
Classify error -> Is this fundamentally blocked?
| YES -> Proceed directly to escalation
| NO -> Proceed to Attempt 3
v
Attempt 3: Alternative approach 2 (different path entirely)
| fails
v
Circuit breaker -> Present findings to user with full evidence系统性执行级联流程,每次尝试都必须是存在本质差异的策略。
Attempt 1: Primary approach (most direct solution)
| fails
v
Classify error -> Can same approach work with a fix?
| YES -> Fix and retry (does NOT count as a new attempt)
| NO -> Proceed to Attempt 2
v
Attempt 2: Alternative approach 1 (different technique)
| fails
v
Classify error -> Is this fundamentally blocked?
| YES -> Proceed directly to escalation
| NO -> Proceed to Attempt 3
v
Attempt 3: Alternative approach 2 (different path entirely)
| fails
v
Circuit breaker -> Present findings to user with full evidenceFor Each Attempt, Log:
每次尝试需记录以下内容:
markdown
undefinedmarkdown
undefinedAttempt N: [Approach Name]
Attempt N: [Approach Name]
Strategy: [what makes this different from previous attempts]
What I tried: [specific description with commands/code]
What happened: [exact error or unexpected result]
Why it failed: [root cause analysis]
Classification: [Transient / Environmental / Logical / Fundamental]
What to try next: [reasoning for next approach]
> **STOP: Log every attempt before moving to the next. Do NOT skip logging — it is evidence for the escalation report.**
---Strategy: [what makes this different from previous attempts]
What I tried: [specific description with commands/code]
What happened: [exact error or unexpected result]
Why it failed: [root cause analysis]
Classification: [Transient / Environmental / Logical / Fundamental]
What to try next: [reasoning for next approach]
> **停止操作:进入下一次尝试前请先记录本次尝试的所有信息。不得跳过日志记录——这是上报报告的证据。**
---Phase 3: Alternative Approach Selection
阶段3:备选方案选择
When the primary approach fails, select the next approach using this decision table:
| Failure Type | Strategy 1 | Strategy 2 | Strategy 3 |
|---|---|---|---|
| Library/API does not work | Different library | Direct implementation (no library) | Shell command / external tool |
| Algorithm produces wrong result | Different algorithm | Decompose into smaller steps | Simplify constraints, solve easier version |
| Permission/access denied | Different access method | Escalate with manual steps | Work around via alternative path |
| Tool limitation | Different tool | Combine multiple tools | Provide manual instructions |
| Integration failure | Mock the dependency | Use alternative interface | Isolate and test components separately |
| Performance issue | Different data structure | Batch/stream processing | Approximate solution |
当主方案失败时,参考以下决策表选择下一个方案:
| 失败类型 | 策略1 | 策略2 | 策略3 |
|---|---|---|---|
| 库/API不可用 | 更换其他库 | 不依赖库直接实现 | 调用Shell命令/外部工具 |
| 算法输出错误结果 | 更换其他算法 | 拆分为更小的步骤执行 | 简化约束,先解决更简单的版本 |
| 权限/访问被拒绝 | 更换访问方式 | 提供手动步骤上报 | 通过备选路径绕过限制 |
| 工具存在局限性 | 更换其他工具 | 组合多个工具实现 | 提供手动操作指引 |
| 集成失败 | Mock依赖项 | 使用备选接口 | 隔离组件分别测试 |
| 性能问题 | 更换数据结构 | 批量/流式处理 | 采用近似解决方案 |
Alternative Strategy Hierarchy
备选策略优先级
Try these in order of preference:
- Different tool — use a different library, API, or command
- Different algorithm — solve the same problem a different way
- Decompose — break the problem into smaller, solvable parts
- Simplify — remove constraints and solve a simpler version first
- Work around — achieve the goal through a different path entirely
- Manual steps — provide clear instructions the user can follow themselves
按优先级从高到低尝试:
- 更换工具 —— 使用不同的库、API或命令
- 更换算法 —— 用不同的方法解决同一个问题
- 问题拆解 —— 将问题拆分为更小的、可解决的部分
- 简化问题 —— 去掉约束,先解决更简单的版本
- 路径绕过 —— 通过完全不同的路径达成目标
- 手动步骤 —— 提供清晰的指引让用户自行操作
Phase 4: Escalation Report
阶段4:上报报告
After 3 genuine attempts with different approaches, produce this report:
markdown
undefined完成3次采用不同方案的真实尝试后,生成如下报告:
markdown
undefinedExecution Report
Execution Report
I tried 3 different approaches to [goal]:
I tried 3 different approaches to [goal]:
Attempt 1: [Approach Name]
Attempt 1: [Approach Name]
Strategy: [description]
Result: Failed because [specific reason]
Error: [exact error message or unexpected output]
Strategy: [description]
Result: Failed because [specific reason]
Error: [exact error message or unexpected output]
Attempt 2: [Approach Name]
Attempt 2: [Approach Name]
Strategy: [description]
Result: Failed because [specific reason]
Error: [exact error message or unexpected output]
Strategy: [description]
Result: Failed because [specific reason]
Error: [exact error message or unexpected output]
Attempt 3: [Approach Name]
Attempt 3: [Approach Name]
Strategy: [description]
Result: Failed because [specific reason]
Error: [exact error message or unexpected output]
Strategy: [description]
Result: Failed because [specific reason]
Error: [exact error message or unexpected output]
Root Cause Analysis
Root Cause Analysis
[Why all three approaches failed — identify the common blocker]
[Why all three approaches failed — identify the common blocker]
Recommended Next Steps
Recommended Next Steps
- Option A: [what the user could try]
- Option B: [alternative path]
- Option C: [if applicable]
- Option A: [what the user could try]
- Option B: [alternative path]
- Option C: [if applicable]
What I Need From You to Proceed
What I Need From You to Proceed
[Specific ask — access, information, permission, or decision]
> **STOP: Do NOT escalate without this report. The user needs evidence that 3 genuine attempts were made.**
---[Specific ask — access, information, permission, or decision]
> **停止操作:没有该报告不得上报问题。用户需要证明你确实完成了3次有效尝试的证据。**
---Decision Table: When Retries Count as "Genuine"
决策表:重试判定为「有效尝试」的标准
| Counts as Genuine Attempt | Does NOT Count |
|---|---|
| Different library or tool | Same library with different import |
| Different algorithm or data structure | Same algorithm with tweaked parameters |
| Different architectural approach | Same approach with minor code changes |
| Manual workaround vs automated | Same automation with retry loop |
| Breaking problem into sub-problems | Same monolithic approach with logging added |
| Using an entirely different API | Same API with different authentication method (unless auth was the error) |
| 属于有效尝试 | 不属于有效尝试 |
|---|---|
| 使用不同的库或工具 | 同一个库仅修改导入方式 |
| 使用不同的算法或数据结构 | 同一个算法仅调整参数 |
| 使用不同的架构方案 | 同一个方案仅做少量代码修改 |
| 手动绕过方案 vs 自动化方案 | 同一个自动化方案仅增加重试循环 |
| 将问题拆分为子问题解决 | 同一个整体方案仅增加日志 |
| 使用完全不同的API | 同一个API仅修改认证方式(除非认证是错误根源) |
Anti-Patterns / Common Mistakes
反模式/常见错误
| What NOT to Do | Why It Fails | What to Do Instead |
|---|---|---|
| Retry the same approach 3 times and call it "3 attempts" | Same approach = same failure. Not genuine alternatives. | Each attempt must use a meaningfully different strategy |
| Give up after 1 failure | Misses 2+ viable approaches | Always try at least 3 genuinely different approaches |
| Skip error classification | Without classification, you retry wrong things | Classify BEFORE choosing next approach |
| Hide failed attempts from the user | User cannot help without context | Log and report every attempt transparently |
| Escalate without trying manual workaround | Many things that fail in automation work manually | Always consider manual steps as Approach 3 |
| Blame the platform without investigation | "Platform limitation" is often wrong | Search for workarounds before declaring impossible |
| Fix environment issues and count as new attempt | Fixing env + retrying same approach is 1 attempt | Only count genuinely different strategies |
| Skip logging intermediate attempts | Loses evidence trail, cannot produce escalation report | Log every attempt immediately |
| 禁止行为 | 错误原因 | 正确做法 |
|---|---|---|
| 重复尝试同一个方案3次并称之为「3次尝试」 | 同一个方案=同样的失败,不属于有效备选方案 | 每次尝试必须采用存在本质差异的策略 |
| 1次失败后就放弃 | 会错过至少2种可行的方案 | 始终至少尝试3种完全不同的方案 |
| 跳过错误分类 | 没有分类就会重试错误的方案 | 选择下一个方案前先完成错误分类 |
| 向用户隐藏失败的尝试 | 没有上下文用户无法提供帮助 | 透明地记录并上报每一次尝试 |
| 没有尝试手动绕过方案就上报 | 很多自动化失败的场景手动操作可以成功 | 始终将手动步骤作为第3种方案考虑 |
| 没有调研就将问题归咎于平台限制 | 「平台限制」的判定通常是错误的 | 声明无法实现前先搜索绕过方案 |
| 修复环境问题后重试算作新的尝试 | 修复环境+重试同一方案只能算1次尝试 | 仅完全不同的策略才算新的尝试 |
| 跳过中间尝试的日志记录 | 丢失证据链,无法生成上报报告 | 每次尝试完成后立即记录 |
Anti-Rationalization Guards
反合理化规则
| Thought | Reality |
|---|---|
| "This genuinely cannot be done" | Have you tried 3 different approaches? Probably not. |
| "The error is clear, I know what is wrong" | Clear errors can have hidden root causes. Investigate. |
| "I have already tried everything" | List what you tried. There are always more options. |
| "The user should fix this themselves" | Provide a manual path, but try 3 approaches first. |
| "This is a platform limitation" | Limitations often have workarounds. Search for them. |
| "The same error keeps happening" | Same error with different approaches = different root cause. Classify. |
| "This is taking too long" | Giving up takes longer when the user has to start over. |
| "A simpler version would not be useful" | A working simple version beats a broken complex one. |
Do NOT escalate without 3 genuine attempts. Period.
| 错误想法 | 事实 |
|---|---|
| 「这确实不可能做到」 | 你试过3种不同的方案了吗?大概率没有。 |
| 「错误很明显,我知道问题出在哪」 | 明显的错误可能存在隐藏的根因,需要调研。 |
| 「我已经试过所有方案了」 | 列出来你试过的方案,永远有更多可选方案。 |
| 「用户应该自己修复这个问题」 | 先尝试3种方案,再提供手动路径。 |
| 「这是平台的限制」 | 限制通常有绕过方案,去搜索。 |
| 「一直报同一个错误」 | 不同方案出现同一个错误=不同的根因,需要分类。 |
| 「这太耗费时间了」 | 用户重新开始解决问题会耗费更多时间。 |
| 「简化版本没有用」 | 能运行的简化版本好过无法运行的复杂版本。 |
没有3次有效尝试绝对不能上报。没有例外。
Integration Points
集成点
| Skill | Relationship |
|---|---|
| Activated after resilient-execution exhausts retries at the loop level |
| Invokes resilient-execution when a task step fails |
| Records failure patterns to avoid repeating them in future sessions |
| Uses failure history to choose more robust approaches |
| Tracks retry success rates and approach effectiveness |
| Invokes resilient-execution if verification fails |
| Skill | 关联关系 |
|---|---|
| resilient-execution在循环层面耗尽重试次数后激活 |
| 任务步骤失败时调用resilient-execution |
| 记录失败模式,避免未来会话重复出现相同问题 |
| 利用失败历史选择更鲁棒的方案 |
| 跟踪重试成功率和方案有效性 |
| 验证失败时调用resilient-execution |
Concrete Examples
具体示例
Example: File Parsing Failure
示例:文件解析失败
Attempt 1: JSON.parse() on the file
Result: SyntaxError — file contains comments (JSONC format)
Classification: Logical — wrong parser for this format
Attempt 2: Strip comments with regex, then JSON.parse()
Result: Failed — nested block comments not handled
Classification: Logical — regex too simple for comment stripping
Attempt 3: Use `jsonc-parser` library (handles JSONC natively)
Result: Success — file parsed correctlyAttempt 1: JSON.parse() on the file
Result: SyntaxError — file contains comments (JSONC format)
Classification: Logical — wrong parser for this format
Attempt 2: Strip comments with regex, then JSON.parse()
Result: Failed — nested block comments not handled
Classification: Logical — regex too simple for comment stripping
Attempt 3: Use `jsonc-parser` library (handles JSONC natively)
Result: Success — file parsed correctlyExample: API Integration Failure
示例:API集成失败
Attempt 1: Direct HTTP request to API endpoint
Result: 403 Forbidden — authentication required
Classification: Environmental — missing auth config
Fix: Add API key from .env
Result: 429 Too Many Requests — rate limited
Classification: Transient — wait and retry
Result: 200 OK but response format changed from docs
Classification: Logical — API version mismatch
Attempt 2: Use official SDK instead of raw HTTP
Result: SDK throws "unsupported region" error
Classification: Environmental — region config needed
Attempt 3: Use GraphQL endpoint instead of REST
Result: Success — GraphQL endpoint supports all regionsAttempt 1: Direct HTTP request to API endpoint
Result: 403 Forbidden — authentication required
Classification: Environmental — missing auth config
Fix: Add API key from .env
Result: 429 Too Many Requests — rate limited
Classification: Transient — wait and retry
Result: 200 OK but response format changed from docs
Classification: Logical — API version mismatch
Attempt 2: Use official SDK instead of raw HTTP
Result: SDK throws "unsupported region" error
Classification: Environmental — region config needed
Attempt 3: Use GraphQL endpoint instead of REST
Result: Success — GraphQL endpoint supports all regionsKey Principles
核心原则
- Never give up silently — always show what was tried
- Genuine alternatives — each attempt must be a meaningfully different approach, not the same thing with minor tweaks
- Root cause analysis — understand WHY before trying the next approach
- Learn from failure — update memory with what did not work and why
- Transparent — show the user your reasoning at each step
- Classify first — error type determines whether to retry same approach or try a new one
- 永远不要默默放弃 —— 始终展示你尝试过的方案
- 有效备选方案 —— 每次尝试必须是存在本质差异的方案,不是同一个方案的微小调整
- 根因分析 —— 尝试下一个方案前先理解失败的原因
- 从失败中学习 —— 记录无效方案及原因,更新到记忆中
- 透明化 —— 每一步都向用户展示你的推理过程
- 先分类再处理 —— 错误类型决定了是重试同一方案还是尝试新方案
Skill Type
Skill类型
RIGID — The 3-attempt minimum is a HARD-GATE. Error classification is mandatory before each retry. The escalation report format must be followed exactly. Do not relax these requirements regardless of perceived simplicity.
RIGID(刚性规则) —— 至少3次尝试是HARD-GATE(硬性门槛)。每次重试前必须进行错误分类。必须严格遵循上报报告格式。无论感知到的问题有多简单,都不得放宽这些要求。