error-recovery
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseError Recovery
错误恢复
Overview
概述
Handle failures gracefully with structured recovery.
Core principle: When things break, don't panic. Assess, preserve, recover, verify.
Announce at start: "I'm using error-recovery to handle this failure."
通过结构化恢复流程优雅地处理故障。
核心原则: 遇到故障时不要惊慌。按评估、留存、恢复、验证的步骤处理。
启动时声明: "我正在使用error-recovery来处理此故障。"
The Recovery Protocol
恢复流程
Error Detected
│
▼
┌─────────────┐
│ 1. ASSESS │ ← Severity? Scope? Impact?
└──────┬──────┘
│
▼
┌─────────────┐
│ 2. PRESERVE │ ← Capture evidence before it's lost
└──────┬──────┘
│
▼
┌─────────────┐
│ 3. RECOVER │ ← Follow decision tree
└──────┬──────┘
│
▼
┌─────────────┐
│ 4. VERIFY │ ← Confirm clean state
└──────┬──────┘
│
▼
┌─────────────┐
│ 5. DOCUMENT │ ← Record what happened
└─────────────┘Error Detected
│
▼
┌─────────────┐
│ 1. ASSESS │ ← 评估严重程度?影响范围?影响程度?
└──────┬──────┘
│
▼
┌─────────────┐
│ 2. PRESERVE │ ← 在证据丢失前先捕获
└──────┬──────┘
│
▼
┌─────────────┐
│ 3. RECOVER │ ← 遵循决策树执行
└──────┬──────┘
│
▼
┌─────────────┐
│ 4. VERIFY │ ← 确认状态正常
└──────┬──────┘
│
▼
┌─────────────┐
│ 5. DOCUMENT │ ← 记录事件经过
└─────────────┘Step 1: Assess Severity
步骤1:评估严重程度
Severity Levels
严重程度等级
| Level | Description | Examples |
|---|---|---|
| Critical | System unusable, data at risk | Build completely broken, tests cause data loss |
| Major | Significant functionality broken | Feature doesn't work, many tests failing |
| Minor | Isolated issue, workaround exists | Single test flaky, style error |
| Info | Warning only, not blocking | Deprecation notice, performance hint |
| 等级 | 描述 | 示例 |
|---|---|---|
| Critical(严重) | 系统不可用,数据面临风险 | 构建完全失败,测试导致数据丢失 |
| Major(主要) | 核心功能损坏 | 功能无法使用,大量测试失败 |
| Minor(次要) | 孤立问题,存在替代方案 | 单个测试不稳定,样式错误 |
| Info(信息) | 仅警告,不阻塞流程 | 弃用通知,性能提示 |
Assessment Questions
评估问题模板
markdown
undefinedmarkdown
undefinedError Assessment
错误评估
Error: [Description of error]
Location: [Where it occurred]
错误: [错误描述]
位置: [发生位置]
Severity Checklist
严重程度检查清单
- Is the system still functional?
- Is any data at risk?
- Are other features affected?
- Is this blocking progress?
- 系统是否仍可正常运行?
- 是否有数据面临风险?
- 其他功能是否受影响?
- 是否阻塞了开发进度?
Scope
影响范围
- Files affected: [list]
- Features affected: [list]
- Users affected: [none/some/all]
undefined- 受影响文件:[列表]
- 受影响功能:[列表]
- 受影响用户:[无/部分/全部]
undefinedStep 2: Preserve Evidence
步骤2:留存证据
Capture BEFORE attempting fixes:
在尝试修复前先捕获:
Error Logs
错误日志
bash
undefinedbash
undefinedCapture error output
捕获错误输出
pnpm test 2>&1 | tee error-log.txt
pnpm test 2>&1 | tee error-log.txt
Or from failed command
或从失败的命令中捕获
./failing-command 2>&1 | tee error-log.txt
undefined./failing-command 2>&1 | tee error-log.txt
undefinedStack Traces
堆栈跟踪
markdown
undefinedmarkdown
undefinedStack Trace
堆栈跟踪
Error: Connection refused
at Database.connect (src/db/connection.ts:45)
at UserService.init (src/services/user.ts:23)
at main (src/index.ts:12)undefinedError: Connection refused
at Database.connect (src/db/connection.ts:45)
at UserService.init (src/services/user.ts:23)
at main (src/index.ts:12)undefinedState Capture
状态捕获
bash
undefinedbash
undefinedGit state
Git状态
git status
git diff
git status
git diff
Environment state
环境状态
env | grep -E "NODE|NPM|PATH"
env | grep -E "NODE|NPM|PATH"
Dependency state
依赖状态
pnpm list
undefinedpnpm list
undefinedScreenshot (if visual)
截图(如果是可视化问题)
For UI errors, capture screenshots before changes.
对于UI错误,在修改前先截图。
Step 3: Recover
步骤3:恢复
Decision Tree
决策树
What type of failure?
│
┌────┴────┬────────────┬────────────┐
│ │ │ │
Code Build Environment External
Error Error Issue Service
│ │ │ │
▼ ▼ ▼ ▼
┌────┐ ┌────┐ ┌────┐ ┌────┐
│Git │ │Clean│ │Re- │ │Wait/│
│reco│ │build│ │init │ │Retry│
│very│ │ │ │ │ │ │
└────┘ └────┘ └────┘ └────┘故障类型?
│
┌────┴────┬────────────┬────────────┐
│ │ │ │
代码错误 构建错误 环境问题 外部服务问题
│ │ │ │
▼ ▼ ▼ ▼
┌────┐ ┌────┐ ┌────┐ ┌────┐
│Git │ │清理│ │重新│ │等待/│
│恢复│ │构建│ │初始化│ │重试│
│ │ │ │ │ │ │ │
└────┘ └────┘ └────┘ └────┘Code Error Recovery
代码错误恢复
Single file broken:
bash
undefined单个文件损坏:
bash
undefinedRevert just that file
仅还原该文件
git checkout HEAD -- path/to/file.ts
**Feature broken (multiple files):**
```bashgit checkout HEAD -- path/to/file.ts
**功能损坏(多个文件):**
```bashFind last good commit
找到最后一个正常的提交
git log --oneline
git log --oneline
Revert to that commit (soft reset keeps changes staged)
还原到该提交(soft reset会保留暂存的更改)
git reset --soft [GOOD_COMMIT]
git reset --soft [正常提交哈希]
Or hard reset (discards changes)
或hard reset(丢弃更改)
git reset --hard [GOOD_COMMIT]
**Working directory is a mess:**
```bashgit reset --hard [正常提交哈希]
**工作目录混乱:**
```bashStash current changes
暂存当前更改
git stash
git stash
Verify clean state
验证干净状态
git status
git status
Optionally recover stash later
之后可选择性恢复暂存内容
git stash pop
undefinedgit stash pop
undefinedBuild Error Recovery
构建错误恢复
bash
undefinedbash
undefinedClean build artifacts
清理构建产物
rm -rf node_modules dist build .cache
rm -rf node_modules dist build .cache
Reinstall dependencies
重新安装依赖
pnpm install --frozen-lockfile # Clean install from lock file
pnpm install --frozen-lockfile # 从锁文件执行干净安装
Rebuild
重新构建
pnpm build
undefinedpnpm build
undefinedEnvironment Error Recovery
环境错误恢复
bash
undefinedbash
undefinedCheck environment
检查环境
env | grep -E "NODE|PNPM"
env | grep -E "NODE|PNPM"
Reset Node modules
重置Node模块
rm -rf node_modules
pnpm install --frozen-lockfile
rm -rf node_modules
pnpm install --frozen-lockfile
If using nvm, verify version
如果使用nvm,验证版本
nvm use
nvm use
Re-run init script
重新运行初始化脚本
./scripts/init.sh
undefined./scripts/init.sh
undefinedExternal Service Error
外部服务错误
bash
undefinedbash
undefinedCheck if service is up
检查服务是否可用
If down, wait and retry
如果不可用,等待并重试
sleep 60
curl -I https://service.example.com/health
sleep 60
curl -I https://service.example.com/health
If still down, check status page
如果仍然不可用,查看状态页面
Document as external blocker
记录为外部阻塞问题
undefinedundefinedStep 4: Verify
步骤4:验证
After recovery, verify clean state:
恢复后,验证干净状态:
Basic Verification
基础验证
bash
undefinedbash
undefinedClean working directory
检查工作目录状态
git status
git status
Expected: "nothing to commit, working tree clean" or known changes
预期结果:"nothing to commit, working tree clean" 或已知的更改
Tests pass
测试是否通过
pnpm test
pnpm test
Build succeeds
构建是否成功
pnpm build
pnpm build
Types check
类型检查是否通过
pnpm typecheck
undefinedpnpm typecheck
undefinedFunctionality Verification
功能验证
bash
undefinedbash
undefinedRun the specific thing that was broken
运行之前损坏的特定测试
pnpm test --grep "specific test"
pnpm test --grep "特定测试名称"
Or verify the feature manually
或手动验证功能
undefinedundefinedStep 5: Document
步骤5:记录
Issue Comment
问题评论
bash
gh issue comment [ISSUE_NUMBER] --body "## Error Recovery
**Error encountered:** [Description]
**Severity:** Major
**Evidence:**
\`\`\`
[Error output]
\`\`\`
**Recovery actions:**
1. [Action 1]
2. [Action 2]
**Verification:**
- [x] Tests pass
- [x] Build succeeds
**Root cause:** [If known]
**Prevention:** [If applicable]
"bash
gh issue comment [问题编号] --body "## 错误恢复
**遇到的错误:** [描述]
**严重程度:** 主要
**证据:**
\`\`\`
[错误输出]
\`\`\`
**恢复操作:**
1. [操作1]
2. [操作2]
**验证结果:**
- [x] 测试通过
- [x] 构建成功
**根因:** [如果已知]
**预防措施:** [如果适用]
"Knowledge Graph
知识图谱
javascript
// Store for future reference
mcp__memory__add_observations({
observations: [{
entityName: "Issue #[NUMBER]",
contents: [
"Encountered [error type] on [date]",
"Caused by: [root cause]",
"Resolved by: [recovery action]"
]
}]
});javascript
// 存储以供未来参考
mcp__memory__add_observations({
observations: [{
entityName: "问题 #[编号]",
contents: [
"[日期] 遇到[错误类型]",
"根因:[根因描述]",
"解决方式:[恢复操作]"
]
}]
});Common Recovery Patterns
常见恢复模式
"Tests were passing, now failing"
"之前测试通过,现在失败"
bash
undefinedbash
undefinedWhat changed?
查看有哪些变更?
git diff HEAD~3
git diff HEAD~3
Did dependencies change?
依赖是否有变更?
git diff HEAD~3 pnpm-lock.yaml
git diff HEAD~3 pnpm-lock.yaml
Clean reinstall
清理后重新安装
rm -rf node_modules && pnpm install --frozen-lockfile
undefinedrm -rf node_modules && pnpm install --frozen-lockfile
undefined"Works locally, fails in CI"
"本地正常,CI中失败"
bash
undefinedbash
undefinedCheck for environment differences
检查环境差异
- Node version
- Node版本
- OS differences
- 操作系统差异
- Env vars
- 环境变量
Run with CI-like settings
使用类似CI的设置运行
CI=true pnpm test
undefinedCI=true pnpm test
undefined"Build was working, now broken"
"之前构建正常,现在失败"
bash
undefinedbash
undefinedCheck TypeScript errors
检查TypeScript错误
pnpm typecheck
pnpm typecheck
Check for circular dependencies
检查循环依赖
pnpm dlx madge --circular src/
pnpm dlx madge --circular src/
Clean build
清理后重新构建
rm -rf dist && pnpm build
undefinedrm -rf dist && pnpm build
undefined"I broke everything"
"我搞砸了所有事情"
bash
undefinedbash
undefinedDon't panic
不要惊慌
Find last known good state
找到最后一个已知的正常状态
git log --oneline
git log --oneline
Reset to that state
重置到该状态
git reset --hard [GOOD_COMMIT]
git reset --hard [正常提交哈希]
Verify
验证
pnpm test
pnpm test
Start again more carefully
更谨慎地重新开始
undefinedundefinedEscalation
升级处理
If recovery fails after 2-3 attempts:
markdown
undefined如果2-3次尝试后仍无法恢复:
markdown
undefinedEscalation: Unrecoverable Error
升级处理:无法恢复的错误
Issue: #[NUMBER]
Error: [Description]
Recovery attempts:
- [Attempt 1] - [Result]
- [Attempt 2] - [Result]
Current state: [Broken/Partially working]
Evidence preserved: [Links to logs, screenshots]
Requesting help with: [Specific question]
Mark issue as Blocked and await human input.问题: #[编号]
错误描述: [描述]
恢复尝试:
- [尝试1] - [结果]
- [尝试2] - [结果]
当前状态: [损坏/部分可用]
已留存证据: [日志、截图链接]
请求协助: [具体问题]
将问题标记为阻塞状态并等待人工介入。Checklist
检查清单
When error occurs:
- Severity assessed
- Evidence preserved (logs, state, screenshots)
- Recovery action selected
- Recovery executed
- Clean state verified
- Tests pass
- Build succeeds
- Issue documented
发生错误时:
- 已评估严重程度
- 已留存证据(日志、状态、截图)
- 已选择恢复操作
- 已执行恢复
- 已验证干净状态
- 测试通过
- 构建成功
- 已记录问题
Integration
集成
This skill is called by:
- - When errors occur
issue-driven-development - - CI failures
ci-monitoring
This skill may trigger:
- - If cause is unknown
research-after-failure - Issue update via
issue-lifecycle
此技能由以下模块调用:
- - 发生错误时
issue-driven-development - - CI失败时
ci-monitoring
此技能可能触发:
- - 当原因未知时
research-after-failure - 通过更新问题
issue-lifecycle