openclaw-self-healing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenClaw Self-Healing System

OpenClaw 自愈系统

"The system that heals itself — or calls for help when it can't."
A 4-tier autonomous self-healing system for OpenClaw Gateway.
“能自我修复的系统——当它无法自行修复时,会主动寻求帮助。”
一款面向OpenClaw Gateway的四层自主自愈系统。

Architecture

架构

Level 1: Watchdog (180s)     → Process monitoring (OpenClaw built-in)
Level 2: Health Check (300s) → HTTP 200 + 3 retries
Level 3: Claude Recovery     → 30min AI-powered diagnosis 🧠
Level 4: Discord Alert       → Human escalation
Level 1: Watchdog (180s)     → Process monitoring (OpenClaw built-in)
Level 2: Health Check (300s) → HTTP 200 + 3 retries
Level 3: Claude Recovery     → 30min AI-powered diagnosis 🧠
Level 4: Discord Alert       → Human escalation

What's Special (v2.0)

新增特性(v2.0)

  • World's first Claude Code as Level 3 emergency doctor
  • Persistent Learning - Automatic recovery documentation (symptom → cause → solution → prevention)
  • Reasoning Logs - Explainable AI decision-making process
  • Multi-Channel Alerts - Discord + Telegram support
  • Metrics Dashboard - Success rate, recovery time, trending analysis
  • Production-tested (verified recovery Feb 5-6, 2026)
  • macOS LaunchAgent integration
  • 全球首创 将Claude Code作为三级应急诊断工具
  • 持续学习 - 自动生成故障恢复文档(症状→原因→解决方案→预防措施)
  • 推理日志 - 可解释的AI决策过程
  • 多渠道告警 - 支持Discord + Telegram
  • 指标仪表盘 - 成功率、恢复时间、趋势分析
  • 经生产环境测试(2026年2月5-6日验证恢复有效)
  • 集成macOS LaunchAgent

Quick Setup

快速设置

1. Install Dependencies

1. 安装依赖

bash
brew install tmux
npm install -g @anthropic-ai/claude-code
bash
brew install tmux
npm install -g @anthropic-ai/claude-code

2. Configure Environment

2. 配置环境变量

bash
undefined
bash
undefined

Copy template to OpenClaw config directory

复制模板到OpenClaw配置目录

cp .env.example ~/.openclaw/.env
cp .env.example ~/.openclaw/.env

Edit and add your Discord webhook (optional)

编辑并添加你的Discord webhook(可选)

nano ~/.openclaw/.env
undefined
nano ~/.openclaw/.env
undefined

3. Install Scripts

3. 安装脚本

bash
undefined
bash
undefined

Copy scripts

复制脚本

cp scripts/.sh ~/openclaw/scripts/ chmod +x ~/openclaw/scripts/.sh
cp scripts/.sh ~/openclaw/scripts/ chmod +x ~/openclaw/scripts/.sh

Install LaunchAgent

安装LaunchAgent

cp launchagent/com.openclaw.healthcheck.plist ~/Library/LaunchAgents/ launchctl load ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
undefined
cp launchagent/com.openclaw.healthcheck.plist ~/Library/LaunchAgents/ launchctl load ~/Library/LaunchAgents/com.openclaw.healthcheck.plist
undefined

4. Verify

4. 验证安装

bash
undefined
bash
undefined

Check Health Check is running

检查健康检查是否在运行

launchctl list | grep openclaw.healthcheck
launchctl list | grep openclaw.healthcheck

View logs

查看日志

tail -f ~/openclaw/memory/healthcheck-$(date +%Y-%m-%d).log
undefined
tail -f ~/openclaw/memory/healthcheck-$(date +%Y-%m-%d).log
undefined

Scripts

脚本说明

ScriptLevelDescription
gateway-healthcheck.sh
2HTTP 200 check + 3 retries + escalation
emergency-recovery.sh
3Claude Code PTY session for AI diagnosis (v1)
emergency-recovery-v2.sh
3Enhanced with learning + reasoning logs (v2) ⭐
emergency-recovery-monitor.sh
4Discord/Telegram notification on failure
metrics-dashboard.sh
-Visualize recovery statistics (NEW)
脚本名称层级描述
gateway-healthcheck.sh
2HTTP 200检查 + 3次重试 + 升级告警
emergency-recovery.sh
3用于AI诊断的Claude Code PTY会话(v1版本)
emergency-recovery-v2.sh
3新增学习功能与推理日志的增强版本(v2)⭐
emergency-recovery-monitor.sh
4故障时发送Discord/Telegram通知
metrics-dashboard.sh
-可视化恢复统计数据(新增)

Configuration

配置说明

All settings via environment variables in
~/.openclaw/.env
:
VariableDefaultDescription
DISCORD_WEBHOOK_URL
(none)Discord webhook for alerts
OPENCLAW_GATEWAY_URL
http://localhost:18789/
Gateway health check URL
HEALTH_CHECK_MAX_RETRIES
3
Restart attempts before escalation
EMERGENCY_RECOVERY_TIMEOUT
1800
Claude recovery timeout (30 min)
所有设置通过
~/.openclaw/.env
中的环境变量进行配置:
变量名称默认值描述
DISCORD_WEBHOOK_URL
用于告警的Discord webhook地址
OPENCLAW_GATEWAY_URL
http://localhost:18789/
网关健康检查地址
HEALTH_CHECK_MAX_RETRIES
3
升级告警前的重启尝试次数
EMERGENCY_RECOVERY_TIMEOUT
1800
Claude恢复超时时间(30分钟)

Testing

测试指南

Test Level 2 (Health Check)

测试层级2(健康检查)

bash
undefined
bash
undefined

Run manually

手动运行

bash ~/openclaw/scripts/gateway-healthcheck.sh
bash ~/openclaw/scripts/gateway-healthcheck.sh

Expected output:

预期输出:

✅ Gateway healthy

✅ Gateway healthy

undefined
undefined

Test Level 3 (Claude Recovery)

测试层级3(Claude恢复)

bash
undefined
bash
undefined

Inject a config error (backup first!)

注入配置错误(请先备份!)

cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak
cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak

Wait for Health Check to detect and escalate (~8 min)

等待健康检查检测到故障并升级告警(约8分钟)

tail -f ~/openclaw/memory/emergency-recovery-*.log
undefined
tail -f ~/openclaw/memory/emergency-recovery-*.log
undefined

Links

相关链接

License

许可证

MIT License - do whatever you want with it.
Built by @ramsbaby + Jarvis 🦞
MIT许可证 - 你可以随意使用本项目。
由@ramsbaby + Jarvis 🦞 开发