Loading...
Loading...
Compare original and translation side by side
services.jsonundefinedservices.jsonundefined| Service | Process Pattern | Health File | Port | Stale Threshold |
|---|---|---|---|---|
| api-server | gunicorn.*app:app | /tmp/api_health.json | 8000 | 300s |
| worker | celery.*worker | /tmp/worker_health.json | - | 300s |
| cache | redis-server | - | 6379 | - |
**Step 3: Validate manifest**
- Confirm each process pattern is specific enough to avoid false matches
- Verify health file paths are absolute
- Ensure port numbers are within valid range (1-65535)
**Gate**: Service manifest complete with at least one service. Proceed only when gate passes.| 服务 | 进程匹配模式 | 健康文件 | 端口 | 过期阈值 |
|---|---|---|---|---|
| api-server | gunicorn.*app:app | /tmp/api_health.json | 8000 | 300s |
| worker | celery.*worker | /tmp/worker_health.json | - | 300s |
| cache | redis-server | - | 6379 | - |
**步骤3:验证清单**
- 确认每个进程匹配模式足够具体,避免误匹配
- 验证健康文件路径为绝对路径
- 确保端口号在有效范围(1-65535)内
**准入条件**:服务清单已完成且至少包含一个服务。仅当满足条件时才可继续。pgrep -f "<process_pattern>"ss -tlnp "sport = :<port>"pgrep -f "<process_pattern>"ss -tlnp "sport = :<port>"SERVICE HEALTH REPORT
=====================
Checked: N services
Healthy: X/N
RESULTS:
service-name [OK ] HEALTHY PID 12345, uptime 2d 4h
background-worker [WARN] WARNING Health file stale (15 min)
cache-service [DOWN] DOWN Process not found
RECOMMENDATIONS:
background-worker: Restart recommended - health file not updated in 900s
cache-service: Start service - process not running
SUGGESTED ACTIONS:
systemctl restart background-worker
systemctl start cache-service服务健康报告
=====================
已检查:N个服务
健康:X/N
结果:
service-name [正常] 健康 PID 12345,运行时间2天4小时
background-worker [警告] 警告 健康文件已过期(15分钟)
cache-service [停止] 已停止 未找到进程
建议:
background-worker: 建议重启 - 健康文件已900秒未更新
cache-service: 启动服务 - 进程未运行
建议操作:
systemctl restart background-worker
systemctl start cache-serviceps aux | grepps aux | grepls -lals -lasystemctl restartsystemctl restartgunicorn.*myapp:appgunicorn.*myapp:app| Rationalization | Why It's Wrong | Required Action |
|---|---|---|
| "Process is running, must be healthy" | Running ≠ functional | Check health file and port |
| "Health file looks fine" | File could be stale from before crash | Verify timestamp freshness |
| "Just restart it" | Restart masks root cause | Report first, restart only if flagged |
| "No config, skip the check" | User still needs an answer | Ask user for service details |
| 合理化借口 | 错误原因 | 要求操作 |
|---|---|---|
| “进程在运行,肯定健康” | 运行≠可用 | 检查健康文件和端口 |
| “健康文件看起来没问题” | 文件可能是崩溃前的过期文件 | 验证时间戳新鲜度 |
| “直接重启就行” | 重启掩盖根本原因 | 先报告,仅在标记时重启 |
| “没有配置,跳过检查” | 用户仍需要答案 | 向用户询问服务详情 |
{
"timestamp": "ISO8601, updated every 30-60s",
"status": "healthy|degraded|error",
"connection": "connected|disconnected|reconnecting",
"last_activity": "ISO8601 of last meaningful action",
"running": true,
"uptime_seconds": 12345,
"metrics": {}
}{
"timestamp": "ISO8601格式,每30-60秒更新一次",
"status": "healthy|degraded|error",
"connection": "connected|disconnected|reconnecting",
"last_activity": "最后一次有效操作的ISO8601时间",
"running": true,
"uptime_seconds": 12345,
"metrics": {}
}