kubernetes-health
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseKubernetes Health Diagnostics
Kubernetes 集群健康诊断
Dynamic, discovery-driven health checks for any Kubernetes cluster configuration
面向任意Kubernetes集群配置的动态、基于发现机制的健康检查
BEFORE YOU START
开始之前
| Impact | Value |
|---|---|
| Token Savings | ~70% vs manual kubectl exploration |
| Setup Time | 0 min (uses existing kubectl config) |
| Coverage | Adapts to installed operators automatically |
| 影响 | 价值 |
|---|---|
| Token节省率 | 相比手动kubectl探查节省约70% |
| 设置时间 | 0分钟(使用现有kubectl配置) |
| 覆盖范围 | 自动适配已安装的Operator |
Known Issues Prevented
可预防的已知问题
| Problem | Root Cause | How This Skill Helps |
|---|---|---|
| Missing operator health | Static checklists miss CRDs | Dynamic API discovery detects all installed operators |
| Stale diagnostics | Manual checks become outdated | Real-time cluster API interrogation |
| Incomplete coverage | Unknown cluster configuration | Automatically activates relevant sub-agents |
| 问题 | 根本原因 | 本Skill的解决方式 |
|---|---|---|
| Operator健康状态缺失 | 静态检查清单遗漏CRDs | 动态API发现可检测所有已安装的Operator |
| 诊断结果过时 | 手动检查易失效 | 实时查询集群API获取数据 |
| 覆盖范围不全 | 不了解集群配置情况 | 自动激活相关的子Agent |
Quick Start
快速开始
- Verify cluster access: Ensure is configured and can reach your cluster
kubectl - Run discovery: Execute to detect installed operators
discover_apis.py - Dispatch agents: Use the orchestrator to run health checks based on discovery
bash
undefined- 验证集群访问权限:确保已配置完成且能连接到目标集群
kubectl - 执行发现操作:运行检测已安装的Operator
discover_apis.py - 调度Agent:使用编排器基于发现结果执行健康检查
bash
undefinedStep 1: Verify kubectl context
Step 1: Verify kubectl context
kubectl config current-context
kubectl cluster-info
kubectl config current-context
kubectl cluster-info
Step 2: Run API discovery
Step 2: Run API discovery
uv run .claude/skills/kubernetes-health/scripts/discover_apis.py
uv run .claude/skills/kubernetes-health/scripts/discover_apis.py
Step 3: Review detected operators and dispatch health agents
Step 3: Review detected operators and dispatch health agents
undefinedundefinedCritical Rules
核心规则
Always
必须遵守
- Verify kubectl context before running health checks
- Use read-only kubectl commands (get, describe, logs)
- Run core health checks before operator-specific checks
- Aggregate results using the provided scoring methodology
- 执行健康检查前先验证kubectl上下文
- 使用只读kubectl命令(get、describe、logs)
- 先执行核心健康检查,再执行Operator专属检查
- 使用提供的评分方法汇总结果
Never
禁止操作
- Modify cluster resources during health checks
- Expose secret values in health reports (metadata only)
- Skip context verification for production clusters
- Assume operator presence without API discovery
- 健康检查过程中修改集群资源
- 在健康报告中暴露敏感值(仅展示元数据)
- 生产集群跳过上下文验证步骤
- 未通过API发现就假设Operator已存在
Common Mistakes
常见错误
| Mistake | Why It's Wrong | Correct Approach |
|---|---|---|
| Hardcoding operator checks | Misses installed operators, checks missing ones | Use API discovery to detect what's installed |
| Sequential agent dispatch | Slow for multi-operator clusters | Run operator agents in parallel (same priority) |
| Raw kubectl output | Token inefficient, hard to parse | Use scripts for condensed JSON output |
| 错误 | 问题所在 | 正确做法 |
|---|---|---|
| 硬编码Operator检查逻辑 | 遗漏已安装的Operator,检查不存在的Operator | 使用API发现检测已安装的组件 |
| 串行调度Agent | 多Operator集群下速度缓慢 | 并行调度同优先级的Operator Agent |
| 直接输出kubectl原始结果 | Token效率低,难以解析 | 使用脚本生成精简的JSON输出 |
Bundled Resources
配套资源
Scripts
脚本
| Script | Purpose |
|---|---|
| Discovers all API groups and detects installed operators |
| Maps discovered APIs to specialized health agents |
| Aggregates multi-agent results into unified report |
| 脚本 | 用途 |
|---|---|
| 发现所有API组并检测已安装的Operator |
| 将发现的API映射到专属健康检查Agent |
| 汇总多Agent的检查结果生成统一报告 |
References
参考资料
| File | Contents |
|---|---|
| Detailed health checks for each supported operator |
| Scoring methodology and weight assignments |
| 文件 | 内容 |
|---|---|
| 各支持Operator的详细健康检查项 |
| 评分方法及权重分配规则 |
Templates
模板
| File | Purpose |
|---|---|
| JSON schema for health report output |
| 文件 | 用途 |
|---|---|
| 健康报告输出的JSON schema |
Dependencies
依赖项
Required
必需依赖
| Package | Version | Purpose |
|---|---|---|
| kubectl | Latest | Cluster interaction |
| Python | >= 3.11 | Script execution |
| uv | Latest | Python script runner |
| 包 | 版本 | 用途 |
|---|---|---|
| kubectl | 最新版 | 集群交互工具 |
| Python | >= 3.11 | 脚本执行环境 |
| uv | 最新版 | Python脚本运行器 |
Optional
可选依赖
| Package | Version | Purpose |
|---|---|---|
| kubernetes | >= 28.1.0 | Python client (for advanced discovery) |
| 包 | 版本 | 用途 |
|---|---|---|
| kubernetes | >= 28.1.0 | Python客户端(用于高级发现功能) |
Supported Operators
支持的Operator
The skill automatically detects and dispatches specialized agents for:
| Operator | API Group | Agent |
|---|---|---|
| Core K8s | (always) | k8s-core-health-agent |
| Crossplane | crossplane.io | k8s-crossplane-health-agent |
| ArgoCD | argoproj.io | k8s-argocd-health-agent |
| Cert-Manager | cert-manager.io | k8s-certmanager-health-agent |
| Prometheus | monitoring.coreos.com | k8s-prometheus-health-agent |
本Skill可自动检测并调度专属Agent处理以下Operator:
| Operator | API组 | Agent |
|---|---|---|
| Core K8s | (默认包含) | k8s-core-health-agent |
| Crossplane | crossplane.io | k8s-crossplane-health-agent |
| ArgoCD | argoproj.io | k8s-argocd-health-agent |
| Cert-Manager | cert-manager.io | k8s-certmanager-health-agent |
| Prometheus | monitoring.coreos.com | k8s-prometheus-health-agent |
Health Scoring
健康评分
| Status | Score Range | Criteria |
|---|---|---|
| HEALTHY | 90-100 | All checks pass, no warnings |
| DEGRADED | 60-89 | Some warnings, no critical issues |
| CRITICAL | 0-59 | Critical issues affecting availability |
| 状态 | 分数范围 | 判定标准 |
|---|---|---|
| 健康 | 90-100 | 所有检查通过,无警告 |
| 退化 | 60-89 | 存在部分警告,无严重问题 |
| 严重 | 0-59 | 存在影响可用性的严重问题 |
Troubleshooting
故障排查
kubectl connection issues
kubectl连接问题
bash
undefinedbash
undefinedVerify context
Verify context
kubectl config current-context
kubectl config current-context
Test connectivity
Test connectivity
kubectl cluster-info
kubectl cluster-info
Check permissions
Check permissions
kubectl auth can-i get pods --all-namespaces
undefinedkubectl auth can-i get pods --all-namespaces
undefinedDiscovery returns empty results
发现结果为空
- Ensure cluster is reachable
- Check RBAC permissions for API discovery
- Verify kubectl version compatibility
- 确保集群可访问
- 检查API发现的RBAC权限
- 验证kubectl版本兼容性
Agent dispatch failures
Agent调度失败
- Confirm discovered API group matches agent trigger
- Check agent file exists in
.claude/agents/specialized/kubernetes/ - Review agent tool restrictions
- 确认发现的API组与Agent触发条件匹配
- 检查Agent文件是否存在于目录
.claude/agents/specialized/kubernetes/ - 查看Agent工具的限制规则
Setup Checklist
设置检查清单
- kubectl configured and connected to cluster
- Python 3.11+ installed
- uv installed for script execution
- Read permissions on cluster resources
- Agent files present in
.claude/agents/specialized/kubernetes/
- kubectl已配置并连接到集群
- 已安装Python 3.11+版本
- 已安装uv用于脚本执行
- 拥有集群资源的只读权限
- Agent文件已存在于目录
.claude/agents/specialized/kubernetes/