kubernetes-health

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Kubernetes Health Diagnostics

Kubernetes 集群健康诊断

Dynamic, discovery-driven health checks for any Kubernetes cluster configuration
面向任意Kubernetes集群配置的动态、基于发现机制的健康检查

BEFORE YOU START

开始之前

ImpactValue
Token Savings~70% vs manual kubectl exploration
Setup Time0 min (uses existing kubectl config)
CoverageAdapts to installed operators automatically
影响价值
Token节省率相比手动kubectl探查节省约70%
设置时间0分钟(使用现有kubectl配置)
覆盖范围自动适配已安装的Operator

Known Issues Prevented

可预防的已知问题

ProblemRoot CauseHow This Skill Helps
Missing operator healthStatic checklists miss CRDsDynamic API discovery detects all installed operators
Stale diagnosticsManual checks become outdatedReal-time cluster API interrogation
Incomplete coverageUnknown cluster configurationAutomatically activates relevant sub-agents
问题根本原因本Skill的解决方式
Operator健康状态缺失静态检查清单遗漏CRDs动态API发现可检测所有已安装的Operator
诊断结果过时手动检查易失效实时查询集群API获取数据
覆盖范围不全不了解集群配置情况自动激活相关的子Agent

Quick Start

快速开始

  1. Verify cluster access: Ensure
    kubectl
    is configured and can reach your cluster
  2. Run discovery: Execute
    discover_apis.py
    to detect installed operators
  3. Dispatch agents: Use the orchestrator to run health checks based on discovery
bash
undefined
  1. 验证集群访问权限:确保
    kubectl
    已配置完成且能连接到目标集群
  2. 执行发现操作:运行
    discover_apis.py
    检测已安装的Operator
  3. 调度Agent:使用编排器基于发现结果执行健康检查
bash
undefined

Step 1: Verify kubectl context

Step 1: Verify kubectl context

kubectl config current-context kubectl cluster-info
kubectl config current-context kubectl cluster-info

Step 2: Run API discovery

Step 2: Run API discovery

uv run .claude/skills/kubernetes-health/scripts/discover_apis.py
uv run .claude/skills/kubernetes-health/scripts/discover_apis.py

Step 3: Review detected operators and dispatch health agents

Step 3: Review detected operators and dispatch health agents

undefined
undefined

Critical Rules

核心规则

Always

必须遵守

  • Verify kubectl context before running health checks
  • Use read-only kubectl commands (get, describe, logs)
  • Run core health checks before operator-specific checks
  • Aggregate results using the provided scoring methodology
  • 执行健康检查前先验证kubectl上下文
  • 使用只读kubectl命令(get、describe、logs)
  • 先执行核心健康检查,再执行Operator专属检查
  • 使用提供的评分方法汇总结果

Never

禁止操作

  • Modify cluster resources during health checks
  • Expose secret values in health reports (metadata only)
  • Skip context verification for production clusters
  • Assume operator presence without API discovery
  • 健康检查过程中修改集群资源
  • 在健康报告中暴露敏感值(仅展示元数据)
  • 生产集群跳过上下文验证步骤
  • 未通过API发现就假设Operator已存在

Common Mistakes

常见错误

MistakeWhy It's WrongCorrect Approach
Hardcoding operator checksMisses installed operators, checks missing onesUse API discovery to detect what's installed
Sequential agent dispatchSlow for multi-operator clustersRun operator agents in parallel (same priority)
Raw kubectl outputToken inefficient, hard to parseUse scripts for condensed JSON output
错误问题所在正确做法
硬编码Operator检查逻辑遗漏已安装的Operator,检查不存在的Operator使用API发现检测已安装的组件
串行调度Agent多Operator集群下速度缓慢并行调度同优先级的Operator Agent
直接输出kubectl原始结果Token效率低,难以解析使用脚本生成精简的JSON输出

Bundled Resources

配套资源

Scripts

脚本

ScriptPurpose
scripts/discover_apis.py
Discovers all API groups and detects installed operators
scripts/health_orchestrator.py
Maps discovered APIs to specialized health agents
scripts/aggregate_report.py
Aggregates multi-agent results into unified report
脚本用途
scripts/discover_apis.py
发现所有API组并检测已安装的Operator
scripts/health_orchestrator.py
将发现的API映射到专属健康检查Agent
scripts/aggregate_report.py
汇总多Agent的检查结果生成统一报告

References

参考资料

FileContents
references/operator-checks.md
Detailed health checks for each supported operator
references/health-scoring.md
Scoring methodology and weight assignments
文件内容
references/operator-checks.md
各支持Operator的详细健康检查项
references/health-scoring.md
评分方法及权重分配规则

Templates

模板

FilePurpose
templates/health-report.json
JSON schema for health report output
文件用途
templates/health-report.json
健康报告输出的JSON schema

Dependencies

依赖项

Required

必需依赖

PackageVersionPurpose
kubectlLatestCluster interaction
Python>= 3.11Script execution
uvLatestPython script runner
版本用途
kubectl最新版集群交互工具
Python>= 3.11脚本执行环境
uv最新版Python脚本运行器

Optional

可选依赖

PackageVersionPurpose
kubernetes>= 28.1.0Python client (for advanced discovery)
版本用途
kubernetes>= 28.1.0Python客户端(用于高级发现功能)

Supported Operators

支持的Operator

The skill automatically detects and dispatches specialized agents for:
OperatorAPI GroupAgent
Core K8s(always)k8s-core-health-agent
Crossplanecrossplane.iok8s-crossplane-health-agent
ArgoCDargoproj.iok8s-argocd-health-agent
Cert-Managercert-manager.iok8s-certmanager-health-agent
Prometheusmonitoring.coreos.comk8s-prometheus-health-agent
本Skill可自动检测并调度专属Agent处理以下Operator:
OperatorAPI组Agent
Core K8s(默认包含)k8s-core-health-agent
Crossplanecrossplane.iok8s-crossplane-health-agent
ArgoCDargoproj.iok8s-argocd-health-agent
Cert-Managercert-manager.iok8s-certmanager-health-agent
Prometheusmonitoring.coreos.comk8s-prometheus-health-agent

Health Scoring

健康评分

StatusScore RangeCriteria
HEALTHY90-100All checks pass, no warnings
DEGRADED60-89Some warnings, no critical issues
CRITICAL0-59Critical issues affecting availability
状态分数范围判定标准
健康90-100所有检查通过,无警告
退化60-89存在部分警告,无严重问题
严重0-59存在影响可用性的严重问题

Troubleshooting

故障排查

kubectl connection issues

kubectl连接问题

bash
undefined
bash
undefined

Verify context

Verify context

kubectl config current-context
kubectl config current-context

Test connectivity

Test connectivity

kubectl cluster-info
kubectl cluster-info

Check permissions

Check permissions

kubectl auth can-i get pods --all-namespaces
undefined
kubectl auth can-i get pods --all-namespaces
undefined

Discovery returns empty results

发现结果为空

  • Ensure cluster is reachable
  • Check RBAC permissions for API discovery
  • Verify kubectl version compatibility
  • 确保集群可访问
  • 检查API发现的RBAC权限
  • 验证kubectl版本兼容性

Agent dispatch failures

Agent调度失败

  • Confirm discovered API group matches agent trigger
  • Check agent file exists in
    .claude/agents/specialized/kubernetes/
  • Review agent tool restrictions
  • 确认发现的API组与Agent触发条件匹配
  • 检查Agent文件是否存在于
    .claude/agents/specialized/kubernetes/
    目录
  • 查看Agent工具的限制规则

Setup Checklist

设置检查清单

  • kubectl configured and connected to cluster
  • Python 3.11+ installed
  • uv installed for script execution
  • Read permissions on cluster resources
  • Agent files present in
    .claude/agents/specialized/kubernetes/
  • kubectl已配置并连接到集群
  • 已安装Python 3.11+版本
  • 已安装uv用于脚本执行
  • 拥有集群资源的只读权限
  • Agent文件已存在于
    .claude/agents/specialized/kubernetes/
    目录