kubernetes-health

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Kubernetes Health Diagnostics

Kubernetes 集群健康诊断

Dynamic, discovery-driven health checks for any Kubernetes cluster configuration

面向任意Kubernetes集群配置的动态、基于发现机制的健康检查

BEFORE YOU START

开始之前

Impact	Value
Token Savings	~70% vs manual kubectl exploration
Setup Time	0 min (uses existing kubectl config)
Coverage	Adapts to installed operators automatically

影响	价值
Token节省率	相比手动kubectl探查节省约70%
设置时间	0分钟（使用现有kubectl配置）
覆盖范围	自动适配已安装的Operator

Known Issues Prevented

可预防的已知问题

Problem	Root Cause	How This Skill Helps
Missing operator health	Static checklists miss CRDs	Dynamic API discovery detects all installed operators
Stale diagnostics	Manual checks become outdated	Real-time cluster API interrogation
Incomplete coverage	Unknown cluster configuration	Automatically activates relevant sub-agents

问题	根本原因	本Skill的解决方式
Operator健康状态缺失	静态检查清单遗漏CRDs	动态API发现可检测所有已安装的Operator
诊断结果过时	手动检查易失效	实时查询集群API获取数据
覆盖范围不全	不了解集群配置情况	自动激活相关的子Agent

Quick Start

快速开始

Verify cluster access: Ensure
```
kubectl
```
is configured and can reach your cluster
Run discovery: Execute
```
discover_apis.py
```
to detect installed operators
Dispatch agents: Use the orchestrator to run health checks based on discovery

bash

undefined

验证集群访问权限：确保
```
kubectl
```
已配置完成且能连接到目标集群
执行发现操作：运行
```
discover_apis.py
```
检测已安装的Operator
调度Agent：使用编排器基于发现结果执行健康检查

bash

undefined

Step 1: Verify kubectl context

kubectl config current-context kubectl cluster-info

Step 2: Run API discovery

uv run .claude/skills/kubernetes-health/scripts/discover_apis.py

Step 3: Review detected operators and dispatch health agents

undefined

undefined

Critical Rules

核心规则

Always

必须遵守

Verify kubectl context before running health checks
Use read-only kubectl commands (get, describe, logs)
Run core health checks before operator-specific checks
Aggregate results using the provided scoring methodology

执行健康检查前先验证kubectl上下文
使用只读kubectl命令（get、describe、logs）
先执行核心健康检查，再执行Operator专属检查
使用提供的评分方法汇总结果

Never

禁止操作

Modify cluster resources during health checks
Expose secret values in health reports (metadata only)
Skip context verification for production clusters
Assume operator presence without API discovery

健康检查过程中修改集群资源
在健康报告中暴露敏感值（仅展示元数据）
生产集群跳过上下文验证步骤
未通过API发现就假设Operator已存在

Common Mistakes

常见错误

Mistake	Why It's Wrong	Correct Approach
Hardcoding operator checks	Misses installed operators, checks missing ones	Use API discovery to detect what's installed
Sequential agent dispatch	Slow for multi-operator clusters	Run operator agents in parallel (same priority)
Raw kubectl output	Token inefficient, hard to parse	Use scripts for condensed JSON output

错误	问题所在	正确做法
硬编码Operator检查逻辑	遗漏已安装的Operator，检查不存在的Operator	使用API发现检测已安装的组件
串行调度Agent	多Operator集群下速度缓慢	并行调度同优先级的Operator Agent
直接输出kubectl原始结果	Token效率低，难以解析	使用脚本生成精简的JSON输出

Bundled Resources

配套资源

Scripts

脚本

Script	Purpose
`scripts/discover_apis.py`	Discovers all API groups and detects installed operators
`scripts/health_orchestrator.py`	Maps discovered APIs to specialized health agents
`scripts/aggregate_report.py`	Aggregates multi-agent results into unified report

脚本	用途
`scripts/discover_apis.py`	发现所有API组并检测已安装的Operator
`scripts/health_orchestrator.py`	将发现的API映射到专属健康检查Agent
`scripts/aggregate_report.py`	汇总多Agent的检查结果生成统一报告

References

参考资料

File	Contents
`references/operator-checks.md`	Detailed health checks for each supported operator
`references/health-scoring.md`	Scoring methodology and weight assignments

文件	内容
`references/operator-checks.md`	各支持Operator的详细健康检查项
`references/health-scoring.md`	评分方法及权重分配规则

Templates

模板

File	Purpose
`templates/health-report.json`	JSON schema for health report output

文件	用途
`templates/health-report.json`	健康报告输出的JSON schema

Dependencies

依赖项

Required

必需依赖

Package	Version	Purpose
kubectl	Latest	Cluster interaction
Python	>= 3.11	Script execution
uv	Latest	Python script runner

包	版本	用途
kubectl	最新版	集群交互工具
Python	>= 3.11	脚本执行环境
uv	最新版	Python脚本运行器

Optional

可选依赖

Package	Version	Purpose
kubernetes	>= 28.1.0	Python client (for advanced discovery)

包	版本	用途
kubernetes	>= 28.1.0	Python客户端（用于高级发现功能）

Supported Operators

支持的Operator

The skill automatically detects and dispatches specialized agents for:

Operator	API Group	Agent
Core K8s	(always)	k8s-core-health-agent
Crossplane	crossplane.io	k8s-crossplane-health-agent
ArgoCD	argoproj.io	k8s-argocd-health-agent
Cert-Manager	cert-manager.io	k8s-certmanager-health-agent
Prometheus	monitoring.coreos.com	k8s-prometheus-health-agent

本Skill可自动检测并调度专属Agent处理以下Operator：

Operator	API组	Agent
Core K8s	（默认包含）	k8s-core-health-agent
Crossplane	crossplane.io	k8s-crossplane-health-agent
ArgoCD	argoproj.io	k8s-argocd-health-agent
Cert-Manager	cert-manager.io	k8s-certmanager-health-agent
Prometheus	monitoring.coreos.com	k8s-prometheus-health-agent

Health Scoring

健康评分

Status	Score Range	Criteria
HEALTHY	90-100	All checks pass, no warnings
DEGRADED	60-89	Some warnings, no critical issues
CRITICAL	0-59	Critical issues affecting availability

状态	分数范围	判定标准
健康	90-100	所有检查通过，无警告
退化	60-89	存在部分警告，无严重问题
严重	0-59	存在影响可用性的严重问题

Troubleshooting

故障排查

kubectl connection issues

kubectl连接问题

bash

undefined

bash

undefined

Verify context

kubectl config current-context

Test connectivity

kubectl cluster-info

Check permissions

kubectl auth can-i get pods --all-namespaces

undefined

kubectl auth can-i get pods --all-namespaces

undefined

Discovery returns empty results

发现结果为空

Ensure cluster is reachable
Check RBAC permissions for API discovery
Verify kubectl version compatibility

确保集群可访问
检查API发现的RBAC权限
验证kubectl版本兼容性

Agent dispatch failures

Agent调度失败

Confirm discovered API group matches agent trigger
Check agent file exists in
```
.claude/agents/specialized/kubernetes/
```
Review agent tool restrictions

确认发现的API组与Agent触发条件匹配
检查Agent文件是否存在于
```
.claude/agents/specialized/kubernetes/
```
目录
查看Agent工具的限制规则

Setup Checklist

设置检查清单

kubectl configured and connected to cluster
Python 3.11+ installed
uv installed for script execution
Read permissions on cluster resources
Agent files present in
```
.claude/agents/specialized/kubernetes/
```

kubectl已配置并连接到集群
已安装Python 3.11+版本
已安装uv用于脚本执行
拥有集群资源的只读权限
Agent文件已存在于
```
.claude/agents/specialized/kubernetes/
```
目录