Loading...
Loading...
Kubernetes debugging methodology and scripts. Use for pod crashes, CrashLoopBackOff, OOMKilled, deployment issues, resource problems, or container failures.
npx skill4agent add incidentfox/incidentfox kubernetes-debug.claude/skills/infrastructure-kubernetes/scripts/python .claude/skills/infrastructure-kubernetes/scripts/list_pods.py -n <namespace> [--label <selector>]
# Examples:
python .claude/skills/infrastructure-kubernetes/scripts/list_pods.py -n otel-demo
python .claude/skills/infrastructure-kubernetes/scripts/list_pods.py -n otel-demo --label app.kubernetes.io/name=paymentpython .claude/skills/infrastructure-kubernetes/scripts/get_events.py <pod-name> -n <namespace>
# Example:
python .claude/skills/infrastructure-kubernetes/scripts/get_events.py payment-7f8b9c6d5-x2k4m -n otel-demopython .claude/skills/infrastructure-kubernetes/scripts/get_logs.py <pod-name> -n <namespace> [--tail N] [--container NAME]
# Examples:
python .claude/skills/infrastructure-kubernetes/scripts/get_logs.py payment-7f8b9c6d5-x2k4m -n otel-demo --tail 100
python .claude/skills/infrastructure-kubernetes/scripts/get_logs.py payment-7f8b9c6d5-x2k4m -n otel-demo --container paymentpython .claude/skills/infrastructure-kubernetes/scripts/describe_pod.py <pod-name> -n <namespace>python .claude/skills/infrastructure-kubernetes/scripts/get_resources.py <pod-name> -n <namespace>python .claude/skills/infrastructure-kubernetes/scripts/describe_deployment.py <deployment-name> -n <namespace>python .claude/skills/infrastructure-kubernetes/scripts/get_history.py <deployment-name> -n <namespace>list_pods.pyget_events.pydescribe_pod.pyget_logs.pyget_events.pyget_resources.pyget_logs.pydescribe_pod.pydescribe_deployment.pylist_pods.pyget_events.pyget_history.py| Event Reason | Meaning | Action |
|---|---|---|
| OOMKilled | Container exceeded memory limit | Increase limits or fix memory leak |
| ImagePullBackOff | Can't pull image | Check image name, registry auth |
| CrashLoopBackOff | Container keeps crashing | Check logs for startup errors |
| FailedScheduling | No node can run pod | Check node resources, taints |
| Unhealthy | Liveness probe failed | Check probe config, app health |
## Kubernetes Analysis
**Pod**: <name>
**Namespace**: <namespace>
**Status**: <phase> (Restarts: N)
### Events
- [timestamp] <reason>: <message>
### Issues Found
1. [Issue description with evidence]
### Root Cause Hypothesis
[Based on events and logs]
### Recommended Action
[Specific remediation step]