Loading...
Loading...
Respond to Kubernetes incidents with runbooks and diagnostics. Use for outages, pod failures, node issues, network problems, and emergency response.
npx skill4agent add rohitg00/kubectl-mcp-server k8s-incident| Priority | Rule | Impact | Tools |
|---|---|---|---|
| 1 | Check control plane first | CRITICAL | |
| 2 | Assess node health | CRITICAL | |
| 3 | Gather events before changes | HIGH | |
| 4 | Document timeline | HIGH | Manual notes |
| 5 | Rollback if safe | MEDIUM | |
| Incident | First Tool | Next Steps |
|---|---|---|
| Pod failure | | |
| Node down | | Check kubelet logs |
| Service unreachable | | |
| Control plane | | Check API server logs |
get_nodes()
get_pods(namespace="kube-system")
get_events(namespace)| Indicator | Severity | Action |
|---|---|---|
| Multiple nodes NotReady | Critical | Escalate immediately |
| kube-system pods failing | Critical | Control plane issue |
| Single pod CrashLoop | Medium | Debug pod |
| High latency | Medium | Check resources |
get_pod_logs(name, namespace, previous=True)
describe_pod(name, namespace)
get_events(namespace, field_selector="involvedObject.name=<pod>")
get_pod_metrics(name, namespace)describe_pod(name, namespace)
get_secrets(namespace)describe_pod(name, namespace)
get_nodes()
get_events(namespace)describe_node(name)
get_events(namespace="", field_selector="involvedObject.name=<node>")
node_logs_tool(name, "kubelet")describe_node(name)
get_pods(field_selector="spec.nodeName=<node>")get_services(namespace)
get_endpoints(namespace)
get_pods(namespace, label_selector="<service-selector>")
get_network_policies(namespace)get_pods(namespace="kube-system", label_selector="k8s-app=kube-dns")
get_pod_logs("coredns-xxx", "kube-system")cilium_status_tool()
cilium_endpoints_list_tool(namespace)
hubble_flows_query_tool(namespace)istio_analyze_tool(namespace)
istio_proxy_status_tool()describe_pvc(name, namespace)
get_storage_classes()
get_events(namespace)describe_pod(name, namespace)
get_pvc(namespace)
get_events(namespace)get_pods(namespace="kube-system", label_selector="component=kube-apiserver")
get_events(namespace="kube-system")get_pods(namespace="kube-system", label_selector="component=etcd")
get_pod_logs("etcd-xxx", "kube-system")delete_pod(name, namespace, grace_period=0, force=True)rollback_deployment(name, namespace, revision=0)rollback_helm_release(name, namespace, revision=1)for context in ["prod-1", "prod-2", "staging"]:
get_nodes(context=context)
get_pods(namespace="kube-system", context=context)
get_events(namespace="kube-system", context=context)