kubernetes

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Kubernetes

Kubernetes

Pod Failure Troubleshooting

Pod故障排查

StatusCommon CausesDebug Steps
CrashLoopBackOffApp crash, bad entrypoint, missing deps
kubectl logs <pod> --previous
ImagePullBackOffWrong image/tag, no auth, registry downCheck image name,
kubectl get events
PendingNo resources, node selector mismatch, PVC pending
kubectl describe pod
, check node capacity
OOMKilledMemory limit exceededIncrease
limits.memory
or fix leak
EvictedNode disk/memory pressureCheck node conditions, clean up
CreateContainerErrorBad securityContext, missing configmap/secret
kubectl describe pod
for specific error
状态常见原因调试步骤
CrashLoopBackOff应用崩溃、入口配置错误、依赖缺失
kubectl logs <pod> --previous
ImagePullBackOff镜像/标签错误、无认证权限、镜像仓库不可用检查镜像名称,执行
kubectl get events
Pending资源不足、节点选择器不匹配、PVC处于Pending状态
kubectl describe pod
,检查节点容量
OOMKilled超过内存限制增大
limits.memory
或修复内存泄漏
Evicted节点磁盘/内存资源紧张检查节点状态,清理资源
CreateContainerError安全上下文配置错误、缺失ConfigMap/Secret执行
kubectl describe pod
查看具体错误

Resource Configuration Gotchas

资源配置注意事项

Requests vs Limits

Requests与Limits对比

  • Requests: Scheduling guarantee. Pod won't schedule if node lacks capacity.
  • Limits: Hard ceiling. Container killed (OOM) or throttled (CPU) if exceeded.
  • No limits = unbounded (can consume entire node)
  • requests
    >
    limits
    is invalid
  • Requests:调度保障。若节点容量不足,Pod将无法调度。
  • Limits:硬性上限。若超出限制,容器会被终止(内存OOM)或被限流(CPU)。
  • 未设置limits = 资源无限制(可能占用整个节点资源)
  • requests
    大于
    limits
    的配置无效

Probe Timing

探针时序配置

yaml
livenessProbe:
  initialDelaySeconds: 10  # Wait before first check
  periodSeconds: 5         # Check interval
  timeoutSeconds: 1        # Max wait for response
  failureThreshold: 3      # Failures before action
  • Liveness failure → container restart
  • Readiness failure → removed from service endpoints
  • StartupProbe disables other probes until success (use for slow-starting apps)
yaml
livenessProbe:
  initialDelaySeconds: 10  # 首次检查前的等待时间
  periodSeconds: 5         # 检查间隔
  timeoutSeconds: 1        # 响应最大等待时间
  failureThreshold: 3      # 触发动作前的失败次数
  • 存活探针(Liveness)失败 → 容器重启
  • 就绪探针(Readiness)失败 → 容器从服务端点中移除
  • 启动探针(StartupProbe)在成功前会禁用其他探针(适用于启动缓慢的应用)

Security Context Inheritance

安全上下文继承

Pod-level
securityContext
applies to all containers but container-level overrides it:
yaml
spec:
  securityContext:
    runAsNonRoot: true      # Pod default
  containers:
    - securityContext:
        runAsUser: 1000     # Container override
Pod级别的
securityContext
会应用于所有容器,但容器级别的配置会覆盖Pod级配置:
yaml
spec:
  securityContext:
    runAsNonRoot: true      # Pod默认配置
  containers:
    - securityContext:
        runAsUser: 1000     # 容器级覆盖配置

RBAC Patterns

RBAC配置模式

Minimal Role for Pod Logs

查看Pod日志的最小权限Role

yaml
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/log"]
    verbs: ["get", "list"]
yaml
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/log"]
    verbs: ["get", "list"]

Common API Groups

常见API组

  • ""
    (empty): Core resources (pods, services, configmaps)
  • apps
    : Deployments, StatefulSets, DaemonSets
  • networking.k8s.io
    : Ingress, NetworkPolicy
  • rbac.authorization.k8s.io
    : Roles, bindings
  • ""
    (空值):核心资源(pods、services、configmaps)
  • apps
    :Deployments、StatefulSets、DaemonSets
  • networking.k8s.io
    :Ingress、NetworkPolicy
  • rbac.authorization.k8s.io
    :Roles、bindings

NetworkPolicy Gotchas

NetworkPolicy注意事项

  • No NetworkPolicy = all traffic allowed
  • Any NetworkPolicy selecting a pod = default deny for that direction
  • Empty
    podSelector: {}
    selects all pods in namespace
  • namespaceSelector: {}
    selects all namespaces
  • Combine selectors with
    - 
    (OR) vs nested (AND)
yaml
ingress:
  - from:
      - podSelector: {matchLabels: {app: frontend}}  # AND
        namespaceSelector: {matchLabels: {env: prod}}
  - from:  # OR (separate rule)
      - podSelector: {matchLabels: {app: monitoring}}
  • 未配置NetworkPolicy = 允许所有流量
  • 只要有NetworkPolicy选择某个Pod,该方向的流量默认被拒绝
  • podSelector: {}
    会选择命名空间下的所有Pod
  • namespaceSelector: {}
    会选择所有命名空间
  • 使用
    - 
    分隔规则表示OR逻辑,嵌套配置表示AND逻辑
yaml
ingress:
  - from:
      - podSelector: {matchLabels: {app: frontend}}  # AND逻辑
        namespaceSelector: {matchLabels: {env: prod}}
  - from:  # OR逻辑(独立规则)
      - podSelector: {matchLabels: {app: monitoring}}