kubernetes
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseKubernetes
Kubernetes
Pod Failure Troubleshooting
Pod故障排查
| Status | Common Causes | Debug Steps |
|---|---|---|
| CrashLoopBackOff | App crash, bad entrypoint, missing deps | |
| ImagePullBackOff | Wrong image/tag, no auth, registry down | Check image name, |
| Pending | No resources, node selector mismatch, PVC pending | |
| OOMKilled | Memory limit exceeded | Increase |
| Evicted | Node disk/memory pressure | Check node conditions, clean up |
| CreateContainerError | Bad securityContext, missing configmap/secret | |
| 状态 | 常见原因 | 调试步骤 |
|---|---|---|
| CrashLoopBackOff | 应用崩溃、入口配置错误、依赖缺失 | |
| ImagePullBackOff | 镜像/标签错误、无认证权限、镜像仓库不可用 | 检查镜像名称,执行 |
| Pending | 资源不足、节点选择器不匹配、PVC处于Pending状态 | |
| OOMKilled | 超过内存限制 | 增大 |
| Evicted | 节点磁盘/内存资源紧张 | 检查节点状态,清理资源 |
| CreateContainerError | 安全上下文配置错误、缺失ConfigMap/Secret | 执行 |
Resource Configuration Gotchas
资源配置注意事项
Requests vs Limits
Requests与Limits对比
- Requests: Scheduling guarantee. Pod won't schedule if node lacks capacity.
- Limits: Hard ceiling. Container killed (OOM) or throttled (CPU) if exceeded.
- No limits = unbounded (can consume entire node)
- >
requestsis invalidlimits
- Requests:调度保障。若节点容量不足,Pod将无法调度。
- Limits:硬性上限。若超出限制,容器会被终止(内存OOM)或被限流(CPU)。
- 未设置limits = 资源无限制(可能占用整个节点资源)
- 大于
requests的配置无效limits
Probe Timing
探针时序配置
yaml
livenessProbe:
initialDelaySeconds: 10 # Wait before first check
periodSeconds: 5 # Check interval
timeoutSeconds: 1 # Max wait for response
failureThreshold: 3 # Failures before action- Liveness failure → container restart
- Readiness failure → removed from service endpoints
- StartupProbe disables other probes until success (use for slow-starting apps)
yaml
livenessProbe:
initialDelaySeconds: 10 # 首次检查前的等待时间
periodSeconds: 5 # 检查间隔
timeoutSeconds: 1 # 响应最大等待时间
failureThreshold: 3 # 触发动作前的失败次数- 存活探针(Liveness)失败 → 容器重启
- 就绪探针(Readiness)失败 → 容器从服务端点中移除
- 启动探针(StartupProbe)在成功前会禁用其他探针(适用于启动缓慢的应用)
Security Context Inheritance
安全上下文继承
Pod-level applies to all containers but container-level overrides it:
securityContextyaml
spec:
securityContext:
runAsNonRoot: true # Pod default
containers:
- securityContext:
runAsUser: 1000 # Container overridePod级别的会应用于所有容器,但容器级别的配置会覆盖Pod级配置:
securityContextyaml
spec:
securityContext:
runAsNonRoot: true # Pod默认配置
containers:
- securityContext:
runAsUser: 1000 # 容器级覆盖配置RBAC Patterns
RBAC配置模式
Minimal Role for Pod Logs
查看Pod日志的最小权限Role
yaml
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list"]yaml
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list"]Common API Groups
常见API组
- (empty): Core resources (pods, services, configmaps)
"" - : Deployments, StatefulSets, DaemonSets
apps - : Ingress, NetworkPolicy
networking.k8s.io - : Roles, bindings
rbac.authorization.k8s.io
- (空值):核心资源(pods、services、configmaps)
"" - :Deployments、StatefulSets、DaemonSets
apps - :Ingress、NetworkPolicy
networking.k8s.io - :Roles、bindings
rbac.authorization.k8s.io
NetworkPolicy Gotchas
NetworkPolicy注意事项
- No NetworkPolicy = all traffic allowed
- Any NetworkPolicy selecting a pod = default deny for that direction
- Empty selects all pods in namespace
podSelector: {} - selects all namespaces
namespaceSelector: {} - Combine selectors with (OR) vs nested (AND)
-
yaml
ingress:
- from:
- podSelector: {matchLabels: {app: frontend}} # AND
namespaceSelector: {matchLabels: {env: prod}}
- from: # OR (separate rule)
- podSelector: {matchLabels: {app: monitoring}}- 未配置NetworkPolicy = 允许所有流量
- 只要有NetworkPolicy选择某个Pod,该方向的流量默认被拒绝
- 空会选择命名空间下的所有Pod
podSelector: {} - 会选择所有命名空间
namespaceSelector: {} - 使用分隔规则表示OR逻辑,嵌套配置表示AND逻辑
-
yaml
ingress:
- from:
- podSelector: {matchLabels: {app: frontend}} # AND逻辑
namespaceSelector: {matchLabels: {env: prod}}
- from: # OR逻辑(独立规则)
- podSelector: {matchLabels: {app: monitoring}}