kubernetes-operations

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Kubernetes Operations

Kubernetes操作

Deployment Manifest

部署清单

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  labels:
    app: api-server
    version: v1
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
        version: v1
    spec:
      containers:
        - name: api
          image: registry.example.com/api:1.2.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 15
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: url
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: api-server
Always set resource requests and limits. Use topology spread constraints for high availability.
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  labels:
    app: api-server
    version: v1
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
        version: v1
    spec:
      containers:
        - name: api
          image: registry.example.com/api:1.2.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 15
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: url
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: api-server
始终设置资源请求与限制。使用拓扑分布约束保障高可用性。

Helm Chart Structure

Helm Chart结构

chart/
  Chart.yaml
  values.yaml
  values-staging.yaml
  values-production.yaml
  templates/
    deployment.yaml
    service.yaml
    ingress.yaml
    hpa.yaml
    _helpers.tpl
yaml
undefined
chart/
  Chart.yaml
  values.yaml
  values-staging.yaml
  values-production.yaml
  templates/
    deployment.yaml
    service.yaml
    ingress.yaml
    hpa.yaml
    _helpers.tpl
yaml
undefined

values.yaml

values.yaml

replicaCount: 2 image: repository: registry.example.com/api tag: "1.2.0" pullPolicy: IfNotPresent resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 512Mi autoscaling: enabled: true minReplicas: 2 maxReplicas: 10 targetCPUUtilization: 70
undefined
replicaCount: 2 image: repository: registry.example.com/api tag: "1.2.0" pullPolicy: IfNotPresent resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 512Mi autoscaling: enabled: true minReplicas: 2 maxReplicas: 10 targetCPUUtilization: 70
undefined

HorizontalPodAutoscaler

HorizontalPodAutoscaler

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300

Troubleshooting Commands

故障排查命令

bash
undefined
bash
undefined

Pod diagnostics

Pod诊断

kubectl describe pod <pod-name> -n <namespace> kubectl logs <pod-name> -c <container> --previous kubectl exec -it <pod-name> -- /bin/sh
kubectl describe pod <pod-name> -n <namespace> kubectl logs <pod-name> -c <container> --previous kubectl exec -it <pod-name> -- /bin/sh

Resource usage

资源使用情况

kubectl top pods -n <namespace> --sort-by=memory kubectl top nodes
kubectl top pods -n <namespace> --sort-by=memory kubectl top nodes

Network debugging

网络调试

kubectl run debug --image=nicolaka/netshoot --rm -it -- bash nslookup <service-name>.<namespace>.svc.cluster.local
kubectl run debug --image=nicolaka/netshoot --rm -it -- bash nslookup <service-name>.<namespace>.svc.cluster.local

Events sorted by time

按时间排序的事件

kubectl get events -n <namespace> --sort-by='.lastTimestamp'
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

Find pods not running

查找未运行的Pod

kubectl get pods -A --field-selector=status.phase!=Running
undefined
kubectl get pods -A --field-selector=status.phase!=Running
undefined

Anti-Patterns

反模式

  • Running containers as root without
    securityContext.runAsNonRoot: true
  • Missing resource requests/limits (causes scheduling issues and noisy neighbors)
  • Using
    latest
    tag instead of pinned image versions
  • Not setting
    PodDisruptionBudget
    for critical workloads
  • Storing secrets in ConfigMaps instead of Secrets (or external secret managers)
  • Ignoring pod anti-affinity for replicated deployments
  • 在未设置
    securityContext.runAsNonRoot: true
    的情况下以root用户运行容器
  • 缺失资源请求/限制(会导致调度问题与“嘈杂邻居”现象)
  • 使用
    latest
    标签而非固定镜像版本
  • 不为关键工作负载设置
    PodDisruptionBudget
  • 将密钥存储在ConfigMap而非Secrets(或外部密钥管理器)中
  • 忽略副本部署的Pod反亲和性

Checklist

检查清单

  • All containers have resource requests and limits
  • Liveness and readiness probes configured
  • Images use specific version tags, not
    latest
  • Secrets stored in Kubernetes Secrets or external vault
  • PodDisruptionBudget set for production workloads
  • NetworkPolicies restrict traffic between namespaces
  • Topology spread constraints or anti-affinity for HA
  • Helm values split per environment (staging, production)
  • 所有容器均配置了资源请求与限制
  • 已配置存活探针与就绪探针
  • 镜像使用特定版本标签,而非
    latest
  • 密钥存储在Kubernetes Secrets或外部Vault中
  • 为生产工作负载设置PodDisruptionBudget
  • 配置NetworkPolicy限制命名空间间流量
  • 配置拓扑分布约束或反亲和性以保障高可用
  • Helm值按环境(预发布、生产)拆分