containerization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Containerization & Kubernetes

容器化与Kubernetes

Production-grade container orchestration for data engineering workloads with Docker and Kubernetes.
使用Docker和Kubernetes为数据工程工作负载提供生产级容器编排。

Quick Start

快速开始

dockerfile
undefined
dockerfile
undefined

Dockerfile for PySpark data application

Dockerfile for PySpark data application

FROM python:3.12-slim
FROM python:3.12-slim

Install Java for Spark

Install Java for Spark

RUN apt-get update && apt-get install -y openjdk-17-jdk-headless &&
apt-get clean && rm -rf /var/lib/apt/lists/*
WORKDIR /app
RUN apt-get update && apt-get install -y openjdk-17-jdk-headless &&
apt-get clean && rm -rf /var/lib/apt/lists/*
WORKDIR /app

Install dependencies first (cache optimization)

Install dependencies first (cache optimization)

COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt
COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt

Copy application code

Copy application code

COPY src/ ./src/ COPY config/ ./config/
COPY src/ ./src/ COPY config/ ./config/

Non-root user for security

Non-root user for security

RUN useradd -m appuser && chown -R appuser:appuser /app USER appuser
ENV PYTHONPATH=/app ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
ENTRYPOINT ["python", "-m", "src.main"]
undefined
RUN useradd -m appuser && chown -R appuser:appuser /app USER appuser
ENV PYTHONPATH=/app ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
ENTRYPOINT ["python", "-m", "src.main"]
undefined

Core Concepts

核心概念

1. Multi-Stage Builds

1. 多阶段构建

dockerfile
undefined
dockerfile
undefined

Build stage

Build stage

FROM python:3.12 AS builder
WORKDIR /build COPY requirements.txt . RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt
FROM python:3.12 AS builder
WORKDIR /build COPY requirements.txt . RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt

Runtime stage

Runtime stage

FROM python:3.12-slim AS runtime
COPY --from=builder /wheels /wheels RUN pip install --no-cache-dir /wheels/* && rm -rf /wheels
COPY src/ /app/src/ WORKDIR /app
USER 1000 CMD ["python", "-m", "src.main"]
undefined
FROM python:3.12-slim AS runtime
COPY --from=builder /wheels /wheels RUN pip install --no-cache-dir /wheels/* && rm -rf /wheels
COPY src/ /app/src/ WORKDIR /app
USER 1000 CMD ["python", "-m", "src.main"]
undefined

2. Kubernetes Deployment

2. Kubernetes部署

yaml
undefined
yaml
undefined

deployment.yaml

deployment.yaml

apiVersion: apps/v1 kind: Deployment metadata: name: etl-worker labels: app: etl-worker spec: replicas: 3 selector: matchLabels: app: etl-worker template: metadata: labels: app: etl-worker spec: containers: - name: etl-worker image: company/etl-worker:v1.2.0 resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "2Gi" cpu: "2000m" env: - name: DATABASE_URL valueFrom: secretKeyRef: name: db-credentials key: url - name: LOG_LEVEL value: "INFO" livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app: etl-worker topologyKey: kubernetes.io/hostname
undefined
apiVersion: apps/v1 kind: Deployment metadata: name: etl-worker labels: app: etl-worker spec: replicas: 3 selector: matchLabels: app: etl-worker template: metadata: labels: app: etl-worker spec: containers: - name: etl-worker image: company/etl-worker:v1.2.0 resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "2Gi" cpu: "2000m" env: - name: DATABASE_URL valueFrom: secretKeyRef: name: db-credentials key: url - name: LOG_LEVEL value: "INFO" livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app: etl-worker topologyKey: kubernetes.io/hostname
undefined

3. Kubernetes CronJob for ETL

3. 用于ETL的Kubernetes CronJob

yaml
undefined
yaml
undefined

cronjob.yaml

cronjob.yaml

apiVersion: batch/v1 kind: CronJob metadata: name: daily-etl spec: schedule: "0 2 * * *" # 2 AM daily concurrencyPolicy: Forbid successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 3 jobTemplate: spec: backoffLimit: 2 activeDeadlineSeconds: 7200 # 2 hour timeout template: spec: restartPolicy: Never containers: - name: etl-job image: company/etl-pipeline:v1.0.0 resources: requests: memory: "4Gi" cpu: "2000m" limits: memory: "8Gi" cpu: "4000m" env: - name: EXECUTION_DATE value: "{{ .Date }}" volumeMounts: - name: config mountPath: /app/config readOnly: true volumes: - name: config configMap: name: etl-config
undefined
apiVersion: batch/v1 kind: CronJob metadata: name: daily-etl spec: schedule: "0 2 * * *" # 2 AM daily concurrencyPolicy: Forbid successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 3 jobTemplate: spec: backoffLimit: 2 activeDeadlineSeconds: 7200 # 2 hour timeout template: spec: restartPolicy: Never containers: - name: etl-job image: company/etl-pipeline:v1.0.0 resources: requests: memory: "4Gi" cpu: "2000m" limits: memory: "8Gi" cpu: "4000m" env: - name: EXECUTION_DATE value: "{{ .Date }}" volumeMounts: - name: config mountPath: /app/config readOnly: true volumes: - name: config configMap: name: etl-config
undefined

4. Helm Chart Structure

4. Helm Chart结构

yaml
undefined
yaml
undefined

Chart.yaml

Chart.yaml

apiVersion: v2 name: data-pipeline version: 1.0.0 appVersion: "2.0.0" description: Data pipeline Helm chart
apiVersion: v2 name: data-pipeline version: 1.0.0 appVersion: "2.0.0" description: Data pipeline Helm chart

values.yaml

values.yaml

replicaCount: 3
image: repository: company/data-pipeline tag: "latest" pullPolicy: IfNotPresent
resources: requests: memory: "1Gi" cpu: "500m" limits: memory: "4Gi" cpu: "2000m"
autoscaling: enabled: true minReplicas: 2 maxReplicas: 10 targetCPUUtilizationPercentage: 70
env: LOG_LEVEL: INFO BATCH_SIZE: "1000"
secrets:
  • name: DATABASE_URL secretName: db-credentials key: url
undefined
replicaCount: 3
image: repository: company/data-pipeline tag: "latest" pullPolicy: IfNotPresent
resources: requests: memory: "1Gi" cpu: "500m" limits: memory: "4Gi" cpu: "2000m"
autoscaling: enabled: true minReplicas: 2 maxReplicas: 10 targetCPUUtilizationPercentage: 70
env: LOG_LEVEL: INFO BATCH_SIZE: "1000"
secrets:
  • name: DATABASE_URL secretName: db-credentials key: url
undefined

5. Docker Compose for Local Dev

5. 用于本地开发的Docker Compose

yaml
undefined
yaml
undefined

docker-compose.yml

docker-compose.yml

version: '3.8'
services: postgres: image: postgres:16-alpine environment: POSTGRES_DB: datawarehouse POSTGRES_USER: admin POSTGRES_PASSWORD: ${DB_PASSWORD} ports: - "5432:5432" volumes: - postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U admin"] interval: 5s timeout: 5s retries: 5
redis: image: redis:7-alpine ports: - "6379:6379"
airflow-webserver: image: apache/airflow:2.8.0-python3.11 depends_on: postgres: condition: service_healthy redis: condition: service_started environment: AIRFLOW__CORE__EXECUTOR: CeleryExecutor AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://admin:${DB_PASSWORD}@postgres/datawarehouse AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0 ports: - "8080:8080" volumes: - ./dags:/opt/airflow/dags - ./plugins:/opt/airflow/plugins
volumes: postgres_data:
undefined
version: '3.8'
services: postgres: image: postgres:16-alpine environment: POSTGRES_DB: datawarehouse POSTGRES_USER: admin POSTGRES_PASSWORD: ${DB_PASSWORD} ports: - "5432:5432" volumes: - postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U admin"] interval: 5s timeout: 5s retries: 5
redis: image: redis:7-alpine ports: - "6379:6379"
airflow-webserver: image: apache/airflow:2.8.0-python3.11 depends_on: postgres: condition: service_healthy redis: condition: service_started environment: AIRFLOW__CORE__EXECUTOR: CeleryExecutor AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://admin:${DB_PASSWORD}@postgres/datawarehouse AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0 ports: - "8080:8080" volumes: - ./dags:/opt/airflow/dags - ./plugins:/opt/airflow/plugins
volumes: postgres_data:
undefined

Tools & Technologies

工具与技术

ToolPurposeVersion (2025)
DockerContainerization25+
KubernetesOrchestration1.29+
HelmK8s package manager3.14+
ArgoCDGitOps deployment2.10+
KustomizeK8s config managementBuilt-in
containerdContainer runtime1.7+
PodmanDocker alternative4.8+
工具用途版本(2025)
Docker容器化25+
Kubernetes编排1.29+
HelmK8s包管理器3.14+
ArgoCDGitOps部署2.10+
KustomizeK8s配置管理内置
containerd容器运行时1.7+
PodmanDocker替代方案4.8+

Troubleshooting Guide

故障排除指南

IssueSymptomsRoot CauseFix
OOMKilledPod restarts, exit code 137Memory limit exceededIncrease limits, optimize code
CrashLoopBackOffPod keeps restartingApp crash, bad configCheck logs:
kubectl logs pod
ImagePullBackOffPod stuck in PendingImage not found, authCheck image name, pull secrets
Pending PodPod won't scheduleNo resources, node selectorCheck resources, affinity rules
问题症状根本原因解决方法
OOMKilledPod重启,退出码137内存限制超出增加限制,优化代码
CrashLoopBackOffPod持续重启应用崩溃,配置错误查看日志:
kubectl logs pod
ImagePullBackOffPod停滞在Pending状态镜像未找到,认证问题检查镜像名称,拉取密钥
Pending PodPod无法调度无可用资源,节点选择器问题检查资源,亲和性规则

Debug Commands

调试命令

bash
undefined
bash
undefined

Check pod status and events

Check pod status and events

kubectl describe pod <pod-name>
kubectl describe pod <pod-name>

View container logs

View container logs

kubectl logs <pod-name> -c <container-name> --previous
kubectl logs <pod-name> -c <container-name> --previous

Execute shell in container

Execute shell in container

kubectl exec -it <pod-name> -- /bin/sh
kubectl exec -it <pod-name> -- /bin/sh

Check resource usage

Check resource usage

kubectl top pods
kubectl top pods

Debug networking

Debug networking

kubectl run debug --image=busybox -it --rm -- sh
undefined
kubectl run debug --image=busybox -it --rm -- sh
undefined

Best Practices

最佳实践

dockerfile
undefined
dockerfile
undefined

✅ DO: Use specific image tags

✅ DO: Use specific image tags

FROM python:3.12.1-slim
FROM python:3.12.1-slim

✅ DO: Use non-root user

✅ DO: Use non-root user

USER 1000
USER 1000

✅ DO: Use multi-stage builds

✅ DO: Use multi-stage builds

✅ DO: Set resource limits

✅ DO: Set resource limits

✅ DO: Use health checks

✅ DO: Use health checks

❌ DON'T: Run as root

❌ DON'T: Run as root

❌ DON'T: Use latest tag

❌ DON'T: Use latest tag

❌ DON'T: Store secrets in images

❌ DON'T: Store secrets in images

undefined
undefined

Resources

资源


Skill Certification Checklist:
  • Can write production Dockerfiles
  • Can deploy applications to Kubernetes
  • Can create Helm charts
  • Can debug container issues
  • Can implement health checks and probes

技能认证清单:
  • 能够编写生产级Dockerfile
  • 能够将应用部署到Kubernetes
  • 能够创建Helm charts
  • 能够调试容器问题
  • 能够实现健康检查与探针