docker-production

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Docker Production Skill

Docker Production 技能

Master production-grade Docker deployments with monitoring, logging, health checks, and resource management.
掌握具备监控、日志、健康检查和资源管理功能的生产级Docker部署方法。

Purpose

用途

Configure containers for production with proper observability, resource limits, and deployment strategies.
为生产环境配置容器,确保具备完善的可观测性、资源限制和合理的部署策略。

Parameters

参数

ParameterTypeRequiredDefaultDescription
monitoringenumNoprometheusprometheus/datadog
loggingenumNojson-filejson-file/loki/elk
replicasnumberNo1Number of replicas
参数类型是否必填默认值描述
monitoringenumprometheusprometheus/datadog
loggingenumjson-filejson-file/loki/elk
replicas数字1副本数量

Production Configuration

生产环境配置

Health Checks

健康检查

dockerfile
HEALTHCHECK --interval=30s --timeout=3s --retries=3 --start-period=60s \
  CMD curl -f http://localhost:3000/health || exit 1
yaml
undefined
dockerfile
HEALTHCHECK --interval=30s --timeout=3s --retries=3 --start-period=60s \
  CMD curl -f http://localhost:3000/health || exit 1
yaml
undefined

Compose health check

Compose 健康检查

services: app: healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 10s retries: 3 start_period: 60s
undefined
services: app: healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 10s retries: 3 start_period: 60s
undefined

Resource Limits

资源限制

yaml
services:
  app:
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
yaml
services:
  app:
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3

Logging Configuration

日志配置

yaml
services:
  app:
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
        labels: "app,environment"
yaml
services:
  app:
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
        labels: "app,environment"

Monitoring Stack

监控栈

Prometheus + Grafana

Prometheus + Grafana

yaml
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - "8080:8080"
yaml
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - "8080:8080"

Prometheus Config

Prometheus 配置

yaml
undefined
yaml
undefined

prometheus.yml

prometheus.yml

global: scrape_interval: 15s
scrape_configs:
  • job_name: 'docker-containers' docker_sd_configs:
    • host: unix:///var/run/docker.sock
undefined
global: scrape_interval: 15s
scrape_configs:
  • job_name: 'docker-containers' docker_sd_configs:
    • host: unix:///var/run/docker.sock
undefined

Deployment Strategies

部署策略

Rolling Update (Zero Downtime)

滚动更新(零停机)

yaml
deploy:
  update_config:
    parallelism: 1
    delay: 10s
    failure_action: rollback
    order: start-first
  rollback_config:
    parallelism: 1
    delay: 10s
yaml
deploy:
  update_config:
    parallelism: 1
    delay: 10s
    failure_action: rollback
    order: start-first
  rollback_config:
    parallelism: 1
    delay: 10s

Blue-Green

蓝绿部署

bash
undefined
bash
undefined

Deploy new version

部署新版本

docker compose -p myapp-green up -d
docker compose -p myapp-green up -d

Switch traffic (update nginx/load balancer)

切换流量(更新Nginx/负载均衡器)

Remove old version

移除旧版本

docker compose -p myapp-blue down
undefined
docker compose -p myapp-blue down
undefined

Error Handling

错误处理

Common Errors

常见错误

ErrorCauseSolution
unhealthy
Health check failingCheck endpoint, increase start_period
OOMKilled
Memory exceededIncrease limit or optimize
restart loop
App crashCheck logs, fix application
错误原因解决方案
unhealthy
健康检查失败检查端点,延长start_period
OOMKilled
内存超限提升限制或优化应用
restart loop
应用崩溃查看日志,修复应用问题

Recovery

恢复步骤

  1. Check logs:
    docker logs --tail 100 <container>
  2. Verify health:
    docker inspect --format='{{.State.Health.Status}}'
  3. Rollback if needed
  1. 查看日志:
    docker logs --tail 100 <container>
  2. 验证健康状态:
    docker inspect --format='{{.State.Health.Status}}'
  3. 必要时回滚

Troubleshooting

故障排查

Debug Checklist

调试清单

  • Health check passing?
  • Resources sufficient?
    docker stats
  • Logs showing errors?
  • Metrics collecting?
  • 健康检查是否通过?
  • 资源是否充足?
    docker stats
  • 日志是否显示错误?
  • 指标是否正常采集?

Diagnostics

诊断命令

bash
undefined
bash
undefined

Resource usage

资源使用情况

docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

Restart count

重启次数

docker inspect --format='{{.RestartCount}}' <container>
docker inspect --format='{{.RestartCount}}' <container>

Recent events

近期事件

docker events --filter 'container=<name>' --since 1h
undefined
docker events --filter 'container=<name>' --since 1h
undefined

Usage

使用方法

Skill("docker-production")
Skill("docker-production")

Related Skills

相关技能

  • docker-debugging
  • docker-ci-cd
  • docker-security
  • docker-debugging
  • docker-ci-cd
  • docker-security