docker-production
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDocker Production Skill
Docker Production 技能
Master production-grade Docker deployments with monitoring, logging, health checks, and resource management.
掌握具备监控、日志、健康检查和资源管理功能的生产级Docker部署方法。
Purpose
用途
Configure containers for production with proper observability, resource limits, and deployment strategies.
为生产环境配置容器,确保具备完善的可观测性、资源限制和合理的部署策略。
Parameters
参数
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| monitoring | enum | No | prometheus | prometheus/datadog |
| logging | enum | No | json-file | json-file/loki/elk |
| replicas | number | No | 1 | Number of replicas |
| 参数 | 类型 | 是否必填 | 默认值 | 描述 |
|---|---|---|---|---|
| monitoring | enum | 否 | prometheus | prometheus/datadog |
| logging | enum | 否 | json-file | json-file/loki/elk |
| replicas | 数字 | 否 | 1 | 副本数量 |
Production Configuration
生产环境配置
Health Checks
健康检查
dockerfile
HEALTHCHECK \
CMD curl -f http://localhost:3000/health || exit 1yaml
undefineddockerfile
HEALTHCHECK \
CMD curl -f http://localhost:3000/health || exit 1yaml
undefinedCompose health check
Compose 健康检查
services:
app:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
undefinedservices:
app:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
undefinedResource Limits
资源限制
yaml
services:
app:
deploy:
resources:
limits:
cpus: '1'
memory: 1G
reservations:
cpus: '0.5'
memory: 512M
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3yaml
services:
app:
deploy:
resources:
limits:
cpus: '1'
memory: 1G
reservations:
cpus: '0.5'
memory: 512M
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3Logging Configuration
日志配置
yaml
services:
app:
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
labels: "app,environment"yaml
services:
app:
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
labels: "app,environment"Monitoring Stack
监控栈
Prometheus + Grafana
Prometheus + Grafana
yaml
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
grafana:
image: grafana/grafana:latest
ports:
- "3001:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8080:8080"yaml
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
grafana:
image: grafana/grafana:latest
ports:
- "3001:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8080:8080"Prometheus Config
Prometheus 配置
yaml
undefinedyaml
undefinedprometheus.yml
prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'docker-containers'
docker_sd_configs:
- host: unix:///var/run/docker.sock
undefinedglobal:
scrape_interval: 15s
scrape_configs:
- job_name: 'docker-containers'
docker_sd_configs:
- host: unix:///var/run/docker.sock
undefinedDeployment Strategies
部署策略
Rolling Update (Zero Downtime)
滚动更新(零停机)
yaml
deploy:
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
order: start-first
rollback_config:
parallelism: 1
delay: 10syaml
deploy:
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
order: start-first
rollback_config:
parallelism: 1
delay: 10sBlue-Green
蓝绿部署
bash
undefinedbash
undefinedDeploy new version
部署新版本
docker compose -p myapp-green up -d
docker compose -p myapp-green up -d
Switch traffic (update nginx/load balancer)
切换流量(更新Nginx/负载均衡器)
Remove old version
移除旧版本
docker compose -p myapp-blue down
undefineddocker compose -p myapp-blue down
undefinedError Handling
错误处理
Common Errors
常见错误
| Error | Cause | Solution |
|---|---|---|
| Health check failing | Check endpoint, increase start_period |
| Memory exceeded | Increase limit or optimize |
| App crash | Check logs, fix application |
| 错误 | 原因 | 解决方案 |
|---|---|---|
| 健康检查失败 | 检查端点,延长start_period |
| 内存超限 | 提升限制或优化应用 |
| 应用崩溃 | 查看日志,修复应用问题 |
Recovery
恢复步骤
- Check logs:
docker logs --tail 100 <container> - Verify health:
docker inspect --format='{{.State.Health.Status}}' - Rollback if needed
- 查看日志:
docker logs --tail 100 <container> - 验证健康状态:
docker inspect --format='{{.State.Health.Status}}' - 必要时回滚
Troubleshooting
故障排查
Debug Checklist
调试清单
- Health check passing?
- Resources sufficient?
docker stats - Logs showing errors?
- Metrics collecting?
- 健康检查是否通过?
- 资源是否充足?
docker stats - 日志是否显示错误?
- 指标是否正常采集?
Diagnostics
诊断命令
bash
undefinedbash
undefinedResource usage
资源使用情况
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
Restart count
重启次数
docker inspect --format='{{.RestartCount}}' <container>
docker inspect --format='{{.RestartCount}}' <container>
Recent events
近期事件
docker events --filter 'container=<name>' --since 1h
undefineddocker events --filter 'container=<name>' --since 1h
undefinedUsage
使用方法
Skill("docker-production")Skill("docker-production")Related Skills
相关技能
- docker-debugging
- docker-ci-cd
- docker-security
- docker-debugging
- docker-ci-cd
- docker-security