deployment-patterns

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Deployment Patterns

部署模式

Production deployment workflows and CI/CD best practices.
生产环境部署工作流与CI/CD最佳实践。

When to Activate

适用场景

  • Setting up CI/CD pipelines
  • Dockerizing an application
  • Planning deployment strategy (blue-green, canary, rolling)
  • Implementing health checks and readiness probes
  • Preparing for a production release
  • Configuring environment-specific settings
  • 设置CI/CD流水线
  • 将应用Docker容器化
  • 规划部署策略(蓝绿、金丝雀、滚动更新)
  • 实现健康检查与就绪探针
  • 为生产发布做准备
  • 配置环境专属设置

Deployment Strategies

部署策略

Rolling Deployment (Default)

滚动更新部署(默认)

Replace instances gradually — old and new versions run simultaneously during rollout.
Instance 1: v1 → v2  (update first)
Instance 2: v1        (still running v1)
Instance 3: v1        (still running v1)

Instance 1: v2
Instance 2: v1 → v2  (update second)
Instance 3: v1

Instance 1: v2
Instance 2: v2
Instance 3: v1 → v2  (update last)
Pros: Zero downtime, gradual rollout Cons: Two versions run simultaneously — requires backward-compatible changes Use when: Standard deployments, backward-compatible changes
逐步替换实例——发布期间旧版本与新版本同时运行。
Instance 1: v1 → v2  (先更新)
Instance 2: v1        (仍运行v1)
Instance 3: v1        (仍运行v1)

Instance 1: v2
Instance 2: v1 → v2  (再更新)
Instance 3: v1

Instance 1: v2
Instance 2: v2
Instance 3: v1 → v2  (最后更新)
优点: 零停机、逐步发布 缺点: 两个版本同时运行——要求变更具备向后兼容性 适用场景: 标准部署、向后兼容的变更

Blue-Green Deployment

蓝绿部署

Run two identical environments. Switch traffic atomically.
Blue  (v1) ← traffic
Green (v2)   idle, running new version
运行两个完全相同的环境,原子性切换流量。
Blue  (v1) ← 流量
Green (v2)   闲置,运行新版本

After verification:

验证完成后:

Blue (v1) idle (becomes standby) Green (v2) ← traffic

**Pros:** Instant rollback (switch back to blue), clean cutover
**Cons:** Requires 2x infrastructure during deployment
**Use when:** Critical services, zero-tolerance for issues
Blue (v1) 闲置(变为备用环境) Green (v2) ← 流量

**优点:** 即时回滚(切回Blue环境)、干净的切换
**缺点:** 部署期间需要2倍的基础设施资源
**适用场景:** 核心服务、对问题零容忍的场景

Canary Deployment

金丝雀部署

Route a small percentage of traffic to the new version first.
v1: 95% of traffic
v2:  5% of traffic  (canary)
先将小比例流量导向新版本。
v1: 95% 流量
v2:  5% 流量  (金丝雀版本)

If metrics look good:

若指标正常:

v1: 50% of traffic v2: 50% of traffic
v1: 50% 流量 v2: 50% 流量

Final:

最终状态:

v2: 100% of traffic

**Pros:** Catches issues with real traffic before full rollout
**Cons:** Requires traffic splitting infrastructure, monitoring
**Use when:** High-traffic services, risky changes, feature flags
v2: 100% 流量

**优点:** 全量发布前用真实流量发现问题
**缺点:** 需要流量拆分基础设施、监控支持
**适用场景:** 高流量服务、高风险变更、功能开关场景

Docker

Docker

Multi-Stage Dockerfile (Node.js)

多阶段Dockerfile(Node.js)

dockerfile
undefined
dockerfile
undefined

Stage 1: Install dependencies

Stage 1: Install dependencies

FROM node:22-alpine AS deps WORKDIR /app COPY package.json package-lock.json ./ RUN npm ci --production=false
FROM node:22-alpine AS deps WORKDIR /app COPY package.json package-lock.json ./ RUN npm ci --production=false

Stage 2: Build

Stage 2: Build

FROM node:22-alpine AS builder WORKDIR /app COPY --from=deps /app/node_modules ./node_modules COPY . . RUN npm run build RUN npm prune --production
FROM node:22-alpine AS builder WORKDIR /app COPY --from=deps /app/node_modules ./node_modules COPY . . RUN npm run build RUN npm prune --production

Stage 3: Production image

Stage 3: Production image

FROM node:22-alpine AS runner WORKDIR /app
RUN addgroup -g 1001 -S appgroup && adduser -S appuser -u 1001 USER appuser
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules COPY --from=builder --chown=appuser:appgroup /app/dist ./dist COPY --from=builder --chown=appuser:appgroup /app/package.json ./
ENV NODE_ENV=production EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]
undefined
FROM node:22-alpine AS runner WORKDIR /app
RUN addgroup -g 1001 -S appgroup && adduser -S appuser -u 1001 USER appuser
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules COPY --from=builder --chown=appuser:appgroup /app/dist ./dist COPY --from=builder --chown=appuser:appgroup /app/package.json ./
ENV NODE_ENV=production EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]
undefined

Multi-Stage Dockerfile (Go)

多阶段Dockerfile(Go)

dockerfile
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /server ./cmd/server

FROM alpine:3.19 AS runner
RUN apk --no-cache add ca-certificates
RUN adduser -D -u 1001 appuser
USER appuser

COPY --from=builder /server /server

EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s CMD wget -qO- http://localhost:8080/health || exit 1
CMD ["/server"]
dockerfile
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /server ./cmd/server

FROM alpine:3.19 AS runner
RUN apk --no-cache add ca-certificates
RUN adduser -D -u 1001 appuser
USER appuser

COPY --from=builder /server /server

EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s CMD wget -qO- http://localhost:8080/health || exit 1
CMD ["/server"]

Multi-Stage Dockerfile (Python/Django)

多阶段Dockerfile(Python/Django)

dockerfile
FROM python:3.12-slim AS builder
WORKDIR /app
RUN pip install --no-cache-dir uv
COPY requirements.txt .
RUN uv pip install --system --no-cache -r requirements.txt

FROM python:3.12-slim AS runner
WORKDIR /app

RUN useradd -r -u 1001 appuser
USER appuser

COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
COPY . .

ENV PYTHONUNBUFFERED=1
EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=3s CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health/')" || exit 1
CMD ["gunicorn", "config.wsgi:application", "--bind", "0.0.0.0:8000", "--workers", "4"]
dockerfile
FROM python:3.12-slim AS builder
WORKDIR /app
RUN pip install --no-cache-dir uv
COPY requirements.txt .
RUN uv pip install --system --no-cache -r requirements.txt

FROM python:3.12-slim AS runner
WORKDIR /app

RUN useradd -r -u 1001 appuser
USER appuser

COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
COPY . .

ENV PYTHONUNBUFFERED=1
EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=3s CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health/')" || exit 1
CMD ["gunicorn", "config.wsgi:application", "--bind", "0.0.0.0:8000", "--workers", "4"]

Docker Best Practices

Docker最佳实践

undefined
undefined

GOOD practices

推荐实践

  • Use specific version tags (node:22-alpine, not node:latest)
  • Multi-stage builds to minimize image size
  • Run as non-root user
  • Copy dependency files first (layer caching)
  • Use .dockerignore to exclude node_modules, .git, tests
  • Add HEALTHCHECK instruction
  • Set resource limits in docker-compose or k8s
  • 使用特定版本标签(如node:22-alpine,而非node:latest)
  • 多阶段构建以最小化镜像体积
  • 以非root用户运行
  • 先复制依赖文件(利用层缓存)
  • 使用.dockerignore排除node_modules、.git、测试文件
  • 添加HEALTHCHECK指令
  • 在docker-compose或k8s中设置资源限制

BAD practices

不推荐实践

  • Running as root
  • Using :latest tags
  • Copying entire repo in one COPY layer
  • Installing dev dependencies in production image
  • Storing secrets in image (use env vars or secrets manager)
undefined
  • 以root用户运行
  • 使用:latest标签
  • 单次COPY复制整个仓库
  • 在生产镜像中安装开发依赖
  • 在镜像中存储密钥(使用环境变量或密钥管理器)
undefined

CI/CD Pipeline

CI/CD流水线

GitHub Actions (Standard Pipeline)

GitHub Actions(标准流水线)

yaml
name: CI/CD

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - run: npm run lint
      - run: npm run typecheck
      - run: npm test -- --coverage
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: coverage
          path: coverage/

  build:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - uses: docker/build-push-action@v5
        with:
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  deploy:
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - name: Deploy to production
        run: |
          # Platform-specific deployment command
          # Railway: railway up
          # Vercel: vercel --prod
          # K8s: kubectl set image deployment/app app=ghcr.io/${{ github.repository }}:${{ github.sha }}
          echo "Deploying ${{ github.sha }}"
yaml
name: CI/CD

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - run: npm run lint
      - run: npm run typecheck
      - run: npm test -- --coverage
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: coverage
          path: coverage/

  build:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - uses: docker/build-push-action@v5
        with:
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  deploy:
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - name: Deploy to production
        run: |
          # Platform-specific deployment command
          # Railway: railway up
          # Vercel: vercel --prod
          # K8s: kubectl set image deployment/app app=ghcr.io/${{ github.repository }}:${{ github.sha }}
          echo "Deploying ${{ github.sha }}"

Pipeline Stages

流水线阶段

PR opened:
  lint → typecheck → unit tests → integration tests → preview deploy

Merged to main:
  lint → typecheck → unit tests → integration tests → build image → deploy staging → smoke tests → deploy production
PR提交后:
  代码检查 → 类型检查 → 单元测试 → 集成测试 → 预览部署

合并至main分支后:
  代码检查 → 类型检查 → 单元测试 → 集成测试 → 构建镜像 → 部署至预发布环境 → 冒烟测试 → 部署至生产环境

Health Checks

健康检查

Health Check Endpoint

健康检查端点

typescript
// Simple health check
app.get("/health", (req, res) => {
  res.status(200).json({ status: "ok" });
});

// Detailed health check (for internal monitoring)
app.get("/health/detailed", async (req, res) => {
  const checks = {
    database: await checkDatabase(),
    redis: await checkRedis(),
    externalApi: await checkExternalApi(),
  };

  const allHealthy = Object.values(checks).every(c => c.status === "ok");

  res.status(allHealthy ? 200 : 503).json({
    status: allHealthy ? "ok" : "degraded",
    timestamp: new Date().toISOString(),
    version: process.env.APP_VERSION || "unknown",
    uptime: process.uptime(),
    checks,
  });
});

async function checkDatabase(): Promise<HealthCheck> {
  try {
    await db.query("SELECT 1");
    return { status: "ok", latency_ms: 2 };
  } catch (err) {
    return { status: "error", message: "Database unreachable" };
  }
}
typescript
// 简单健康检查
app.get("/health", (req, res) => {
  res.status(200).json({ status: "ok" });
});

// 详细健康检查(用于内部监控)
app.get("/health/detailed", async (req, res) => {
  const checks = {
    database: await checkDatabase(),
    redis: await checkRedis(),
    externalApi: await checkExternalApi(),
  };

  const allHealthy = Object.values(checks).every(c => c.status === "ok");

  res.status(allHealthy ? 200 : 503).json({
    status: allHealthy ? "ok" : "degraded",
    timestamp: new Date().toISOString(),
    version: process.env.APP_VERSION || "unknown",
    uptime: process.uptime(),
    checks,
  });
});

async function checkDatabase(): Promise<HealthCheck> {
  try {
    await db.query("SELECT 1");
    return { status: "ok", latency_ms: 2 };
  } catch (err) {
    return { status: "error", message: "Database unreachable" };
  }
}

Kubernetes Probes

Kubernetes探针

yaml
livenessProbe:
  httpGet:
    path: /health
    port: 3000
  initialDelaySeconds: 10
  periodSeconds: 30
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health
    port: 3000
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 2

startupProbe:
  httpGet:
    path: /health
    port: 3000
  initialDelaySeconds: 0
  periodSeconds: 5
  failureThreshold: 30    # 30 * 5s = 150s max startup time
yaml
livenessProbe:
  httpGet:
    path: /health
    port: 3000
  initialDelaySeconds: 10
  periodSeconds: 30
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health
    port: 3000
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 2

startupProbe:
  httpGet:
    path: /health
    port: 3000
  initialDelaySeconds: 0
  periodSeconds: 5
  failureThreshold: 30    # 30 * 5s = 150s max startup time

Environment Configuration

环境配置

Twelve-Factor App Pattern

十二因素应用模式

bash
undefined
bash
undefined

All config via environment variables — never in code

所有配置通过环境变量传递——绝不在代码中硬编码

DATABASE_URL=postgres://user:pass@host:5432/db REDIS_URL=redis://host:6379/0 API_KEY=${API_KEY} # injected by secrets manager LOG_LEVEL=info PORT=3000
DATABASE_URL=postgres://user:pass@host:5432/db REDIS_URL=redis://host:6379/0 API_KEY=${API_KEY} # 由密钥管理器注入 LOG_LEVEL=info PORT=3000

Environment-specific behavior

环境专属行为

NODE_ENV=production # or staging, development APP_ENV=production # explicit app environment
undefined
NODE_ENV=production # 或staging、development APP_ENV=production # 显式声明应用环境
undefined

Configuration Validation

配置验证

typescript
import { z } from "zod";

const envSchema = z.object({
  NODE_ENV: z.enum(["development", "staging", "production"]),
  PORT: z.coerce.number().default(3000),
  DATABASE_URL: z.string().url(),
  REDIS_URL: z.string().url(),
  JWT_SECRET: z.string().min(32),
  LOG_LEVEL: z.enum(["debug", "info", "warn", "error"]).default("info"),
});

// Validate at startup — fail fast if config is wrong
export const env = envSchema.parse(process.env);
typescript
import { z } from "zod";

const envSchema = z.object({
  NODE_ENV: z.enum(["development", "staging", "production"]),
  PORT: z.coerce.number().default(3000),
  DATABASE_URL: z.string().url(),
  REDIS_URL: z.string().url(),
  JWT_SECRET: z.string().min(32),
  LOG_LEVEL: z.enum(["debug", "info", "warn", "error"]).default("info"),
});

// 启动时验证——配置错误则立即终止
export const env = envSchema.parse(process.env);

Rollback Strategy

回滚策略

Instant Rollback

即时回滚

bash
undefined
bash
undefined

Docker/Kubernetes: point to previous image

Docker/Kubernetes:指向旧版本镜像

kubectl rollout undo deployment/app
kubectl rollout undo deployment/app

Vercel: promote previous deployment

Vercel:推广上一个部署版本

vercel rollback
vercel rollback

Railway: redeploy previous commit

Railway:重新部署上一次提交

railway up --commit <previous-sha>
railway up --commit <previous-sha>

Database: rollback migration (if reversible)

数据库:回滚迁移(若支持可逆)

npx prisma migrate resolve --rolled-back <migration-name>
undefined
npx prisma migrate resolve --rolled-back <migration-name>
undefined

Rollback Checklist

回滚检查清单

  • Previous image/artifact is available and tagged
  • Database migrations are backward-compatible (no destructive changes)
  • Feature flags can disable new features without deploy
  • Monitoring alerts configured for error rate spikes
  • Rollback tested in staging before production release
  • 旧版本镜像/制品可用且已打标签
  • 数据库迁移具备向后兼容性(无破坏性变更)
  • 可通过功能开关禁用新功能,无需重新部署
  • 已配置错误率突增的监控告警
  • 回滚操作已在预发布环境测试

Production Readiness Checklist

生产就绪检查清单

Before any production deployment:
生产部署前需完成:

Application

应用层面

  • All tests pass (unit, integration, E2E)
  • No hardcoded secrets in code or config files
  • Error handling covers all edge cases
  • Logging is structured (JSON) and does not contain PII
  • Health check endpoint returns meaningful status
  • 所有测试通过(单元、集成、端到端)
  • 代码或配置文件中无硬编码密钥
  • 异常处理覆盖所有边缘场景
  • 日志采用结构化格式(JSON)且不包含PII(个人可识别信息)
  • 健康检查端点返回有效状态

Infrastructure

基础设施层面

  • Docker image builds reproducibly (pinned versions)
  • Environment variables documented and validated at startup
  • Resource limits set (CPU, memory)
  • Horizontal scaling configured (min/max instances)
  • SSL/TLS enabled on all endpoints
  • Docker镜像可重复构建(版本已固定)
  • 环境变量已文档化且启动时会验证
  • 已设置资源限制(CPU、内存)
  • 已配置水平扩容(最小/最大实例数)
  • 所有端点已启用SSL/TLS

Monitoring

监控层面

  • Application metrics exported (request rate, latency, errors)
  • Alerts configured for error rate > threshold
  • Log aggregation set up (structured logs, searchable)
  • Uptime monitoring on health endpoint
  • 已导出应用指标(请求量、延迟、错误数)
  • 已配置错误率超过阈值的告警
  • 已设置日志聚合(结构化日志、可搜索)
  • 已对健康检查端点配置可用性监控

Security

安全层面

  • Dependencies scanned for CVEs
  • CORS configured for allowed origins only
  • Rate limiting enabled on public endpoints
  • Authentication and authorization verified
  • Security headers set (CSP, HSTS, X-Frame-Options)
  • 已扫描依赖的CVE漏洞
  • CORS仅配置允许的源
  • 公开端点已启用速率限制
  • 已验证认证与授权逻辑
  • 已设置安全头(CSP、HSTS、X-Frame-Options)

Operations

运维层面

  • Rollback plan documented and tested
  • Database migration tested against production-sized data
  • Runbook for common failure scenarios
  • On-call rotation and escalation path defined
  • 回滚计划已文档化且经过测试
  • 数据库迁移已针对生产规模数据测试
  • 常见故障场景的运行手册已准备
  • 已定义值班轮转与升级路径