ci-cd-best-practices

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

CI/CD Best Practices

CI/CD最佳实践

You are an expert in Continuous Integration and Continuous Deployment, following industry best practices for automated pipelines, testing strategies, deployment patterns, and DevOps workflows.

您是持续集成与持续部署（CI/CD）领域的专家，熟知自动化流水线、测试策略、部署模式及DevOps工作流的行业最佳实践。

Core Principles

核心原则

Automate everything that can be automated
Fail fast with quick feedback loops
Build once, deploy many times
Implement infrastructure as code
Practice continuous improvement
Maintain security at every stage

自动化所有可自动化的环节
通过快速反馈机制尽早发现问题
一次构建，多次部署
实施基础设施即代码
践行持续改进
在每个阶段保障安全性

Pipeline Design

流水线设计

Pipeline Stages

流水线阶段

A typical CI/CD pipeline includes these stages:

Build -> Test -> Security -> Deploy (Staging) -> Deploy (Production)

典型的CI/CD流水线包含以下阶段：

Build -> Test -> Security -> Deploy (Staging) -> Deploy (Production)

1. Build Stage

1. 构建阶段

yaml

build:
  stage: build
  script:
    - npm ci --prefer-offline
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 1 day
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/

Best practices:

Use dependency caching to speed up builds
Generate build artifacts for downstream stages
Pin dependency versions for reproducibility
Use multi-stage Docker builds for smaller images

yaml

build:
  stage: build
  script:
    - npm ci --prefer-offline
    - npm run build
  artifacts:
    paths:
      - dist/
    expire_in: 1 day
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/

最佳实践：

使用依赖缓存加速构建
为下游阶段生成构建产物
固定依赖版本以保证构建可复现
使用多阶段Docker构建以生成更小的镜像

2. Test Stage

2. 测试阶段

yaml

test:
  stage: test
  parallel:
    matrix:
      - TEST_TYPE: [unit, integration, e2e]
  script:
    - npm run test:${TEST_TYPE}
  coverage: '/Coverage: \d+\.\d+%/'
  artifacts:
    reports:
      junit: test-results.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml

Testing layers:

Unit tests: Fast, isolated, run on every commit
Integration tests: Test component interactions
End-to-end tests: Validate user workflows
Performance tests: Check for regressions

yaml

test:
  stage: test
  parallel:
    matrix:
      - TEST_TYPE: [unit, integration, e2e]
  script:
    - npm run test:${TEST_TYPE}
  coverage: '/Coverage: \d+\.\d+%/'
  artifacts:
    reports:
      junit: test-results.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml

测试层级：

单元测试：快速、隔离，每次提交都要运行
集成测试：测试组件间的交互
端到端测试：验证用户工作流
性能测试：检查性能回归问题

3. Security Stage

3. 安全阶段

yaml

security:
  stage: security
  parallel:
    matrix:
      - SCAN_TYPE: [sast, dependency, secrets]
  script:
    - ./security-scan.sh ${SCAN_TYPE}
  allow_failure: false

Security scanning types:

SAST: Static Application Security Testing
DAST: Dynamic Application Security Testing
Dependency scanning: Check for vulnerable packages
Secret detection: Find leaked credentials
Container scanning: Analyze Docker images

yaml

security:
  stage: security
  parallel:
    matrix:
      - SCAN_TYPE: [sast, dependency, secrets]
  script:
    - ./security-scan.sh ${SCAN_TYPE}
  allow_failure: false

安全扫描类型：

SAST：静态应用安全测试
DAST：动态应用安全测试
依赖扫描：检查易受攻击的包
密钥检测：查找泄露的凭证
容器扫描：分析Docker镜像

4. Deploy Stage

4. 部署阶段

yaml

deploy:staging:
  stage: deploy
  environment:
    name: staging
    url: https://staging.example.com
  script:
    - ./deploy.sh staging
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

deploy:production:
  stage: deploy
  environment:
    name: production
    url: https://example.com
  script:
    - ./deploy.sh production
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual

yaml

deploy:staging:
  stage: deploy
  environment:
    name: staging
    url: https://staging.example.com
  script:
    - ./deploy.sh staging
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"

deploy:production:
  stage: deploy
  environment:
    name: production
    url: https://example.com
  script:
    - ./deploy.sh production
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual

Deployment Strategies

部署策略

Blue-Green Deployment

蓝绿部署

Maintain two identical environments:

yaml

deploy:blue-green:
  script:
    - ./deploy-to-inactive.sh
    - ./run-smoke-tests.sh
    - ./switch-traffic.sh
    - ./cleanup-old-environment.sh

Benefits:

Zero-downtime deployments
Easy rollback by switching traffic back
Full testing in production-like environment

维护两个完全相同的环境：

yaml

deploy:blue-green:
  script:
    - ./deploy-to-inactive.sh
    - ./run-smoke-tests.sh
    - ./switch-traffic.sh
    - ./cleanup-old-environment.sh

优势：

零停机部署
通过切换流量轻松回滚
在类生产环境中完成完整测试

Canary Deployment

金丝雀部署

Gradually roll out to subset of users:

yaml

deploy:canary:
  script:
    - ./deploy-canary.sh --percentage=5
    - ./monitor-metrics.sh --duration=30m
    - ./deploy-canary.sh --percentage=25
    - ./monitor-metrics.sh --duration=30m
    - ./deploy-canary.sh --percentage=100

Canary stages:

Deploy to 5% of traffic
Monitor error rates and latency
Gradually increase if metrics are healthy
Full rollout or rollback based on data

逐步向部分用户推出新版本：

yaml

deploy:canary:
  script:
    - ./deploy-canary.sh --percentage=5
    - ./monitor-metrics.sh --duration=30m
    - ./deploy-canary.sh --percentage=25
    - ./monitor-metrics.sh --duration=30m
    - ./deploy-canary.sh --percentage=100

金丝雀部署阶段：

部署到5%的流量
监控错误率和延迟
如果指标健康，逐步扩大范围
根据数据决定全量发布或回滚

Rolling Deployment

滚动部署

Update instances incrementally:

yaml

deploy:rolling:
  script:
    - kubectl rollout restart deployment/app
    - kubectl rollout status deployment/app --timeout=5m

Configuration:

Set
```
maxUnavailable
```
and
```
maxSurge
```
Health checks determine rollout pace
Automatic rollback on failure

增量更新实例：

yaml

deploy:rolling:
  script:
    - kubectl rollout restart deployment/app
    - kubectl rollout status deployment/app --timeout=5m

配置要点：

设置
```
maxUnavailable
```
和
```
maxSurge
```
参数
健康检查决定部署速度
失败时自动回滚

Feature Flags

功能开关

Decouple deployment from release:

javascript

// Feature flag implementation
if (featureFlags.isEnabled('new-checkout')) {
  return <NewCheckout />;
} else {
  return <LegacyCheckout />;
}

Benefits:

Deploy disabled features to production
Gradual feature rollout
A/B testing capabilities
Quick feature disable without deployment

将部署与发布解耦：

javascript

// Feature flag implementation
if (featureFlags.isEnabled('new-checkout')) {
  return <NewCheckout />;
} else {
  return <LegacyCheckout />;
}

优势：

可将禁用的功能部署到生产环境
逐步推出新功能
支持A/B测试
无需部署即可快速禁用功能

Environment Management

环境管理

Environment Hierarchy

环境层级

Development -> Testing -> Staging -> Production

Each environment should:

Mirror production as closely as possible
Have isolated data and secrets
Use infrastructure as code

Development -> Testing -> Staging -> Production

每个环境应：

尽可能与生产环境保持一致
拥有隔离的数据和密钥
使用基础设施即代码

Environment Variables

环境变量

yaml

variables:
  # Global variables
  APP_NAME: my-app

yaml

variables:
  # Global variables
  APP_NAME: my-app

Environment-specific

.staging: variables: ENV: staging API_URL: https://api.staging.example.com

.production: variables: ENV: production API_URL: https://api.example.com


Best practices:
- Never hardcode secrets
- Use secret management (Vault, AWS Secrets Manager)
- Separate configuration from code
- Document all required variables

.staging: variables: ENV: staging API_URL: https://api.staging.example.com

.production: variables: ENV: production API_URL: https://api.example.com


最佳实践：
- 切勿硬编码密钥
- 使用密钥管理工具（如Vault、AWS Secrets Manager）
- 将配置与代码分离
- 记录所有必需的变量

Infrastructure as Code

基础设施即代码

hcl

undefined

hcl

undefined

Terraform example

resource "aws_ecs_service" "app" { name = var.app_name cluster = aws_ecs_cluster.main.id task_definition = aws_ecs_task_definition.app.arn desired_count = var.environment == "production" ? 3 : 1

deployment_configuration { maximum_percent = 200 minimum_healthy_percent = 100 } }

undefined

resource "aws_ecs_service" "app" { name = var.app_name cluster = aws_ecs_cluster.main.id task_definition = aws_ecs_task_definition.app.arn desired_count = var.environment == "production" ? 3 : 1

deployment_configuration { maximum_percent = 200 minimum_healthy_percent = 100 } }

undefined

Testing Strategies

测试策略

Test Pyramid

测试金字塔

        /\
       /  \      E2E Tests (Few)
      /----\
     /      \    Integration Tests (Some)
    /--------\
   /          \  Unit Tests (Many)
  --------------

        /\
       /  \      E2E Tests (Few)
      /----\
     /      \    Integration Tests (Some)
    /--------\
   /          \  Unit Tests (Many)
  --------------

Test Parallelization

测试并行化

yaml

test:
  parallel: 4
  script:
    - npm test -- --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL

yaml

test:
  parallel: 4
  script:
    - npm test -- --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL

Test Data Management

测试数据管理

Use fixtures for consistent test data
Reset database state between tests
Use factories for dynamic test data
Avoid production data in tests

使用固定测试数据以保持一致性
在测试之间重置数据库状态
使用工厂模式生成动态测试数据
避免在测试中使用生产数据

Flaky Test Handling

不稳定测试处理

yaml

test:
  retry:
    max: 2
    when:
      - runner_system_failure
      - stuck_or_timeout_failure

Strategies:

Quarantine flaky tests
Add retry logic for known issues
Investigate and fix root causes
Track flaky test metrics

yaml

test:
  retry:
    max: 2
    when:
      - runner_system_failure
      - stuck_or_timeout_failure

处理策略：

隔离不稳定测试
为已知问题添加重试逻辑
调查并修复根本原因
跟踪不稳定测试指标

Monitoring and Observability

监控与可观测性

Pipeline Metrics

流水线指标

Track these metrics:

Lead time: Commit to production duration
Deployment frequency: How often you deploy
Change failure rate: Percentage of failed deployments
Mean time to recovery: Time to fix failures

跟踪以下指标：

前置时间：从提交到生产部署的时长
部署频率：部署的频次
变更失败率：失败部署的百分比
平均恢复时间：修复故障所需的时间

Health Checks

健康检查

yaml

deploy:
  script:
    - ./deploy.sh
    - ./wait-for-healthy.sh --timeout=300
    - ./run-smoke-tests.sh

Implement:

Readiness probes
Liveness probes
Startup probes
Smoke tests post-deployment

yaml

deploy:
  script:
    - ./deploy.sh
    - ./wait-for-healthy.sh --timeout=300
    - ./run-smoke-tests.sh

实施内容：

就绪探针
存活探针
启动探针
部署后冒烟测试

Alerting

告警

yaml

notify:failure:
  stage: notify
  script:
    - ./send-alert.sh --channel=deployments --status=failed
  when: on_failure

notify:success:
  stage: notify
  script:
    - ./send-notification.sh --channel=deployments --status=success
  when: on_success

yaml

notify:failure:
  stage: notify
  script:
    - ./send-alert.sh --channel=deployments --status=failed
  when: on_failure

notify:success:
  stage: notify
  script:
    - ./send-notification.sh --channel=deployments --status=success
  when: on_success

Security in CI/CD

CI/CD中的安全

Secrets Management

密钥管理

yaml

undefined

yaml

undefined

Use CI/CD secret variables

deploy: script: - echo "$DEPLOY_KEY" | base64 -d > deploy_key - chmod 600 deploy_key - ./deploy.sh after_script: - rm -f deploy_key


Best practices:
- Rotate secrets regularly
- Use short-lived credentials
- Audit secret access
- Never log secrets

deploy: script: - echo "$DEPLOY_KEY" | base64 -d > deploy_key - chmod 600 deploy_key - ./deploy.sh after_script: - rm -f deploy_key


最佳实践：
- 定期轮换密钥
- 使用短期凭证
- 审计密钥访问记录
- 切勿记录密钥

Pipeline Security

流水线安全

yaml

undefined

yaml

undefined

Restrict who can run production deploys

deploy:production: rules: - if: $CI_COMMIT_BRANCH == "main" when: manual allow_failure: false environment: name: production deployment_tier: production


Controls:
- Branch protection rules
- Required approvals
- Audit logging
- Signed commits

deploy:production: rules: - if: $CI_COMMIT_BRANCH == "main" when: manual allow_failure: false environment: name: production deployment_tier: production


控制措施：
- 分支保护规则
- 必需的审批流程
- 审计日志
- 签名提交

Dependency Security

依赖安全

yaml

dependency_check:
  script:
    - npm audit --audit-level=high
    - ./check-licenses.sh
  allow_failure: false

yaml

dependency_check:
  script:
    - npm audit --audit-level=high
    - ./check-licenses.sh
  allow_failure: false

Optimization Techniques

优化技巧

Caching

缓存

yaml

cache:
  key:
    files:
      - package-lock.json
  paths:
    - node_modules/
  policy: pull-push

Cache strategies:

Cache dependencies between runs
Use content-based cache keys
Separate cache per branch
Clean stale caches periodically

yaml

cache:
  key:
    files:
      - package-lock.json
  paths:
    - node_modules/
  policy: pull-push

缓存策略：

在多次运行之间缓存依赖
使用基于内容的缓存键
为每个分支设置独立缓存
定期清理过期缓存

Parallelization

并行化

yaml

stages:
  - build
  - test
  - deploy

yaml

stages:
  - build
  - test
  - deploy

Run tests in parallel

test:unit: stage: test script: npm run test:unit

test:integration: stage: test script: npm run test:integration

test:e2e: stage: test script: npm run test:e2e

undefined

test:unit: stage: test script: npm run test:unit

test:integration: stage: test script: npm run test:integration

test:e2e: stage: test script: npm run test:e2e

undefined

Artifact Management

产物管理

yaml

build:
  artifacts:
    paths:
      - dist/
    expire_in: 1 week
    when: on_success

Best practices:

Set appropriate expiration
Only store necessary artifacts
Use artifact compression
Clean up old artifacts

yaml

build:
  artifacts:
    paths:
      - dist/
    expire_in: 1 week
    when: on_success

最佳实践：

设置合理的过期时间
仅存储必要的产物
使用产物压缩
清理旧产物

Rollback Strategies

回滚策略

Automatic Rollback

自动回滚

yaml

deploy:
  script:
    - ./deploy.sh
    - ./health-check.sh || ./rollback.sh

yaml

deploy:
  script:
    - ./deploy.sh
    - ./health-check.sh || ./rollback.sh

Manual Rollback

手动回滚

yaml

rollback:
  stage: deploy
  when: manual
  script:
    - ./get-previous-version.sh
    - ./deploy.sh --version=$PREVIOUS_VERSION

yaml

rollback:
  stage: deploy
  when: manual
  script:
    - ./get-previous-version.sh
    - ./deploy.sh --version=$PREVIOUS_VERSION

Database Rollbacks

数据库回滚

Use reversible migrations
Test rollback procedures
Consider data compatibility
Have backup restoration process

使用可逆迁移
测试回滚流程
考虑数据兼容性
具备备份恢复流程

Documentation

文档

Pipeline Documentation

流水线文档

Document in your repository:

Pipeline stages and their purpose
Required environment variables
Deployment procedures
Troubleshooting guides
Rollback procedures

在仓库中记录：

流水线阶段及其用途
必需的环境变量
部署流程
故障排除指南
回滚流程

Runbooks

运行手册

Create runbooks for:

Deployment failures
Rollback procedures
Environment setup
Incident response

为以下场景创建运行手册：

部署失败
回滚流程
环境搭建
事件响应

Continuous Improvement

持续改进

Metrics to Track

跟踪指标

Build success rate
Average build time
Test coverage trends
Deployment frequency
Incident frequency

构建成功率
平均构建时间
测试覆盖率趋势
部署频率
事件发生频率

Regular Reviews

定期评审

Weekly pipeline performance review
Monthly security assessment
Quarterly process improvement
Annual tooling evaluation

每周流水线性能评审
每月安全评估
每季度流程改进
每年工具评估