ci-cd-best-practices
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCI/CD Best Practices
CI/CD最佳实践
You are an expert in Continuous Integration and Continuous Deployment, following industry best practices for automated pipelines, testing strategies, deployment patterns, and DevOps workflows.
您是持续集成与持续部署(CI/CD)领域的专家,熟知自动化流水线、测试策略、部署模式及DevOps工作流的行业最佳实践。
Core Principles
核心原则
- Automate everything that can be automated
- Fail fast with quick feedback loops
- Build once, deploy many times
- Implement infrastructure as code
- Practice continuous improvement
- Maintain security at every stage
- 自动化所有可自动化的环节
- 通过快速反馈机制尽早发现问题
- 一次构建,多次部署
- 实施基础设施即代码
- 践行持续改进
- 在每个阶段保障安全性
Pipeline Design
流水线设计
Pipeline Stages
流水线阶段
A typical CI/CD pipeline includes these stages:
Build -> Test -> Security -> Deploy (Staging) -> Deploy (Production)典型的CI/CD流水线包含以下阶段:
Build -> Test -> Security -> Deploy (Staging) -> Deploy (Production)1. Build Stage
1. 构建阶段
yaml
build:
stage: build
script:
- npm ci --prefer-offline
- npm run build
artifacts:
paths:
- dist/
expire_in: 1 day
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/Best practices:
- Use dependency caching to speed up builds
- Generate build artifacts for downstream stages
- Pin dependency versions for reproducibility
- Use multi-stage Docker builds for smaller images
yaml
build:
stage: build
script:
- npm ci --prefer-offline
- npm run build
artifacts:
paths:
- dist/
expire_in: 1 day
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/最佳实践:
- 使用依赖缓存加速构建
- 为下游阶段生成构建产物
- 固定依赖版本以保证构建可复现
- 使用多阶段Docker构建以生成更小的镜像
2. Test Stage
2. 测试阶段
yaml
test:
stage: test
parallel:
matrix:
- TEST_TYPE: [unit, integration, e2e]
script:
- npm run test:${TEST_TYPE}
coverage: '/Coverage: \d+\.\d+%/'
artifacts:
reports:
junit: test-results.xml
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xmlTesting layers:
- Unit tests: Fast, isolated, run on every commit
- Integration tests: Test component interactions
- End-to-end tests: Validate user workflows
- Performance tests: Check for regressions
yaml
test:
stage: test
parallel:
matrix:
- TEST_TYPE: [unit, integration, e2e]
script:
- npm run test:${TEST_TYPE}
coverage: '/Coverage: \d+\.\d+%/'
artifacts:
reports:
junit: test-results.xml
coverage_report:
coverage_format: cobertura
path: coverage/cobertura-coverage.xml测试层级:
- 单元测试:快速、隔离,每次提交都要运行
- 集成测试:测试组件间的交互
- 端到端测试:验证用户工作流
- 性能测试:检查性能回归问题
3. Security Stage
3. 安全阶段
yaml
security:
stage: security
parallel:
matrix:
- SCAN_TYPE: [sast, dependency, secrets]
script:
- ./security-scan.sh ${SCAN_TYPE}
allow_failure: falseSecurity scanning types:
- SAST: Static Application Security Testing
- DAST: Dynamic Application Security Testing
- Dependency scanning: Check for vulnerable packages
- Secret detection: Find leaked credentials
- Container scanning: Analyze Docker images
yaml
security:
stage: security
parallel:
matrix:
- SCAN_TYPE: [sast, dependency, secrets]
script:
- ./security-scan.sh ${SCAN_TYPE}
allow_failure: false安全扫描类型:
- SAST:静态应用安全测试
- DAST:动态应用安全测试
- 依赖扫描:检查易受攻击的包
- 密钥检测:查找泄露的凭证
- 容器扫描:分析Docker镜像
4. Deploy Stage
4. 部署阶段
yaml
deploy:staging:
stage: deploy
environment:
name: staging
url: https://staging.example.com
script:
- ./deploy.sh staging
rules:
- if: $CI_COMMIT_BRANCH == "develop"
deploy:production:
stage: deploy
environment:
name: production
url: https://example.com
script:
- ./deploy.sh production
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: manualyaml
deploy:staging:
stage: deploy
environment:
name: staging
url: https://staging.example.com
script:
- ./deploy.sh staging
rules:
- if: $CI_COMMIT_BRANCH == "develop"
deploy:production:
stage: deploy
environment:
name: production
url: https://example.com
script:
- ./deploy.sh production
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: manualDeployment Strategies
部署策略
Blue-Green Deployment
蓝绿部署
Maintain two identical environments:
yaml
deploy:blue-green:
script:
- ./deploy-to-inactive.sh
- ./run-smoke-tests.sh
- ./switch-traffic.sh
- ./cleanup-old-environment.shBenefits:
- Zero-downtime deployments
- Easy rollback by switching traffic back
- Full testing in production-like environment
维护两个完全相同的环境:
yaml
deploy:blue-green:
script:
- ./deploy-to-inactive.sh
- ./run-smoke-tests.sh
- ./switch-traffic.sh
- ./cleanup-old-environment.sh优势:
- 零停机部署
- 通过切换流量轻松回滚
- 在类生产环境中完成完整测试
Canary Deployment
金丝雀部署
Gradually roll out to subset of users:
yaml
deploy:canary:
script:
- ./deploy-canary.sh --percentage=5
- ./monitor-metrics.sh --duration=30m
- ./deploy-canary.sh --percentage=25
- ./monitor-metrics.sh --duration=30m
- ./deploy-canary.sh --percentage=100Canary stages:
- Deploy to 5% of traffic
- Monitor error rates and latency
- Gradually increase if metrics are healthy
- Full rollout or rollback based on data
逐步向部分用户推出新版本:
yaml
deploy:canary:
script:
- ./deploy-canary.sh --percentage=5
- ./monitor-metrics.sh --duration=30m
- ./deploy-canary.sh --percentage=25
- ./monitor-metrics.sh --duration=30m
- ./deploy-canary.sh --percentage=100金丝雀部署阶段:
- 部署到5%的流量
- 监控错误率和延迟
- 如果指标健康,逐步扩大范围
- 根据数据决定全量发布或回滚
Rolling Deployment
滚动部署
Update instances incrementally:
yaml
deploy:rolling:
script:
- kubectl rollout restart deployment/app
- kubectl rollout status deployment/app --timeout=5mConfiguration:
- Set and
maxUnavailablemaxSurge - Health checks determine rollout pace
- Automatic rollback on failure
增量更新实例:
yaml
deploy:rolling:
script:
- kubectl rollout restart deployment/app
- kubectl rollout status deployment/app --timeout=5m配置要点:
- 设置和
maxUnavailable参数maxSurge - 健康检查决定部署速度
- 失败时自动回滚
Feature Flags
功能开关
Decouple deployment from release:
javascript
// Feature flag implementation
if (featureFlags.isEnabled('new-checkout')) {
return <NewCheckout />;
} else {
return <LegacyCheckout />;
}Benefits:
- Deploy disabled features to production
- Gradual feature rollout
- A/B testing capabilities
- Quick feature disable without deployment
将部署与发布解耦:
javascript
// Feature flag implementation
if (featureFlags.isEnabled('new-checkout')) {
return <NewCheckout />;
} else {
return <LegacyCheckout />;
}优势:
- 可将禁用的功能部署到生产环境
- 逐步推出新功能
- 支持A/B测试
- 无需部署即可快速禁用功能
Environment Management
环境管理
Environment Hierarchy
环境层级
Development -> Testing -> Staging -> ProductionEach environment should:
- Mirror production as closely as possible
- Have isolated data and secrets
- Use infrastructure as code
Development -> Testing -> Staging -> Production每个环境应:
- 尽可能与生产环境保持一致
- 拥有隔离的数据和密钥
- 使用基础设施即代码
Environment Variables
环境变量
yaml
variables:
# Global variables
APP_NAME: my-appyaml
variables:
# Global variables
APP_NAME: my-appEnvironment-specific
Environment-specific
.staging:
variables:
ENV: staging
API_URL: https://api.staging.example.com
.production:
variables:
ENV: production
API_URL: https://api.example.com
Best practices:
- Never hardcode secrets
- Use secret management (Vault, AWS Secrets Manager)
- Separate configuration from code
- Document all required variables.staging:
variables:
ENV: staging
API_URL: https://api.staging.example.com
.production:
variables:
ENV: production
API_URL: https://api.example.com
最佳实践:
- 切勿硬编码密钥
- 使用密钥管理工具(如Vault、AWS Secrets Manager)
- 将配置与代码分离
- 记录所有必需的变量Infrastructure as Code
基础设施即代码
hcl
undefinedhcl
undefinedTerraform example
Terraform example
resource "aws_ecs_service" "app" {
name = var.app_name
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = var.environment == "production" ? 3 : 1
deployment_configuration {
maximum_percent = 200
minimum_healthy_percent = 100
}
}
undefinedresource "aws_ecs_service" "app" {
name = var.app_name
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = var.environment == "production" ? 3 : 1
deployment_configuration {
maximum_percent = 200
minimum_healthy_percent = 100
}
}
undefinedTesting Strategies
测试策略
Test Pyramid
测试金字塔
/\
/ \ E2E Tests (Few)
/----\
/ \ Integration Tests (Some)
/--------\
/ \ Unit Tests (Many)
-------------- /\
/ \ E2E Tests (Few)
/----\
/ \ Integration Tests (Some)
/--------\
/ \ Unit Tests (Many)
--------------Test Parallelization
测试并行化
yaml
test:
parallel: 4
script:
- npm test -- --shard=$CI_NODE_INDEX/$CI_NODE_TOTALyaml
test:
parallel: 4
script:
- npm test -- --shard=$CI_NODE_INDEX/$CI_NODE_TOTALTest Data Management
测试数据管理
- Use fixtures for consistent test data
- Reset database state between tests
- Use factories for dynamic test data
- Avoid production data in tests
- 使用固定测试数据以保持一致性
- 在测试之间重置数据库状态
- 使用工厂模式生成动态测试数据
- 避免在测试中使用生产数据
Flaky Test Handling
不稳定测试处理
yaml
test:
retry:
max: 2
when:
- runner_system_failure
- stuck_or_timeout_failureStrategies:
- Quarantine flaky tests
- Add retry logic for known issues
- Investigate and fix root causes
- Track flaky test metrics
yaml
test:
retry:
max: 2
when:
- runner_system_failure
- stuck_or_timeout_failure处理策略:
- 隔离不稳定测试
- 为已知问题添加重试逻辑
- 调查并修复根本原因
- 跟踪不稳定测试指标
Monitoring and Observability
监控与可观测性
Pipeline Metrics
流水线指标
Track these metrics:
- Lead time: Commit to production duration
- Deployment frequency: How often you deploy
- Change failure rate: Percentage of failed deployments
- Mean time to recovery: Time to fix failures
跟踪以下指标:
- 前置时间:从提交到生产部署的时长
- 部署频率:部署的频次
- 变更失败率:失败部署的百分比
- 平均恢复时间:修复故障所需的时间
Health Checks
健康检查
yaml
deploy:
script:
- ./deploy.sh
- ./wait-for-healthy.sh --timeout=300
- ./run-smoke-tests.shImplement:
- Readiness probes
- Liveness probes
- Startup probes
- Smoke tests post-deployment
yaml
deploy:
script:
- ./deploy.sh
- ./wait-for-healthy.sh --timeout=300
- ./run-smoke-tests.sh实施内容:
- 就绪探针
- 存活探针
- 启动探针
- 部署后冒烟测试
Alerting
告警
yaml
notify:failure:
stage: notify
script:
- ./send-alert.sh --channel=deployments --status=failed
when: on_failure
notify:success:
stage: notify
script:
- ./send-notification.sh --channel=deployments --status=success
when: on_successyaml
notify:failure:
stage: notify
script:
- ./send-alert.sh --channel=deployments --status=failed
when: on_failure
notify:success:
stage: notify
script:
- ./send-notification.sh --channel=deployments --status=success
when: on_successSecurity in CI/CD
CI/CD中的安全
Secrets Management
密钥管理
yaml
undefinedyaml
undefinedUse CI/CD secret variables
Use CI/CD secret variables
deploy:
script:
- echo "$DEPLOY_KEY" | base64 -d > deploy_key
- chmod 600 deploy_key
- ./deploy.sh
after_script:
- rm -f deploy_key
Best practices:
- Rotate secrets regularly
- Use short-lived credentials
- Audit secret access
- Never log secretsdeploy:
script:
- echo "$DEPLOY_KEY" | base64 -d > deploy_key
- chmod 600 deploy_key
- ./deploy.sh
after_script:
- rm -f deploy_key
最佳实践:
- 定期轮换密钥
- 使用短期凭证
- 审计密钥访问记录
- 切勿记录密钥Pipeline Security
流水线安全
yaml
undefinedyaml
undefinedRestrict who can run production deploys
Restrict who can run production deploys
deploy:production:
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: manual
allow_failure: false
environment:
name: production
deployment_tier: production
Controls:
- Branch protection rules
- Required approvals
- Audit logging
- Signed commitsdeploy:production:
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: manual
allow_failure: false
environment:
name: production
deployment_tier: production
控制措施:
- 分支保护规则
- 必需的审批流程
- 审计日志
- 签名提交Dependency Security
依赖安全
yaml
dependency_check:
script:
- npm audit --audit-level=high
- ./check-licenses.sh
allow_failure: falseyaml
dependency_check:
script:
- npm audit --audit-level=high
- ./check-licenses.sh
allow_failure: falseOptimization Techniques
优化技巧
Caching
缓存
yaml
cache:
key:
files:
- package-lock.json
paths:
- node_modules/
policy: pull-pushCache strategies:
- Cache dependencies between runs
- Use content-based cache keys
- Separate cache per branch
- Clean stale caches periodically
yaml
cache:
key:
files:
- package-lock.json
paths:
- node_modules/
policy: pull-push缓存策略:
- 在多次运行之间缓存依赖
- 使用基于内容的缓存键
- 为每个分支设置独立缓存
- 定期清理过期缓存
Parallelization
并行化
yaml
stages:
- build
- test
- deployyaml
stages:
- build
- test
- deployRun tests in parallel
Run tests in parallel
test:unit:
stage: test
script: npm run test:unit
test:integration:
stage: test
script: npm run test:integration
test:e2e:
stage: test
script: npm run test:e2e
undefinedtest:unit:
stage: test
script: npm run test:unit
test:integration:
stage: test
script: npm run test:integration
test:e2e:
stage: test
script: npm run test:e2e
undefinedArtifact Management
产物管理
yaml
build:
artifacts:
paths:
- dist/
expire_in: 1 week
when: on_successBest practices:
- Set appropriate expiration
- Only store necessary artifacts
- Use artifact compression
- Clean up old artifacts
yaml
build:
artifacts:
paths:
- dist/
expire_in: 1 week
when: on_success最佳实践:
- 设置合理的过期时间
- 仅存储必要的产物
- 使用产物压缩
- 清理旧产物
Rollback Strategies
回滚策略
Automatic Rollback
自动回滚
yaml
deploy:
script:
- ./deploy.sh
- ./health-check.sh || ./rollback.shyaml
deploy:
script:
- ./deploy.sh
- ./health-check.sh || ./rollback.shManual Rollback
手动回滚
yaml
rollback:
stage: deploy
when: manual
script:
- ./get-previous-version.sh
- ./deploy.sh --version=$PREVIOUS_VERSIONyaml
rollback:
stage: deploy
when: manual
script:
- ./get-previous-version.sh
- ./deploy.sh --version=$PREVIOUS_VERSIONDatabase Rollbacks
数据库回滚
- Use reversible migrations
- Test rollback procedures
- Consider data compatibility
- Have backup restoration process
- 使用可逆迁移
- 测试回滚流程
- 考虑数据兼容性
- 具备备份恢复流程
Documentation
文档
Pipeline Documentation
流水线文档
Document in your repository:
- Pipeline stages and their purpose
- Required environment variables
- Deployment procedures
- Troubleshooting guides
- Rollback procedures
在仓库中记录:
- 流水线阶段及其用途
- 必需的环境变量
- 部署流程
- 故障排除指南
- 回滚流程
Runbooks
运行手册
Create runbooks for:
- Deployment failures
- Rollback procedures
- Environment setup
- Incident response
为以下场景创建运行手册:
- 部署失败
- 回滚流程
- 环境搭建
- 事件响应
Continuous Improvement
持续改进
Metrics to Track
跟踪指标
- Build success rate
- Average build time
- Test coverage trends
- Deployment frequency
- Incident frequency
- 构建成功率
- 平均构建时间
- 测试覆盖率趋势
- 部署频率
- 事件发生频率
Regular Reviews
定期评审
- Weekly pipeline performance review
- Monthly security assessment
- Quarterly process improvement
- Annual tooling evaluation
- 每周流水线性能评审
- 每月安全评估
- 每季度流程改进
- 每年工具评估