devops-flow

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

devops-flow

DevOps工作流

Description: Infrastructure as Code, CI/CD pipeline automation, and deployment management
Category: DevOps & Deployment
Complexity: High (multi-cloud + orchestration + automation)

说明: 基础设施即代码、CI/CD流水线自动化及部署管理
分类: DevOps与部署
复杂度: 高(多云+编排+自动化)

Purpose

目标

Automate infrastructure provisioning, CI/CD pipelines, and deployment processes based on SPEC documents and ADR decisions. Ensures consistent, repeatable, and secure deployments across environments.

基于SPEC文档和ADR决策,自动化基础设施配置、CI/CD流水线和部署流程。确保跨环境的部署一致、可重复且安全。

Capabilities

能力

1. Infrastructure as Code (IaC)

1. 基础设施即代码(IaC)

  • Terraform: Cloud-agnostic infrastructure provisioning
  • CloudFormation: AWS-native infrastructure
  • Ansible: Configuration management and provisioning
  • Pulumi: Modern IaC with standard programming languages
  • Kubernetes manifests: Container orchestration
  • Terraform: 云无关的基础设施配置
  • CloudFormation: AWS原生基础设施
  • Ansible: 配置管理与配置
  • Pulumi: 使用标准编程语言的现代IaC
  • Kubernetes manifests: 容器编排

2. CI/CD Pipeline Generation

2. CI/CD流水线生成

  • GitHub Actions: Workflow automation
  • GitLab CI: Pipeline configuration
  • Jenkins: Pipeline as code
  • CircleCI: Cloud-native CI/CD
  • Azure DevOps: Microsoft ecosystem integration
  • GitHub Actions: 工作流自动化
  • GitLab CI: 流水线配置
  • Jenkins: 流水线即代码
  • CircleCI: 云原生CI/CD
  • Azure DevOps: Microsoft生态系统集成

3. Container Configuration

3. 容器配置

  • Dockerfile: Container image definition
  • Docker Compose: Multi-container applications
  • Kubernetes: Production orchestration
  • Helm charts: Kubernetes package management
  • Container registry: Image storage and versioning
  • Dockerfile: 容器镜像定义
  • Docker Compose: 多容器应用
  • Kubernetes: 生产环境编排
  • Helm charts: Kubernetes包管理
  • Container registry: 镜像存储与版本控制

4. Deployment Strategies

4. 部署策略

  • Blue-Green: Zero-downtime deployments
  • Canary: Gradual rollout with monitoring
  • Rolling: Sequential instance updates
  • Feature flags: Progressive feature enablement
  • Rollback procedures: Automated failure recovery
  • Blue-Green: 零停机部署
  • Canary: 带监控的渐进式发布
  • Rolling: 顺序实例更新
  • Feature flags: 渐进式功能启用
  • Rollback procedures: 自动化故障恢复

5. Environment Management

5. 环境管理

  • Environment separation: dev, staging, production
  • Configuration management: Environment-specific configs
  • Secret management: Vault, AWS Secrets Manager, etc.
  • Infrastructure versioning: State management
  • Cost optimization: Resource tagging and monitoring
  • Environment separation: 开发、 staging、生产环境隔离
  • Configuration management: 环境专属配置
  • Secret management: Vault、AWS Secrets Manager等
  • Infrastructure versioning: 状态管理
  • Cost optimization: 资源标签与监控

6. Monitoring & Observability

6. 监控与可观测性

  • Logging: Centralized log aggregation
  • Metrics: Performance and health monitoring
  • Alerting: Incident response automation
  • Tracing: Distributed request tracking
  • Dashboards: Real-time visualization
  • Logging: 集中式日志聚合
  • Metrics: 性能与健康监控
  • Alerting: 事件响应自动化
  • Tracing: 分布式请求追踪
  • Dashboards: 实时可视化

7. Security & Compliance

7. 安全与合规

  • Security scanning: Container and infrastructure
  • Compliance checks: Policy enforcement
  • Access control: IAM and RBAC
  • Network security: Firewall rules, VPC configuration
  • Audit logging: Change tracking
  • Security scanning: 容器与基础设施安全扫描
  • Compliance checks: 策略执行
  • Access control: IAM与RBAC
  • Network security: 防火墙规则、VPC配置
  • Audit logging: 变更追踪

8. Disaster Recovery

8. 灾难恢复

  • Backup automation: Data and configuration backups
  • Recovery procedures: Automated restoration
  • Failover: Multi-region redundancy
  • Data replication: Cross-region sync
  • RTO/RPO: Recovery objectives implementation

  • Backup automation: 数据与配置备份自动化
  • Recovery procedures: 自动化恢复
  • Failover: 多区域冗余
  • Data replication: 跨区域同步
  • RTO/RPO: 恢复目标落地

DevOps Workflow

DevOps工作流

mermaid
graph TD
    A[SPEC Document] --> B[Extract Requirements]
    B --> C{Infrastructure Needed?}
    C -->|Yes| D[Generate IaC Templates]
    C -->|No| E[Generate CI/CD Pipeline]

    D --> F[Terraform/CloudFormation]
    F --> G[Validate Infrastructure Code]
    G --> H{Validation Pass?}
    H -->|No| I[Report Issues]
    H -->|Yes| J[Generate Deployment Pipeline]

    E --> J
    J --> K[CI/CD Configuration]
    K --> L[Add Build Stage]
    L --> M[Add Test Stage]
    M --> N[Add Security Scan]
    N --> O[Add Deploy Stage]

    O --> P[Environment Strategy]
    P --> Q{Deployment Type}
    Q -->|Blue-Green| R[Generate Blue-Green Config]
    Q -->|Canary| S[Generate Canary Config]
    Q -->|Rolling| T[Generate Rolling Config]

    R --> U[Add Monitoring]
    S --> U
    T --> U

    U --> V[Add Rollback Procedure]
    V --> W[Generate Documentation]
    W --> X[Review & Deploy]

    I --> X

mermaid
graph TD
    A[SPEC Document] --> B[Extract Requirements]
    B --> C{Infrastructure Needed?}
    C -->|Yes| D[Generate IaC Templates]
    C -->|No| E[Generate CI/CD Pipeline]

    D --> F[Terraform/CloudFormation]
    F --> G[Validate Infrastructure Code]
    G --> H{Validation Pass?}
    H -->|No| I[Report Issues]
    H -->|Yes| J[Generate Deployment Pipeline]

    E --> J
    J --> K[CI/CD Configuration]
    K --> L[Add Build Stage]
    L --> M[Add Test Stage]
    M --> N[Add Security Scan]
    N --> O[Add Deploy Stage]

    O --> P[Environment Strategy]
    P --> Q{Deployment Type}
    Q -->|Blue-Green| R[Generate Blue-Green Config]
    Q -->|Canary| S[Generate Canary Config]
    Q -->|Rolling| T[Generate Rolling Config]

    R --> U[Add Monitoring]
    S --> U
    T --> U

    U --> V[Add Rollback Procedure]
    V --> W[Generate Documentation]
    W --> X[Review & Deploy]

    I --> X

Usage Instructions

使用说明

Generate Infrastructure from SPEC

从SPEC生成基础设施

bash
devops-flow generate-infra \
  --spec specs/SPEC-API-V1.md \
  --cloud aws \
  --output infrastructure/
Generated Terraform structure:
infrastructure/
├── main.tf              # Main configuration
├── variables.tf         # Input variables
├── outputs.tf           # Output values
├── providers.tf         # Cloud provider config
├── modules/
│   ├── vpc/            # Network infrastructure
│   ├── compute/        # EC2, Lambda, etc.
│   ├── database/       # RDS, DynamoDB
│   └── storage/        # S3, EBS
└── environments/
    ├── dev.tfvars      # Development config
    ├── staging.tfvars  # Staging config
    └── prod.tfvars     # Production config
bash
devops-flow generate-infra \
  --spec specs/SPEC-API-V1.md \
  --cloud aws \
  --output infrastructure/
生成的Terraform结构:
infrastructure/
├── main.tf              # 主配置
├── variables.tf         # 输入变量
├── outputs.tf           # 输出值
├── providers.tf         # 云提供商配置
├── modules/
│   ├── vpc/            # 网络基础设施
│   ├── compute/        # EC2、Lambda等
│   ├── database/       # RDS、DynamoDB
│   └── storage/        # S3、EBS
└── environments/
    ├── dev.tfvars      # 开发环境配置
    ├── staging.tfvars  # Staging环境配置
    └── prod.tfvars     # 生产环境配置

Generate CI/CD Pipeline

生成CI/CD流水线

bash
devops-flow generate-pipeline \
  --type github-actions \
  --language python \
  --deploy-strategy blue-green \
  --output .github/workflows/
Generated GitHub Actions workflow:
yaml
name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  PYTHON_VERSION: '3.11'
  AWS_REGION: us-east-1

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: ${{ env.PYTHON_VERSION }}
      - name: Install dependencies
        run: |
          pip install ruff mypy
      - name: Run linters
        run: |
          ruff check .
          mypy .

  test:
    needs: lint
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run tests
        run: |
          pytest --cov=. --cov-report=xml
      - name: Upload coverage
        uses: codecov/codecov-action@v3

  security:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Security scan
        run: |
          bandit -r . -f json -o security-report.json
      - name: Upload security report
        uses: actions/upload-artifact@v3
        with:
          name: security-report
          path: security-report.json

  build:
    needs: [lint, test, security]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build Docker image
        run: |
          docker build -t app:${{ github.sha }} .
      - name: Push to registry
        run: |
          docker push registry.example.com/app:${{ github.sha }}

  deploy-staging:
    needs: build
    if: github.ref == 'refs/heads/develop'
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - name: Deploy to staging
        run: |
          aws ecs update-service \
            --cluster staging-cluster \
            --service app-service \
            --force-new-deployment

  deploy-production:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Deploy blue-green
        run: |
          # Deploy to green environment
          ./scripts/deploy-green.sh
          # Run smoke tests
          ./scripts/smoke-tests.sh
          # Switch traffic to green
          ./scripts/switch-traffic.sh
          # Keep blue for rollback
bash
devops-flow generate-pipeline \
  --type github-actions \
  --language python \
  --deploy-strategy blue-green \
  --output .github/workflows/
生成的GitHub Actions工作流:
yaml
name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  PYTHON_VERSION: '3.11'
  AWS_REGION: us-east-1

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: ${{ env.PYTHON_VERSION }}
      - name: Install dependencies
        run: |
          pip install ruff mypy
      - name: Run linters
        run: |
          ruff check .
          mypy .

  test:
    needs: lint
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run tests
        run: |
          pytest --cov=. --cov-report=xml
      - name: Upload coverage
        uses: codecov/codecov-action@v3

  security:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Security scan
        run: |
          bandit -r . -f json -o security-report.json
      - name: Upload security report
        uses: actions/upload-artifact@v3
        with:
          name: security-report
          path: security-report.json

  build:
    needs: [lint, test, security]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build Docker image
        run: |
          docker build -t app:${{ github.sha }} .
      - name: Push to registry
        run: |
          docker push registry.example.com/app:${{ github.sha }}

  deploy-staging:
    needs: build
    if: github.ref == 'refs/heads/develop'
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - name: Deploy to staging
        run: |
          aws ecs update-service \
            --cluster staging-cluster \
            --service app-service \
            --force-new-deployment

  deploy-production:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Deploy blue-green
        run: |
          # 部署到绿色环境
          ./scripts/deploy-green.sh
          # 运行冒烟测试
          ./scripts/smoke-tests.sh
          # 切换流量到绿色环境
          ./scripts/switch-traffic.sh
          # 保留蓝色环境用于回滚

Generate Kubernetes Configuration

生成Kubernetes配置

bash
devops-flow generate-k8s \
  --spec specs/SPEC-API-V1.md \
  --replicas 3 \
  --output k8s/
Generated Kubernetes manifests:
yaml
undefined
bash
devops-flow generate-k8s \
  --spec specs/SPEC-API-V1.md \
  --replicas 3 \
  --output k8s/
生成的Kubernetes清单:
yaml
undefined

k8s/deployment.yaml

k8s/deployment.yaml

apiVersion: apps/v1 kind: Deployment metadata: name: api-service labels: app: api version: v1 spec: replicas: 3 selector: matchLabels: app: api template: metadata: labels: app: api version: v1 spec: containers: - name: api image: registry.example.com/api:latest ports: - containerPort: 8000 env: - name: DATABASE_URL valueFrom: secretKeyRef: name: api-secrets key: database-url resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8000 initialDelaySeconds: 5 periodSeconds: 5

apiVersion: apps/v1 kind: Deployment metadata: name: api-service labels: app: api version: v1 spec: replicas: 3 selector: matchLabels: app: api template: metadata: labels: app: api version: v1 spec: containers: - name: api image: registry.example.com/api:latest ports: - containerPort: 8000 env: - name: DATABASE_URL valueFrom: secretKeyRef: name: api-secrets key: database-url resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8000 initialDelaySeconds: 5 periodSeconds: 5

k8s/service.yaml

k8s/service.yaml

apiVersion: v1 kind: Service metadata: name: api-service spec: selector: app: api ports:
  • protocol: TCP port: 80 targetPort: 8000 type: LoadBalancer

apiVersion: v1 kind: Service metadata: name: api-service spec: selector: app: api ports:
  • protocol: TCP port: 80 targetPort: 8000 type: LoadBalancer

k8s/hpa.yaml

k8s/hpa.yaml

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: api-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-service minReplicas: 3 maxReplicas: 10 metrics:
  • type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70

---
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: api-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-service minReplicas: 3 maxReplicas: 10 metrics:
  • type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70

---

Generate Deployment Scripts from REQ

从REQ生成部署脚本

bash
devops-flow generate-deployment-scripts \
  --req docs/07_REQ/REQ-NN.md \
  --spec docs/09_SPEC/SPEC-NN.yaml \
  --output scripts/
Generated shell scripts structure:
scripts/
├── setup.sh              # Initial environment setup
├── install.sh            # Application installation
├── deploy.sh             # Main deployment orchestration
├── rollback.sh           # Rollback to previous version
├── health-check.sh       # Health verification
└── cleanup.sh            # Cleanup old versions
Script Generation Logic:
  • Parse REQ Section 9.5.3 for script requirements
  • Parse SPEC deployment section for technical details
  • Apply script standards (Bash 4.0+, error handling, logging)
  • Reference cloud provider from REQ @adr tags
  • Use environment-specific configurations from REQ 9.5.2
Example generated script (setup.sh):
bash
#!/bin/bash
set -euo pipefail
bash
devops-flow generate-deployment-scripts \
  --req docs/07_REQ/REQ-NN.md \
  --spec docs/09_SPEC/SPEC-NN.yaml \
  --output scripts/
生成的Shell脚本结构:
scripts/
├── setup.sh              # 初始环境搭建
├── install.sh            # 应用安装
├── deploy.sh             # 主部署编排
├── rollback.sh           # 回滚到上一版本
├── health-check.sh       # 健康验证
└── cleanup.sh            # 清理旧版本
脚本生成逻辑:
  • 解析REQ第9.5.3节获取脚本需求
  • 解析SPEC部署节获取技术细节
  • 应用脚本标准(Bash 4.0+、错误处理、日志)
  • 从REQ @adr标签引用云提供商
  • 使用REQ 9.5.2中的环境专属配置
生成的脚本示例 (setup.sh):
bash
#!/bin/bash
set -euo pipefail

Setup environment for deployment

部署环境搭建

LOG_FILE="logs/deployment_$(date +%Y%m%d_%H%M%S).log" mkdir -p logs
log() { echo "[$(date +%Y-%m-%d %H:%M:%S)] $*" | tee -a "$LOG_FILE" }
log "Starting environment setup..."
LOG_FILE="logs/deployment_$(date +%Y%m%d_%H%M%S).log" mkdir -p logs
log() { echo "[$(date +%Y-%m-%d %H:%M:%S)] $*" | tee -a "$LOG_FILE" }
log "开始环境搭建..."

Install dependencies

安装依赖

if [ ! -f .tool-versions ]; then log "Installing Python dependencies..." pip install -r requirements.txt fi
if [ ! -f .tool-versions ]; then log "安装Python依赖..." pip install -r requirements.txt fi

Configure environment variables

配置环境变量

if [ -f .env.deployment ]; then log "Loading deployment environment variables..." export $(cat .env.deployment | grep -v '^#' | xargs) fi
log "Environment setup complete" exit 0

**Example generated script** (deploy.sh):
```bash
#!/bin/bash
set -euo pipefail
if [ -f .env.deployment ]; then log "加载部署环境变量..." export $(cat .env.deployment | grep -v '^#' | xargs) fi
log "环境搭建完成" exit 0

**生成的脚本示例** (deploy.sh):
```bash
#!/bin/bash
set -euo pipefail

Main deployment orchestration script

主部署编排脚本

LOG_FILE="logs/deployment_$(date +%Y%m%d_%H%M%S).log" ENVIRONMENT="${1:-staging}"
log() { echo "[$(date +%Y-%m-%d %H:%M:%S)] $*" | tee -a "$LOG_FILE" }
LOG_FILE="logs/deployment_$(date +%Y%m%d_%H%M%S).log" ENVIRONMENT="${1:-staging}"
log() { echo "[$(date +%Y-%m-%d %H:%M:%S)] $*" | tee -a "$LOG_FILE" }

Step 1: Setup

步骤1:搭建环境

log "Running setup..." ./scripts/setup.sh
log "执行环境搭建..." ./scripts/setup.sh

Step 2: Install

步骤2:安装应用

log "Installing application..." ./scripts/install.sh --env "$ENVIRONMENT"
log "安装应用..." ./scripts/install.sh --env "$ENVIRONMENT"

Step 3: Deploy

步骤3:部署应用

log "Deploying application..." if [ "$ENVIRONMENT" = "production" ]; then ./scripts/deploy-prod.sh else ./scripts/deploy-staging.sh fi
log "部署应用..." if [ "$ENVIRONMENT" = "production" ]; then ./scripts/deploy-prod.sh else ./scripts/deploy-staging.sh fi

Step 4: Health check

步骤4:健康检查

log "Running health check..." ./scripts/health-check.sh --env "$ENVIRONMENT"
if [ $? -eq 0 ]; then log "Deployment successful" else log "Deployment failed, initiating rollback..." ./scripts/rollback.sh --env "$ENVIRONMENT" exit 1 fi

**Example generated script** (health-check.sh):
```bash
#!/bin/bash
set -euo pipefail
log "执行健康检查..." ./scripts/health-check.sh --env "$ENVIRONMENT"
if [ $? -eq 0 ]; then log "部署成功" else log "部署失败,启动回滚..." ./scripts/rollback.sh --env "$ENVIRONMENT" exit 1 fi

**生成的脚本示例** (health-check.sh):
```bash
#!/bin/bash
set -euo pipefail

Health verification script

健康验证脚本

HEALTH_URL="${1:-http://localhost:8000/health/live}" TIMEOUT=60 RETRIES=3
log() { echo "[$(date +%Y-%m-%d %H:%M:%S)] $*" }
log "Starting health check..."
for i in $(seq 1 $RETRIES); do log "Attempt $i of $RETRIES..." RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" --max-time $TIMEOUT "$HEALTH_URL")
if [ "$RESPONSE" = "200" ]; then log "Health check passed" exit 0 fi
log "Health check failed, sleeping before retry..." sleep 5 done
log "Health check failed after $RETRIES attempts" exit 1
undefined
HEALTH_URL="${1:-http://localhost:8000/health/live}" TIMEOUT=60 RETRIES=3
log() { echo "[$(date +%Y-%m-%d %H:%M:%S)] $*" }
log "开始健康检查..."
for i in $(seq 1 $RETRIES); do log "第$i次尝试,共$RETRIES次..." RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" --max-time $TIMEOUT "$HEALTH_URL")
if [ "$RESPONSE" = "200" ]; then log "健康检查通过" exit 0 fi
log "健康检查失败,重试前等待..." sleep 5 done
log "$RETRIES次尝试后健康检查仍失败" exit 1
undefined

Generate Ansible Playbooks from REQ

从REQ生成Ansible Playbook

bash
devops-flow generate-ansible-playbooks \
  --req docs/07_REQ/REQ-NN.md \
  --spec docs/09_SPEC/SPEC-NN.yaml \
  --output ansible/
Generated Ansible playbooks structure:
ansible/
├── provision_infra.yml         # Infrastructure provisioning
├── configure_instances.yml      # Instance configuration
├── deploy_app.yml              # Application deployment
├── configure_monitoring.yml     # Monitoring setup
├── configure_security.yml       # Security hardening
└── backup_restore.yml          # Backup/restore procedures
Playbook Generation Logic:
  • Parse REQ Section 9.5.4 for playbook requirements
  • Parse Section 9.5.1 for infrastructure configuration
  • Apply Ansible standards (2.9+, modular roles, idempotency)
  • Reference cloud provider from REQ @adr tags
  • Use environment-specific variables from REQ 9.5.2
Example generated playbook (provision_infra.yml):
yaml
---
- name: Provision Infrastructure
  hosts: localhost
  gather_facts: no
  vars_files:
    - "environments/{{ target_env }}.yml"

  tasks:
    - name: Create VPC
      ec2_vpc_net:
        name: "{{ vpc_name }}"
        cidr_block: "{{ vpc_cidr }}"
        region: "{{ aws_region }}"
        tags:
          Project: "{{ project_name }}"
          Environment: "{{ target_env }}"
          ManagedBy: "Ansible"

    - name: Create security groups
      ec2_security_group:
        name: "{{ security_group_name }}"
        description: "Security group for {{ application_name }}"
        vpc_id: "{{ vpc.vpc_id }}"
        rules:
          - proto: tcp
            from_port: 80
            to_port: 80
            cidr_ip: 0.0.0.0/0
          - proto: tcp
            from_port: 443
            to_port: 443
            cidr_ip: 0.0.0.0/0
        region: "{{ aws_region }}"
        tags:
          Project: "{{ project_name }}"
          Environment: "{{ target_env }}"

    - name: Create RDS instance
      rds:
        db_name: "{{ db_name }}"
        engine: postgres
        engine_version: "{{ db_version }}"
        instance_type: "{{ db_instance_class }}"
        allocated_storage: "{{ db_storage_gb }}"
        username: "{{ db_username }}"
        password: "{{ db_password }}"
        vpc_security_group_ids:
          - "{{ security_group.group_id }}"
        subnet_group_name: "{{ db_subnet_group }}"
        backup_retention_period: "{{ backup_retention_days }}"
        multi_az: true
        region: "{{ aws_region }}"
        tags:
          Project: "{{ project_name }}"
          Environment: "{{ target_env }}"
          ManagedBy: "Ansible"
Example generated playbook (deploy_app.yml):
yaml
---
- name: Deploy Application
  hosts: app_servers
  gather_facts: yes
  become: yes
  vars_files:
    - "environments/{{ target_env }}.yml"

  tasks:
    - name: Ensure application directory exists
      file:
        path: "{{ app_directory }}"
        state: directory
        mode: '0755'
        owner: "{{ app_user }}"
        group: "{{ app_group }}"

    - name: Copy application code
      synchronize:
        src: "{{ app_source_directory }}/"
        dest: "{{ app_directory }}/"
        delete: yes
        recursive: yes

    - name: Install Python dependencies
      pip:
        requirements: "{{ app_directory }}/requirements.txt"
        virtualenv: "{{ app_venv }}"
        state: present

    - name: Configure application
      template:
        src: "templates/{{ target_env }}_config.yml"
        dest: "{{ app_directory }}/config.yml"
        owner: "{{ app_user }}"
        group: "{{ app_group }}"
        mode: '0640'

    - name: Restart application service
      systemd:
        name: "{{ app_service_name }}"
        state: restarted
        daemon_reload: yes
      notify: Run Health Check

    - name: Wait for application to be ready
      wait_for:
        port: 8000
        host: "{{ inventory_hostname }}"
        timeout: 300

  handlers:
    - name: Run Health Check
      uri:
        url: "http://localhost:8000/health/ready"
        method: GET
        status_code: 200
      register: health_check

bash
devops-flow generate-ansible-playbooks \
  --req docs/07_REQ/REQ-NN.md \
  --spec docs/09_SPEC/SPEC-NN.yaml \
  --output ansible/
生成的Ansible Playbook结构:
ansible/
├── provision_infra.yml         # 基础设施配置
├── configure_instances.yml      # 实例配置
├── deploy_app.yml              # 应用部署
├── configure_monitoring.yml     # 监控搭建
├── configure_security.yml       # 安全加固
└── backup_restore.yml          # 备份/恢复流程
Playbook生成逻辑:
  • 解析REQ第9.5.4节获取Playbook需求
  • 解析第9.5.1节获取基础设施配置
  • 应用Ansible标准(2.9+、模块化角色、幂等性)
  • 从REQ @adr标签引用云提供商
  • 使用REQ 9.5.2中的环境专属变量
生成的Playbook示例 (provision_infra.yml):
yaml
---
- name: Provision Infrastructure
  hosts: localhost
  gather_facts: no
  vars_files:
    - "environments/{{ target_env }}.yml"

  tasks:
    - name: Create VPC
      ec2_vpc_net:
        name: "{{ vpc_name }}"
        cidr_block: "{{ vpc_cidr }}"
        region: "{{ aws_region }}"
        tags:
          Project: "{{ project_name }}"
          Environment: "{{ target_env }}"
          ManagedBy: "Ansible"

    - name: Create security groups
      ec2_security_group:
        name: "{{ security_group_name }}"
        description: "Security group for {{ application_name }}"
        vpc_id: "{{ vpc.vpc_id }}"
        rules:
          - proto: tcp
            from_port: 80
            to_port: 80
            cidr_ip: 0.0.0.0/0
          - proto: tcp
            from_port: 443
            to_port: 443
            cidr_ip: 0.0.0.0/0
        region: "{{ aws_region }}"
        tags:
          Project: "{{ project_name }}"
          Environment: "{{ target_env }}"

    - name: Create RDS instance
      rds:
        db_name: "{{ db_name }}"
        engine: postgres
        engine_version: "{{ db_version }}"
        instance_type: "{{ db_instance_class }}"
        allocated_storage: "{{ db_storage_gb }}"
        username: "{{ db_username }}"
        password: "{{ db_password }}"
        vpc_security_group_ids:
          - "{{ security_group.group_id }}"
        subnet_group_name: "{{ db_subnet_group }}"
        backup_retention_period: "{{ backup_retention_days }}"
        multi_az: true
        region: "{{ aws_region }}"
        tags:
          Project: "{{ project_name }}"
          Environment: "{{ target_env }}"
          ManagedBy: "Ansible"
生成的Playbook示例 (deploy_app.yml):
yaml
---
- name: Deploy Application
  hosts: app_servers
  gather_facts: yes
  become: yes
  vars_files:
    - "environments/{{ target_env }}.yml"

  tasks:
    - name: Ensure application directory exists
      file:
        path: "{{ app_directory }}"
        state: directory
        mode: '0755'
        owner: "{{ app_user }}"
        group: "{{ app_group }}"

    - name: Copy application code
      synchronize:
        src: "{{ app_source_directory }}/"
        dest: "{{ app_directory }}/"
        delete: yes
        recursive: yes

    - name: Install Python dependencies
      pip:
        requirements: "{{ app_directory }}/requirements.txt"
        virtualenv: "{{ app_venv }}"
        state: present

    - name: Configure application
      template:
        src: "templates/{{ target_env }}_config.yml"
        dest: "{{ app_directory }}/config.yml"
        owner: "{{ app_user }}"
        group: "{{ app_group }}"
        mode: '0640'

    - name: Restart application service
      systemd:
        name: "{{ app_service_name }}"
        state: restarted
        daemon_reload: yes
      notify: Run Health Check

    - name: Wait for application to be ready
      wait_for:
        port: 8000
        host: "{{ inventory_hostname }}"
        timeout: 300

  handlers:
    - name: Run Health Check
      uri:
        url: "http://localhost:8000/health/ready"
        method: GET
        status_code: 200
      register: health_check

Infrastructure Templates

基础设施模板

AWS Infrastructure (Terraform)

AWS基础设施(Terraform)

hcl
undefined
hcl
undefined

main.tf

main.tf

terraform { required_version = ">= 1.0" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } backend "s3" { bucket = "terraform-state-bucket" key = "infrastructure/terraform.tfstate" region = "us-east-1" } }
provider "aws" { region = var.aws_region default_tags { tags = { Project = var.project_name Environment = var.environment ManagedBy = "Terraform" } } }
terraform { required_version = ">= 1.0" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } backend "s3" { bucket = "terraform-state-bucket" key = "infrastructure/terraform.tfstate" region = "us-east-1" } }
provider "aws" { region = var.aws_region default_tags { tags = { Project = var.project_name Environment = var.environment ManagedBy = "Terraform" } } }

VPC Module

VPC模块

module "vpc" { source = "./modules/vpc"
vpc_cidr = var.vpc_cidr availability_zones = var.availability_zones public_subnet_cidrs = var.public_subnet_cidrs private_subnet_cidrs = var.private_subnet_cidrs }
module "vpc" { source = "./modules/vpc"
vpc_cidr = var.vpc_cidr availability_zones = var.availability_zones public_subnet_cidrs = var.public_subnet_cidrs private_subnet_cidrs = var.private_subnet_cidrs }

ECS Cluster

ECS集群

resource "aws_ecs_cluster" "main" { name = "${var.project_name}-${var.environment}-cluster"
setting { name = "containerInsights" value = "enabled" } }
resource "aws_ecs_cluster" "main" { name = "${var.project_name}-${var.environment}-cluster"
setting { name = "containerInsights" value = "enabled" } }

Application Load Balancer

应用负载均衡器

resource "aws_lb" "main" { name = "${var.project_name}-${var.environment}-alb" internal = false load_balancer_type = "application" security_groups = [aws_security_group.alb.id] subnets = module.vpc.public_subnet_ids
enable_deletion_protection = var.environment == "production" }
resource "aws_lb" "main" { name = "${var.project_name}-${var.environment}-alb" internal = false load_balancer_type = "application" security_groups = [aws_security_group.alb.id] subnets = module.vpc.public_subnet_ids
enable_deletion_protection = var.environment == "production" }

RDS Database

RDS数据库

resource "aws_db_instance" "main" { identifier = "${var.project_name}-${var.environment}-db" engine = "postgres" engine_version = "15.3" instance_class = var.db_instance_class
allocated_storage = var.db_allocated_storage max_allocated_storage = var.db_max_allocated_storage storage_encrypted = true
db_name = var.db_name username = var.db_username password = random_password.db_password.result
vpc_security_group_ids = [aws_security_group.db.id] db_subnet_group_name = aws_db_subnet_group.main.name
backup_retention_period = var.environment == "production" ? 30 : 7 skip_final_snapshot = var.environment != "production"
tags = { Name = "${var.project_name}-${var.environment}-db" } }
undefined
resource "aws_db_instance" "main" { identifier = "${var.project_name}-${var.environment}-db" engine = "postgres" engine_version = "15.3" instance_class = var.db_instance_class
allocated_storage = var.db_allocated_storage max_allocated_storage = var.db_max_allocated_storage storage_encrypted = true
db_name = var.db_name username = var.db_username password = random_password.db_password.result
vpc_security_group_ids = [aws_security_group.db.id] db_subnet_group_name = aws_db_subnet_group.main.name
backup_retention_period = var.environment == "production" ? 30 : 7 skip_final_snapshot = var.environment != "production"
tags = { Name = "${var.project_name}-${var.environment}-db" } }
undefined

Docker Configuration

Docker配置

dockerfile
undefined
dockerfile
undefined

Dockerfile

Dockerfile

FROM python:3.11-slim as base
WORKDIR /app
FROM python:3.11-slim as base
WORKDIR /app

Install system dependencies

安装系统依赖

RUN apt-get update && apt-get install -y
gcc
libpq-dev
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y
gcc
libpq-dev
&& rm -rf /var/lib/apt/lists/*

Copy requirements

复制依赖文件

COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt
COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt

Copy application code

复制应用代码

COPY . .
COPY . .

Create non-root user

创建非root用户

RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app USER appuser
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app USER appuser

Expose port

暴露端口

EXPOSE 8000
EXPOSE 8000

Health check

健康检查

HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3
CMD python -c "import requests; requests.get('http://localhost:8000/health')"
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3
CMD python -c "import requests; requests.get('http://localhost:8000/health')"

Run application

启动应用

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Multi-stage build for smaller image

多阶段构建以缩小镜像体积

FROM base as production ENV ENVIRONMENT=production RUN pip install --no-cache-dir gunicorn CMD ["gunicorn", "main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000"]
undefined
FROM base as production ENV ENVIRONMENT=production RUN pip install --no-cache-dir gunicorn CMD ["gunicorn", "main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000"]
undefined

Docker Compose (Local Development)

Docker Compose(本地开发)

yaml
undefined
yaml
undefined

docker-compose.yml

docker-compose.yml

version: '3.8'
services: api: build: context: . dockerfile: Dockerfile target: base ports: - "8000:8000" environment: - DATABASE_URL=postgresql://user:password@db:5432/appdb - REDIS_URL=redis://redis:6379/0 - ENVIRONMENT=development volumes: - .:/app depends_on: db: condition: service_healthy redis: condition: service_started command: uvicorn main:app --host 0.0.0.0 --port 8000 --reload
db: image: postgres:15 environment: POSTGRES_USER: user POSTGRES_PASSWORD: password POSTGRES_DB: appdb ports: - "5432:5432" volumes: - postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U user"] interval: 10s timeout: 5s retries: 5
redis: image: redis:7-alpine ports: - "6379:6379" volumes: - redis_data:/data
nginx: image: nginx:alpine ports: - "80:80" volumes: - ./nginx.conf:/etc/nginx/nginx.conf:ro depends_on: - api
volumes: postgres_data: redis_data:

---
version: '3.8'
services: api: build: context: . dockerfile: Dockerfile target: base ports: - "8000:8000" environment: - DATABASE_URL=postgresql://user:password@db:5432/appdb - REDIS_URL=redis://redis:6379/0 - ENVIRONMENT=development volumes: - .:/app depends_on: db: condition: service_healthy redis: condition: service_started command: uvicorn main:app --host 0.0.0.0 --port 8000 --reload
db: image: postgres:15 environment: POSTGRES_USER: user POSTGRES_PASSWORD: password POSTGRES_DB: appdb ports: - "5432:5432" volumes: - postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U user"] interval: 10s timeout: 5s retries: 5
redis: image: redis:7-alpine ports: - "6379:6379" volumes: - redis_data:/data
nginx: image: nginx:alpine ports: - "80:80" volumes: - ./nginx.conf:/etc/nginx/nginx.conf:ro depends_on: - api
volumes: postgres_data: redis_data:

---

Deployment Strategies

部署策略

Blue-Green Deployment

蓝绿部署

bash
#!/bin/bash
bash
#!/bin/bash

deploy-blue-green.sh

deploy-blue-green.sh

set -e
BLUE_ENV="production-blue" GREEN_ENV="production-green" CURRENT_ENV=$(get_active_environment)
if [ "$CURRENT_ENV" == "$BLUE_ENV" ]; then TARGET_ENV="$GREEN_ENV" OLD_ENV="$BLUE_ENV" else TARGET_ENV="$BLUE_ENV" OLD_ENV="$GREEN_ENV" fi
echo "Deploying to $TARGET_ENV (current: $OLD_ENV)"
set -e
BLUE_ENV="production-blue" GREEN_ENV="production-green" CURRENT_ENV=$(get_active_environment)
if [ "$CURRENT_ENV" == "$BLUE_ENV" ]; then TARGET_ENV="$GREEN_ENV" OLD_ENV="$BLUE_ENV" else TARGET_ENV="$BLUE_ENV" OLD_ENV="$GREEN_ENV" fi
echo "部署到$TARGET_ENV(当前环境:$OLD_ENV)"

Deploy to target environment

部署到目标环境

deploy_to_environment "$TARGET_ENV"
deploy_to_environment "$TARGET_ENV"

Run smoke tests

运行冒烟测试

if ! run_smoke_tests "$TARGET_ENV"; then echo "Smoke tests failed, rolling back" exit 1 fi
if ! run_smoke_tests "$TARGET_ENV"; then echo "冒烟测试失败,执行回滚" exit 1 fi

Switch traffic

切换流量

switch_load_balancer "$TARGET_ENV"
switch_load_balancer "$TARGET_ENV"

Monitor for 5 minutes

监控5分钟

monitor_environment "$TARGET_ENV" 300
monitor_environment "$TARGET_ENV" 300

If all good, keep old environment for quick rollback

若一切正常,保留旧环境以便快速回滚

echo "Deployment successful. Old environment $OLD_ENV kept for rollback."
undefined
echo "部署成功。旧环境$OLD_ENV已保留用于回滚。"
undefined

Canary Deployment

金丝雀部署

yaml
undefined
yaml
undefined

k8s/canary-deployment.yaml

k8s/canary-deployment.yaml

apiVersion: v1 kind: Service metadata: name: api-service spec: selector: app: api ports:
  • port: 80 targetPort: 8000

apiVersion: v1 kind: Service metadata: name: api-service spec: selector: app: api ports:
  • port: 80 targetPort: 8000

Stable version (90% traffic)

稳定版本(90%流量)

apiVersion: apps/v1 kind: Deployment metadata: name: api-stable spec: replicas: 9 selector: matchLabels: app: api version: stable template: metadata: labels: app: api version: stable spec: containers: - name: api image: registry.example.com/api:v1.0.0

apiVersion: apps/v1 kind: Deployment metadata: name: api-stable spec: replicas: 9 selector: matchLabels: app: api version: stable template: metadata: labels: app: api version: stable spec: containers: - name: api image: registry.example.com/api:v1.0.0

Canary version (10% traffic)

金丝雀版本(10%流量)

apiVersion: apps/v1 kind: Deployment metadata: name: api-canary spec: replicas: 1 selector: matchLabels: app: api version: canary template: metadata: labels: app: api version: canary spec: containers: - name: api image: registry.example.com/api:v1.1.0

---
apiVersion: apps/v1 kind: Deployment metadata: name: api-canary spec: replicas: 1 selector: matchLabels: app: api version: canary template: metadata: labels: app: api version: canary spec: containers: - name: api image: registry.example.com/api:v1.1.0

---

Monitoring & Observability

监控与可观测性

Prometheus Configuration

Prometheus配置

yaml
undefined
yaml
undefined

prometheus.yml

prometheus.yml

global: scrape_interval: 15s evaluation_interval: 15s
scrape_configs:
  • job_name: 'api-service' kubernetes_sd_configs:
    • role: pod relabel_configs:
    • source_labels: [__meta_kubernetes_pod_label_app] action: keep regex: api
  • job_name: 'node-exporter' static_configs:
    • targets: ['node-exporter:9100']
alerting: alertmanagers: - static_configs: - targets: ['alertmanager:9093']
rule_files:
  • /etc/prometheus/alerts/*.yml
undefined
global: scrape_interval: 15s evaluation_interval: 15s
scrape_configs:
  • job_name: 'api-service' kubernetes_sd_configs:
    • role: pod relabel_configs:
    • source_labels: [__meta_kubernetes_pod_label_app] action: keep regex: api
  • job_name: 'node-exporter' static_configs:
    • targets: ['node-exporter:9100']
alerting: alertmanagers: - static_configs: - targets: ['alertmanager:9093']
rule_files:
  • /etc/prometheus/alerts/*.yml
undefined

Alert Rules

告警规则

yaml
undefined
yaml
undefined

alerts/api-alerts.yml

alerts/api-alerts.yml

groups:
  • name: api-alerts interval: 30s rules:
    • alert: HighErrorRate expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate detected" description: "{{ $labels.instance }} has error rate {{ $value }}"
    • alert: HighLatency expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 1 for: 10m labels: severity: warning annotations: summary: "High latency detected" description: "95th percentile latency is {{ $value }}s"
    • alert: PodDown expr: up{job="api-service"} == 0 for: 2m labels: severity: critical annotations: summary: "Pod is down" description: "{{ $labels.instance }} has been down for 2 minutes"

---
groups:
  • name: api-alerts interval: 30s rules:
    • alert: HighErrorRate expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05 for: 5m labels: severity: critical annotations: summary: "检测到高错误率" description: "{{ $labels.instance }}的错误率为{{ $value }}"
    • alert: HighLatency expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 1 for: 10m labels: severity: warning annotations: summary: "检测到高延迟" description: "95分位延迟为{{ $value }}秒"
    • alert: PodDown expr: up{job="api-service"} == 0 for: 2m labels: severity: critical annotations: summary: "Pod已下线" description: "{{ $labels.instance }}已下线2分钟"

---

Security Configuration

安全配置

Network Security

网络安全

hcl
undefined
hcl
undefined

security-groups.tf

security-groups.tf

ALB Security Group

ALB安全组

resource "aws_security_group" "alb" { name_prefix = "${var.project_name}-alb-" vpc_id = module.vpc.vpc_id
ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] description = "HTTPS from internet" }
egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }
resource "aws_security_group" "alb" { name_prefix = "${var.project_name}-alb-" vpc_id = module.vpc.vpc_id
ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] description = "来自互联网的HTTPS" }
egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }

Application Security Group

应用安全组

resource "aws_security_group" "app" { name_prefix = "${var.project_name}-app-" vpc_id = module.vpc.vpc_id
ingress { from_port = 8000 to_port = 8000 protocol = "tcp" security_groups = [aws_security_group.alb.id] description = "From ALB" }
egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }
resource "aws_security_group" "app" { name_prefix = "${var.project_name}-app-" vpc_id = module.vpc.vpc_id
ingress { from_port = 8000 to_port = 8000 protocol = "tcp" security_groups = [aws_security_group.alb.id] description = "来自ALB" }
egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }

Database Security Group

数据库安全组

resource "aws_security_group" "db" { name_prefix = "${var.project_name}-db-" vpc_id = module.vpc.vpc_id
ingress { from_port = 5432 to_port = 5432 protocol = "tcp" security_groups = [aws_security_group.app.id] description = "From application" } }

---
resource "aws_security_group" "db" { name_prefix = "${var.project_name}-db-" vpc_id = module.vpc.vpc_id
ingress { from_port = 5432 to_port = 5432 protocol = "tcp" security_groups = [aws_security_group.app.id] description = "来自应用" } }

---

Tool Access

工具权限

Required tools:
  • Read
    : Read SPEC documents and ADRs
  • Write
    : Generate infrastructure and pipeline files
  • Bash
    : Execute Terraform, Docker, kubectl commands
  • Grep
    : Search for configuration patterns
Required software:
  • Terraform / OpenTofu
  • Docker / Podman
  • kubectl / helm
  • aws-cli / gcloud / az-cli
  • Ansible (optional)

所需工具权限:
  • Read
    : 读取SPEC文档和ADR
  • Write
    : 生成基础设施和流水线文件
  • Bash
    : 执行Terraform、Docker、kubectl命令
  • Grep
    : 搜索配置模式
所需软件:
  • Terraform / OpenTofu
  • Docker / Podman
  • kubectl / helm
  • aws-cli / gcloud / az-cli
  • Ansible(可选)

Integration Points

集成点

With doc-flow

与doc-flow集成

  • Extract infrastructure requirements from SPEC documents
  • Validate ADR compliance in infrastructure code
  • Generate deployment documentation
  • 从SPEC文档提取基础设施需求
  • 验证基础设施代码的ADR合规性
  • 生成部署文档

With security-audit

与security-audit集成

  • Security scanning of infrastructure code
  • Vulnerability assessment of containers
  • Compliance validation
  • 基础设施代码安全扫描
  • 容器漏洞评估
  • 合规验证

With test-automation

与test-automation集成

  • Integration with CI/CD for automated testing
  • Deployment smoke tests
  • Infrastructure validation tests
  • 与CI/CD集成实现自动化测试
  • 部署冒烟测试
  • 基础设施验证测试

With analytics-flow

与analytics-flow集成

  • Deployment metrics and trends
  • Infrastructure cost tracking
  • Performance monitoring integration

  • 部署指标与趋势
  • 基础设施成本追踪
  • 性能监控集成

Best Practices

最佳实践

  1. Infrastructure as Code: All infrastructure versioned in Git
  2. Immutable infrastructure: Replace, don't modify
  3. Environment parity: Dev/staging/prod consistency
  4. Secret management: Never commit secrets
  5. Monitoring from day one: Observability built-in
  6. Automated rollbacks: Fast failure recovery
  7. Cost optimization: Tag resources, monitor spending
  8. Security by default: Least privilege, encryption
  9. Documentation: Runbooks for common operations
  10. Disaster recovery: Regular backup testing

  1. 基础设施即代码: 所有基础设施在Git中版本化
  2. 不可变基础设施: 替换而非修改
  3. 环境一致性: 开发/staging/生产环境一致
  4. 密钥管理: 绝不提交密钥
  5. 从第一天开始监控: 内置可观测性
  6. 自动化回滚: 快速故障恢复
  7. 成本优化: 资源标签、支出监控
  8. 默认安全: 最小权限、加密
  9. 文档: 常见操作手册
  10. 灾难恢复: 定期备份测试

Success Criteria

成功标准

  • Zero manual infrastructure provisioning
  • Deployment time < 15 minutes
  • Rollback time < 5 minutes
  • Zero-downtime deployments
  • Infrastructure drift detection automated
  • Security compliance 100%
  • Cost variance < 10% from budget

  • 无手动基础设施配置
  • 部署时间<15分钟
  • 回滚时间<5分钟
  • 零停机部署
  • 基础设施漂移检测自动化
  • 安全合规100%
  • 成本偏差<预算的10%

Notes

注意事项

  • Generated configurations require review before production use
  • Cloud provider credentials must be configured separately
  • State management (Terraform) requires backend configuration
  • Multi-region deployments require additional configuration
  • Cost estimation available with
    terraform plan
  • 生成的配置在生产使用前需审核
  • 云提供商凭证需单独配置
  • 状态管理(Terraform)需配置后端
  • 多区域部署需额外配置
  • 可通过
    terraform plan
    进行成本估算