devops-flow

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

devops-flow

DevOps工作流

Description: Infrastructure as Code, CI/CD pipeline automation, and deployment management

Category: DevOps & Deployment

Complexity: High (multi-cloud + orchestration + automation)

说明: 基础设施即代码、CI/CD流水线自动化及部署管理

分类: DevOps与部署

复杂度: 高（多云+编排+自动化）

Purpose

目标

Automate infrastructure provisioning, CI/CD pipelines, and deployment processes based on SPEC documents and ADR decisions. Ensures consistent, repeatable, and secure deployments across environments.

基于SPEC文档和ADR决策，自动化基础设施配置、CI/CD流水线和部署流程。确保跨环境的部署一致、可重复且安全。

Capabilities

能力

1. Infrastructure as Code (IaC)

1. 基础设施即代码（IaC）

Terraform: Cloud-agnostic infrastructure provisioning
CloudFormation: AWS-native infrastructure
Ansible: Configuration management and provisioning
Pulumi: Modern IaC with standard programming languages
Kubernetes manifests: Container orchestration

Terraform: 云无关的基础设施配置
CloudFormation: AWS原生基础设施
Ansible: 配置管理与配置
Pulumi: 使用标准编程语言的现代IaC
Kubernetes manifests: 容器编排

2. CI/CD Pipeline Generation

2. CI/CD流水线生成

GitHub Actions: Workflow automation
GitLab CI: Pipeline configuration
Jenkins: Pipeline as code
CircleCI: Cloud-native CI/CD
Azure DevOps: Microsoft ecosystem integration

GitHub Actions: 工作流自动化
GitLab CI: 流水线配置
Jenkins: 流水线即代码
CircleCI: 云原生CI/CD
Azure DevOps: Microsoft生态系统集成

3. Container Configuration

3. 容器配置

Dockerfile: Container image definition
Docker Compose: Multi-container applications
Kubernetes: Production orchestration
Helm charts: Kubernetes package management
Container registry: Image storage and versioning

Dockerfile: 容器镜像定义
Docker Compose: 多容器应用
Kubernetes: 生产环境编排
Helm charts: Kubernetes包管理
Container registry: 镜像存储与版本控制

4. Deployment Strategies

4. 部署策略

Blue-Green: Zero-downtime deployments
Canary: Gradual rollout with monitoring
Rolling: Sequential instance updates
Feature flags: Progressive feature enablement
Rollback procedures: Automated failure recovery

Blue-Green: 零停机部署
Canary: 带监控的渐进式发布
Rolling: 顺序实例更新
Feature flags: 渐进式功能启用
Rollback procedures: 自动化故障恢复

5. Environment Management

5. 环境管理

Environment separation: dev, staging, production
Configuration management: Environment-specific configs
Secret management: Vault, AWS Secrets Manager, etc.
Infrastructure versioning: State management
Cost optimization: Resource tagging and monitoring

Environment separation: 开发、 staging、生产环境隔离
Configuration management: 环境专属配置
Secret management: Vault、AWS Secrets Manager等
Infrastructure versioning: 状态管理
Cost optimization: 资源标签与监控

6. Monitoring & Observability

6. 监控与可观测性

Logging: Centralized log aggregation
Metrics: Performance and health monitoring
Alerting: Incident response automation
Tracing: Distributed request tracking
Dashboards: Real-time visualization

Logging: 集中式日志聚合
Metrics: 性能与健康监控
Alerting: 事件响应自动化
Tracing: 分布式请求追踪
Dashboards: 实时可视化

7. Security & Compliance

7. 安全与合规

Security scanning: Container and infrastructure
Compliance checks: Policy enforcement
Access control: IAM and RBAC
Network security: Firewall rules, VPC configuration
Audit logging: Change tracking

Security scanning: 容器与基础设施安全扫描
Compliance checks: 策略执行
Access control: IAM与RBAC
Network security: 防火墙规则、VPC配置
Audit logging: 变更追踪

8. Disaster Recovery

8. 灾难恢复

Backup automation: Data and configuration backups
Recovery procedures: Automated restoration
Failover: Multi-region redundancy
Data replication: Cross-region sync
RTO/RPO: Recovery objectives implementation

Backup automation: 数据与配置备份自动化
Recovery procedures: 自动化恢复
Failover: 多区域冗余
Data replication: 跨区域同步
RTO/RPO: 恢复目标落地

DevOps Workflow

DevOps工作流

mermaid

graph TD
    A[SPEC Document] --> B[Extract Requirements]
    B --> C{Infrastructure Needed?}
    C -->|Yes| D[Generate IaC Templates]
    C -->|No| E[Generate CI/CD Pipeline]

    D --> F[Terraform/CloudFormation]
    F --> G[Validate Infrastructure Code]
    G --> H{Validation Pass?}
    H -->|No| I[Report Issues]
    H -->|Yes| J[Generate Deployment Pipeline]

    E --> J
    J --> K[CI/CD Configuration]
    K --> L[Add Build Stage]
    L --> M[Add Test Stage]
    M --> N[Add Security Scan]
    N --> O[Add Deploy Stage]

    O --> P[Environment Strategy]
    P --> Q{Deployment Type}
    Q -->|Blue-Green| R[Generate Blue-Green Config]
    Q -->|Canary| S[Generate Canary Config]
    Q -->|Rolling| T[Generate Rolling Config]

    R --> U[Add Monitoring]
    S --> U
    T --> U

    U --> V[Add Rollback Procedure]
    V --> W[Generate Documentation]
    W --> X[Review & Deploy]

    I --> X

mermaid

graph TD
    A[SPEC Document] --> B[Extract Requirements]
    B --> C{Infrastructure Needed?}
    C -->|Yes| D[Generate IaC Templates]
    C -->|No| E[Generate CI/CD Pipeline]

    D --> F[Terraform/CloudFormation]
    F --> G[Validate Infrastructure Code]
    G --> H{Validation Pass?}
    H -->|No| I[Report Issues]
    H -->|Yes| J[Generate Deployment Pipeline]

    E --> J
    J --> K[CI/CD Configuration]
    K --> L[Add Build Stage]
    L --> M[Add Test Stage]
    M --> N[Add Security Scan]
    N --> O[Add Deploy Stage]

    O --> P[Environment Strategy]
    P --> Q{Deployment Type}
    Q -->|Blue-Green| R[Generate Blue-Green Config]
    Q -->|Canary| S[Generate Canary Config]
    Q -->|Rolling| T[Generate Rolling Config]

    R --> U[Add Monitoring]
    S --> U
    T --> U

    U --> V[Add Rollback Procedure]
    V --> W[Generate Documentation]
    W --> X[Review & Deploy]

    I --> X

Usage Instructions

使用说明

Generate Infrastructure from SPEC

从SPEC生成基础设施

bash

devops-flow generate-infra \
  --spec specs/SPEC-API-V1.md \
  --cloud aws \
  --output infrastructure/

Generated Terraform structure:

infrastructure/
├── main.tf              # Main configuration
├── variables.tf         # Input variables
├── outputs.tf           # Output values
├── providers.tf         # Cloud provider config
├── modules/
│   ├── vpc/            # Network infrastructure
│   ├── compute/        # EC2, Lambda, etc.
│   ├── database/       # RDS, DynamoDB
│   └── storage/        # S3, EBS
└── environments/
    ├── dev.tfvars      # Development config
    ├── staging.tfvars  # Staging config
    └── prod.tfvars     # Production config

bash

devops-flow generate-infra \
  --spec specs/SPEC-API-V1.md \
  --cloud aws \
  --output infrastructure/

生成的Terraform结构:

infrastructure/
├── main.tf              # 主配置
├── variables.tf         # 输入变量
├── outputs.tf           # 输出值
├── providers.tf         # 云提供商配置
├── modules/
│   ├── vpc/            # 网络基础设施
│   ├── compute/        # EC2、Lambda等
│   ├── database/       # RDS、DynamoDB
│   └── storage/        # S3、EBS
└── environments/
    ├── dev.tfvars      # 开发环境配置
    ├── staging.tfvars  # Staging环境配置
    └── prod.tfvars     # 生产环境配置

Generate CI/CD Pipeline

生成CI/CD流水线

bash

devops-flow generate-pipeline \
  --type github-actions \
  --language python \
  --deploy-strategy blue-green \
  --output .github/workflows/

Generated GitHub Actions workflow:

yaml

name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  PYTHON_VERSION: '3.11'
  AWS_REGION: us-east-1

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: ${{ env.PYTHON_VERSION }}
      - name: Install dependencies
        run: |
          pip install ruff mypy
      - name: Run linters
        run: |
          ruff check .
          mypy .

  test:
    needs: lint
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run tests
        run: |
          pytest --cov=. --cov-report=xml
      - name: Upload coverage
        uses: codecov/codecov-action@v3

  security:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Security scan
        run: |
          bandit -r . -f json -o security-report.json
      - name: Upload security report
        uses: actions/upload-artifact@v3
        with:
          name: security-report
          path: security-report.json

  build:
    needs: [lint, test, security]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build Docker image
        run: |
          docker build -t app:${{ github.sha }} .
      - name: Push to registry
        run: |
          docker push registry.example.com/app:${{ github.sha }}

  deploy-staging:
    needs: build
    if: github.ref == 'refs/heads/develop'
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - name: Deploy to staging
        run: |
          aws ecs update-service \
            --cluster staging-cluster \
            --service app-service \
            --force-new-deployment

  deploy-production:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Deploy blue-green
        run: |
          # Deploy to green environment
          ./scripts/deploy-green.sh
          # Run smoke tests
          ./scripts/smoke-tests.sh
          # Switch traffic to green
          ./scripts/switch-traffic.sh
          # Keep blue for rollback

bash

devops-flow generate-pipeline \
  --type github-actions \
  --language python \
  --deploy-strategy blue-green \
  --output .github/workflows/

生成的GitHub Actions工作流:

yaml

name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  PYTHON_VERSION: '3.11'
  AWS_REGION: us-east-1

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: ${{ env.PYTHON_VERSION }}
      - name: Install dependencies
        run: |
          pip install ruff mypy
      - name: Run linters
        run: |
          ruff check .
          mypy .

  test:
    needs: lint
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run tests
        run: |
          pytest --cov=. --cov-report=xml
      - name: Upload coverage
        uses: codecov/codecov-action@v3

  security:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Security scan
        run: |
          bandit -r . -f json -o security-report.json
      - name: Upload security report
        uses: actions/upload-artifact@v3
        with:
          name: security-report
          path: security-report.json

  build:
    needs: [lint, test, security]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build Docker image
        run: |
          docker build -t app:${{ github.sha }} .
      - name: Push to registry
        run: |
          docker push registry.example.com/app:${{ github.sha }}

  deploy-staging:
    needs: build
    if: github.ref == 'refs/heads/develop'
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - name: Deploy to staging
        run: |
          aws ecs update-service \
            --cluster staging-cluster \
            --service app-service \
            --force-new-deployment

  deploy-production:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Deploy blue-green
        run: |
          # 部署到绿色环境
          ./scripts/deploy-green.sh
          # 运行冒烟测试
          ./scripts/smoke-tests.sh
          # 切换流量到绿色环境
          ./scripts/switch-traffic.sh
          # 保留蓝色环境用于回滚

Generate Kubernetes Configuration

生成Kubernetes配置

bash

devops-flow generate-k8s \
  --spec specs/SPEC-API-V1.md \
  --replicas 3 \
  --output k8s/

Generated Kubernetes manifests:

yaml

undefined

bash

devops-flow generate-k8s \
  --spec specs/SPEC-API-V1.md \
  --replicas 3 \
  --output k8s/

生成的Kubernetes清单:

yaml

undefined

k8s/deployment.yaml

apiVersion: apps/v1 kind: Deployment metadata: name: api-service labels: app: api version: v1 spec: replicas: 3 selector: matchLabels: app: api template: metadata: labels: app: api version: v1 spec: containers: - name: api image: registry.example.com/api:latest ports: - containerPort: 8000 env: - name: DATABASE_URL valueFrom: secretKeyRef: name: api-secrets key: database-url resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8000 initialDelaySeconds: 5 periodSeconds: 5

k8s/service.yaml

apiVersion: v1 kind: Service metadata: name: api-service spec: selector: app: api ports:

protocol: TCP port: 80 targetPort: 8000 type: LoadBalancer

apiVersion: v1 kind: Service metadata: name: api-service spec: selector: app: api ports:

protocol: TCP port: 80 targetPort: 8000 type: LoadBalancer

k8s/hpa.yaml

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: api-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-service minReplicas: 3 maxReplicas: 10 metrics:

type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70

---

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: api-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-service minReplicas: 3 maxReplicas: 10 metrics:

type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70

---

Generate Deployment Scripts from REQ

从REQ生成部署脚本

bash

devops-flow generate-deployment-scripts \
  --req docs/07_REQ/REQ-NN.md \
  --spec docs/09_SPEC/SPEC-NN.yaml \
  --output scripts/

Generated shell scripts structure:

scripts/
├── setup.sh              # Initial environment setup
├── install.sh            # Application installation
├── deploy.sh             # Main deployment orchestration
├── rollback.sh           # Rollback to previous version
├── health-check.sh       # Health verification
└── cleanup.sh            # Cleanup old versions

Script Generation Logic:

Parse REQ Section 9.5.3 for script requirements
Parse SPEC deployment section for technical details
Apply script standards (Bash 4.0+, error handling, logging)
Reference cloud provider from REQ @adr tags
Use environment-specific configurations from REQ 9.5.2

Example generated script (setup.sh):

bash

#!/bin/bash
set -euo pipefail

bash

devops-flow generate-deployment-scripts \
  --req docs/07_REQ/REQ-NN.md \
  --spec docs/09_SPEC/SPEC-NN.yaml \
  --output scripts/

生成的Shell脚本结构:

scripts/
├── setup.sh              # 初始环境搭建
├── install.sh            # 应用安装
├── deploy.sh             # 主部署编排
├── rollback.sh           # 回滚到上一版本
├── health-check.sh       # 健康验证
└── cleanup.sh            # 清理旧版本

脚本生成逻辑:

解析REQ第9.5.3节获取脚本需求
解析SPEC部署节获取技术细节
应用脚本标准（Bash 4.0+、错误处理、日志）
从REQ @adr标签引用云提供商
使用REQ 9.5.2中的环境专属配置

生成的脚本示例 (setup.sh):

bash

#!/bin/bash
set -euo pipefail

Setup environment for deployment

部署环境搭建

LOG_FILE="logs/deployment_$(date +%Y%m%d_%H%M%S).log" mkdir -p logs

log() { echo "[$(date +%Y-%m-%d %H:%M:%S)] $*" | tee -a "$LOG_FILE" }

log "Starting environment setup..."

LOG_FILE="logs/deployment_$(date +%Y%m%d_%H%M%S).log" mkdir -p logs

log() { echo "[$(date +%Y-%m-%d %H:%M:%S)] $*" | tee -a "$LOG_FILE" }

log "开始环境搭建..."

Install dependencies

安装依赖

if [ ! -f .tool-versions ]; then log "Installing Python dependencies..." pip install -r requirements.txt fi

if [ ! -f .tool-versions ]; then log "安装Python依赖..." pip install -r requirements.txt fi

Configure environment variables

配置环境变量

if [ -f .env.deployment ]; then log "Loading deployment environment variables..." export $(cat .env.deployment | grep -v '^#' | xargs) fi

log "Environment setup complete" exit 0


**Example generated script** (deploy.sh):
```bash
#!/bin/bash
set -euo pipefail

if [ -f .env.deployment ]; then log "加载部署环境变量..." export $(cat .env.deployment | grep -v '^#' | xargs) fi

log "环境搭建完成" exit 0


**生成的脚本示例** (deploy.sh):
```bash
#!/bin/bash
set -euo pipefail

Main deployment orchestration script

主部署编排脚本

LOG_FILE="logs/deployment_$(date +%Y%m%d_%H%M%S).log" ENVIRONMENT="${1:-staging}"

log() { echo "[$(date +%Y-%m-%d %H:%M:%S)] $*" | tee -a "$LOG_FILE" }

LOG_FILE="logs/deployment_$(date +%Y%m%d_%H%M%S).log" ENVIRONMENT="${1:-staging}"

log() { echo "[$(date +%Y-%m-%d %H:%M:%S)] $*" | tee -a "$LOG_FILE" }

Step 1: Setup

步骤1：搭建环境

log "Running setup..." ./scripts/setup.sh

log "执行环境搭建..." ./scripts/setup.sh

Step 2: Install

步骤2：安装应用

log "Installing application..." ./scripts/install.sh --env "$ENVIRONMENT"

log "安装应用..." ./scripts/install.sh --env "$ENVIRONMENT"

Step 3: Deploy

步骤3：部署应用

log "Deploying application..." if [ "$ENVIRONMENT" = "production" ]; then ./scripts/deploy-prod.sh else ./scripts/deploy-staging.sh fi

log "部署应用..." if [ "$ENVIRONMENT" = "production" ]; then ./scripts/deploy-prod.sh else ./scripts/deploy-staging.sh fi

Step 4: Health check

步骤4：健康检查

log "Running health check..." ./scripts/health-check.sh --env "$ENVIRONMENT"

if [ $? -eq 0 ]; then log "Deployment successful" else log "Deployment failed, initiating rollback..." ./scripts/rollback.sh --env "$ENVIRONMENT" exit 1 fi


**Example generated script** (health-check.sh):
```bash
#!/bin/bash
set -euo pipefail

log "执行健康检查..." ./scripts/health-check.sh --env "$ENVIRONMENT"

if [ $? -eq 0 ]; then log "部署成功" else log "部署失败，启动回滚..." ./scripts/rollback.sh --env "$ENVIRONMENT" exit 1 fi


**生成的脚本示例** (health-check.sh):
```bash
#!/bin/bash
set -euo pipefail

Health verification script

健康验证脚本

HEALTH_URL="${1:-http://localhost:8000/health/live}" TIMEOUT=60 RETRIES=3

log() { echo "[$(date +%Y-%m-%d %H:%M:%S)] $*" }

log "Starting health check..."

for i in $(seq 1 $RETRIES); do log "Attempt $i of $RETRIES..." RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" --max-time $TIMEOUT "$HEALTH_URL")

if [ "$RESPONSE" = "200" ]; then log "Health check passed" exit 0 fi

log "Health check failed, sleeping before retry..." sleep 5 done

log "Health check failed after $RETRIES attempts" exit 1

undefined

HEALTH_URL="${1:-http://localhost:8000/health/live}" TIMEOUT=60 RETRIES=3

log() { echo "[$(date +%Y-%m-%d %H:%M:%S)] $*" }

log "开始健康检查..."

for i in $(seq 1 $RETRIES); do log "第$i次尝试，共$RETRIES次..." RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" --max-time $TIMEOUT "$HEALTH_URL")

if [ "$RESPONSE" = "200" ]; then log "健康检查通过" exit 0 fi

log "健康检查失败，重试前等待..." sleep 5 done

log "$RETRIES次尝试后健康检查仍失败" exit 1

undefined

Generate Ansible Playbooks from REQ

从REQ生成Ansible Playbook

bash

devops-flow generate-ansible-playbooks \
  --req docs/07_REQ/REQ-NN.md \
  --spec docs/09_SPEC/SPEC-NN.yaml \
  --output ansible/

Generated Ansible playbooks structure:

ansible/
├── provision_infra.yml         # Infrastructure provisioning
├── configure_instances.yml      # Instance configuration
├── deploy_app.yml              # Application deployment
├── configure_monitoring.yml     # Monitoring setup
├── configure_security.yml       # Security hardening
└── backup_restore.yml          # Backup/restore procedures

Playbook Generation Logic:

Parse REQ Section 9.5.4 for playbook requirements
Parse Section 9.5.1 for infrastructure configuration
Apply Ansible standards (2.9+, modular roles, idempotency)
Reference cloud provider from REQ @adr tags
Use environment-specific variables from REQ 9.5.2

Example generated playbook (provision_infra.yml):

yaml

---
- name: Provision Infrastructure
  hosts: localhost
  gather_facts: no
  vars_files:
    - "environments/{{ target_env }}.yml"

  tasks:
    - name: Create VPC
      ec2_vpc_net:
        name: "{{ vpc_name }}"
        cidr_block: "{{ vpc_cidr }}"
        region: "{{ aws_region }}"
        tags:
          Project: "{{ project_name }}"
          Environment: "{{ target_env }}"
          ManagedBy: "Ansible"

    - name: Create security groups
      ec2_security_group:
        name: "{{ security_group_name }}"
        description: "Security group for {{ application_name }}"
        vpc_id: "{{ vpc.vpc_id }}"
        rules:
          - proto: tcp
            from_port: 80
            to_port: 80
            cidr_ip: 0.0.0.0/0
          - proto: tcp
            from_port: 443
            to_port: 443
            cidr_ip: 0.0.0.0/0
        region: "{{ aws_region }}"
        tags:
          Project: "{{ project_name }}"
          Environment: "{{ target_env }}"

    - name: Create RDS instance
      rds:
        db_name: "{{ db_name }}"
        engine: postgres
        engine_version: "{{ db_version }}"
        instance_type: "{{ db_instance_class }}"
        allocated_storage: "{{ db_storage_gb }}"
        username: "{{ db_username }}"
        password: "{{ db_password }}"
        vpc_security_group_ids:
          - "{{ security_group.group_id }}"
        subnet_group_name: "{{ db_subnet_group }}"
        backup_retention_period: "{{ backup_retention_days }}"
        multi_az: true
        region: "{{ aws_region }}"
        tags:
          Project: "{{ project_name }}"
          Environment: "{{ target_env }}"
          ManagedBy: "Ansible"

Example generated playbook (deploy_app.yml):

yaml

---
- name: Deploy Application
  hosts: app_servers
  gather_facts: yes
  become: yes
  vars_files:
    - "environments/{{ target_env }}.yml"

  tasks:
    - name: Ensure application directory exists
      file:
        path: "{{ app_directory }}"
        state: directory
        mode: '0755'
        owner: "{{ app_user }}"
        group: "{{ app_group }}"

    - name: Copy application code
      synchronize:
        src: "{{ app_source_directory }}/"
        dest: "{{ app_directory }}/"
        delete: yes
        recursive: yes

    - name: Install Python dependencies
      pip:
        requirements: "{{ app_directory }}/requirements.txt"
        virtualenv: "{{ app_venv }}"
        state: present

    - name: Configure application
      template:
        src: "templates/{{ target_env }}_config.yml"
        dest: "{{ app_directory }}/config.yml"
        owner: "{{ app_user }}"
        group: "{{ app_group }}"
        mode: '0640'

    - name: Restart application service
      systemd:
        name: "{{ app_service_name }}"
        state: restarted
        daemon_reload: yes
      notify: Run Health Check

    - name: Wait for application to be ready
      wait_for:
        port: 8000
        host: "{{ inventory_hostname }}"
        timeout: 300

  handlers:
    - name: Run Health Check
      uri:
        url: "http://localhost:8000/health/ready"
        method: GET
        status_code: 200
      register: health_check

bash

devops-flow generate-ansible-playbooks \
  --req docs/07_REQ/REQ-NN.md \
  --spec docs/09_SPEC/SPEC-NN.yaml \
  --output ansible/

生成的Ansible Playbook结构:

ansible/
├── provision_infra.yml         # 基础设施配置
├── configure_instances.yml      # 实例配置
├── deploy_app.yml              # 应用部署
├── configure_monitoring.yml     # 监控搭建
├── configure_security.yml       # 安全加固
└── backup_restore.yml          # 备份/恢复流程

Playbook生成逻辑:

解析REQ第9.5.4节获取Playbook需求
解析第9.5.1节获取基础设施配置
应用Ansible标准（2.9+、模块化角色、幂等性）
从REQ @adr标签引用云提供商
使用REQ 9.5.2中的环境专属变量

生成的Playbook示例 (provision_infra.yml):

yaml

---
- name: Provision Infrastructure
  hosts: localhost
  gather_facts: no
  vars_files:
    - "environments/{{ target_env }}.yml"

  tasks:
    - name: Create VPC
      ec2_vpc_net:
        name: "{{ vpc_name }}"
        cidr_block: "{{ vpc_cidr }}"
        region: "{{ aws_region }}"
        tags:
          Project: "{{ project_name }}"
          Environment: "{{ target_env }}"
          ManagedBy: "Ansible"

    - name: Create security groups
      ec2_security_group:
        name: "{{ security_group_name }}"
        description: "Security group for {{ application_name }}"
        vpc_id: "{{ vpc.vpc_id }}"
        rules:
          - proto: tcp
            from_port: 80
            to_port: 80
            cidr_ip: 0.0.0.0/0
          - proto: tcp
            from_port: 443
            to_port: 443
            cidr_ip: 0.0.0.0/0
        region: "{{ aws_region }}"
        tags:
          Project: "{{ project_name }}"
          Environment: "{{ target_env }}"

    - name: Create RDS instance
      rds:
        db_name: "{{ db_name }}"
        engine: postgres
        engine_version: "{{ db_version }}"
        instance_type: "{{ db_instance_class }}"
        allocated_storage: "{{ db_storage_gb }}"
        username: "{{ db_username }}"
        password: "{{ db_password }}"
        vpc_security_group_ids:
          - "{{ security_group.group_id }}"
        subnet_group_name: "{{ db_subnet_group }}"
        backup_retention_period: "{{ backup_retention_days }}"
        multi_az: true
        region: "{{ aws_region }}"
        tags:
          Project: "{{ project_name }}"
          Environment: "{{ target_env }}"
          ManagedBy: "Ansible"

生成的Playbook示例 (deploy_app.yml):

yaml

---
- name: Deploy Application
  hosts: app_servers
  gather_facts: yes
  become: yes
  vars_files:
    - "environments/{{ target_env }}.yml"

  tasks:
    - name: Ensure application directory exists
      file:
        path: "{{ app_directory }}"
        state: directory
        mode: '0755'
        owner: "{{ app_user }}"
        group: "{{ app_group }}"

    - name: Copy application code
      synchronize:
        src: "{{ app_source_directory }}/"
        dest: "{{ app_directory }}/"
        delete: yes
        recursive: yes

    - name: Install Python dependencies
      pip:
        requirements: "{{ app_directory }}/requirements.txt"
        virtualenv: "{{ app_venv }}"
        state: present

    - name: Configure application
      template:
        src: "templates/{{ target_env }}_config.yml"
        dest: "{{ app_directory }}/config.yml"
        owner: "{{ app_user }}"
        group: "{{ app_group }}"
        mode: '0640'

    - name: Restart application service
      systemd:
        name: "{{ app_service_name }}"
        state: restarted
        daemon_reload: yes
      notify: Run Health Check

    - name: Wait for application to be ready
      wait_for:
        port: 8000
        host: "{{ inventory_hostname }}"
        timeout: 300

  handlers:
    - name: Run Health Check
      uri:
        url: "http://localhost:8000/health/ready"
        method: GET
        status_code: 200
      register: health_check

Infrastructure Templates

基础设施模板

AWS Infrastructure (Terraform)

AWS基础设施（Terraform）

hcl

undefined

hcl

undefined

main.tf

terraform { required_version = ">= 1.0" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } backend "s3" { bucket = "terraform-state-bucket" key = "infrastructure/terraform.tfstate" region = "us-east-1" } }

provider "aws" { region = var.aws_region default_tags { tags = { Project = var.project_name Environment = var.environment ManagedBy = "Terraform" } } }

VPC Module

VPC模块

module "vpc" { source = "./modules/vpc"

vpc_cidr = var.vpc_cidr availability_zones = var.availability_zones public_subnet_cidrs = var.public_subnet_cidrs private_subnet_cidrs = var.private_subnet_cidrs }

module "vpc" { source = "./modules/vpc"

vpc_cidr = var.vpc_cidr availability_zones = var.availability_zones public_subnet_cidrs = var.public_subnet_cidrs private_subnet_cidrs = var.private_subnet_cidrs }

ECS Cluster

ECS集群

resource "aws_ecs_cluster" "main" { name = "${var.project_name}-${var.environment}-cluster"

setting { name = "containerInsights" value = "enabled" } }

resource "aws_ecs_cluster" "main" { name = "${var.project_name}-${var.environment}-cluster"

setting { name = "containerInsights" value = "enabled" } }

Application Load Balancer

应用负载均衡器

resource "aws_lb" "main" { name = "${var.project_name}-${var.environment}-alb" internal = false load_balancer_type = "application" security_groups = [aws_security_group.alb.id] subnets = module.vpc.public_subnet_ids

enable_deletion_protection = var.environment == "production" }

RDS Database

RDS数据库

resource "aws_db_instance" "main" { identifier = "${var.project_name}-${var.environment}-db" engine = "postgres" engine_version = "15.3" instance_class = var.db_instance_class

allocated_storage = var.db_allocated_storage max_allocated_storage = var.db_max_allocated_storage storage_encrypted = true

db_name = var.db_name username = var.db_username password = random_password.db_password.result

vpc_security_group_ids = [aws_security_group.db.id] db_subnet_group_name = aws_db_subnet_group.main.name

backup_retention_period = var.environment == "production" ? 30 : 7 skip_final_snapshot = var.environment != "production"

tags = { Name = "${var.project_name}-${var.environment}-db" } }

undefined

resource "aws_db_instance" "main" { identifier = "${var.project_name}-${var.environment}-db" engine = "postgres" engine_version = "15.3" instance_class = var.db_instance_class

allocated_storage = var.db_allocated_storage max_allocated_storage = var.db_max_allocated_storage storage_encrypted = true

db_name = var.db_name username = var.db_username password = random_password.db_password.result

vpc_security_group_ids = [aws_security_group.db.id] db_subnet_group_name = aws_db_subnet_group.main.name

backup_retention_period = var.environment == "production" ? 30 : 7 skip_final_snapshot = var.environment != "production"

tags = { Name = "${var.project_name}-${var.environment}-db" } }

undefined

Docker Configuration

Docker配置

dockerfile

undefined

dockerfile

undefined

Dockerfile

FROM python:3.11-slim as base

WORKDIR /app

FROM python:3.11-slim as base

WORKDIR /app

Install system dependencies

安装系统依赖

RUN apt-get update && apt-get install -y
gcc
libpq-dev
&& rm -rf /var/lib/apt/lists/*

Copy requirements

复制依赖文件

COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt

Copy application code

复制应用代码

COPY . .

Create non-root user

创建非root用户

RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app USER appuser

Expose port

暴露端口

EXPOSE 8000

Health check

健康检查

HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3
CMD python -c "import requests; requests.get('http://localhost:8000/health')"

Run application

启动应用

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Multi-stage build for smaller image

多阶段构建以缩小镜像体积

FROM base as production ENV ENVIRONMENT=production RUN pip install --no-cache-dir gunicorn CMD ["gunicorn", "main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000"]

undefined

FROM base as production ENV ENVIRONMENT=production RUN pip install --no-cache-dir gunicorn CMD ["gunicorn", "main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000"]

undefined

Docker Compose (Local Development)

Docker Compose（本地开发）

yaml

undefined

yaml

undefined

docker-compose.yml

version: '3.8'

services: api: build: context: . dockerfile: Dockerfile target: base ports: - "8000:8000" environment: - DATABASE_URL=postgresql://user:password@db:5432/appdb - REDIS_URL=redis://redis:6379/0 - ENVIRONMENT=development volumes: - .:/app depends_on: db: condition: service_healthy redis: condition: service_started command: uvicorn main:app --host 0.0.0.0 --port 8000 --reload

db: image: postgres:15 environment: POSTGRES_USER: user POSTGRES_PASSWORD: password POSTGRES_DB: appdb ports: - "5432:5432" volumes: - postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U user"] interval: 10s timeout: 5s retries: 5

redis: image: redis:7-alpine ports: - "6379:6379" volumes: - redis_data:/data

nginx: image: nginx:alpine ports: - "80:80" volumes: - ./nginx.conf:/etc/nginx/nginx.conf:ro depends_on: - api

volumes: postgres_data: redis_data:

---

version: '3.8'

redis: image: redis:7-alpine ports: - "6379:6379" volumes: - redis_data:/data

nginx: image: nginx:alpine ports: - "80:80" volumes: - ./nginx.conf:/etc/nginx/nginx.conf:ro depends_on: - api

volumes: postgres_data: redis_data:

---

Deployment Strategies

部署策略

Blue-Green Deployment

蓝绿部署

bash

#!/bin/bash

bash

#!/bin/bash

deploy-blue-green.sh

set -e

BLUE_ENV="production-blue" GREEN_ENV="production-green" CURRENT_ENV=$(get_active_environment)

if [ "$CURRENT_ENV" == "$BLUE_ENV" ]; then TARGET_ENV="$GREEN_ENV" OLD_ENV="$BLUE_ENV" else TARGET_ENV="$BLUE_ENV" OLD_ENV="$GREEN_ENV" fi

echo "Deploying to $TARGET_ENV (current: $OLD_ENV)"

set -e

BLUE_ENV="production-blue" GREEN_ENV="production-green" CURRENT_ENV=$(get_active_environment)

if [ "$CURRENT_ENV" == "$BLUE_ENV" ]; then TARGET_ENV="$GREEN_ENV" OLD_ENV="$BLUE_ENV" else TARGET_ENV="$BLUE_ENV" OLD_ENV="$GREEN_ENV" fi

echo "部署到$TARGET_ENV（当前环境：$OLD_ENV）"

Deploy to target environment

部署到目标环境

deploy_to_environment "$TARGET_ENV"

Run smoke tests

运行冒烟测试

if ! run_smoke_tests "$TARGET_ENV"; then echo "Smoke tests failed, rolling back" exit 1 fi

if ! run_smoke_tests "$TARGET_ENV"; then echo "冒烟测试失败，执行回滚" exit 1 fi

Switch traffic

切换流量

switch_load_balancer "$TARGET_ENV"

Monitor for 5 minutes

监控5分钟

monitor_environment "$TARGET_ENV" 300

If all good, keep old environment for quick rollback

若一切正常，保留旧环境以便快速回滚

echo "Deployment successful. Old environment $OLD_ENV kept for rollback."

undefined

echo "部署成功。旧环境$OLD_ENV已保留用于回滚。"

undefined

Canary Deployment

金丝雀部署

yaml

undefined

yaml

undefined

k8s/canary-deployment.yaml

apiVersion: v1 kind: Service metadata: name: api-service spec: selector: app: api ports:

port: 80 targetPort: 8000

apiVersion: v1 kind: Service metadata: name: api-service spec: selector: app: api ports:

port: 80 targetPort: 8000

Stable version (90% traffic)

稳定版本（90%流量）

apiVersion: apps/v1 kind: Deployment metadata: name: api-stable spec: replicas: 9 selector: matchLabels: app: api version: stable template: metadata: labels: app: api version: stable spec: containers: - name: api image: registry.example.com/api:v1.0.0

Canary version (10% traffic)

金丝雀版本（10%流量）

apiVersion: apps/v1 kind: Deployment metadata: name: api-canary spec: replicas: 1 selector: matchLabels: app: api version: canary template: metadata: labels: app: api version: canary spec: containers: - name: api image: registry.example.com/api:v1.1.0

---

---

Monitoring & Observability

监控与可观测性

Prometheus Configuration

Prometheus配置

yaml

undefined

yaml

undefined

prometheus.yml

global: scrape_interval: 15s evaluation_interval: 15s

scrape_configs:

job_name: 'api-service' kubernetes_sd_configs:
- role: pod relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app] action: keep regex: api
job_name: 'node-exporter' static_configs:
- targets: ['node-exporter:9100']

alerting: alertmanagers: - static_configs: - targets: ['alertmanager:9093']

rule_files:

/etc/prometheus/alerts/*.yml

undefined

global: scrape_interval: 15s evaluation_interval: 15s

scrape_configs:

job_name: 'api-service' kubernetes_sd_configs:
- role: pod relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app] action: keep regex: api
job_name: 'node-exporter' static_configs:
- targets: ['node-exporter:9100']

alerting: alertmanagers: - static_configs: - targets: ['alertmanager:9093']

rule_files:

/etc/prometheus/alerts/*.yml

undefined

Alert Rules

告警规则

yaml

undefined

yaml

undefined

alerts/api-alerts.yml

groups:

name: api-alerts interval: 30s rules:
- alert: HighErrorRate expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate detected" description: "{{ $labels.instance }} has error rate {{ $value }}"
- alert: HighLatency expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 1 for: 10m labels: severity: warning annotations: summary: "High latency detected" description: "95th percentile latency is {{ $value }}s"
- alert: PodDown expr: up{job="api-service"} == 0 for: 2m labels: severity: critical annotations: summary: "Pod is down" description: "{{ $labels.instance }} has been down for 2 minutes"

---

groups:

name: api-alerts interval: 30s rules:
- alert: HighErrorRate expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05 for: 5m labels: severity: critical annotations: summary: "检测到高错误率" description: "{{ $labels.instance }}的错误率为{{ $value }}"
- alert: HighLatency expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 1 for: 10m labels: severity: warning annotations: summary: "检测到高延迟" description: "95分位延迟为{{ $value }}秒"
- alert: PodDown expr: up{job="api-service"} == 0 for: 2m labels: severity: critical annotations: summary: "Pod已下线" description: "{{ $labels.instance }}已下线2分钟"

---

Security Configuration

安全配置

Network Security

网络安全

hcl

undefined

hcl

undefined

security-groups.tf

ALB Security Group

ALB安全组

resource "aws_security_group" "alb" { name_prefix = "${var.project_name}-alb-" vpc_id = module.vpc.vpc_id

ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] description = "HTTPS from internet" }

egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }

resource "aws_security_group" "alb" { name_prefix = "${var.project_name}-alb-" vpc_id = module.vpc.vpc_id

ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] description = "来自互联网的HTTPS" }

egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }

Application Security Group

应用安全组

resource "aws_security_group" "app" { name_prefix = "${var.project_name}-app-" vpc_id = module.vpc.vpc_id

ingress { from_port = 8000 to_port = 8000 protocol = "tcp" security_groups = [aws_security_group.alb.id] description = "From ALB" }

egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }

resource "aws_security_group" "app" { name_prefix = "${var.project_name}-app-" vpc_id = module.vpc.vpc_id

ingress { from_port = 8000 to_port = 8000 protocol = "tcp" security_groups = [aws_security_group.alb.id] description = "来自ALB" }

egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }

Database Security Group

数据库安全组

resource "aws_security_group" "db" { name_prefix = "${var.project_name}-db-" vpc_id = module.vpc.vpc_id

ingress { from_port = 5432 to_port = 5432 protocol = "tcp" security_groups = [aws_security_group.app.id] description = "From application" } }

---

resource "aws_security_group" "db" { name_prefix = "${var.project_name}-db-" vpc_id = module.vpc.vpc_id

ingress { from_port = 5432 to_port = 5432 protocol = "tcp" security_groups = [aws_security_group.app.id] description = "来自应用" } }

---

Tool Access

工具权限

Required tools:

```
Read
```
: Read SPEC documents and ADRs
```
Write
```
: Generate infrastructure and pipeline files
```
Bash
```
: Execute Terraform, Docker, kubectl commands
```
Grep
```
: Search for configuration patterns

Required software:

Terraform / OpenTofu
Docker / Podman
kubectl / helm
aws-cli / gcloud / az-cli
Ansible (optional)

所需工具权限:

```
Read
```
: 读取SPEC文档和ADR
```
Write
```
: 生成基础设施和流水线文件
```
Bash
```
: 执行Terraform、Docker、kubectl命令
```
Grep
```
: 搜索配置模式

所需软件:

Terraform / OpenTofu
Docker / Podman
kubectl / helm
aws-cli / gcloud / az-cli
Ansible（可选）

Integration Points

集成点

With doc-flow

与doc-flow集成

Extract infrastructure requirements from SPEC documents
Validate ADR compliance in infrastructure code
Generate deployment documentation

从SPEC文档提取基础设施需求
验证基础设施代码的ADR合规性
生成部署文档

With security-audit

与security-audit集成

Security scanning of infrastructure code
Vulnerability assessment of containers
Compliance validation

基础设施代码安全扫描
容器漏洞评估
合规验证

With test-automation

与test-automation集成

Integration with CI/CD for automated testing
Deployment smoke tests
Infrastructure validation tests

与CI/CD集成实现自动化测试
部署冒烟测试
基础设施验证测试

With analytics-flow

与analytics-flow集成

Deployment metrics and trends
Infrastructure cost tracking
Performance monitoring integration

部署指标与趋势
基础设施成本追踪
性能监控集成

Best Practices

最佳实践

Infrastructure as Code: All infrastructure versioned in Git
Immutable infrastructure: Replace, don't modify
Environment parity: Dev/staging/prod consistency
Secret management: Never commit secrets
Monitoring from day one: Observability built-in
Automated rollbacks: Fast failure recovery
Cost optimization: Tag resources, monitor spending
Security by default: Least privilege, encryption
Documentation: Runbooks for common operations
Disaster recovery: Regular backup testing

基础设施即代码: 所有基础设施在Git中版本化
不可变基础设施: 替换而非修改
环境一致性: 开发/staging/生产环境一致
密钥管理: 绝不提交密钥
从第一天开始监控: 内置可观测性
自动化回滚: 快速故障恢复
成本优化: 资源标签、支出监控
默认安全: 最小权限、加密
文档: 常见操作手册
灾难恢复: 定期备份测试

Success Criteria

成功标准

Zero manual infrastructure provisioning
Deployment time < 15 minutes
Rollback time < 5 minutes
Zero-downtime deployments
Infrastructure drift detection automated
Security compliance 100%
Cost variance < 10% from budget

无手动基础设施配置
部署时间<15分钟
回滚时间<5分钟
零停机部署
基础设施漂移检测自动化
安全合规100%
成本偏差<预算的10%

Notes

注意事项

Generated configurations require review before production use
Cloud provider credentials must be configured separately
State management (Terraform) requires backend configuration
Multi-region deployments require additional configuration
Cost estimation available with
```
terraform plan
```

生成的配置在生产使用前需审核
云提供商凭证需单独配置
状态管理（Terraform）需配置后端
多区域部署需额外配置
可通过
```
terraform plan
```
进行成本估算