senior-cloud-architect

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Senior Cloud Architect

资深云架构师

Expert-level cloud architecture and infrastructure design.
专家级云架构与基础设施设计。

Core Competencies

核心能力

  • Multi-cloud architecture
  • AWS, GCP, Azure platforms
  • Cloud-native design patterns
  • Cost optimization
  • Security and compliance
  • Migration strategies
  • Disaster recovery
  • Infrastructure automation
  • 多云架构
  • AWS、GCP、Azure平台
  • 云原生设计模式
  • 成本优化
  • 安全与合规
  • 迁移策略
  • 灾难恢复
  • 基础设施自动化

Cloud Platform Comparison

云平台对比

ServiceAWSGCPAzure
ComputeEC2, ECS, EKSGCE, GKEVMs, AKS
ServerlessLambdaCloud FunctionsAzure Functions
StorageS3Cloud StorageBlob Storage
DatabaseRDS, DynamoDBCloud SQL, SpannerSQL DB, CosmosDB
MLSageMakerVertex AIAzure ML
CDNCloudFrontCloud CDNAzure CDN
服务AWSGCPAzure
计算EC2, ECS, EKSGCE, GKEVMs, AKS
无服务器LambdaCloud FunctionsAzure Functions
存储S3Cloud StorageBlob Storage
数据库RDS, DynamoDBCloud SQL, SpannerSQL DB, CosmosDB
机器学习SageMakerVertex AIAzure ML
内容分发网络CloudFrontCloud CDNAzure CDN

AWS Architecture

AWS架构

Well-Architected Framework

架构完善框架

Pillars:
  1. Operational Excellence
    • Infrastructure as Code
    • Monitoring and observability
    • Incident response
    • Continuous improvement
  2. Security
    • Identity and access management
    • Data protection
    • Infrastructure protection
    • Incident response
  3. Reliability
    • Fault tolerance
    • Disaster recovery
    • Change management
    • Failure testing
  4. Performance Efficiency
    • Right-sizing resources
    • Monitoring performance
    • Trade-off decisions
    • Keeping current
  5. Cost Optimization
    • Cost awareness
    • Right-sizing
    • Reserved capacity
    • Efficient resources
  6. Sustainability
    • Region selection
    • Efficient algorithms
    • Hardware utilization
    • Data management
支柱:
  1. 卓越运营
    • 基础设施即代码
    • 监控与可观测性
    • 事件响应
    • 持续改进
  2. 安全
    • 身份与访问管理
    • 数据保护
    • 基础设施保护
    • 事件响应
  3. 可靠性
    • 容错能力
    • 灾难恢复
    • 变更管理
    • 故障测试
  4. 性能效率
    • 资源合理选型
    • 性能监控
    • 权衡决策
    • 技术更新
  5. 成本优化
    • 成本感知
    • 资源合理选型
    • 预留容量
    • 高效资源利用
  6. 可持续性
    • 区域选择
    • 高效算法
    • 硬件利用率
    • 数据管理

Reference Architecture

参考架构

┌─────────────────────────────────────────────────────────────┐
│                        Route 53 (DNS)                       │
└─────────────────────────────┬───────────────────────────────┘
┌─────────────────────────────▼───────────────────────────────┐
│                    CloudFront (CDN)                         │
│                    WAF (Web Application Firewall)           │
└─────────────────────────────┬───────────────────────────────┘
┌─────────────────────────────▼───────────────────────────────┐
│                Application Load Balancer                     │
└──────────┬───────────────────────────────────┬──────────────┘
           │                                   │
┌──────────▼──────────┐             ┌──────────▼──────────┐
│   ECS/EKS Cluster   │             │   ECS/EKS Cluster   │
│   (AZ-a)            │             │   (AZ-b)            │
└──────────┬──────────┘             └──────────┬──────────┘
           │                                   │
┌──────────▼───────────────────────────────────▼──────────┐
│                    ElastiCache (Redis)                   │
└─────────────────────────────┬───────────────────────────┘
┌─────────────────────────────▼───────────────────────────┐
│                    RDS Multi-AZ                          │
│                    (Primary + Standby)                   │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│                        Route 53 (DNS)                       │
└─────────────────────────────┬───────────────────────────────┘
┌─────────────────────────────▼───────────────────────────────┐
│                    CloudFront (CDN)                         │
│                    WAF (Web Application Firewall)           │
└─────────────────────────────┬───────────────────────────────┘
┌─────────────────────────────▼───────────────────────────────┐
│                Application Load Balancer                     │
└──────────┬───────────────────────────────────┬──────────────┘
           │                                   │
┌──────────▼──────────┐             ┌──────────▼──────────┐
│   ECS/EKS Cluster   │             │   ECS/EKS Cluster   │
│   (AZ-a)            │             │   (AZ-b)            │
└──────────┬──────────┘             └──────────┬──────────┘
           │                                   │
┌──────────▼───────────────────────────────────▼──────────┐
│                    ElastiCache (Redis)                   │
└─────────────────────────────┬───────────────────────────┘
┌─────────────────────────────▼───────────────────────────┐
│                    RDS Multi-AZ                          │
│                    (Primary + Standby)                   │
└─────────────────────────────────────────────────────────┘

Terraform AWS Module

Terraform AWS模块

hcl
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "${var.project}-${var.environment}"
  cidr = var.vpc_cidr

  azs             = ["${var.region}a", "${var.region}b", "${var.region}c"]
  private_subnets = var.private_subnets
  public_subnets  = var.public_subnets

  enable_nat_gateway     = true
  single_nat_gateway     = var.environment != "production"
  enable_dns_hostnames   = true
  enable_dns_support     = true

  tags = local.common_tags
}

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = "${var.project}-${var.environment}"
  cluster_version = "1.28"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  cluster_endpoint_public_access  = true
  cluster_endpoint_private_access = true

  eks_managed_node_groups = {
    main = {
      instance_types = var.node_instance_types
      min_size       = var.node_min_size
      max_size       = var.node_max_size
      desired_size   = var.node_desired_size
    }
  }

  tags = local.common_tags
}

module "rds" {
  source  = "terraform-aws-modules/rds/aws"
  version = "~> 6.0"

  identifier = "${var.project}-${var.environment}"

  engine               = "postgres"
  engine_version       = "15"
  family               = "postgres15"
  major_engine_version = "15"
  instance_class       = var.db_instance_class

  allocated_storage     = var.db_allocated_storage
  max_allocated_storage = var.db_max_allocated_storage

  db_name  = var.db_name
  username = var.db_username
  port     = 5432

  multi_az               = var.environment == "production"
  db_subnet_group_name   = module.vpc.database_subnet_group
  vpc_security_group_ids = [module.security_group.security_group_id]

  backup_retention_period = var.environment == "production" ? 30 : 7
  skip_final_snapshot     = var.environment != "production"

  tags = local.common_tags
}
hcl
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "${var.project}-${var.environment}"
  cidr = var.vpc_cidr

  azs             = ["${var.region}a", "${var.region}b", "${var.region}c"]
  private_subnets = var.private_subnets
  public_subnets  = var.public_subnets

  enable_nat_gateway     = true
  single_nat_gateway     = var.environment != "production"
  enable_dns_hostnames   = true
  enable_dns_support     = true

  tags = local.common_tags
}

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = "${var.project}-${var.environment}"
  cluster_version = "1.28"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  cluster_endpoint_public_access  = true
  cluster_endpoint_private_access = true

  eks_managed_node_groups = {
    main = {
      instance_types = var.node_instance_types
      min_size       = var.node_min_size
      max_size       = var.node_max_size
      desired_size   = var.node_desired_size
    }
  }

  tags = local.common_tags
}

module "rds" {
  source  = "terraform-aws-modules/rds/aws"
  version = "~> 6.0"

  identifier = "${var.project}-${var.environment}"

  engine               = "postgres"
  engine_version       = "15"
  family               = "postgres15"
  major_engine_version = "15"
  instance_class       = var.db_instance_class

  allocated_storage     = var.db_allocated_storage
  max_allocated_storage = var.db_max_allocated_storage

  db_name  = var.db_name
  username = var.db_username
  port     = 5432

  multi_az               = var.environment == "production"
  db_subnet_group_name   = module.vpc.database_subnet_group
  vpc_security_group_ids = [module.security_group.security_group_id]

  backup_retention_period = var.environment == "production" ? 30 : 7
  skip_final_snapshot     = var.environment != "production"

  tags = local.common_tags
}

Cost Optimization

成本优化

Reserved vs On-Demand vs Spot

预留实例 vs 按需实例 vs 竞价实例

TypeDiscountCommitmentUse Case
On-Demand0%NoneVariable workloads
Reserved30-72%1-3 yearsSteady-state
Savings Plans30-72%1-3 yearsFlexible compute
Spot60-90%NoneFault-tolerant
类型折扣承诺期限使用场景
按需实例0%可变工作负载
预留实例30-72%1-3年稳定负载
储蓄计划30-72%1-3年灵活计算
竞价实例60-90%容错工作负载

Cost Optimization Strategies

成本优化策略

Right-sizing:
python
def analyze_utilization(instance_id: str, days: int = 14):
    """Analyze CPU/memory utilization for right-sizing recommendations."""
    cloudwatch = boto3.client('cloudwatch')

    metrics = cloudwatch.get_metric_statistics(
        Namespace='AWS/EC2',
        MetricName='CPUUtilization',
        Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
        StartTime=datetime.now() - timedelta(days=days),
        EndTime=datetime.now(),
        Period=3600,
        Statistics=['Average', 'Maximum']
    )

    avg_cpu = sum(p['Average'] for p in metrics['Datapoints']) / len(metrics['Datapoints'])
    max_cpu = max(p['Maximum'] for p in metrics['Datapoints'])

    if avg_cpu < 10 and max_cpu < 30:
        return 'downsize'
    elif avg_cpu > 80:
        return 'upsize'
    else:
        return 'optimal'
Cost Allocation Tags:
yaml
required_tags:
  - Environment: production|staging|development
  - Project: project-name
  - Owner: team-name
  - CostCenter: cost-center-id

automation:
  - Untagged resources alert after 24 hours
  - Auto-terminate development resources after 7 days
  - Weekly cost reports by tag
资源合理选型:
python
def analyze_utilization(instance_id: str, days: int = 14):
    """分析CPU/内存利用率,给出资源选型建议。"""
    cloudwatch = boto3.client('cloudwatch')

    metrics = cloudwatch.get_metric_statistics(
        Namespace='AWS/EC2',
        MetricName='CPUUtilization',
        Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
        StartTime=datetime.now() - timedelta(days=days),
        EndTime=datetime.now(),
        Period=3600,
        Statistics=['Average', 'Maximum']
    )

    avg_cpu = sum(p['Average'] for p in metrics['Datapoints']) / len(metrics['Datapoints'])
    max_cpu = max(p['Maximum'] for p in metrics['Datapoints'])

    if avg_cpu < 10 and max_cpu < 30:
        return 'downsize'
    elif avg_cpu > 80:
        return 'upsize'
    else:
        return 'optimal'
成本分配标签:
yaml
required_tags:
  - Environment: production|staging|development
  - Project: project-name
  - Owner: team-name
  - CostCenter: cost-center-id

automation:
  - 未打标签资源24小时后触发告警
  - 开发环境资源7天后自动终止
  - 按标签生成每周成本报告

Cost Dashboard

成本仪表盘

┌─────────────────────────────────────────────────────────────┐
│                    Monthly Cost Summary                      │
├─────────────────────────────────────────────────────────────┤
│  Total: $45,231     vs Last Month: +5%                      │
│                                                              │
│  By Service:                    By Environment:              │
│  ├── EC2: $18,500 (41%)        ├── Production: $38,000      │
│  ├── RDS: $12,000 (27%)        ├── Staging: $4,500          │
│  ├── S3: $3,200 (7%)           └── Development: $2,731      │
│  ├── Lambda: $1,800 (4%)                                     │
│  └── Other: $9,731 (21%)       Savings Opportunity: $8,200   │
│                                                              │
│  Recommendations:                                            │
│  • Convert 12 instances to Reserved (save $4,200/mo)        │
│  • Delete 5 unused EBS volumes (save $180/mo)               │
│  • Resize 8 over-provisioned instances (save $1,800/mo)     │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│                    月度成本汇总                              │
├─────────────────────────────────────────────────────────────┤
│  总计: $45,231     环比: +5%                                │
│                                                              │
│  按服务划分:                    按环境划分:                  │
│  ├── EC2: $18,500 (41%)        ├── 生产环境: $38,000        │
│  ├── RDS: $12,000 (27%)        ├── 预发布环境: $4,500        │
│  ├── S3: $3,200 (7%)           └── 开发环境: $2,731          │
│  ├── Lambda: $1,800 (4%)                                     │
│  └── 其他: $9,731 (21%)       潜在节省金额: $8,200           │
│                                                              │
│  建议:                                                      │
│  • 将12台实例转换为预留实例(每月节省$4,200)                │
│  • 删除5个未使用的EBS卷(每月节省$180)                     │
│  • 调整8台过度配置的实例规格(每月节省$1,800)               │
└─────────────────────────────────────────────────────────────┘

Disaster Recovery

灾难恢复

DR Strategies

灾难恢复策略

StrategyRTORPOCost
Backup & RestoreHoursHours$
Pilot LightMinutesMinutes$$
Warm StandbyMinutesSeconds$$$
Multi-Site ActiveSecondsNear-zero$$$$
策略RTORPO成本
备份与恢复小时级小时级$
试点模式分钟级分钟级$$
暖备模式分钟级秒级$$$
多站点活跃模式秒级近乎零$$$$

Multi-Region Architecture

多区域架构

┌────────────────────────────────────────────────────────────┐
│                      Global Load Balancer                   │
│                      (Route 53 / Cloud DNS)                 │
└──────────────┬─────────────────────────────┬───────────────┘
               │                             │
┌──────────────▼──────────────┐ ┌────────────▼──────────────┐
│      Primary Region         │ │     Secondary Region       │
│      (us-east-1)           │ │     (us-west-2)           │
│                            │ │                            │
│  ┌──────────────────────┐  │ │  ┌──────────────────────┐  │
│  │   Application Layer  │  │ │  │   Application Layer  │  │
│  │   (Active)          │  │ │  │   (Standby/Active)   │  │
│  └──────────┬───────────┘  │ │  └──────────┬───────────┘  │
│             │              │ │             │              │
│  ┌──────────▼───────────┐  │ │  ┌──────────▼───────────┐  │
│  │   Database           │──┼─┼──│   Database           │  │
│  │   (Primary)         │  │ │  │   (Read Replica)     │  │
│  └──────────────────────┘  │ │  └──────────────────────┘  │
└────────────────────────────┘ └────────────────────────────┘
                    │ Cross-Region Replication
        ┌──────────────────────┐
        │     S3 Backup        │
        │   (Multi-Region)     │
        └──────────────────────┘
┌────────────────────────────────────────────────────────────┐
│                      全局负载均衡器                         │
│                      (Route 53 / Cloud DNS)                 │
└──────────────┬─────────────────────────────┬───────────────┘
               │                             │
┌──────────────▼──────────────┐ ┌────────────▼──────────────┐
│      主区域                  │ │     备用区域                │
│      (us-east-1)           │ │     (us-west-2)           │
│                            │ │                            │
│  ┌──────────────────────┐  │ │  ┌──────────────────────┐  │
│  │   应用层              │  │ │  │   应用层              │  │
│  │   (活跃状态)          │  │ │  │   (备用/活跃状态)     │  │
│  └──────────┬───────────┘  │ │  └──────────┬───────────┘  │
│             │              │ │             │              │
│  ┌──────────▼───────────┐  │ │  ┌──────────▼───────────┐  │
│  │   数据库              │──┼─┼──│   数据库              │  │
│  │   (主实例)            │  │ │  │   (只读副本)          │  │
│  └──────────────────────┘  │ │  └──────────────────────┘  │
└────────────────────────────┘ └────────────────────────────┘
                    │ 跨区域复制
        ┌──────────────────────┐
        │     S3备份           │
        │   (多区域)           │
        └──────────────────────┘

Backup Strategy

备份策略

yaml
backup_policy:
  database:
    frequency: continuous
    retention: 35 days
    cross_region: true
    encryption: aws/rds

  application_data:
    frequency: daily
    retention: 90 days
    versioning: enabled
    lifecycle:
      - transition_to_ia: 30 days
      - transition_to_glacier: 90 days
      - expiration: 365 days

  configuration:
    frequency: on_change
    retention: unlimited
    storage: git + s3
yaml
backup_policy:
  database:
    frequency: continuous
    retention: 35 days
    cross_region: true
    encryption: aws/rds

  application_data:
    frequency: daily
    retention: 90 days
    versioning: enabled
    lifecycle:
      - transition_to_ia: 30 days
      - transition_to_glacier: 90 days
      - expiration: 365 days

  configuration:
    frequency: on_change
    retention: unlimited
    storage: git + s3

Security Architecture

安全架构

Network Security

网络安全

┌─────────────────────────────────────────────────────────────┐
│                           VPC                                │
│  ┌───────────────────────────────────────────────────────┐  │
│  │                    Public Subnet                       │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐  │  │
│  │  │   NAT GW    │  │     ALB     │  │   Bastion     │  │  │
│  │  └─────────────┘  └─────────────┘  └───────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                              │                               │
│  ┌───────────────────────────▼───────────────────────────┐  │
│  │                   Private Subnet                       │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐  │  │
│  │  │   App Tier  │  │   App Tier  │  │   App Tier    │  │  │
│  │  └─────────────┘  └─────────────┘  └───────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                              │                               │
│  ┌───────────────────────────▼───────────────────────────┐  │
│  │                   Data Subnet                          │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐  │  │
│  │  │     RDS     │  │    Redis    │  │  Elasticsearch│  │  │
│  │  └─────────────┘  └─────────────┘  └───────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│                           VPC                                │
│  ┌───────────────────────────────────────────────────────┐  │
│  │                    公有子网                           │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐  │  │
│  │  │   NAT网关    │  │     ALB     │  │   堡垒机       │  │  │
│  │  └─────────────┘  └─────────────┘  └───────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                              │                               │
│  ┌───────────────────────────▼───────────────────────────┐  │
│  │                   私有子网                           │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐  │  │
│  │  │   应用层    │  │   应用层    │  │   应用层      │  │  │
│  │  └─────────────┘  └─────────────┘  └───────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
│                              │                               │
│  ┌───────────────────────────▼───────────────────────────┐  │
│  │                   数据子网                            │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌───────────────┐  │  │
│  │  │     RDS     │  │    Redis    │  │  Elasticsearch│  │  │
│  │  └─────────────┘  └─────────────┘  └───────────────┘  │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

IAM Best Practices

IAM最佳实践

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "LeastPrivilegeExample",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/uploads/*",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/Team": "engineering"
        },
        "IpAddress": {
          "aws:SourceIp": ["10.0.0.0/8"]
        }
      }
    }
  ]
}
json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "LeastPrivilegeExample",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/uploads/*",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/Team": "engineering"
        },
        "IpAddress": {
          "aws:SourceIp": ["10.0.0.0/8"]
        }
      }
    }
  ]
}

Reference Materials

参考资料

  • references/aws_patterns.md
    - AWS architecture patterns
  • references/gcp_patterns.md
    - GCP architecture patterns
  • references/multi_cloud.md
    - Multi-cloud strategies
  • references/cost_optimization.md
    - Cost optimization guide
  • references/aws_patterns.md
    - AWS架构模式
  • references/gcp_patterns.md
    - GCP架构模式
  • references/multi_cloud.md
    - 多云战略
  • references/cost_optimization.md
    - 成本优化指南

Scripts

脚本

bash
undefined
bash
undefined

Infrastructure cost analyzer

基础设施成本分析器

python scripts/cost_analyzer.py --account production --period monthly
python scripts/cost_analyzer.py --account production --period monthly

DR validation

灾难恢复验证

python scripts/dr_test.py --region us-west-2 --type failover
python scripts/dr_test.py --region us-west-2 --type failover

Security audit

安全审计

python scripts/security_audit.py --framework cis --output report.html
python scripts/security_audit.py --framework cis --output report.html

Resource inventory

资源清单

python scripts/inventory.py --accounts all --format csv
undefined
python scripts/inventory.py --accounts all --format csv
undefined