senior-cloud-architect
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSenior Cloud Architect
资深云架构师
Expert-level cloud architecture and infrastructure design.
专家级云架构与基础设施设计。
Core Competencies
核心能力
- Multi-cloud architecture
- AWS, GCP, Azure platforms
- Cloud-native design patterns
- Cost optimization
- Security and compliance
- Migration strategies
- Disaster recovery
- Infrastructure automation
- 多云架构
- AWS、GCP、Azure平台
- 云原生设计模式
- 成本优化
- 安全与合规
- 迁移策略
- 灾难恢复
- 基础设施自动化
Cloud Platform Comparison
云平台对比
| Service | AWS | GCP | Azure |
|---|---|---|---|
| Compute | EC2, ECS, EKS | GCE, GKE | VMs, AKS |
| Serverless | Lambda | Cloud Functions | Azure Functions |
| Storage | S3 | Cloud Storage | Blob Storage |
| Database | RDS, DynamoDB | Cloud SQL, Spanner | SQL DB, CosmosDB |
| ML | SageMaker | Vertex AI | Azure ML |
| CDN | CloudFront | Cloud CDN | Azure CDN |
| 服务 | AWS | GCP | Azure |
|---|---|---|---|
| 计算 | EC2, ECS, EKS | GCE, GKE | VMs, AKS |
| 无服务器 | Lambda | Cloud Functions | Azure Functions |
| 存储 | S3 | Cloud Storage | Blob Storage |
| 数据库 | RDS, DynamoDB | Cloud SQL, Spanner | SQL DB, CosmosDB |
| 机器学习 | SageMaker | Vertex AI | Azure ML |
| 内容分发网络 | CloudFront | Cloud CDN | Azure CDN |
AWS Architecture
AWS架构
Well-Architected Framework
架构完善框架
Pillars:
-
Operational Excellence
- Infrastructure as Code
- Monitoring and observability
- Incident response
- Continuous improvement
-
Security
- Identity and access management
- Data protection
- Infrastructure protection
- Incident response
-
Reliability
- Fault tolerance
- Disaster recovery
- Change management
- Failure testing
-
Performance Efficiency
- Right-sizing resources
- Monitoring performance
- Trade-off decisions
- Keeping current
-
Cost Optimization
- Cost awareness
- Right-sizing
- Reserved capacity
- Efficient resources
-
Sustainability
- Region selection
- Efficient algorithms
- Hardware utilization
- Data management
支柱:
-
卓越运营
- 基础设施即代码
- 监控与可观测性
- 事件响应
- 持续改进
-
安全
- 身份与访问管理
- 数据保护
- 基础设施保护
- 事件响应
-
可靠性
- 容错能力
- 灾难恢复
- 变更管理
- 故障测试
-
性能效率
- 资源合理选型
- 性能监控
- 权衡决策
- 技术更新
-
成本优化
- 成本感知
- 资源合理选型
- 预留容量
- 高效资源利用
-
可持续性
- 区域选择
- 高效算法
- 硬件利用率
- 数据管理
Reference Architecture
参考架构
┌─────────────────────────────────────────────────────────────┐
│ Route 53 (DNS) │
└─────────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────────▼───────────────────────────────┐
│ CloudFront (CDN) │
│ WAF (Web Application Firewall) │
└─────────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────────▼───────────────────────────────┐
│ Application Load Balancer │
└──────────┬───────────────────────────────────┬──────────────┘
│ │
┌──────────▼──────────┐ ┌──────────▼──────────┐
│ ECS/EKS Cluster │ │ ECS/EKS Cluster │
│ (AZ-a) │ │ (AZ-b) │
└──────────┬──────────┘ └──────────┬──────────┘
│ │
┌──────────▼───────────────────────────────────▼──────────┐
│ ElastiCache (Redis) │
└─────────────────────────────┬───────────────────────────┘
│
┌─────────────────────────────▼───────────────────────────┐
│ RDS Multi-AZ │
│ (Primary + Standby) │
└─────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────┐
│ Route 53 (DNS) │
└─────────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────────▼───────────────────────────────┐
│ CloudFront (CDN) │
│ WAF (Web Application Firewall) │
└─────────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────────▼───────────────────────────────┐
│ Application Load Balancer │
└──────────┬───────────────────────────────────┬──────────────┘
│ │
┌──────────▼──────────┐ ┌──────────▼──────────┐
│ ECS/EKS Cluster │ │ ECS/EKS Cluster │
│ (AZ-a) │ │ (AZ-b) │
└──────────┬──────────┘ └──────────┬──────────┘
│ │
┌──────────▼───────────────────────────────────▼──────────┐
│ ElastiCache (Redis) │
└─────────────────────────────┬───────────────────────────┘
│
┌─────────────────────────────▼───────────────────────────┐
│ RDS Multi-AZ │
│ (Primary + Standby) │
└─────────────────────────────────────────────────────────┘Terraform AWS Module
Terraform AWS模块
hcl
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "${var.project}-${var.environment}"
cidr = var.vpc_cidr
azs = ["${var.region}a", "${var.region}b", "${var.region}c"]
private_subnets = var.private_subnets
public_subnets = var.public_subnets
enable_nat_gateway = true
single_nat_gateway = var.environment != "production"
enable_dns_hostnames = true
enable_dns_support = true
tags = local.common_tags
}
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 19.0"
cluster_name = "${var.project}-${var.environment}"
cluster_version = "1.28"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
cluster_endpoint_public_access = true
cluster_endpoint_private_access = true
eks_managed_node_groups = {
main = {
instance_types = var.node_instance_types
min_size = var.node_min_size
max_size = var.node_max_size
desired_size = var.node_desired_size
}
}
tags = local.common_tags
}
module "rds" {
source = "terraform-aws-modules/rds/aws"
version = "~> 6.0"
identifier = "${var.project}-${var.environment}"
engine = "postgres"
engine_version = "15"
family = "postgres15"
major_engine_version = "15"
instance_class = var.db_instance_class
allocated_storage = var.db_allocated_storage
max_allocated_storage = var.db_max_allocated_storage
db_name = var.db_name
username = var.db_username
port = 5432
multi_az = var.environment == "production"
db_subnet_group_name = module.vpc.database_subnet_group
vpc_security_group_ids = [module.security_group.security_group_id]
backup_retention_period = var.environment == "production" ? 30 : 7
skip_final_snapshot = var.environment != "production"
tags = local.common_tags
}hcl
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "${var.project}-${var.environment}"
cidr = var.vpc_cidr
azs = ["${var.region}a", "${var.region}b", "${var.region}c"]
private_subnets = var.private_subnets
public_subnets = var.public_subnets
enable_nat_gateway = true
single_nat_gateway = var.environment != "production"
enable_dns_hostnames = true
enable_dns_support = true
tags = local.common_tags
}
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 19.0"
cluster_name = "${var.project}-${var.environment}"
cluster_version = "1.28"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
cluster_endpoint_public_access = true
cluster_endpoint_private_access = true
eks_managed_node_groups = {
main = {
instance_types = var.node_instance_types
min_size = var.node_min_size
max_size = var.node_max_size
desired_size = var.node_desired_size
}
}
tags = local.common_tags
}
module "rds" {
source = "terraform-aws-modules/rds/aws"
version = "~> 6.0"
identifier = "${var.project}-${var.environment}"
engine = "postgres"
engine_version = "15"
family = "postgres15"
major_engine_version = "15"
instance_class = var.db_instance_class
allocated_storage = var.db_allocated_storage
max_allocated_storage = var.db_max_allocated_storage
db_name = var.db_name
username = var.db_username
port = 5432
multi_az = var.environment == "production"
db_subnet_group_name = module.vpc.database_subnet_group
vpc_security_group_ids = [module.security_group.security_group_id]
backup_retention_period = var.environment == "production" ? 30 : 7
skip_final_snapshot = var.environment != "production"
tags = local.common_tags
}Cost Optimization
成本优化
Reserved vs On-Demand vs Spot
预留实例 vs 按需实例 vs 竞价实例
| Type | Discount | Commitment | Use Case |
|---|---|---|---|
| On-Demand | 0% | None | Variable workloads |
| Reserved | 30-72% | 1-3 years | Steady-state |
| Savings Plans | 30-72% | 1-3 years | Flexible compute |
| Spot | 60-90% | None | Fault-tolerant |
| 类型 | 折扣 | 承诺期限 | 使用场景 |
|---|---|---|---|
| 按需实例 | 0% | 无 | 可变工作负载 |
| 预留实例 | 30-72% | 1-3年 | 稳定负载 |
| 储蓄计划 | 30-72% | 1-3年 | 灵活计算 |
| 竞价实例 | 60-90% | 无 | 容错工作负载 |
Cost Optimization Strategies
成本优化策略
Right-sizing:
python
def analyze_utilization(instance_id: str, days: int = 14):
"""Analyze CPU/memory utilization for right-sizing recommendations."""
cloudwatch = boto3.client('cloudwatch')
metrics = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
StartTime=datetime.now() - timedelta(days=days),
EndTime=datetime.now(),
Period=3600,
Statistics=['Average', 'Maximum']
)
avg_cpu = sum(p['Average'] for p in metrics['Datapoints']) / len(metrics['Datapoints'])
max_cpu = max(p['Maximum'] for p in metrics['Datapoints'])
if avg_cpu < 10 and max_cpu < 30:
return 'downsize'
elif avg_cpu > 80:
return 'upsize'
else:
return 'optimal'Cost Allocation Tags:
yaml
required_tags:
- Environment: production|staging|development
- Project: project-name
- Owner: team-name
- CostCenter: cost-center-id
automation:
- Untagged resources alert after 24 hours
- Auto-terminate development resources after 7 days
- Weekly cost reports by tag资源合理选型:
python
def analyze_utilization(instance_id: str, days: int = 14):
"""分析CPU/内存利用率,给出资源选型建议。"""
cloudwatch = boto3.client('cloudwatch')
metrics = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
StartTime=datetime.now() - timedelta(days=days),
EndTime=datetime.now(),
Period=3600,
Statistics=['Average', 'Maximum']
)
avg_cpu = sum(p['Average'] for p in metrics['Datapoints']) / len(metrics['Datapoints'])
max_cpu = max(p['Maximum'] for p in metrics['Datapoints'])
if avg_cpu < 10 and max_cpu < 30:
return 'downsize'
elif avg_cpu > 80:
return 'upsize'
else:
return 'optimal'成本分配标签:
yaml
required_tags:
- Environment: production|staging|development
- Project: project-name
- Owner: team-name
- CostCenter: cost-center-id
automation:
- 未打标签资源24小时后触发告警
- 开发环境资源7天后自动终止
- 按标签生成每周成本报告Cost Dashboard
成本仪表盘
┌─────────────────────────────────────────────────────────────┐
│ Monthly Cost Summary │
├─────────────────────────────────────────────────────────────┤
│ Total: $45,231 vs Last Month: +5% │
│ │
│ By Service: By Environment: │
│ ├── EC2: $18,500 (41%) ├── Production: $38,000 │
│ ├── RDS: $12,000 (27%) ├── Staging: $4,500 │
│ ├── S3: $3,200 (7%) └── Development: $2,731 │
│ ├── Lambda: $1,800 (4%) │
│ └── Other: $9,731 (21%) Savings Opportunity: $8,200 │
│ │
│ Recommendations: │
│ • Convert 12 instances to Reserved (save $4,200/mo) │
│ • Delete 5 unused EBS volumes (save $180/mo) │
│ • Resize 8 over-provisioned instances (save $1,800/mo) │
└─────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────┐
│ 月度成本汇总 │
├─────────────────────────────────────────────────────────────┤
│ 总计: $45,231 环比: +5% │
│ │
│ 按服务划分: 按环境划分: │
│ ├── EC2: $18,500 (41%) ├── 生产环境: $38,000 │
│ ├── RDS: $12,000 (27%) ├── 预发布环境: $4,500 │
│ ├── S3: $3,200 (7%) └── 开发环境: $2,731 │
│ ├── Lambda: $1,800 (4%) │
│ └── 其他: $9,731 (21%) 潜在节省金额: $8,200 │
│ │
│ 建议: │
│ • 将12台实例转换为预留实例(每月节省$4,200) │
│ • 删除5个未使用的EBS卷(每月节省$180) │
│ • 调整8台过度配置的实例规格(每月节省$1,800) │
└─────────────────────────────────────────────────────────────┘Disaster Recovery
灾难恢复
DR Strategies
灾难恢复策略
| Strategy | RTO | RPO | Cost |
|---|---|---|---|
| Backup & Restore | Hours | Hours | $ |
| Pilot Light | Minutes | Minutes | $$ |
| Warm Standby | Minutes | Seconds | $$$ |
| Multi-Site Active | Seconds | Near-zero | $$$$ |
| 策略 | RTO | RPO | 成本 |
|---|---|---|---|
| 备份与恢复 | 小时级 | 小时级 | $ |
| 试点模式 | 分钟级 | 分钟级 | $$ |
| 暖备模式 | 分钟级 | 秒级 | $$$ |
| 多站点活跃模式 | 秒级 | 近乎零 | $$$$ |
Multi-Region Architecture
多区域架构
┌────────────────────────────────────────────────────────────┐
│ Global Load Balancer │
│ (Route 53 / Cloud DNS) │
└──────────────┬─────────────────────────────┬───────────────┘
│ │
┌──────────────▼──────────────┐ ┌────────────▼──────────────┐
│ Primary Region │ │ Secondary Region │
│ (us-east-1) │ │ (us-west-2) │
│ │ │ │
│ ┌──────────────────────┐ │ │ ┌──────────────────────┐ │
│ │ Application Layer │ │ │ │ Application Layer │ │
│ │ (Active) │ │ │ │ (Standby/Active) │ │
│ └──────────┬───────────┘ │ │ └──────────┬───────────┘ │
│ │ │ │ │ │
│ ┌──────────▼───────────┐ │ │ ┌──────────▼───────────┐ │
│ │ Database │──┼─┼──│ Database │ │
│ │ (Primary) │ │ │ │ (Read Replica) │ │
│ └──────────────────────┘ │ │ └──────────────────────┘ │
└────────────────────────────┘ └────────────────────────────┘
│
│ Cross-Region Replication
▼
┌──────────────────────┐
│ S3 Backup │
│ (Multi-Region) │
└──────────────────────┘┌────────────────────────────────────────────────────────────┐
│ 全局负载均衡器 │
│ (Route 53 / Cloud DNS) │
└──────────────┬─────────────────────────────┬───────────────┘
│ │
┌──────────────▼──────────────┐ ┌────────────▼──────────────┐
│ 主区域 │ │ 备用区域 │
│ (us-east-1) │ │ (us-west-2) │
│ │ │ │
│ ┌──────────────────────┐ │ │ ┌──────────────────────┐ │
│ │ 应用层 │ │ │ │ 应用层 │ │
│ │ (活跃状态) │ │ │ │ (备用/活跃状态) │ │
│ └──────────┬───────────┘ │ │ └──────────┬───────────┘ │
│ │ │ │ │ │
│ ┌──────────▼───────────┐ │ │ ┌──────────▼───────────┐ │
│ │ 数据库 │──┼─┼──│ 数据库 │ │
│ │ (主实例) │ │ │ │ (只读副本) │ │
│ └──────────────────────┘ │ │ └──────────────────────┘ │
└────────────────────────────┘ └────────────────────────────┘
│
│ 跨区域复制
▼
┌──────────────────────┐
│ S3备份 │
│ (多区域) │
└──────────────────────┘Backup Strategy
备份策略
yaml
backup_policy:
database:
frequency: continuous
retention: 35 days
cross_region: true
encryption: aws/rds
application_data:
frequency: daily
retention: 90 days
versioning: enabled
lifecycle:
- transition_to_ia: 30 days
- transition_to_glacier: 90 days
- expiration: 365 days
configuration:
frequency: on_change
retention: unlimited
storage: git + s3yaml
backup_policy:
database:
frequency: continuous
retention: 35 days
cross_region: true
encryption: aws/rds
application_data:
frequency: daily
retention: 90 days
versioning: enabled
lifecycle:
- transition_to_ia: 30 days
- transition_to_glacier: 90 days
- expiration: 365 days
configuration:
frequency: on_change
retention: unlimited
storage: git + s3Security Architecture
安全架构
Network Security
网络安全
┌─────────────────────────────────────────────────────────────┐
│ VPC │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Public Subnet │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐ │ │
│ │ │ NAT GW │ │ ALB │ │ Bastion │ │ │
│ │ └─────────────┘ └─────────────┘ └───────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────▼───────────────────────────┐ │
│ │ Private Subnet │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐ │ │
│ │ │ App Tier │ │ App Tier │ │ App Tier │ │ │
│ │ └─────────────┘ └─────────────┘ └───────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────▼───────────────────────────┐ │
│ │ Data Subnet │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐ │ │
│ │ │ RDS │ │ Redis │ │ Elasticsearch│ │ │
│ │ └─────────────┘ └─────────────┘ └───────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘┌─────────────────────────────────────────────────────────────┐
│ VPC │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ 公有子网 │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐ │ │
│ │ │ NAT网关 │ │ ALB │ │ 堡垒机 │ │ │
│ │ └─────────────┘ └─────────────┘ └───────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────▼───────────────────────────┐ │
│ │ 私有子网 │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐ │ │
│ │ │ 应用层 │ │ 应用层 │ │ 应用层 │ │ │
│ │ └─────────────┘ └─────────────┘ └───────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────▼───────────────────────────┐ │
│ │ 数据子网 │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌───────────────┐ │ │
│ │ │ RDS │ │ Redis │ │ Elasticsearch│ │ │
│ │ └─────────────┘ └─────────────┘ └───────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘IAM Best Practices
IAM最佳实践
json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "LeastPrivilegeExample",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-bucket/uploads/*",
"Condition": {
"StringEquals": {
"aws:PrincipalTag/Team": "engineering"
},
"IpAddress": {
"aws:SourceIp": ["10.0.0.0/8"]
}
}
}
]
}json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "LeastPrivilegeExample",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-bucket/uploads/*",
"Condition": {
"StringEquals": {
"aws:PrincipalTag/Team": "engineering"
},
"IpAddress": {
"aws:SourceIp": ["10.0.0.0/8"]
}
}
}
]
}Reference Materials
参考资料
- - AWS architecture patterns
references/aws_patterns.md - - GCP architecture patterns
references/gcp_patterns.md - - Multi-cloud strategies
references/multi_cloud.md - - Cost optimization guide
references/cost_optimization.md
- - AWS架构模式
references/aws_patterns.md - - GCP架构模式
references/gcp_patterns.md - - 多云战略
references/multi_cloud.md - - 成本优化指南
references/cost_optimization.md
Scripts
脚本
bash
undefinedbash
undefinedInfrastructure cost analyzer
基础设施成本分析器
python scripts/cost_analyzer.py --account production --period monthly
python scripts/cost_analyzer.py --account production --period monthly
DR validation
灾难恢复验证
python scripts/dr_test.py --region us-west-2 --type failover
python scripts/dr_test.py --region us-west-2 --type failover
Security audit
安全审计
python scripts/security_audit.py --framework cis --output report.html
python scripts/security_audit.py --framework cis --output report.html
Resource inventory
资源清单
python scripts/inventory.py --accounts all --format csv
undefinedpython scripts/inventory.py --accounts all --format csv
undefined