cloud-infrastructure

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cloud Infrastructure

云基础设施

Comprehensive cloud infrastructure skill covering multi-cloud architecture, Infrastructure as Code, cost optimization, and production deployment patterns.
涵盖多云架构、基础设施即代码(IaC)、成本优化及生产环境部署模式的综合性云基础设施技能。

When to Use This Skill

何时使用该技能

  • Designing cloud architecture for new applications
  • Implementing Infrastructure as Code (Terraform, CloudFormation, Pulumi)
  • Cost optimization and resource right-sizing
  • Multi-region and high-availability deployments
  • Cloud migration planning
  • Security and compliance implementation
  • Auto-scaling and performance optimization
  • 为新应用设计云架构
  • 实现基础设施即代码(Terraform、CloudFormation、Pulumi)
  • 成本优化与资源合理配置
  • 多区域与高可用部署
  • 云迁移规划
  • 安全与合规实施
  • 自动扩缩容与性能优化

Cloud Architecture Patterns

云架构模式

Compute Patterns

计算模式

PatternAWSAzureGCPUse Case
ServerlessLambdaFunctionsCloud FunctionsEvent-driven, variable load
ContainersECS/EKSAKSGKEMicroservices, consistent env
VMsEC2Virtual MachinesCompute EngineLegacy apps, full control
BatchBatchBatchBatchLarge-scale processing
模式AWSAzureGCP适用场景
无服务器LambdaFunctionsCloud Functions事件驱动、负载可变
容器ECS/EKSAKSGKE微服务、一致运行环境
虚拟机EC2Virtual MachinesCompute Engine遗留应用、完全控制需求
批处理BatchBatchBatch大规模数据处理

Storage Patterns

存储模式

TypeAWSAzureGCPUse Case
ObjectS3Blob StorageCloud StorageStatic files, backups
BlockEBSManaged DisksPersistent DiskDatabase storage
FileEFSAzure FilesFilestoreShared file systems
ArchiveGlacierArchiveColdlineLong-term retention
类型AWSAzureGCP适用场景
对象存储S3Blob StorageCloud Storage静态文件、备份
块存储EBSManaged DisksPersistent Disk数据库存储
文件存储EFSAzure FilesFilestore共享文件系统
归档存储GlacierArchiveColdline长期数据留存

Database Patterns

数据库模式

TypeAWSAzureGCPUse Case
RelationalRDS, AuroraSQL DatabaseCloud SQLACID transactions
NoSQLDynamoDBCosmos DBFirestoreFlexible schema
CacheElastiCacheCache for RedisMemorystoreSession, caching
Data WarehouseRedshiftSynapseBigQueryAnalytics
类型AWSAzureGCP适用场景
关系型RDS, AuroraSQL DatabaseCloud SQLACID事务处理
非关系型DynamoDBCosmos DBFirestore灵活 schema 需求
缓存ElastiCacheCache for RedisMemorystore会话存储、缓存加速
数据仓库RedshiftSynapseBigQuery数据分析

Infrastructure as Code

基础设施即代码

Terraform Best Practices

Terraform 最佳实践

Project Structure:
infrastructure/
├── modules/
│   ├── networking/
│   ├── compute/
│   └── database/
├── environments/
│   ├── dev/
│   ├── staging/
│   └── prod/
├── main.tf
├── variables.tf
├── outputs.tf
└── versions.tf
State Management:
  • Use remote state (S3, Azure Blob, GCS)
  • Enable state locking (DynamoDB, Blob lease)
  • Separate state per environment
  • Never commit state files
Module Design:
  • Single responsibility per module
  • Expose minimal required variables
  • Document inputs/outputs
  • Version modules with git tags
项目结构:
infrastructure/
├── modules/
│   ├── networking/
│   ├── compute/
│   └── database/
├── environments/
│   ├── dev/
│   ├── staging/
│   └── prod/
├── main.tf
├── variables.tf
├── outputs.tf
└── versions.tf
状态管理:
  • 使用远程状态存储(S3、Azure Blob、GCS)
  • 启用状态锁定(DynamoDB、Blob 租约)
  • 按环境分离状态
  • 切勿提交状态文件
模块设计:
  • 每个模块单一职责
  • 暴露最少必要变量
  • 文档化输入/输出
  • 通过Git标签为模块版本化

Cost Optimization

成本优化

Compute Savings:
  • Reserved Instances (1-3 year commitment): 30-60% savings
  • Spot/Preemptible instances: 60-90% savings for interruptible workloads
  • Right-sizing: Match instance size to actual usage
  • Auto-scaling: Scale down during low usage
Storage Savings:
  • Lifecycle policies: Auto-transition to cheaper tiers
  • Compression: Reduce storage footprint
  • Deduplication: Eliminate redundant data
  • Delete unused resources: Orphaned volumes, snapshots
Network Savings:
  • Use CDN for static content
  • Optimize data transfer paths
  • Use private endpoints
  • Compress API responses
计算资源成本节省:
  • 预留实例(1-3年承诺期):节省30-60%成本
  • 竞价/抢占式实例:为可中断工作负载节省60-90%成本
  • 资源合理配置:匹配实例规格与实际使用需求
  • 自动扩缩容:低负载时自动缩容
存储资源成本节省:
  • 生命周期策略:自动转换至低成本存储层
  • 压缩:减少存储占用
  • 重复数据删除:消除冗余数据
  • 删除未使用资源:孤立卷、快照等
网络成本节省:
  • 为静态内容使用CDN
  • 优化数据传输路径
  • 使用私有端点
  • 压缩API响应

High Availability Patterns

高可用模式

Multi-AZ Deployment

多可用区部署

  • Deploy across 2-3 availability zones
  • Use load balancers for distribution
  • Database replication across AZs
  • Automatic failover configuration
  • 跨2-3个可用区部署
  • 使用负载均衡器分发流量
  • 数据库跨可用区复制
  • 配置自动故障转移

Multi-Region Deployment

多区域部署

  • Active-active or active-passive
  • DNS-based routing (Route53, Traffic Manager)
  • Data replication strategy
  • Disaster recovery procedures
  • 双活或主备模式
  • 基于DNS的路由(Route53、Traffic Manager)
  • 数据复制策略
  • 灾难恢复流程

Resilience Patterns

弹性模式

  • Circuit breakers for external dependencies
  • Retry with exponential backoff
  • Bulkhead isolation
  • Graceful degradation
  • 为外部依赖配置断路器
  • 指数退避重试机制
  • 舱壁隔离
  • 优雅降级

Security Best Practices

安全最佳实践

Identity & Access

身份与访问管理

  • Principle of least privilege
  • Use IAM roles, not long-term credentials
  • Enable MFA for privileged accounts
  • Regular access reviews
  • 最小权限原则
  • 使用IAM角色,而非长期凭证
  • 为特权账户启用MFA
  • 定期访问审核

Network Security

网络安全

  • VPC/VNet isolation
  • Security groups as firewalls
  • Private subnets for backend services
  • VPN/Direct Connect for hybrid
  • VPC/VNet 隔离
  • 安全组作为防火墙
  • 后端服务部署在私有子网
  • 使用VPN/专线实现混合云连接

Data Protection

数据保护

  • Encryption at rest (KMS)
  • Encryption in transit (TLS)
  • Key rotation policies
  • Backup and recovery testing
  • 静态数据加密(KMS)
  • 传输中数据加密(TLS)
  • 密钥轮换策略
  • 备份与恢复测试

Monitoring & Observability

监控与可观测性

Key Metrics

关键指标

  • CPU, Memory, Disk utilization
  • Network throughput and latency
  • Error rates and types
  • Cost per service/team
  • CPU、内存、磁盘利用率
  • 网络吞吐量与延迟
  • 错误率与错误类型
  • 按服务/团队划分的成本

Alerting Strategy

告警策略

  • Set thresholds based on baselines
  • Alert on symptoms, not causes
  • Runbooks for each alert
  • Escalation paths defined
  • 基于基线设置阈值
  • 针对症状而非原因告警
  • 为每个告警配备运行手册
  • 定义升级路径

Reference Files

参考文件

  • references/terraform_patterns.md
    - IaC patterns and examples
  • references/cost_optimization.md
    - Detailed cost reduction strategies
  • references/terraform_patterns.md
    - IaC模式与示例
  • references/cost_optimization.md
    - 详细成本削减策略

Integration with Other Skills

与其他技能的集成

  • security-engineering - For security architecture
  • network-engineering - For network design
  • performance - For optimization strategies
  • devops-runbooks - For operational procedures
  • security-engineering - 用于安全架构设计
  • network-engineering - 用于网络设计
  • performance - 用于优化策略制定
  • devops-runbooks - 用于操作流程规范