cloud-infrastructure
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCloud Infrastructure
云基础设施
Comprehensive cloud infrastructure skill covering multi-cloud architecture, Infrastructure as Code, cost optimization, and production deployment patterns.
涵盖多云架构、基础设施即代码(IaC)、成本优化及生产环境部署模式的综合性云基础设施技能。
When to Use This Skill
何时使用该技能
- Designing cloud architecture for new applications
- Implementing Infrastructure as Code (Terraform, CloudFormation, Pulumi)
- Cost optimization and resource right-sizing
- Multi-region and high-availability deployments
- Cloud migration planning
- Security and compliance implementation
- Auto-scaling and performance optimization
- 为新应用设计云架构
- 实现基础设施即代码(Terraform、CloudFormation、Pulumi)
- 成本优化与资源合理配置
- 多区域与高可用部署
- 云迁移规划
- 安全与合规实施
- 自动扩缩容与性能优化
Cloud Architecture Patterns
云架构模式
Compute Patterns
计算模式
| Pattern | AWS | Azure | GCP | Use Case |
|---|---|---|---|---|
| Serverless | Lambda | Functions | Cloud Functions | Event-driven, variable load |
| Containers | ECS/EKS | AKS | GKE | Microservices, consistent env |
| VMs | EC2 | Virtual Machines | Compute Engine | Legacy apps, full control |
| Batch | Batch | Batch | Batch | Large-scale processing |
| 模式 | AWS | Azure | GCP | 适用场景 |
|---|---|---|---|---|
| 无服务器 | Lambda | Functions | Cloud Functions | 事件驱动、负载可变 |
| 容器 | ECS/EKS | AKS | GKE | 微服务、一致运行环境 |
| 虚拟机 | EC2 | Virtual Machines | Compute Engine | 遗留应用、完全控制需求 |
| 批处理 | Batch | Batch | Batch | 大规模数据处理 |
Storage Patterns
存储模式
| Type | AWS | Azure | GCP | Use Case |
|---|---|---|---|---|
| Object | S3 | Blob Storage | Cloud Storage | Static files, backups |
| Block | EBS | Managed Disks | Persistent Disk | Database storage |
| File | EFS | Azure Files | Filestore | Shared file systems |
| Archive | Glacier | Archive | Coldline | Long-term retention |
| 类型 | AWS | Azure | GCP | 适用场景 |
|---|---|---|---|---|
| 对象存储 | S3 | Blob Storage | Cloud Storage | 静态文件、备份 |
| 块存储 | EBS | Managed Disks | Persistent Disk | 数据库存储 |
| 文件存储 | EFS | Azure Files | Filestore | 共享文件系统 |
| 归档存储 | Glacier | Archive | Coldline | 长期数据留存 |
Database Patterns
数据库模式
| Type | AWS | Azure | GCP | Use Case |
|---|---|---|---|---|
| Relational | RDS, Aurora | SQL Database | Cloud SQL | ACID transactions |
| NoSQL | DynamoDB | Cosmos DB | Firestore | Flexible schema |
| Cache | ElastiCache | Cache for Redis | Memorystore | Session, caching |
| Data Warehouse | Redshift | Synapse | BigQuery | Analytics |
| 类型 | AWS | Azure | GCP | 适用场景 |
|---|---|---|---|---|
| 关系型 | RDS, Aurora | SQL Database | Cloud SQL | ACID事务处理 |
| 非关系型 | DynamoDB | Cosmos DB | Firestore | 灵活 schema 需求 |
| 缓存 | ElastiCache | Cache for Redis | Memorystore | 会话存储、缓存加速 |
| 数据仓库 | Redshift | Synapse | BigQuery | 数据分析 |
Infrastructure as Code
基础设施即代码
Terraform Best Practices
Terraform 最佳实践
Project Structure:
infrastructure/
├── modules/
│ ├── networking/
│ ├── compute/
│ └── database/
├── environments/
│ ├── dev/
│ ├── staging/
│ └── prod/
├── main.tf
├── variables.tf
├── outputs.tf
└── versions.tfState Management:
- Use remote state (S3, Azure Blob, GCS)
- Enable state locking (DynamoDB, Blob lease)
- Separate state per environment
- Never commit state files
Module Design:
- Single responsibility per module
- Expose minimal required variables
- Document inputs/outputs
- Version modules with git tags
项目结构:
infrastructure/
├── modules/
│ ├── networking/
│ ├── compute/
│ └── database/
├── environments/
│ ├── dev/
│ ├── staging/
│ └── prod/
├── main.tf
├── variables.tf
├── outputs.tf
└── versions.tf状态管理:
- 使用远程状态存储(S3、Azure Blob、GCS)
- 启用状态锁定(DynamoDB、Blob 租约)
- 按环境分离状态
- 切勿提交状态文件
模块设计:
- 每个模块单一职责
- 暴露最少必要变量
- 文档化输入/输出
- 通过Git标签为模块版本化
Cost Optimization
成本优化
Compute Savings:
- Reserved Instances (1-3 year commitment): 30-60% savings
- Spot/Preemptible instances: 60-90% savings for interruptible workloads
- Right-sizing: Match instance size to actual usage
- Auto-scaling: Scale down during low usage
Storage Savings:
- Lifecycle policies: Auto-transition to cheaper tiers
- Compression: Reduce storage footprint
- Deduplication: Eliminate redundant data
- Delete unused resources: Orphaned volumes, snapshots
Network Savings:
- Use CDN for static content
- Optimize data transfer paths
- Use private endpoints
- Compress API responses
计算资源成本节省:
- 预留实例(1-3年承诺期):节省30-60%成本
- 竞价/抢占式实例:为可中断工作负载节省60-90%成本
- 资源合理配置:匹配实例规格与实际使用需求
- 自动扩缩容:低负载时自动缩容
存储资源成本节省:
- 生命周期策略:自动转换至低成本存储层
- 压缩:减少存储占用
- 重复数据删除:消除冗余数据
- 删除未使用资源:孤立卷、快照等
网络成本节省:
- 为静态内容使用CDN
- 优化数据传输路径
- 使用私有端点
- 压缩API响应
High Availability Patterns
高可用模式
Multi-AZ Deployment
多可用区部署
- Deploy across 2-3 availability zones
- Use load balancers for distribution
- Database replication across AZs
- Automatic failover configuration
- 跨2-3个可用区部署
- 使用负载均衡器分发流量
- 数据库跨可用区复制
- 配置自动故障转移
Multi-Region Deployment
多区域部署
- Active-active or active-passive
- DNS-based routing (Route53, Traffic Manager)
- Data replication strategy
- Disaster recovery procedures
- 双活或主备模式
- 基于DNS的路由(Route53、Traffic Manager)
- 数据复制策略
- 灾难恢复流程
Resilience Patterns
弹性模式
- Circuit breakers for external dependencies
- Retry with exponential backoff
- Bulkhead isolation
- Graceful degradation
- 为外部依赖配置断路器
- 指数退避重试机制
- 舱壁隔离
- 优雅降级
Security Best Practices
安全最佳实践
Identity & Access
身份与访问管理
- Principle of least privilege
- Use IAM roles, not long-term credentials
- Enable MFA for privileged accounts
- Regular access reviews
- 最小权限原则
- 使用IAM角色,而非长期凭证
- 为特权账户启用MFA
- 定期访问审核
Network Security
网络安全
- VPC/VNet isolation
- Security groups as firewalls
- Private subnets for backend services
- VPN/Direct Connect for hybrid
- VPC/VNet 隔离
- 安全组作为防火墙
- 后端服务部署在私有子网
- 使用VPN/专线实现混合云连接
Data Protection
数据保护
- Encryption at rest (KMS)
- Encryption in transit (TLS)
- Key rotation policies
- Backup and recovery testing
- 静态数据加密(KMS)
- 传输中数据加密(TLS)
- 密钥轮换策略
- 备份与恢复测试
Monitoring & Observability
监控与可观测性
Key Metrics
关键指标
- CPU, Memory, Disk utilization
- Network throughput and latency
- Error rates and types
- Cost per service/team
- CPU、内存、磁盘利用率
- 网络吞吐量与延迟
- 错误率与错误类型
- 按服务/团队划分的成本
Alerting Strategy
告警策略
- Set thresholds based on baselines
- Alert on symptoms, not causes
- Runbooks for each alert
- Escalation paths defined
- 基于基线设置阈值
- 针对症状而非原因告警
- 为每个告警配备运行手册
- 定义升级路径
Reference Files
参考文件
- - IaC patterns and examples
references/terraform_patterns.md - - Detailed cost reduction strategies
references/cost_optimization.md
- - IaC模式与示例
references/terraform_patterns.md - - 详细成本削减策略
references/cost_optimization.md
Integration with Other Skills
与其他技能的集成
- security-engineering - For security architecture
- network-engineering - For network design
- performance - For optimization strategies
- devops-runbooks - For operational procedures
- security-engineering - 用于安全架构设计
- network-engineering - 用于网络设计
- performance - 用于优化策略制定
- devops-runbooks - 用于操作流程规范