Loading...
Loading...
Compare original and translation side by side
| Pillar | Focus | Key Question |
|---|---|---|
| Operational Excellence | Run and monitor systems | How do we operate effectively? |
| Security | Protect information and systems | How do we protect data and resources? |
| Reliability | Recover from failures | How do we ensure workload availability? |
| Performance Efficiency | Use resources effectively | How do we meet performance requirements? |
| Cost Optimization | Avoid unnecessary costs | How do we achieve cost-effective outcomes? |
| Sustainability | Minimize environmental impact | How do we reduce carbon footprint? |
| 支柱 | 关注重点 | 核心问题 |
|---|---|---|
| Operational Excellence(卓越运维) | 运行和监控系统 | 我们如何高效运维? |
| Security(安全) | 保护信息和系统 | 我们如何保护数据和资源? |
| Reliability(可靠性) | 从故障中恢复 | 我们如何确保工作负载的可用性? |
| Performance Efficiency(性能效率) | 高效利用资源 | 我们如何满足性能要求? |
| Cost Optimization(成本优化) | 避免不必要的成本 | 我们如何实现高性价比的结果? |
| Sustainability(可持续性) | 最小化环境影响 | 我们如何减少碳足迹? |
digraph review_flow {
"Architecture review needed" [shape=doublecircle];
"Identify workload scope" [shape=box];
"Review each pillar systematically" [shape=box];
"Document findings per pillar" [shape=box];
"Prioritize improvements" [shape=box];
"Create action plan" [shape=box];
"All pillars reviewed?" [shape=diamond];
"Complete" [shape=doublecircle];
"Architecture review needed" -> "Identify workload scope";
"Identify workload scope" -> "Review each pillar systematically";
"Review each pillar systematically" -> "Document findings per pillar";
"Document findings per pillar" -> "All pillars reviewed?";
"All pillars reviewed?" -> "Review each pillar systematically" [label="no"];
"All pillars reviewed?" -> "Prioritize improvements" [label="yes"];
"Prioritize improvements" -> "Create action plan";
"Create action plan" -> "Complete";
}digraph review_flow {
"Architecture review needed" [shape=doublecircle];
"Identify workload scope" [shape=box];
"Review each pillar systematically" [shape=box];
"Document findings per pillar" [shape=box];
"Prioritize improvements" [shape=box];
"Create action plan" [shape=box];
"All pillars reviewed?" [shape=diamond];
"Complete" [shape=doublecircle];
"Architecture review needed" -> "Identify workload scope";
"Identify workload scope" -> "Review each pillar systematically";
"Review each pillar systematically" -> "Document findings per pillar";
"Document findings per pillar" -> "All pillars reviewed?";
"All pillars reviewed?" -> "Review each pillar systematically" [label="no"];
"All pillars reviewed?" -> "Prioritize improvements" [label="yes"];
"Prioritize improvements" -> "Create action plan";
"Create action plan" -> "Complete";
}| Issue | Solution |
|---|---|
| Manual deployments | Implement CI/CD with CloudFormation/CDK/Terraform |
| No visibility into system health | Add CloudWatch dashboards, metrics, alarms |
| Operational procedures outdated | Regular runbook reviews, post-incident learning |
| Slow incident response | Create automated remediation with Lambda/Systems Manager |
| 问题 | 解决方案 |
|---|---|
| 手动部署 | 使用CloudFormation/CDK/Terraform实现CI/CD |
| 无系统健康可见性 | 添加CloudWatch仪表板、指标和告警 |
| 运维流程过时 | 定期评审运行手册,开展事后复盘学习 |
| 事件响应缓慢 | 使用Lambda/Systems Manager创建自动化修复 |
// ❌ DANGEROUS: Hardcoded credentials
const AWS = require('aws-sdk');
const s3 = new AWS.S3({
accessKeyId: 'AKIAIOSFODNN7EXAMPLE',
secretAccessKey: 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
});// ✅ CORRECT: Use IAM roles
const AWS = require('aws-sdk');
const s3 = new AWS.S3(); // Credentials from IAM role
// Lambda function with IAM role
const lambda = new lambda.Function(this, 'MyFunction', {
// IAM role with least privilege
role: myRole,
// ...
});// ❌ DANGEROUS: Hardcoded credentials
const AWS = require('aws-sdk');
const s3 = new AWS.S3({
accessKeyId: 'AKIAIOSFODNN7EXAMPLE',
secretAccessKey: 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
});// ✅ CORRECT: Use IAM roles
const AWS = require('aws-sdk');
const s3 = new AWS.S3(); // Credentials from IAM role
// Lambda function with IAM role
const lambda = new lambda.Function(this, 'MyFunction', {
// IAM role with least privilege
role: myRole,
// ...
});Region
├── AZ-1: Application + Database
├── AZ-2: Application + Database (standby)
└── AZ-3: Application + Database (standby)Primary Region Secondary Region
├── Active workload ├── Standby/Active
├── Database (primary) ├── Database (replica)
└── Route 53 health check monitoringRegion
├── AZ-1: Application + Database
├── AZ-2: Application + Database (standby)
└── AZ-3: Application + Database (standby)Primary Region Secondary Region
├── Active workload ├── Standby/Active
├── Database (primary) ├── Database (replica)
└── Route 53 health check monitoring| Data Type | Solution | RPO | RTO |
|---|---|---|---|
| RDS | Automated backups + snapshots | < 5 min | < 30 min |
| DynamoDB | Point-in-time recovery | Seconds | Minutes |
| S3 | Versioning + cross-region replication | Real-time | Immediate |
| EBS | Snapshots via AWS Backup | Hours | Hours |
| 数据类型 | 解决方案 | RPO | RTO |
|---|---|---|---|
| RDS | 自动备份 + 快照 | < 5分钟 | < 30分钟 |
| DynamoDB | 时间点恢复 | 秒级 | 分钟级 |
| S3 | 版本控制 + 跨区域复制 | 实时 | 即时 |
| EBS | 通过AWS Backup创建快照 | 小时级 | 小时级 |
Client → CloudFront (edge cache)
→ API Gateway
→ Lambda
→ ElastiCache (data cache)
→ DynamoDB/RDS| Use Case | Recommended Service |
|---|---|
| Relational, complex queries | RDS (PostgreSQL/MySQL) |
| High throughput, simple queries | DynamoDB |
| Graph relationships | Neptune |
| Search and analytics | OpenSearch |
| Time-series data | Timestream |
| In-memory cache | ElastiCache (Redis/Memcached) |
Client → CloudFront (edge cache)
→ API Gateway
→ Lambda
→ ElastiCache (data cache)
→ DynamoDB/RDS| 使用场景 | 推荐服务 |
|---|---|
| 关系型、复杂查询 | RDS (PostgreSQL/MySQL) |
| 高吞吐量、简单查询 | DynamoDB |
| 图关系 | Neptune |
| 搜索与分析 | OpenSearch |
| 时序数据 | Timestream |
| 内存缓存 | ElastiCache (Redis/Memcached) |
| Strategy | Implementation | Potential Savings |
|---|---|---|
| Right-sizing | Use Compute Optimizer recommendations | 20-40% |
| Reserved Instances | 1-year or 3-year commitments | 30-75% |
| Savings Plans | Flexible compute commitments | 30-70% |
| Spot Instances | Fault-tolerant workloads | 50-90% |
| S3 Intelligent-Tiering | Automatic storage class optimization | 40-60% |
| Auto Scaling | Scale resources with demand | 30-50% |
| Lambda instead of EC2 | For appropriate workloads | Varies |
| 策略 | 实施方式 | 潜在节省比例 |
|---|---|---|
| 规格优化 | 使用Compute Optimizer的建议 | 20-40% |
| 预留实例 | 1年或3年承诺 | 30-75% |
| 节省计划 | 灵活的计算承诺 | 30-70% |
| 竞价实例 | 用于容错工作负载 | 50-90% |
| S3智能分层 | 自动存储类优化 | 40-60% |
| 自动扩缩容 | 根据需求缩放资源 | 30-50% |
| Lambda替代EC2 | 适用于合适的工作负载 | 视情况而定 |
// CDK Example: Set up budget alerts
import * as budgets from 'aws-cdk-lib/aws-budgets';
new budgets.CfnBudget(this, 'MonthlyBudget', {
budget: {
budgetType: 'COST',
timeUnit: 'MONTHLY',
budgetLimit: {
amount: 1000,
unit: 'USD',
},
},
notificationsWithSubscribers: [{
notification: {
notificationType: 'ACTUAL',
comparisonOperator: 'GREATER_THAN',
threshold: 80, // Alert at 80%
},
subscribers: [{
subscriptionType: 'EMAIL',
address: 'team@example.com',
}],
}],
});// CDK Example: Set up budget alerts
import * as budgets from 'aws-cdk-lib/aws-budgets';
new budgets.CfnBudget(this, 'MonthlyBudget', {
budget: {
budgetType: 'COST',
timeUnit: 'MONTHLY',
budgetLimit: {
amount: 1000,
unit: 'USD',
},
},
notificationsWithSubscribers: [{
notification: {
notificationType: 'ACTUAL',
comparisonOperator: 'GREATER_THAN',
threshold: 80, // Alert at 80%
},
subscribers: [{
subscriptionType: 'EMAIL',
address: 'team@example.com',
}],
}],
});| Priority | Criteria |
|---|---|
| High | Security vulnerabilities, critical availability risks, major cost waste |
| Medium | Performance issues, moderate cost optimization, operational improvements |
| Low | Nice-to-haves, future considerations, minor optimizations |
| 优先级 | 标准 |
|---|---|
| 高 | 安全漏洞、关键可用性风险、重大成本浪费 |
| 中 | 性能问题、中等成本优化、运维改进 |
| 低 | 锦上添花的功能、未来考量、微小优化 |
undefinedundefinedundefinedundefined| Anti-Pattern | Issue | Better Approach |
|---|---|---|
| Single AZ deployment | No fault tolerance | Multi-AZ architecture |
| No IaC | Manual config, drift | CloudFormation/CDK/Terraform |
| Hardcoded secrets | Security vulnerability | Secrets Manager/Parameter Store |
| No monitoring | Blind operation | CloudWatch dashboards + alarms |
| No backups | Data loss risk | Automated backup strategy |
| Over-provisioning | Cost waste | Right-sizing + Auto Scaling |
| No cost tracking | Budget overruns | Tags + Budgets + Cost Explorer |
| Monolithic architecture | Hard to scale | Microservices or serverless |
| 反模式 | 问题 | 更佳方案 |
|---|---|---|
| 单可用区部署 | 无容错能力 | 多可用区架构 |
| 无IaC | 手动配置、配置漂移 | CloudFormation/CDK/Terraform |
| 硬编码密钥 | 安全漏洞 | Secrets Manager/Parameter Store |
| 无监控 | 盲目运维 | CloudWatch仪表板 + 告警 |
| 无备份 | 数据丢失风险 | 自动化备份策略 |
| 过度配置 | 成本浪费 | 合理规格 + 自动扩缩容 |
| 无成本追踪 | 预算超支 | 标签 + 预算 + Cost Explorer |
| 单体架构 | 难以扩展 | 微服务或无服务器 |
| Mistake | Why It's Wrong | Correct Approach |
|---|---|---|
| "Sustainability doesn't apply to this workload" | Every workload consumes resources and energy | Review all 6 pillars, even if findings are minimal |
| Skipping current state documentation | Can't measure improvement without baseline | Always document "Current State" before recommendations |
| Generic recommendations | Not actionable or specific to this workload | Provide specific AWS services, code examples, priorities |
| No prioritization | Everything seems equally important | Use HIGH/MEDIUM/LOW risk levels, create phased plan |
| Forgetting about trade-offs | Optimizing one pillar at expense of others | Explicitly call out trade-offs (e.g., multi-region cost vs reliability) |
| 错误 | 错误原因 | 正确做法 |
|---|---|---|
| “可持续性不适用于此工作负载” | 每个工作负载都会消耗资源和能源 | 评审所有6个支柱,即使发现的问题很少 |
| 跳过当前状态记录 | 没有基线就无法衡量改进 | 给出建议前始终记录“当前状态” |
| 通用建议 | 不具可操作性或不针对工作负载 | 提供具体的AWS服务、代码示例和优先级 |
| 无优先级划分 | 所有事项看似同等重要 | 使用高/中/低风险级别,制定分阶段计划 |
| 忽略权衡取舍 | 优化一个支柱以牺牲其他支柱为代价 | 明确指出权衡(例如,多区域部署的成本与可靠性) |