azure-well-architected-framework

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

🚨 CRITICAL GUIDELINES

⚠️ 关键指南

Windows File Path Requirements

Windows文件路径要求

MANDATORY: Always Use Backslashes on Windows for File Paths
When using Edit or Write tools on Windows, you MUST use backslashes (
\
) in file paths, NOT forward slashes (
/
).
Examples:
  • ❌ WRONG:
    D:/repos/project/file.tsx
  • ✅ CORRECT:
    D:\repos\project\file.tsx
This applies to:
  • Edit tool file_path parameter
  • Write tool file_path parameter
  • All file operations on Windows systems
强制要求:在Windows上始终使用反斜杠表示文件路径
在Windows系统上使用编辑或写入工具时,文件路径必须使用反斜杠(
\
),而不能使用正斜杠(
/
)。
示例:
  • ❌ 错误:
    D:/repos/project/file.tsx
  • ✅ 正确:
    D:\repos\project\file.tsx
此要求适用于:
  • 编辑工具的file_path参数
  • 写入工具的file_path参数
  • Windows系统上的所有文件操作

Documentation Guidelines

文档指南

NEVER create new documentation files unless explicitly requested by the user.
  • Priority: Update existing README.md files rather than creating new documentation
  • Repository cleanliness: Keep repository root clean - only README.md unless user requests otherwise
  • Style: Documentation should be concise, direct, and professional - avoid AI-generated tone
  • User preference: Only create additional .md files when user specifically asks for documentation

除非用户明确要求,否则绝不要创建新的文档文件。
  • 优先级:优先更新现有README.md文件,而非创建新文档
  • 仓库整洁性:保持仓库根目录整洁 - 除非用户要求,否则仅保留README.md
  • 风格:文档应简洁、直接且专业 - 避免AI生成的语气
  • 用户偏好:仅当用户明确要求文档时,才创建额外的.md文件

Azure Well-Architected Framework

Azure Well-Architected Framework

The Azure Well-Architected Framework is a set of guiding tenets for building high-quality cloud solutions. It consists of five pillars of architectural excellence.
Azure Well-Architected Framework是一套用于构建高质量云解决方案的指导原则,包含五大架构卓越支柱。

Overview

概述

Purpose: Help architects and engineers build secure, high-performing, resilient, and efficient infrastructure for applications.
The Five Pillars:
  1. Reliability
  2. Security
  3. Cost Optimization
  4. Operational Excellence
  5. Performance Efficiency
目的:帮助架构师和工程师为应用构建安全、高性能、弹性且高效的基础设施。
五大支柱:
  1. 可靠性
  2. 安全性
  3. 成本优化
  4. 运营卓越
  5. 性能效率

Pillar 1: Reliability

支柱1:可靠性

Definition: The ability of a system to recover from failures and continue to function.
Key Principles:
  • Design for failure
  • Use availability zones and regions
  • Implement redundancy
  • Monitor and respond to failures
  • Test disaster recovery
Best Practices:
Availability Zones:
bash
undefined
定义:系统从故障中恢复并持续运行的能力。
核心原则:
  • 为故障而设计
  • 使用可用性区域(Availability Zones)和区域(Regions)
  • 实现冗余
  • 监控并响应故障
  • 测试灾难恢复
最佳实践:
可用性区域:
bash
undefined

Deploy VM across availability zones

跨可用性区域部署VM

az vm create
--resource-group MyRG
--name MyVM
--zone 1
--image Ubuntu2204
--size Standard_D2s_v3
az vm create
--resource-group MyRG
--name MyVM
--zone 1
--image Ubuntu2204
--size Standard_D2s_v3

Availability SLAs:

可用性SLA:

- Single VM (Premium SSD): 99.9%

- 单个VM(高级SSD): 99.9%

- Availability Set: 99.95%

- 可用性集: 99.95%

- Availability Zones: 99.99%

- 可用性区域: 99.99%


**Backup and Disaster Recovery:**
```bash

**备份与灾难恢复:**
```bash

Enable Azure Backup

启用Azure备份

az backup protection enable-for-vm
--resource-group MyRG
--vault-name MyVault
--vm MyVM
--policy-name DefaultPolicy
az backup protection enable-for-vm
--resource-group MyRG
--vault-name MyVault
--vm MyVM
--policy-name DefaultPolicy

Recovery Point Objective (RPO): How much data loss is acceptable

恢复点目标(RPO): 可接受的数据丢失量

Recovery Time Objective (RTO): How long can system be down

恢复时间目标(RTO): 系统可停机的最长时间


**Health Probes:**
- Application Gateway health probes
- Load Balancer probes
- Traffic Manager endpoint monitoring

**健康探测:**
- 应用程序网关(Application Gateway)健康探测
- 负载均衡器(Load Balancer)探测
- 流量管理器(Traffic Manager)端点监控

Pillar 2: Security

支柱2:安全性

Definition: Protecting applications and data from threats.
Key Principles:
  • Defense in depth
  • Least privilege access
  • Secure the network
  • Protect data at rest and in transit
  • Monitor and audit
Best Practices:
Identity and Access:
bash
undefined
定义:保护应用和数据免受威胁。
核心原则:
  • 纵深防御
  • 最小权限访问
  • 保障网络安全
  • 保护静态和传输中的数据
  • 监控与审计
最佳实践:
身份与访问:
bash
undefined

Use managed identities (no credentials in code)

使用托管身份(代码中无需凭据)

az vm identity assign
--resource-group MyRG
--name MyVM
az vm identity assign
--resource-group MyRG
--name MyVM

RBAC assignment

RBAC分配

az role assignment create
--assignee <principal-id>
--role "Contributor"
--scope /subscriptions/<subscription-id>/resourceGroups/MyRG

**Network Security:**
- Use Network Security Groups (NSGs)
- Implement Azure Firewall or Application Gateway WAF
- Use Private Endpoints for PaaS services
- Enable DDoS Protection Standard for public-facing apps

**Data Protection:**
```bash
az role assignment create
--assignee <principal-id>
--role "Contributor"
--scope /subscriptions/<subscription-id>/resourceGroups/MyRG

**网络安全:**
- 使用网络安全组(NSGs)
- 部署Azure防火墙或应用程序网关WAF
- 为PaaS服务使用专用终结点(Private Endpoints)
- 为面向公网的应用启用标准DDoS防护

**数据保护:**
```bash

Enable encryption at rest (automatic for most services)

启用静态加密(大多数服务默认自动启用)

Enable TLS 1.2+ for data in transit

为传输中的数据启用TLS 1.2+

Azure Storage encryption

Azure存储加密

az storage account update
--name mystorageaccount
--resource-group MyRG
--min-tls-version TLS1_2
--https-only true

**Security Monitoring:**
```bash
az storage account update
--name mystorageaccount
--resource-group MyRG
--min-tls-version TLS1_2
--https-only true

**安全监控:**
```bash

Enable Microsoft Defender for Cloud

启用Microsoft Defender for Cloud

az security pricing create
--name VirtualMachines
--tier Standard
az security pricing create
--name VirtualMachines
--tier Standard

Enable Azure Sentinel

启用Azure Sentinel

az sentinel onboard
--resource-group MyRG
--workspace-name MyWorkspace
undefined
az sentinel onboard
--resource-group MyRG
--workspace-name MyWorkspace
undefined

Pillar 3: Cost Optimization

支柱3:成本优化

Definition: Managing costs to maximize the value delivered.
Key Principles:
  • Plan and estimate costs
  • Provision with optimization
  • Use monitoring and analytics
  • Maximize efficiency of cloud spend
Best Practices:
Right-Sizing:
bash
undefined
定义:管理成本以最大化交付价值。
核心原则:
  • 规划并估算成本
  • 优化配置
  • 使用监控与分析
  • 最大化云支出效率
最佳实践:
资源合理选型:
bash
undefined

Use Azure Advisor recommendations

使用Azure Advisor建议

az advisor recommendation list
--category Cost
--output table
az advisor recommendation list
--category Cost
--output table

Common optimizations:

常见优化措施:

1. Shutdown dev/test VMs when not in use

1. 非使用时段关闭开发/测试VM

2. Use Azure Hybrid Benefit for Windows/SQL

2. 为Windows/SQL使用Azure Hybrid Benefit

3. Purchase reservations for consistent workloads

3. 为稳定工作负载购买预留实例

4. Use autoscaling to match demand

4. 使用自动缩放匹配需求


**Reserved Instances:**
- 1-year or 3-year commitment
- Save up to 72% vs pay-as-you-go
- Available for VMs, SQL Database, Cosmos DB, Synapse, Storage

**Azure Hybrid Benefit:**
```bash

**预留实例:**
- 1年或3年承诺
- 相比按需付费最高节省72%
- 适用于VM、SQL数据库、Cosmos DB、Synapse、存储等服务

**Azure Hybrid Benefit:**
```bash

Apply Windows license to VM

为VM应用Windows许可证

az vm update
--resource-group MyRG
--name MyVM
--license-type Windows_Server
az vm update
--resource-group MyRG
--name MyVM
--license-type Windows_Server

SQL Server Hybrid Benefit

SQL Server Hybrid Benefit

az sql vm create
--resource-group MyRG
--name MySQLVM
--license-type AHUB

**Cost Management:**
```bash
az sql vm create
--resource-group MyRG
--name MySQLVM
--license-type AHUB

**成本管理:**
```bash

Create budget

创建预算

az consumption budget create
--budget-name MyBudget
--category cost
--amount 1000
--time-grain monthly
--start-date 2025-01-01
--end-date 2025-12-31
az consumption budget create
--budget-name MyBudget
--category cost
--amount 1000
--time-grain monthly
--start-date 2025-01-01
--end-date 2025-12-31

Set up alerts at 80%, 100%, 120% of budget

在预算的80%、100%、120%设置警报

undefined
undefined

Pillar 4: Operational Excellence

支柱4:运营卓越

Definition: Operations processes that keep a system running in production.
Key Principles:
  • Automate operations
  • Monitor and gain insights
  • Refine operations procedures
  • Anticipate failure
  • Stay current with updates
Best Practices:
Infrastructure as Code:
bash
undefined
定义:保持系统在生产环境中持续运行的运营流程。
核心原则:
  • 自动化运营
  • 监控并获取洞察
  • 优化运营流程
  • 预判故障
  • 保持更新
最佳实践:
基础设施即代码:
bash
undefined

Use ARM, Bicep, or Terraform

使用ARM、Bicep或Terraform

Version control all infrastructure

对所有基础设施进行版本控制

Implement CI/CD for infrastructure

为基础设施实现CI/CD

Example: Bicep deployment

示例:Bicep部署

az deployment group create
--resource-group MyRG
--template-file main.bicep
--parameters @parameters.json

**Monitoring and Alerting:**
```bash
az deployment group create
--resource-group MyRG
--template-file main.bicep
--parameters @parameters.json

**监控与警报:**
```bash

Application Insights for apps

为应用启用Application Insights

az monitor app-insights component create
--app MyApp
--location eastus
--resource-group MyRG
az monitor app-insights component create
--app MyApp
--location eastus
--resource-group MyRG

Log Analytics for infrastructure

为基础设施启用Log Analytics

az monitor log-analytics workspace create
--resource-group MyRG
--workspace-name MyWorkspace
az monitor log-analytics workspace create
--resource-group MyRG
--workspace-name MyWorkspace

Create alerts

创建警报

az monitor metrics alert create
--name HighCPU
--resource-group MyRG
--scopes <vm-id>
--condition "avg Percentage CPU > 80"
--description "CPU usage is above 80%"

**DevOps Practices:**
- Continuous Integration/Continuous Deployment (CI/CD)
- Blue-green deployments
- Canary releases
- Feature flags
- Automated testing
az monitor metrics alert create
--name HighCPU
--resource-group MyRG
--scopes <vm-id>
--condition "avg Percentage CPU > 80"
--description "CPU使用率超过80%"

**DevOps实践:**
- 持续集成/持续部署(CI/CD)
- 蓝绿部署
- 金丝雀发布
- 功能标志
- 自动化测试

Pillar 5: Performance Efficiency

支柱5:性能效率

Definition: The ability of a system to adapt to changes in load.
Key Principles:
  • Scale horizontally
  • Choose the right resources
  • Monitor performance
  • Optimize network and data access
Best Practices:
Scaling:
bash
undefined
定义:系统适应负载变化的能力。
核心原则:
  • 横向扩展
  • 选择合适的资源
  • 监控性能
  • 优化网络与数据访问
最佳实践:
扩展:
bash
undefined

Horizontal scaling (preferred)

横向扩展(推荐)

VM Scale Sets

VM规模集

az vmss create
--resource-group MyRG
--name MyVMSS
--image Ubuntu2204
--instance-count 3
--vm-sku Standard_D2s_v3
az vmss create
--resource-group MyRG
--name MyVMSS
--image Ubuntu2204
--instance-count 3
--vm-sku Standard_D2s_v3

Autoscaling

自动缩放

az monitor autoscale create
--resource-group MyRG
--resource MyVMSS
--resource-type Microsoft.Compute/virtualMachineScaleSets
--name MyAutoscale
--min-count 2
--max-count 10

**Caching:**
- Azure Cache for Redis
- Azure CDN for static content
- Application-level caching

**Data Access:**
- Use indexes on databases
- Implement caching strategies
- Use CDN for global content delivery
- Optimize queries (SQL, Cosmos DB)

**Networking:**
```bash
az monitor autoscale create
--resource-group MyRG
--resource MyVMSS
--resource-type Microsoft.Compute/virtualMachineScaleSets
--name MyAutoscale
--min-count 2
--max-count 10

**缓存:**
- Azure Cache for Redis
- 用于静态内容的Azure CDN
- 应用级缓存

**数据访问:**
- 为数据库创建索引
- 实施缓存策略
- 为全球内容分发使用CDN
- 优化查询(SQL、Cosmos DB)

**网络:**
```bash

Use Azure Front Door for global apps

为全球应用使用Azure Front Door

az afd profile create
--profile-name MyFrontDoor
--resource-group MyRG
--sku Premium_AzureFrontDoor
az afd profile create
--profile-name MyFrontDoor
--resource-group MyRG
--sku Premium_AzureFrontDoor

Features:

功能:

- Global load balancing

- 全球负载均衡

- CDN capabilities

- CDN能力

- Web Application Firewall

- Web应用防火墙

- SSL offloading

- SSL卸载

- Caching

- 缓存

undefined
undefined

Assessment and Tools

评估与工具

Azure Well-Architected Review:
bash
undefined
Azure Well-Architected评估:
bash
undefined

Self-assessment tool in Azure Portal

Azure门户中的自我评估工具

Generates recommendations per pillar

按支柱生成建议

Provides actionable guidance

提供可操作的指导


**Azure Advisor:**
```bash

**Azure Advisor:**
```bash

Get recommendations

获取建议

az advisor recommendation list --output table
az advisor recommendation list --output table

Categories:

分类:

- Reliability (High Availability)

- 可靠性(高可用性)

- Security

- 安全性

- Performance

- 性能

- Cost

- 成本

- Operational Excellence

- 运营卓越

undefined
undefined

Implementation Checklist

实施检查清单

Reliability:
  • Deploy across availability zones
  • Implement backup strategy
  • Define RTO and RPO
  • Test disaster recovery
  • Implement health monitoring
Security:
  • Enable Azure AD authentication
  • Implement RBAC (least privilege)
  • Encrypt data at rest and in transit
  • Enable Microsoft Defender for Cloud
  • Implement network segmentation (NSGs, Firewall)
  • Use Key Vault for secrets
Cost Optimization:
  • Right-size resources
  • Purchase reservations for predictable workloads
  • Enable autoscaling
  • Use Azure Hybrid Benefit
  • Implement budget alerts
  • Review Azure Advisor cost recommendations
Operational Excellence:
  • Implement Infrastructure as Code
  • Set up CI/CD pipelines
  • Enable comprehensive monitoring
  • Create operational runbooks
  • Implement automated alerting
  • Use tags for resource organization
Performance Efficiency:
  • Choose appropriate resource SKUs
  • Implement autoscaling
  • Use caching (Redis, CDN)
  • Optimize database queries
  • Implement load balancing
  • Monitor performance metrics
可靠性:
  • 跨可用性区域部署
  • 实施备份策略
  • 定义RTO和RPO
  • 测试灾难恢复
  • 实施健康监控
安全性:
  • 启用Azure AD身份验证
  • 实施RBAC(最小权限)
  • 加密静态和传输中的数据
  • 启用Microsoft Defender for Cloud
  • 实施网络分段(NSGs、防火墙)
  • 使用Key Vault存储密钥
成本优化:
  • 合理选型资源
  • 为可预测工作负载购买预留实例
  • 启用自动缩放
  • 使用Azure Hybrid Benefit
  • 实施预算警报
  • 查看Azure Advisor成本建议
运营卓越:
  • 实施基础设施即代码
  • 搭建CI/CD流水线
  • 启用全面监控
  • 创建运营手册
  • 实施自动化警报
  • 使用标签组织资源
性能效率:
  • 选择合适的资源SKU
  • 实施自动缩放
  • 使用缓存(Redis、CDN)
  • 优化数据库查询
  • 实施负载均衡
  • 监控性能指标

Common Patterns

常见模式

Highly Available Web Application:
  • Application Gateway (WAF enabled)
  • App Service (Premium tier, multiple instances)
  • Azure SQL Database (Zone-redundant)
  • Azure Cache for Redis
  • Application Insights
  • Azure Front Door (global distribution)
Mission-Critical Application:
  • Multi-region deployment
  • Traffic Manager or Front Door (global routing)
  • Availability Zones in each region
  • Geo-redundant storage (GRS or RA-GRS)
  • Automated backups with geo-replication
  • Comprehensive monitoring and alerting
Cost-Optimized Dev/Test:
  • Auto-shutdown for VMs
  • B-series (burstable) VMs
  • Dev/Test pricing tiers
  • Shared App Service plans
  • Azure DevTest Labs
高可用Web应用:
  • 应用程序网关(启用WAF)
  • App Service(高级层,多实例)
  • Azure SQL数据库(区域冗余)
  • Azure Cache for Redis
  • Application Insights
  • Azure Front Door(全球分发)
关键业务应用:
  • 多区域部署
  • Traffic Manager或Front Door(全球路由)
  • 每个区域使用可用性区域
  • 异地冗余存储(GRS或RA-GRS)
  • 带异地复制的自动备份
  • 全面监控与警报
成本优化的开发/测试环境:
  • VM自动关机
  • B系列(突发型)VM
  • 开发/测试定价层
  • 共享App Service计划
  • Azure DevTest Labs

References

参考资料

Key Takeaways

关键要点

  1. Balance the Pillars: Trade-offs exist between pillars (e.g., cost vs. reliability)
  2. Continuous Improvement: Architecture is not static, revisit regularly
  3. Measure and Monitor: Use data to drive decisions
  4. Automation: Automate repetitive tasks to improve reliability and reduce costs
  5. Security First: Integrate security into every layer of architecture
The Well-Architected Framework provides a consistent approach to evaluating architectures and implementing designs that scale over time.
  1. 平衡支柱: 各支柱之间存在权衡(例如:成本与可靠性)
  2. 持续改进: 架构并非静态,需定期回顾
  3. 度量与监控: 基于数据驱动决策
  4. 自动化: 自动化重复任务以提升可靠性并降低成本
  5. 安全优先: 将安全集成到架构的每一层
Well-Architected Framework提供了一致的方法来评估架构,并实施可随时间扩展的设计。