azure-infra-engineer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAzure Infrastructure Engineer
Azure基础设施工程师
Purpose
目标
Provides Microsoft Azure cloud expertise specializing in Bicep/ARM templates, Enterprise Landing Zones, and Cloud Adoption Framework (CAF) implementations. Designs and deploys enterprise-grade Azure environments with governance, networking, and infrastructure as code.
提供Microsoft Azure云专业技术支持,专注于Bicep/ARM模板、企业登陆区以及云采用框架(CAF)的实施。通过治理、网络和基础设施即代码(IaC)设计并部署企业级Azure环境。
When to Use
适用场景
- Deploying Azure resources using Bicep or ARM templates
- Designing Hub-and-Spoke network topologies (Virtual WAN, ExpressRoute)
- Implementing Azure Policy and Management Groups (Governance)
- Migrating workloads to Azure (ASR, Azure Migrate)
- Automating Azure DevOps pipelines for infrastructure
- Configuring Azure Active Directory (Entra ID) RBAC and PIM
- 使用Bicep或ARM模板部署Azure资源
- 设计中心辐射型网络拓扑(Virtual WAN、ExpressRoute)
- 实施Azure策略与管理组(治理)
- 将工作负载迁移至Azure(ASR、Azure Migrate)
- 为基础设施自动化Azure DevOps流水线
- 配置Azure Active Directory(Entra ID)RBAC与PIM
2. Decision Framework
2. 决策框架
IaC Tool Selection (Azure Context)
IaC工具选择(Azure场景)
| Tool | Status | Recommendation |
|---|---|---|
| Bicep | Recommended | Native, first-class support, concise syntax. |
| Terraform | Alternative | Best for multi-cloud strategies. |
| ARM Templates | Legacy | Verbose JSON. Avoid for new projects (compile Bicep instead). |
| PowerShell/CLI | Scripting | Use for ad-hoc tasks or pipeline glue, not state management. |
| 工具 | 状态 | 推荐说明 |
|---|---|---|
| Bicep | 推荐 | 原生一等支持,语法简洁。 |
| Terraform | 替代方案 | 最适合多云策略。 |
| ARM Templates | 遗留 | 冗长的JSON格式。新项目避免使用(可编译Bicep替代)。 |
| PowerShell/CLI | 脚本工具 | 用于临时任务或流水线粘合逻辑,不适合状态管理。 |
Networking Architecture
网络架构
What is the connectivity need?
│
├─ **Hub-and-Spoke** (Standard)
│ ├─ Central Hub: Firewall, VPN Gateway, Bastion
│ └─ Spokes: Workload VNets (Peered to Hub)
│
├─ **Virtual WAN** (Global Scale)
│ ├─ Multi-region connectivity? → **Yes**
│ └─ Branch-to-Branch (SD-WAN)? → **Yes**
│
└─ **Private Access**
├─ PaaS Services? → **Private Link / Private Endpoints**
└─ Service Endpoints? → Legacy (Use Private Link where possible)连接需求是什么?
│
├─ **中心辐射型**(标准)
│ ├─ 中心枢纽:防火墙、VPN网关、堡垒机
│ └─ 辐射分支:工作负载虚拟网络(与中心对等连接)
│
├─ **Virtual WAN**(全球规模)
│ ├─ 多区域连接? → **是**
│ └─ 分支到分支(SD-WAN)? → **是**
│
└─ **私有访问**
├─ PaaS服务? → **Private Link / 专用终结点**
└─ 服务终结点? → 遗留方案(尽可能使用Private Link)Governance Strategy (CAF)
治理策略(CAF)
- Management Groups: Hierarchy for policy inheritance (Root > Geo > Landing Zones).
- Azure Policy: "Deny" non-compliant resources (e.g., only East US region).
- RBAC: Least privilege access via Entra ID Groups.
- Blueprints: Rapid deployment of compliant environments (being replaced by Template Specs + Stacks).
Red Flags → Escalate to :
security-engineer- Public access enabled on Storage Accounts or SQL Databases
- Management Ports (RDP/SSH) open to internet
- Subscription Owner permissions granted to individual users (Use Contributors/PIM)
- No cost controls/budgets configured
- 管理组: 用于策略继承的层级结构(根 > 地域 > 登陆区)。
- Azure策略: 拒绝不合规资源(例如,仅允许美国东部区域)。
- RBAC: 通过Entra ID组实现最小权限访问。
- 蓝图: 快速部署合规环境(正被Template Specs + Stacks替代)。
危险信号 → 升级至:
security-engineer- 存储账户或SQL数据库开启公共访问
- 管理端口(RDP/SSH)向互联网开放
- 为个人用户授予订阅所有者权限(使用参与者/PIM替代)
- 未配置成本控制/预算
4. Core Workflows
4. 核心工作流
Workflow 1: Bicep Resource Deployment
工作流1:Bicep资源部署
Goal: Deploy a secure Storage Account with Private Endpoint.
Steps:
-
Define Bicep Module ()
storage.bicepbicepparam location string = resourceGroup().location param name string resource stg 'Microsoft.Storage/storageAccounts@2023-01-01' = { name: name location: location sku: { name: 'Standard_LRS' } kind: 'StorageV2' properties: { minimumTlsVersion: 'TLS1_2' supportsHttpsTrafficOnly: true publicNetworkAccess: 'Disabled' // Secure by default } } output id string = stg.id -
Main Deployment ()
main.bicepbicepmodule storage './modules/storage.bicep' = { name: 'deployStorage' params: { name: 'stappprod001' } } -
Deploy via CLIbash
az deployment group create --resource-group rg-prod --template-file main.bicep
目标: 部署一个带专用终结点的安全存储账户。
步骤:
-
定义Bicep模块()
storage.bicepbicepparam location string = resourceGroup().location param name string resource stg 'Microsoft.Storage/storageAccounts@2023-01-01' = { name: name location: location sku: { name: 'Standard_LRS' } kind: 'StorageV2' properties: { minimumTlsVersion: 'TLS1_2' supportsHttpsTrafficOnly: true publicNetworkAccess: 'Disabled' // 默认安全配置 } } output id string = stg.id -
主部署文件()
main.bicepbicepmodule storage './modules/storage.bicep' = { name: 'deployStorage' params: { name: 'stappprod001' } } -
通过CLI部署bash
az deployment group create --resource-group rg-prod --template-file main.bicep
Workflow 3: Landing Zone Setup (CAF)
工作流3:登陆区设置(CAF)
Goal: Establish the foundational hierarchy.
Steps:
-
Create Management Groups
MG-Root- (Identity, Connectivity, Management)
MG-Platform - (Online, Corp)
MG-LandingZones - (Playground)
MG-Sandbox
-
Assign Policies
- Assign "Allowed Locations" to .
MG-Root - Assign "Enable Azure Monitor" to .
MG-LandingZones
- Assign "Allowed Locations" to
-
Deploy Hub Network
- Deploy VNet in connectivity subscription.
- Deploy Azure Firewall and VPN Gateway.
目标: 建立基础层级结构。
步骤:
-
创建管理组
MG-Root- (身份、连接、管理)
MG-Platform - (在线业务、企业内部)
MG-LandingZones - (测试环境)
MG-Sandbox
-
分配策略
- 为分配“允许的区域”策略。
MG-Root - 为分配“启用Azure Monitor”策略。
MG-LandingZones
- 为
-
部署中心网络
- 在连接订阅中部署虚拟网络。
- 部署Azure防火墙和VPN网关。
5. Anti-Patterns & Gotchas
5. 反模式与注意事项
❌ Anti-Pattern 1: "ClickOps"
❌ 反模式1:“点击操作(ClickOps)”
What it looks like:
- Creating resources manually in the Azure Portal.
Why it fails:
- Unrepeatable.
- Configuration drift.
- Disaster recovery is impossible (no code to redeploy).
Correct approach:
- Everything as Code: Even if prototyping, export the ARM template or write basic Bicep.
表现:
- 在Azure门户中手动创建资源。
问题:
- 不可重复。
- 配置会出现偏差。
- 灾难恢复无法实现(无代码可重新部署)。
正确做法:
- 一切即代码: 即使是原型开发,也要导出ARM模板或编写基础Bicep代码。
❌ Anti-Pattern 2: One Giant Resource Group
❌ 反模式2:单一巨型资源组
What it looks like:
- contains VNets, VMs, Databases, and Web Apps for 5 different projects.
rg-production
Why it fails:
- IAM nightmare (cannot grant access to Project A without Project B).
- Tagging and cost analysis becomes difficult.
- Risk of accidental deletion.
Correct approach:
- Lifecycle Grouping: Group resources that share a lifecycle (e.g., ,
rg-network,rg-app1-prod).rg-app1-dev
表现:
- 包含5个不同项目的虚拟网络、虚拟机、数据库和Web应用。
rg-production
问题:
- IAM管理噩梦(无法仅为项目A授予权限而不涉及项目B)。
- 标记和成本分析变得困难。
- 存在意外删除的风险。
正确做法:
- 生命周期分组: 将具有相同生命周期的资源分组(例如、
rg-network、rg-app1-prod)。rg-app1-dev
❌ Anti-Pattern 3: Ignoring Naming Conventions
❌ 反模式3:忽略命名规范
What it looks like:
- ,
myvm1,test-storage.sql-server
Why it fails:
- Cannot identify resource type, environment, or region from name.
- Name collisions (Storage accounts must be globally unique).
Correct approach:
- CAF Naming Standard:
[Resource Type]-[Workload]-[Environment]-[Region]-[Instance] - Example: (Storage Account, MyApp, Prod, East US, 001).
st-myapp-prod-eus-001
表现:
- 、
myvm1、test-storage。sql-server
问题:
- 无法从名称识别资源类型、环境或区域。
- 名称冲突(存储账户必须全局唯一)。
正确做法:
- CAF命名标准:
[资源类型]-[工作负载]-[环境]-[区域]-[实例] - 示例:(存储账户、MyApp、生产环境、美国东部、001)。
st-myapp-prod-eus-001
7. Quality Checklist
7. 质量检查清单
Governance:
- Naming: Resources follow CAF naming conventions.
- Tagging: Resources tagged with ,
CostCenter,Environment.Owner - Policies: Azure Policy enforces compliance (e.g., allowed SKUs).
Security:
- Network: No public IPs on backend resources (VMs, DBs).
- Identity: Managed Identities used instead of Service Principals/Keys where possible.
- Encryption: CMK (Customer Managed Keys) enabled for sensitive data.
Reliability:
- Availability Zones: Critical resources deployed zone-redundant (ZRS).
- Backup: Azure Backup enabled for VMs and SQL.
- Locks: Resource Locks () on critical production resources.
CanNotDelete
Cost:
- Sizing: Resources right-sized based on metrics.
- Reservations: Reserved Instances purchased for steady workloads.
- Cleanup: Unused resources (orphaned disks/NICs) deleted.
治理:
- 命名: 资源遵循CAF命名规范。
- 标记: 资源标记有、
CostCenter、Environment。Owner - 策略: Azure策略强制合规(例如,允许的SKU)。
安全:
- 网络: 后端资源(虚拟机、数据库)无公网IP。
- 身份: 尽可能使用托管标识替代服务主体/密钥。
- 加密: 敏感数据启用客户管理密钥(CMK)。
可靠性:
- 可用性区域: 关键资源部署为区域冗余(ZRS)。
- 备份: 为虚拟机和SQL启用Azure备份。
- 锁定: 对关键生产资源设置资源锁定()。
CanNotDelete
成本:
- 规格调整: 根据指标合理调整资源规格。
- 预留实例: 为稳定工作负载购买预留实例。
- 清理: 删除未使用的资源(孤立磁盘/NIC)。
Examples
示例
Example 1: Multi-Subscription Landing Zone Setup
示例1:多订阅登陆区设置
Scenario: A healthcare company needs to deploy a compliant landing zone for HIPAA-regulated workloads across three environments (dev, staging, prod).
Architecture:
- Management Group Hierarchy: Root > Organization > Environments > Workloads
- Network Design: Hub-and-spoke with Azure Firewall, separate VNets per environment
- Policy Enforcement: Azure Policy to enforce HIPAA compliance (encryption, backup, private endpoints)
- CI/CD Pipeline: Azure DevOps pipeline with approval gates for prod deployments
Key Components:
- Azure Firewall Manager for centralized policy
- Private DNS Zones for app-internal resolution
- Azure Backup with immutable vaults for compliance
- Cost Management tags for departmental chargebacks
场景: 一家医疗公司需要为符合HIPAA规范的工作负载部署合规登陆区,覆盖三个环境(开发、预发布、生产)。
架构:
- 管理组层级: 根 > 组织 > 环境 > 工作负载
- 网络设计: 带Azure防火墙的中心辐射型架构,每个环境使用独立虚拟网络
- 策略强制: Azure策略强制执行HIPAA合规(加密、备份、专用终结点)
- CI/CD流水线: 带生产部署审批门的Azure DevOps流水线
核心组件:
- 用于集中策略的Azure防火墙管理器
- 用于应用内部解析的专用DNS区域
- 带不可变保管库的Azure备份以满足合规要求
- 用于部门成本分摊的成本管理标记
Example 2: Zero-Trust Network Architecture
示例2:零信任网络架构
Scenario: A financial services firm needs to replace their VPN-based access with a Zero Trust architecture using Azure Private Link and Conditional Access.
Implementation:
- Private Endpoints: All PaaS services accessed via Private Endpoints (SQL, Storage, Key Vault)
- Identity-Based Access: Conditional Access policies requiring compliant device and MFA
- Micro-segmentation: NSG rules denying all traffic by default, allowing only required flows
- Monitoring: Azure Sentinel for security analytics and anomaly detection
Security Controls:
- Azure AD Conditional Access with device compliance
- Just-In-Time VM access for administration
- Azure Defender for Cloud threat protection
- Comprehensive audit logging to Log Analytics
场景: 一家金融服务公司需要使用Azure Private Link和条件访问替换基于VPN的访问,实现零信任架构。
实施:
- 专用终结点: 所有PaaS服务(SQL、存储、密钥保管库)通过专用终结点访问
- 基于身份的访问: 条件访问策略要求合规设备和多因素认证(MFA)
- 微分段: NSG规则默认拒绝所有流量,仅允许必要的流量
- 监控: Azure Sentinel用于安全分析和异常检测
安全控制:
- 带设备合规性的Azure AD条件访问
- 用于管理的即时(Just-In-Time)VM访问
- 用于云威胁防护的Azure Defender
- 记录到Log Analytics的全面审计日志
Example 3: Cost-Optimized Dev/Test Environment
示例3:成本优化的开发/测试环境
Scenario: A software company wants to reduce their Azure dev/test environment costs by 60% while maintaining developer productivity.
Optimization Strategy:
- Auto-Shutdown: Dev VMs auto-shutdown evenings and weekends via Automation Runbooks
- Reserved Capacity: Prod-like dev environments use Reserved Instances
- Dev-Optimized SKUs: Development uses Dev/Test SKUs where available
- Tagging and Governance: Required tags for cost allocation, orphaned resource cleanup
Cost Savings Results:
- 65% reduction in dev/test compute costs
- Automated cleanup of unused resources saving $2K/month
- Reserved Instance savings for stable environments
- Developer productivity maintained with auto-start capabilities
场景: 一家软件公司希望在保持开发人员生产力的前提下,将Azure开发/测试环境成本降低60%。
优化策略:
- 自动关机: 通过自动化运行簿在晚间和周末自动关闭开发虚拟机
- 预留容量: 类生产的开发环境使用预留实例
- 开发优化SKU: 开发环境尽可能使用开发/测试SKU
- 标记与治理: 强制使用成本分配标记,清理孤立资源
成本节约成果:
- 开发/测试计算成本降低65%
- 自动清理未使用资源每月节省2000美元
- 稳定环境通过预留实例实现成本节约
- 自动启动功能确保开发人员生产力不受影响
Best Practices
最佳实践
Infrastructure as Code
基础设施即代码
- Everything as Code: Every resource defined in Bicep, never manual portal changes
- Module Library: Create reusable Bicep modules for common patterns
- Parameter Files: Separate parameter files per environment (dev, staging, prod)
- GitOps Workflow: Infrastructure changes via PR and approval process
- State Management: Use AzDO stateful pipelines or Terraform backend
- 一切即代码: 所有资源均在Bicep中定义,绝不进行手动门户更改
- 模块库: 为常见模式创建可复用的Bicep模块
- 参数文件: 为每个环境(开发、预发布、生产)使用独立的参数文件
- GitOps工作流: 基础设施变更通过PR和审批流程进行
- 状态管理: 使用AzDO有状态流水线或Terraform后端
Networking Excellence
网络最佳实践
- Hub-and-Spoke Default: Standard architecture for most workloads
- Private by Default: All PaaS access via Private Endpoints
- DNS Planning: Private DNS Zones with VNet links, avoid host file modifications
- Firewall Integration: Centralized threat protection with Azure Firewall
- Hybrid Connectivity: ExpressRoute for production, VPN for secondary
- 默认中心辐射型: 大多数工作负载的标准架构
- 默认私有: 所有PaaS访问均通过专用终结点
- DNS规划: 带虚拟网络链接的专用DNS区域,避免修改主机文件
- 防火墙集成: 使用Azure防火墙实现集中威胁防护
- 混合连接: 生产环境使用ExpressRoute,备用使用VPN
Security Hardening
安全加固
- Least Privilege: RBAC with specific roles, avoid Subscription Owner
- Managed Identities: Prefer over Service Principals with secrets
- Secrets Management: Key Vault for all secrets, never environment variables
- Encryption Everywhere: CMK for sensitive data, TLS 1.2+ everywhere
- Network Isolation: NSG rules denying by default, allow-listing required traffic
- 最小权限: 使用特定角色的RBAC,避免订阅所有者权限
- 托管标识: 优先使用托管标识替代带密钥的服务主体
- 密钥管理: 所有密钥存储在密钥保管库中,绝不使用环境变量
- 全面加密: 敏感数据使用CMK,所有场景启用TLS 1.2+
- 网络隔离: NSG规则默认拒绝所有流量,仅允许必要的流量
Cost Management
成本管理
- Right-Sizing: Regular review of actual utilization vs allocated size
- Reservation Planning: Identify stable workloads for Reserved Instances
- Auto-Shutdown: Dev/test resources off during off-hours
- Tagging Strategy: Required tags for cost center, environment, owner
- Budget Alerts: Budget thresholds with alerts at 50%, 75%, 90%
- 规格调整: 定期审查实际利用率与分配规格
- 预留规划: 识别稳定工作负载以购买预留实例
- 自动关机: 开发/测试资源在非工作时间关闭
- 标记策略: 强制使用成本中心、环境、所有者标记
- 预算警报: 设置预算阈值,在50%、75%、90%时触发警报
Governance and Compliance
治理与合规
- Policy as Guardrails: Azure Policy for prevention, not just detection
- Management Groups: Hierarchy reflecting organizational structure
- Blueprint Usage: Azure Blueprints for standard compliant environments
- Monitoring Strategy: Centralized logging to Log Analytics workspace
- Automation: Runbooks for routine operational tasks
- 策略作为护栏: Azure策略用于预防而非仅检测
- 管理组: 反映组织结构的层级结构
- 蓝图使用: Azure蓝图用于标准合规环境
- 监控策略: 集中日志到Log Analytics工作区
- 自动化: 使用运行簿处理日常运维任务