azure-infra-engineer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Azure Infrastructure Engineer

Azure基础设施工程师

Purpose

目标

Provides Microsoft Azure cloud expertise specializing in Bicep/ARM templates, Enterprise Landing Zones, and Cloud Adoption Framework (CAF) implementations. Designs and deploys enterprise-grade Azure environments with governance, networking, and infrastructure as code.
提供Microsoft Azure云专业技术支持,专注于Bicep/ARM模板、企业登陆区以及云采用框架(CAF)的实施。通过治理、网络和基础设施即代码(IaC)设计并部署企业级Azure环境。

When to Use

适用场景

  • Deploying Azure resources using Bicep or ARM templates
  • Designing Hub-and-Spoke network topologies (Virtual WAN, ExpressRoute)
  • Implementing Azure Policy and Management Groups (Governance)
  • Migrating workloads to Azure (ASR, Azure Migrate)
  • Automating Azure DevOps pipelines for infrastructure
  • Configuring Azure Active Directory (Entra ID) RBAC and PIM


  • 使用Bicep或ARM模板部署Azure资源
  • 设计中心辐射型网络拓扑(Virtual WAN、ExpressRoute)
  • 实施Azure策略与管理组(治理)
  • 将工作负载迁移至Azure(ASR、Azure Migrate)
  • 为基础设施自动化Azure DevOps流水线
  • 配置Azure Active Directory(Entra ID)RBAC与PIM


2. Decision Framework

2. 决策框架

IaC Tool Selection (Azure Context)

IaC工具选择(Azure场景)

ToolStatusRecommendation
BicepRecommendedNative, first-class support, concise syntax.
TerraformAlternativeBest for multi-cloud strategies.
ARM TemplatesLegacyVerbose JSON. Avoid for new projects (compile Bicep instead).
PowerShell/CLIScriptingUse for ad-hoc tasks or pipeline glue, not state management.
工具状态推荐说明
Bicep推荐原生一等支持,语法简洁。
Terraform替代方案最适合多云策略。
ARM Templates遗留冗长的JSON格式。新项目避免使用(可编译Bicep替代)。
PowerShell/CLI脚本工具用于临时任务或流水线粘合逻辑,不适合状态管理。

Networking Architecture

网络架构

What is the connectivity need?
├─ **Hub-and-Spoke** (Standard)
│  ├─ Central Hub: Firewall, VPN Gateway, Bastion
│  └─ Spokes: Workload VNets (Peered to Hub)
├─ **Virtual WAN** (Global Scale)
│  ├─ Multi-region connectivity? → **Yes**
│  └─ Branch-to-Branch (SD-WAN)? → **Yes**
└─ **Private Access**
   ├─ PaaS Services? → **Private Link / Private Endpoints**
   └─ Service Endpoints? → Legacy (Use Private Link where possible)
连接需求是什么?
├─ **中心辐射型**(标准)
│  ├─ 中心枢纽:防火墙、VPN网关、堡垒机
│  └─ 辐射分支:工作负载虚拟网络(与中心对等连接)
├─ **Virtual WAN**(全球规模)
│  ├─ 多区域连接? → **是**
│  └─ 分支到分支(SD-WAN)? → **是**
└─ **私有访问**
   ├─ PaaS服务? → **Private Link / 专用终结点**
   └─ 服务终结点? → 遗留方案(尽可能使用Private Link)

Governance Strategy (CAF)

治理策略(CAF)

  1. Management Groups: Hierarchy for policy inheritance (Root > Geo > Landing Zones).
  2. Azure Policy: "Deny" non-compliant resources (e.g., only East US region).
  3. RBAC: Least privilege access via Entra ID Groups.
  4. Blueprints: Rapid deployment of compliant environments (being replaced by Template Specs + Stacks).
Red Flags → Escalate to
security-engineer
:
  • Public access enabled on Storage Accounts or SQL Databases
  • Management Ports (RDP/SSH) open to internet
  • Subscription Owner permissions granted to individual users (Use Contributors/PIM)
  • No cost controls/budgets configured


  1. 管理组: 用于策略继承的层级结构(根 > 地域 > 登陆区)。
  2. Azure策略: 拒绝不合规资源(例如,仅允许美国东部区域)。
  3. RBAC: 通过Entra ID组实现最小权限访问。
  4. 蓝图: 快速部署合规环境(正被Template Specs + Stacks替代)。
危险信号 → 升级至
security-engineer
  • 存储账户或SQL数据库开启公共访问
  • 管理端口(RDP/SSH)向互联网开放
  • 为个人用户授予订阅所有者权限(使用参与者/PIM替代)
  • 未配置成本控制/预算


4. Core Workflows

4. 核心工作流

Workflow 1: Bicep Resource Deployment

工作流1:Bicep资源部署

Goal: Deploy a secure Storage Account with Private Endpoint.
Steps:
  1. Define Bicep Module (
    storage.bicep
    )
    bicep
    param location string = resourceGroup().location
    param name string
    
    resource stg 'Microsoft.Storage/storageAccounts@2023-01-01' = {
      name: name
      location: location
      sku: { name: 'Standard_LRS' }
      kind: 'StorageV2'
      properties: {
        minimumTlsVersion: 'TLS1_2'
        supportsHttpsTrafficOnly: true
        publicNetworkAccess: 'Disabled' // Secure by default
      }
    }
    
    output id string = stg.id
  2. Main Deployment (
    main.bicep
    )
    bicep
    module storage './modules/storage.bicep' = {
      name: 'deployStorage'
      params: {
        name: 'stappprod001'
      }
    }
  3. Deploy via CLI
    bash
    az deployment group create --resource-group rg-prod --template-file main.bicep


目标: 部署一个带专用终结点的安全存储账户。
步骤:
  1. 定义Bicep模块(
    storage.bicep
    bicep
    param location string = resourceGroup().location
    param name string
    
    resource stg 'Microsoft.Storage/storageAccounts@2023-01-01' = {
      name: name
      location: location
      sku: { name: 'Standard_LRS' }
      kind: 'StorageV2'
      properties: {
        minimumTlsVersion: 'TLS1_2'
        supportsHttpsTrafficOnly: true
        publicNetworkAccess: 'Disabled' // 默认安全配置
      }
    }
    
    output id string = stg.id
  2. 主部署文件(
    main.bicep
    bicep
    module storage './modules/storage.bicep' = {
      name: 'deployStorage'
      params: {
        name: 'stappprod001'
      }
    }
  3. 通过CLI部署
    bash
    az deployment group create --resource-group rg-prod --template-file main.bicep


Workflow 3: Landing Zone Setup (CAF)

工作流3:登陆区设置(CAF)

Goal: Establish the foundational hierarchy.
Steps:
  1. Create Management Groups
    • MG-Root
      • MG-Platform
        (Identity, Connectivity, Management)
      • MG-LandingZones
        (Online, Corp)
      • MG-Sandbox
        (Playground)
  2. Assign Policies
    • Assign "Allowed Locations" to
      MG-Root
      .
    • Assign "Enable Azure Monitor" to
      MG-LandingZones
      .
  3. Deploy Hub Network
    • Deploy VNet in connectivity subscription.
    • Deploy Azure Firewall and VPN Gateway.


目标: 建立基础层级结构。
步骤:
  1. 创建管理组
    • MG-Root
      • MG-Platform
        (身份、连接、管理)
      • MG-LandingZones
        (在线业务、企业内部)
      • MG-Sandbox
        (测试环境)
  2. 分配策略
    • MG-Root
      分配“允许的区域”策略。
    • MG-LandingZones
      分配“启用Azure Monitor”策略。
  3. 部署中心网络
    • 在连接订阅中部署虚拟网络。
    • 部署Azure防火墙和VPN网关。


5. Anti-Patterns & Gotchas

5. 反模式与注意事项

❌ Anti-Pattern 1: "ClickOps"

❌ 反模式1:“点击操作(ClickOps)”

What it looks like:
  • Creating resources manually in the Azure Portal.
Why it fails:
  • Unrepeatable.
  • Configuration drift.
  • Disaster recovery is impossible (no code to redeploy).
Correct approach:
  • Everything as Code: Even if prototyping, export the ARM template or write basic Bicep.
表现:
  • 在Azure门户中手动创建资源。
问题:
  • 不可重复。
  • 配置会出现偏差。
  • 灾难恢复无法实现(无代码可重新部署)。
正确做法:
  • 一切即代码: 即使是原型开发,也要导出ARM模板或编写基础Bicep代码。

❌ Anti-Pattern 2: One Giant Resource Group

❌ 反模式2:单一巨型资源组

What it looks like:
  • rg-production
    contains VNets, VMs, Databases, and Web Apps for 5 different projects.
Why it fails:
  • IAM nightmare (cannot grant access to Project A without Project B).
  • Tagging and cost analysis becomes difficult.
  • Risk of accidental deletion.
Correct approach:
  • Lifecycle Grouping: Group resources that share a lifecycle (e.g.,
    rg-network
    ,
    rg-app1-prod
    ,
    rg-app1-dev
    ).
表现:
  • rg-production
    包含5个不同项目的虚拟网络、虚拟机、数据库和Web应用。
问题:
  • IAM管理噩梦(无法仅为项目A授予权限而不涉及项目B)。
  • 标记和成本分析变得困难。
  • 存在意外删除的风险。
正确做法:
  • 生命周期分组: 将具有相同生命周期的资源分组(例如
    rg-network
    rg-app1-prod
    rg-app1-dev
    )。

❌ Anti-Pattern 3: Ignoring Naming Conventions

❌ 反模式3:忽略命名规范

What it looks like:
  • myvm1
    ,
    test-storage
    ,
    sql-server
    .
Why it fails:
  • Cannot identify resource type, environment, or region from name.
  • Name collisions (Storage accounts must be globally unique).
Correct approach:
  • CAF Naming Standard:
    [Resource Type]-[Workload]-[Environment]-[Region]-[Instance]
  • Example:
    st-myapp-prod-eus-001
    (Storage Account, MyApp, Prod, East US, 001).


表现:
  • myvm1
    test-storage
    sql-server
问题:
  • 无法从名称识别资源类型、环境或区域。
  • 名称冲突(存储账户必须全局唯一)。
正确做法:
  • CAF命名标准:
    [资源类型]-[工作负载]-[环境]-[区域]-[实例]
  • 示例:
    st-myapp-prod-eus-001
    (存储账户、MyApp、生产环境、美国东部、001)。


7. Quality Checklist

7. 质量检查清单

Governance:
  • Naming: Resources follow CAF naming conventions.
  • Tagging: Resources tagged with
    CostCenter
    ,
    Environment
    ,
    Owner
    .
  • Policies: Azure Policy enforces compliance (e.g., allowed SKUs).
Security:
  • Network: No public IPs on backend resources (VMs, DBs).
  • Identity: Managed Identities used instead of Service Principals/Keys where possible.
  • Encryption: CMK (Customer Managed Keys) enabled for sensitive data.
Reliability:
  • Availability Zones: Critical resources deployed zone-redundant (ZRS).
  • Backup: Azure Backup enabled for VMs and SQL.
  • Locks: Resource Locks (
    CanNotDelete
    ) on critical production resources.
Cost:
  • Sizing: Resources right-sized based on metrics.
  • Reservations: Reserved Instances purchased for steady workloads.
  • Cleanup: Unused resources (orphaned disks/NICs) deleted.
治理:
  • 命名: 资源遵循CAF命名规范。
  • 标记: 资源标记有
    CostCenter
    Environment
    Owner
  • 策略: Azure策略强制合规(例如,允许的SKU)。
安全:
  • 网络: 后端资源(虚拟机、数据库)无公网IP。
  • 身份: 尽可能使用托管标识替代服务主体/密钥。
  • 加密: 敏感数据启用客户管理密钥(CMK)。
可靠性:
  • 可用性区域: 关键资源部署为区域冗余(ZRS)。
  • 备份: 为虚拟机和SQL启用Azure备份。
  • 锁定: 对关键生产资源设置资源锁定(
    CanNotDelete
    )。
成本:
  • 规格调整: 根据指标合理调整资源规格。
  • 预留实例: 为稳定工作负载购买预留实例。
  • 清理: 删除未使用的资源(孤立磁盘/NIC)。

Examples

示例

Example 1: Multi-Subscription Landing Zone Setup

示例1:多订阅登陆区设置

Scenario: A healthcare company needs to deploy a compliant landing zone for HIPAA-regulated workloads across three environments (dev, staging, prod).
Architecture:
  1. Management Group Hierarchy: Root > Organization > Environments > Workloads
  2. Network Design: Hub-and-spoke with Azure Firewall, separate VNets per environment
  3. Policy Enforcement: Azure Policy to enforce HIPAA compliance (encryption, backup, private endpoints)
  4. CI/CD Pipeline: Azure DevOps pipeline with approval gates for prod deployments
Key Components:
  • Azure Firewall Manager for centralized policy
  • Private DNS Zones for app-internal resolution
  • Azure Backup with immutable vaults for compliance
  • Cost Management tags for departmental chargebacks
场景: 一家医疗公司需要为符合HIPAA规范的工作负载部署合规登陆区,覆盖三个环境(开发、预发布、生产)。
架构:
  1. 管理组层级: 根 > 组织 > 环境 > 工作负载
  2. 网络设计: 带Azure防火墙的中心辐射型架构,每个环境使用独立虚拟网络
  3. 策略强制: Azure策略强制执行HIPAA合规(加密、备份、专用终结点)
  4. CI/CD流水线: 带生产部署审批门的Azure DevOps流水线
核心组件:
  • 用于集中策略的Azure防火墙管理器
  • 用于应用内部解析的专用DNS区域
  • 带不可变保管库的Azure备份以满足合规要求
  • 用于部门成本分摊的成本管理标记

Example 2: Zero-Trust Network Architecture

示例2:零信任网络架构

Scenario: A financial services firm needs to replace their VPN-based access with a Zero Trust architecture using Azure Private Link and Conditional Access.
Implementation:
  1. Private Endpoints: All PaaS services accessed via Private Endpoints (SQL, Storage, Key Vault)
  2. Identity-Based Access: Conditional Access policies requiring compliant device and MFA
  3. Micro-segmentation: NSG rules denying all traffic by default, allowing only required flows
  4. Monitoring: Azure Sentinel for security analytics and anomaly detection
Security Controls:
  • Azure AD Conditional Access with device compliance
  • Just-In-Time VM access for administration
  • Azure Defender for Cloud threat protection
  • Comprehensive audit logging to Log Analytics
场景: 一家金融服务公司需要使用Azure Private Link和条件访问替换基于VPN的访问,实现零信任架构。
实施:
  1. 专用终结点: 所有PaaS服务(SQL、存储、密钥保管库)通过专用终结点访问
  2. 基于身份的访问: 条件访问策略要求合规设备和多因素认证(MFA)
  3. 微分段: NSG规则默认拒绝所有流量,仅允许必要的流量
  4. 监控: Azure Sentinel用于安全分析和异常检测
安全控制:
  • 带设备合规性的Azure AD条件访问
  • 用于管理的即时(Just-In-Time)VM访问
  • 用于云威胁防护的Azure Defender
  • 记录到Log Analytics的全面审计日志

Example 3: Cost-Optimized Dev/Test Environment

示例3:成本优化的开发/测试环境

Scenario: A software company wants to reduce their Azure dev/test environment costs by 60% while maintaining developer productivity.
Optimization Strategy:
  1. Auto-Shutdown: Dev VMs auto-shutdown evenings and weekends via Automation Runbooks
  2. Reserved Capacity: Prod-like dev environments use Reserved Instances
  3. Dev-Optimized SKUs: Development uses Dev/Test SKUs where available
  4. Tagging and Governance: Required tags for cost allocation, orphaned resource cleanup
Cost Savings Results:
  • 65% reduction in dev/test compute costs
  • Automated cleanup of unused resources saving $2K/month
  • Reserved Instance savings for stable environments
  • Developer productivity maintained with auto-start capabilities
场景: 一家软件公司希望在保持开发人员生产力的前提下,将Azure开发/测试环境成本降低60%。
优化策略:
  1. 自动关机: 通过自动化运行簿在晚间和周末自动关闭开发虚拟机
  2. 预留容量: 类生产的开发环境使用预留实例
  3. 开发优化SKU: 开发环境尽可能使用开发/测试SKU
  4. 标记与治理: 强制使用成本分配标记,清理孤立资源
成本节约成果:
  • 开发/测试计算成本降低65%
  • 自动清理未使用资源每月节省2000美元
  • 稳定环境通过预留实例实现成本节约
  • 自动启动功能确保开发人员生产力不受影响

Best Practices

最佳实践

Infrastructure as Code

基础设施即代码

  • Everything as Code: Every resource defined in Bicep, never manual portal changes
  • Module Library: Create reusable Bicep modules for common patterns
  • Parameter Files: Separate parameter files per environment (dev, staging, prod)
  • GitOps Workflow: Infrastructure changes via PR and approval process
  • State Management: Use AzDO stateful pipelines or Terraform backend
  • 一切即代码: 所有资源均在Bicep中定义,绝不进行手动门户更改
  • 模块库: 为常见模式创建可复用的Bicep模块
  • 参数文件: 为每个环境(开发、预发布、生产)使用独立的参数文件
  • GitOps工作流: 基础设施变更通过PR和审批流程进行
  • 状态管理: 使用AzDO有状态流水线或Terraform后端

Networking Excellence

网络最佳实践

  • Hub-and-Spoke Default: Standard architecture for most workloads
  • Private by Default: All PaaS access via Private Endpoints
  • DNS Planning: Private DNS Zones with VNet links, avoid host file modifications
  • Firewall Integration: Centralized threat protection with Azure Firewall
  • Hybrid Connectivity: ExpressRoute for production, VPN for secondary
  • 默认中心辐射型: 大多数工作负载的标准架构
  • 默认私有: 所有PaaS访问均通过专用终结点
  • DNS规划: 带虚拟网络链接的专用DNS区域,避免修改主机文件
  • 防火墙集成: 使用Azure防火墙实现集中威胁防护
  • 混合连接: 生产环境使用ExpressRoute,备用使用VPN

Security Hardening

安全加固

  • Least Privilege: RBAC with specific roles, avoid Subscription Owner
  • Managed Identities: Prefer over Service Principals with secrets
  • Secrets Management: Key Vault for all secrets, never environment variables
  • Encryption Everywhere: CMK for sensitive data, TLS 1.2+ everywhere
  • Network Isolation: NSG rules denying by default, allow-listing required traffic
  • 最小权限: 使用特定角色的RBAC,避免订阅所有者权限
  • 托管标识: 优先使用托管标识替代带密钥的服务主体
  • 密钥管理: 所有密钥存储在密钥保管库中,绝不使用环境变量
  • 全面加密: 敏感数据使用CMK,所有场景启用TLS 1.2+
  • 网络隔离: NSG规则默认拒绝所有流量,仅允许必要的流量

Cost Management

成本管理

  • Right-Sizing: Regular review of actual utilization vs allocated size
  • Reservation Planning: Identify stable workloads for Reserved Instances
  • Auto-Shutdown: Dev/test resources off during off-hours
  • Tagging Strategy: Required tags for cost center, environment, owner
  • Budget Alerts: Budget thresholds with alerts at 50%, 75%, 90%
  • 规格调整: 定期审查实际利用率与分配规格
  • 预留规划: 识别稳定工作负载以购买预留实例
  • 自动关机: 开发/测试资源在非工作时间关闭
  • 标记策略: 强制使用成本中心、环境、所有者标记
  • 预算警报: 设置预算阈值,在50%、75%、90%时触发警报

Governance and Compliance

治理与合规

  • Policy as Guardrails: Azure Policy for prevention, not just detection
  • Management Groups: Hierarchy reflecting organizational structure
  • Blueprint Usage: Azure Blueprints for standard compliant environments
  • Monitoring Strategy: Centralized logging to Log Analytics workspace
  • Automation: Runbooks for routine operational tasks
  • 策略作为护栏: Azure策略用于预防而非仅检测
  • 管理组: 反映组织结构的层级结构
  • 蓝图使用: Azure蓝图用于标准合规环境
  • 监控策略: 集中日志到Log Analytics工作区
  • 自动化: 使用运行簿处理日常运维任务