infra-engineer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Infrastructure Engineering Skill

基础设施工程技能指南

Comprehensive guide for modern infrastructure engineering covering DevOps practices, multi-cloud platforms (AWS, Azure, GCP, Cloudflare), FinOps cost optimization, and DevSecOps security practices.
本指南全面介绍现代基础设施工程,涵盖DevOps实践、多云平台(AWS、Azure、GCP、Cloudflare)、FinOps成本优化及DevSecOps安全实践。

When to Use This Skill

何时使用本技能

Use this skill when:
  • DevOps: Setting up CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins), implementing GitOps workflows (ArgoCD, Flux)
  • AWS: Deploying to EC2, Lambda, ECS, EKS, managing S3, RDS, using CloudFormation/CDK
  • Azure: Working with Azure VMs, App Service, AKS, Azure Functions, Storage Accounts
  • GCP: Managing Compute Engine, GKE, Cloud Run, Cloud Storage, App Engine
  • Cloudflare: Deploying Workers, R2 storage, D1 databases, Pages applications
  • Kubernetes: Managing clusters, deployments, services, ingress, Helm charts, operators
  • Docker: Containerizing applications, multi-stage builds, Docker Compose, registries
  • FinOps: Analyzing cloud costs, optimizing spend, reserved instances, spot instances, rightsizing
  • DevSecOps: Security scanning (SAST/DAST), vulnerability management, secrets management, compliance
  • IaC: Terraform, CloudFormation, Pulumi, configuration management
  • Monitoring: Setting up observability, logging, metrics, alerting, distributed tracing
在以下场景使用本技能:
  • DevOps:搭建CI/CD流水线(GitHub Actions、GitLab CI、Jenkins)、实施GitOps工作流(ArgoCD、Flux)
  • AWS:部署至EC2、Lambda、ECS、EKS,管理S3、RDS,使用CloudFormation/CDK
  • Azure:使用Azure虚拟机、应用服务、AKS、Azure Functions、存储账户
  • GCP:管理Compute Engine、GKE、Cloud Run、Cloud Storage、App Engine
  • Cloudflare:部署Workers、R2存储、D1数据库、Pages应用
  • Kubernetes:管理集群、部署、服务、Ingress、Helm Chart、Operator
  • Docker:应用容器化、多阶段构建、Docker Compose、镜像仓库
  • FinOps:分析云成本、优化开支、预留实例、竞价实例、资源合理配置
  • DevSecOps:安全扫描(SAST/DAST)、漏洞管理、密钥管理、合规性
  • IaC:Terraform、CloudFormation、Pulumi、配置管理
  • 监控:搭建可观测性系统、日志、指标、告警、分布式追踪

Platform Selection Guide

平台选择指南

When to Use AWS

何时选择AWS

Best For:
  • General-purpose cloud computing at scale
  • Mature ecosystem with 200+ services
  • Enterprise workloads with compliance requirements
  • Hybrid cloud with AWS Outposts
  • Extensive third-party integrations
  • Advanced networking and security controls
Key Services:
  • EC2 (virtual machines, flexible compute)
  • Lambda (serverless functions, event-driven)
  • ECS/EKS (container orchestration)
  • S3 (object storage, industry standard)
  • RDS (managed relational databases)
  • DynamoDB (NoSQL, global tables)
  • CloudFormation/CDK (infrastructure as code)
  • IAM (identity and access management)
  • VPC (virtual private cloud networking)
Cost Profile: Pay-as-you-go, reserved instances (up to 72% discount), savings plans, spot instances (up to 90% discount)
最佳适用场景:
  • 大规模通用云计算需求
  • 拥有200+服务的成熟生态系统
  • 有合规要求的企业级工作负载
  • 基于AWS Outposts的混合云场景
  • 丰富的第三方集成
  • 高级网络与安全控制
核心服务:
  • EC2(虚拟机,灵活计算)
  • Lambda(无服务器函数,事件驱动)
  • ECS/EKS(容器编排)
  • S3(对象存储,行业标准)
  • RDS(托管关系型数据库)
  • DynamoDB(NoSQL,全局表)
  • CloudFormation/CDK(基础设施即代码)
  • IAM(身份与访问管理)
  • VPC(虚拟私有云网络)
成本模式: 按需付费、预留实例(最高72%折扣)、节省计划、竞价实例(最高90%折扣)

When to Use Azure

何时选择Azure

Best For:
  • Microsoft-centric organizations (.NET, Active Directory)
  • Hybrid cloud scenarios (Azure Arc, Stack)
  • Enterprise agreements with Microsoft
  • Windows Server and SQL Server workloads
  • Integration with Microsoft 365 and Dynamics
  • Strong compliance certifications (90+ certifications)
Key Services:
  • Virtual Machines (Windows/Linux compute)
  • App Service (PaaS for web apps)
  • AKS (managed Kubernetes)
  • Azure Functions (serverless compute)
  • Storage Accounts (Blob, File, Queue, Table)
  • SQL Database (managed SQL Server)
  • Active Directory (identity management)
  • ARM Templates/Bicep (infrastructure as code)
Cost Profile: Pay-as-you-go, reserved instances, Azure Hybrid Benefit for Windows/SQL Server licenses
最佳适用场景:
  • 微软技术栈为主的企业(.NET、Active Directory)
  • 混合云场景(Azure Arc、Stack)
  • 与微软签订企业协议的客户
  • Windows Server和SQL Server工作负载
  • 与Microsoft 365及Dynamics集成
  • 丰富的合规认证(90+项)
核心服务:
  • 虚拟机(Windows/Linux计算)
  • 应用服务(Web应用PaaS)
  • AKS(托管Kubernetes)
  • Azure Functions(无服务器计算)
  • 存储账户(Blob、文件、队列、表)
  • SQL数据库(托管SQL Server)
  • Active Directory(身份管理)
  • ARM模板/Bicep(基础设施即代码)
成本模式: 按需付费、预留实例、Windows/SQL Server许可证的Azure混合权益

When to Use Cloudflare

何时选择Cloudflare

Best For:
  • Edge-first applications with global distribution
  • Ultra-low latency requirements (<50ms)
  • Static sites with serverless functions
  • Zero egress cost scenarios (R2 storage)
  • WebSocket/real-time applications (Durable Objects)
  • AI/ML at the edge (Workers AI)
Key Products:
  • Workers (serverless functions)
  • R2 (object storage, S3-compatible)
  • D1 (SQLite database with global replication)
  • KV (key-value store)
  • Pages (static hosting + functions)
  • Durable Objects (stateful compute)
  • Browser Rendering (headless browser automation)
Cost Profile: Pay-per-request, generous free tier, zero egress fees
最佳适用场景:
  • 以边缘为核心的全球分布式应用
  • 超低延迟需求(<50ms)
  • 静态站点+无服务器函数
  • 零出口成本场景(R2存储)
  • WebSocket/实时应用(Durable Objects)
  • 边缘AI/ML(Workers AI)
核心产品:
  • Workers(无服务器函数)
  • R2(对象存储,兼容S3)
  • D1(SQLite数据库,全局复制)
  • KV(键值存储)
  • Pages(静态托管+函数)
  • Durable Objects(有状态计算)
  • Browser Rendering(无头浏览器自动化)
成本模式: 按请求付费、免费额度充足、零出口费用

When to Use Kubernetes

何时选择Kubernetes

Best For:
  • Container orchestration at scale
  • Microservices architectures with 10+ services
  • Multi-cloud and hybrid deployments
  • Self-healing and auto-scaling workloads
  • Complex deployment strategies (blue/green, canary)
  • Service mesh architectures (Istio, Linkerd)
  • Stateful applications with operators
Key Features:
  • Declarative configuration (YAML manifests)
  • Automated rollouts and rollbacks
  • Service discovery and load balancing
  • Self-healing (restarts failed containers)
  • Horizontal pod autoscaling
  • Secret and configuration management
  • Storage orchestration
  • Batch job execution
Managed Options: EKS (AWS), AKS (Azure), GKE (GCP), managed k8s providers
Cost Profile: Cluster management fees + node costs (optimize with spot instances, cluster autoscaling)
最佳适用场景:
  • 大规模容器编排
  • 包含10+服务的微服务架构
  • 多云与混合云部署
  • 自修复与自动扩缩容工作负载
  • 复杂部署策略(蓝绿、金丝雀)
  • 服务网格架构(Istio、Linkerd)
  • 使用Operator的有状态应用
核心特性:
  • 声明式配置(YAML清单)
  • 自动化发布与回滚
  • 服务发现与负载均衡
  • 自修复(重启故障容器)
  • 水平Pod自动扩缩容
  • 密钥与配置管理
  • 存储编排
  • 批处理作业执行
托管选项: EKS(AWS)、AKS(Azure)、GKE(GCP)、其他托管k8s服务商
成本模式: 集群管理费用+节点成本(可通过竞价实例、集群自动扩缩容优化)

When to Use Docker

何时选择Docker

Best For:
  • Local development consistency
  • Microservices architectures
  • Multi-language stack applications
  • Traditional VPS/VM deployments
  • Foundation for Kubernetes workloads
  • CI/CD build environments
  • Database containerization (dev/test)
Key Capabilities:
  • Application isolation and portability
  • Multi-stage builds for optimization
  • Docker Compose for multi-container apps
  • Volume management for data persistence
  • Network configuration and service discovery
  • Cross-platform compatibility (amd64, arm64)
  • BuildKit for improved build performance
Cost Profile: Infrastructure cost only (compute + storage), no orchestration overhead
最佳适用场景:
  • 本地开发环境一致性保障
  • 微服务架构
  • 多语言栈应用
  • 传统VPS/VM部署
  • Kubernetes工作负载的基础
  • CI/CD构建环境
  • 数据库容器化(开发/测试)
核心能力:
  • 应用隔离与可移植性
  • 多阶段构建优化
  • Docker Compose用于多容器应用
  • 数据持久化的卷管理
  • 网络配置与服务发现
  • 跨平台兼容性(amd64、arm64)
  • BuildKit提升构建性能
成本模式: 仅需基础设施成本(计算+存储),无编排开销

When to Use Google Cloud

何时选择Google Cloud

Best For:
  • Enterprise-scale applications
  • Data analytics and ML pipelines (BigQuery, Vertex AI)
  • Hybrid/multi-cloud deployments
  • Kubernetes at scale (GKE)
  • Managed databases (Cloud SQL, Firestore, Spanner)
  • Complex IAM and compliance requirements
Key Services:
  • Compute Engine (VMs)
  • GKE (managed Kubernetes)
  • Cloud Run (containerized serverless)
  • App Engine (PaaS)
  • Cloud Storage (object storage)
  • Cloud SQL (managed databases)
Cost Profile: Varied pricing, sustained use discounts, committed use contracts
最佳适用场景:
  • 企业级规模应用
  • 数据分析与ML流水线(BigQuery、Vertex AI)
  • 混合/多云部署
  • 大规模Kubernetes(GKE)
  • 托管数据库(Cloud SQL、Firestore、Spanner)
  • 复杂IAM与合规要求
核心服务:
  • Compute Engine(虚拟机)
  • GKE(托管Kubernetes)
  • Cloud Run(容器化无服务器)
  • App Engine(PaaS)
  • Cloud Storage(对象存储)
  • Cloud SQL(托管数据库)
成本模式: 多样化定价、持续使用折扣、承诺使用合同

Quick Start

快速开始

AWS Lambda Function

AWS Lambda函数

bash
undefined
bash
undefined

Install AWS CLI

安装AWS CLI

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip && sudo ./aws/install
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip && sudo ./aws/install

Configure credentials

配置凭证

aws configure
aws configure

Create Lambda function with SAM

使用SAM创建Lambda函数

sam init --runtime python3.11 sam build && sam deploy --guided

See: `references/aws-lambda.md`
sam init --runtime python3.11 sam build && sam deploy --guided

参考:`references/aws-lambda.md`

AWS EKS Kubernetes Cluster

AWS EKS Kubernetes集群

bash
undefined
bash
undefined

Install eksctl

安装eksctl

brew install eksctl # or curl download
brew install eksctl # 或通过curl下载

Create cluster

创建集群

eksctl create cluster
--name my-cluster
--region us-west-2
--nodegroup-name standard-workers
--node-type t3.medium
--nodes 3
--nodes-min 1
--nodes-max 4

See: `references/kubernetes-basics.md`
eksctl create cluster
--name my-cluster
--region us-west-2
--nodegroup-name standard-workers
--node-type t3.medium
--nodes 3
--nodes-min 1
--nodes-max 4

参考:`references/kubernetes-basics.md`

Azure Deployment

Azure部署

bash
undefined
bash
undefined

Install Azure CLI

安装Azure CLI

Login and create resources

登录并创建资源

az login az group create --name myResourceGroup --location eastus az webapp create --resource-group myResourceGroup
--name myapp --runtime "NODE:18-lts"

See: `references/azure-basics.md`
az login az group create --name myResourceGroup --location eastus az webapp create --resource-group myResourceGroup
--name myapp --runtime "NODE:18-lts"

参考:`references/azure-basics.md`

Cloudflare Workers

Cloudflare Workers

bash
undefined
bash
undefined

Install Wrangler CLI

安装Wrangler CLI

npm install -g wrangler
npm install -g wrangler

Create and deploy Worker

创建并部署Worker

wrangler init my-worker cd my-worker wrangler deploy

See: `references/cloudflare-workers-basics.md`
wrangler init my-worker cd my-worker wrangler deploy

参考:`references/cloudflare-workers-basics.md`

Kubernetes Deployment

Kubernetes部署

bash
undefined
bash
undefined

Create deployment

创建部署

kubectl create deployment nginx --image=nginx:latest kubectl expose deployment nginx --port=80 --type=LoadBalancer
kubectl create deployment nginx --image=nginx:latest kubectl expose deployment nginx --port=80 --type=LoadBalancer

Apply from manifest

通过清单部署

kubectl apply -f deployment.yaml
kubectl apply -f deployment.yaml

Check status

检查状态

kubectl get pods,services,deployments

See: `references/kubernetes-basics.md`
kubectl get pods,services,deployments

参考:`references/kubernetes-basics.md`

Docker Container

Docker容器

bash
undefined
bash
undefined

Create Dockerfile

创建Dockerfile

cat > Dockerfile <<EOF FROM node:20-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --production COPY . . EXPOSE 3000 CMD ["node", "server.js"] EOF
cat > Dockerfile <<EOF FROM node:20-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --production COPY . . EXPOSE 3000 CMD ["node", "server.js"] EOF

Build and run

构建并运行

docker build -t myapp . docker run -p 3000:3000 myapp

See: `references/docker-basics.md`
docker build -t myapp . docker run -p 3000:3000 myapp

参考:`references/docker-basics.md`

Reference Navigation

参考文档导航

AWS (Amazon Web Services)

AWS(亚马逊云科技)

  • aws-overview.md
    - AWS fundamentals, account setup, IAM basics
  • aws-ec2.md
    - EC2 instances, AMIs, security groups, auto-scaling
  • aws-lambda.md
    - Serverless functions, SAM, event sources, layers
  • aws-ecs-eks.md
    - Container orchestration, ECS vs EKS, Fargate
  • aws-s3-rds.md
    - S3 storage, RDS databases, backup strategies
  • aws-cloudformation.md
    - Infrastructure as code, CDK, best practices
  • aws-networking.md
    - VPC, subnets, security groups, load balancers
  • aws-overview.md
    - AWS基础、账户设置、IAM入门
  • aws-ec2.md
    - EC2实例、AMI、安全组、自动扩缩容
  • aws-lambda.md
    - 无服务器函数、SAM、事件源、层
  • aws-ecs-eks.md
    - 容器编排、ECS vs EKS、Fargate
  • aws-s3-rds.md
    - S3存储、RDS数据库、备份策略
  • aws-cloudformation.md
    - 基础设施即代码、CDK、最佳实践
  • aws-networking.md
    - VPC、子网、安全组、负载均衡器

Azure (Microsoft Azure)

Azure(微软Azure)

  • azure-basics.md
    - Azure fundamentals, subscriptions, resource groups
  • azure-compute.md
    - VMs, App Service, AKS, Azure Functions
  • azure-storage.md
    - Storage Accounts, Blob, Files, managed disks
  • azure-basics.md
    - Azure基础、订阅、资源组
  • azure-compute.md
    - 虚拟机、应用服务、AKS、Azure Functions
  • azure-storage.md
    - 存储账户、Blob、文件、托管磁盘

Cloudflare Platform

Cloudflare平台

  • cloudflare-platform.md
    - Edge computing overview, key components
  • cloudflare-workers-basics.md
    - Getting started, handler types, basic patterns
  • cloudflare-workers-advanced.md
    - Advanced patterns, performance, optimization
  • cloudflare-workers-apis.md
    - Runtime APIs, bindings, integrations
  • cloudflare-r2-storage.md
    - R2 object storage, S3 compatibility, best practices
  • cloudflare-d1-kv.md
    - D1 SQLite database, KV store, use cases
  • browser-rendering.md
    - Puppeteer/Playwright automation on Cloudflare
  • cloudflare-platform.md
    - 边缘计算概述、核心组件
  • cloudflare-workers-basics.md
    - 入门指南、处理器类型、基础模式
  • cloudflare-workers-advanced.md
    - 高级模式、性能优化
  • cloudflare-workers-apis.md
    - 运行时API、绑定、集成
  • cloudflare-r2-storage.md
    - R2对象存储、S3兼容性、最佳实践
  • cloudflare-d1-kv.md
    - D1 SQLite数据库、KV存储、使用场景
  • browser-rendering.md
    - Cloudflare上的Puppeteer/Playwright自动化

Kubernetes & Container Orchestration

Kubernetes与容器编排

  • kubernetes-basics.md
    - Core concepts, pods, deployments, services
  • kubernetes-advanced.md
    - StatefulSets, operators, custom resources
  • kubernetes-networking.md
    - Ingress, service mesh, network policies
  • helm-charts.md
    - Package management, charts, repositories
  • kubernetes-basics.md
    - 核心概念、Pod、部署、服务
  • kubernetes-advanced.md
    - StatefulSet、Operator、自定义资源
  • kubernetes-networking.md
    - Ingress、服务网格、网络策略
  • helm-charts.md
    - 包管理、Chart、仓库

Docker Containerization

Docker容器化

  • docker-basics.md
    - Core concepts, Dockerfile, images, containers
  • docker-compose.md
    - Multi-container apps, networking, volumes
  • docker-security.md
    - Image scanning, secrets, best practices
  • docker-basics.md
    - 核心概念、Dockerfile、镜像、容器
  • docker-compose.md
    - 多容器应用、网络、卷
  • docker-security.md
    - 镜像扫描、密钥、最佳实践

Google Cloud Platform

Google Cloud平台

  • gcloud-platform.md
    - GCP overview, gcloud CLI, authentication
  • gcloud-services.md
    - Compute Engine, GKE, Cloud Run, App Engine
  • gcloud-platform.md
    - GCP概述、gcloud CLI、认证
  • gcloud-services.md
    - Compute Engine、GKE、Cloud Run、App Engine

CI/CD & GitOps

CI/CD与GitOps

  • cicd-github-actions.md
    - GitHub Actions workflows, runners, secrets
  • cicd-gitlab.md
    - GitLab CI/CD pipelines, artifacts, caching
  • gitops-argocd.md
    - ArgoCD setup, app of apps pattern, sync policies
  • gitops-flux.md
    - Flux controllers, GitOps toolkit, multi-tenancy
  • cicd-github-actions.md
    - GitHub Actions工作流、运行器、密钥
  • cicd-gitlab.md
    - GitLab CI/CD流水线、制品、缓存
  • gitops-argocd.md
    - ArgoCD设置、应用集模式、同步策略
  • gitops-flux.md
    - Flux控制器、GitOps工具包、多租户

FinOps (Cost Optimization)

FinOps(成本优化)

  • finops-basics.md
    - Cost optimization principles, FinOps lifecycle
  • finops-aws.md
    - AWS cost optimization, RI, savings plans, spot
  • finops-azure.md
    - Azure cost management, reservations, hybrid benefit
  • finops-gcp.md
    - GCP cost optimization, committed use, sustained use
  • finops-tools.md
    - Cost analysis tools, Kubecost, CloudHealth, Infracost
  • finops-basics.md
    - 成本优化原则、FinOps生命周期
  • finops-aws.md
    - AWS成本优化、预留实例、节省计划、竞价实例
  • finops-azure.md
    - Azure成本管理、预留实例、混合权益
  • finops-gcp.md
    - GCP成本优化、承诺使用、持续使用
  • finops-tools.md
    - 成本分析工具、Kubecost、CloudHealth、Infracost

DevSecOps (Security)

DevSecOps(安全)

  • devsecops-basics.md
    - Security best practices, shift-left security
  • devsecops-scanning.md
    - SAST, DAST, SCA, container scanning
  • secrets-management.md
    - Vault, AWS Secrets Manager, sealed secrets
  • compliance.md
    - SOC2, HIPAA, PCI-DSS, audit logging
  • devsecops-basics.md
    - 安全最佳实践、左移安全
  • devsecops-scanning.md
    - SAST、DAST、SCA、容器扫描
  • secrets-management.md
    - Vault、AWS Secrets Manager、加密密钥
  • compliance.md
    - SOC2、HIPAA、PCI-DSS、审计日志

Infrastructure as Code

基础设施即代码

  • terraform-basics.md
    - Terraform fundamentals, providers, state
  • terraform-advanced.md
    - Modules, workspaces, remote state
  • cloudformation-basics.md
    - CloudFormation templates, stacks, change sets
  • terraform-basics.md
    - Terraform基础、提供商、状态
  • terraform-advanced.md
    - 模块、工作区、远程状态
  • cloudformation-basics.md
    - CloudFormation模板、栈、变更集

Utilities & Scripts

工具与脚本

  • scripts/cloudflare-deploy.py
    - Automate Cloudflare Worker deployments
  • scripts/docker-optimize.py
    - Analyze and optimize Dockerfiles
  • scripts/cost-analyzer.py
    - Cloud cost analysis and reporting
  • scripts/security-scanner.py
    - Automated security scanning
  • scripts/cloudflare-deploy.py
    - 自动化Cloudflare Worker部署
  • scripts/docker-optimize.py
    - 分析与优化Dockerfile
  • scripts/cost-analyzer.py
    - 云成本分析与报告
  • scripts/security-scanner.py
    - 自动化安全扫描

Common Workflows

常见工作流

Multi-Cloud Architecture

多云架构

yaml
undefined
yaml
undefined

Edge Layer: Cloudflare Workers (global routing, caching)

边缘层:Cloudflare Workers(全局路由、缓存)

Compute Layer: AWS ECS/Lambda or Azure App Service (application logic)

计算层:AWS ECS/Lambda或Azure应用服务(应用逻辑)

Data Layer: AWS RDS or Azure SQL (persistent storage)

数据层:AWS RDS或Azure SQL(持久化存储)

CDN/Storage: Cloudflare R2 or AWS S3 (static assets)

CDN/存储:Cloudflare R2或AWS S3(静态资源)

Benefits:
  • Best-of-breed services per layer
  • Geographic redundancy
  • Cost optimization across providers
undefined
优势:
  • 各层选用最优服务
  • 地理冗余
  • 跨提供商成本优化
undefined

AWS ECS Deployment with CI/CD

基于CI/CD的AWS ECS部署

yaml
undefined
yaml
undefined

GitHub Actions workflow

GitHub Actions工作流

name: Deploy to ECS on: push jobs: deploy: - Build Docker image - Push to ECR - Update ECS task definition - Deploy to ECS service - Wait for deployment stabilization
undefined
name: Deploy to ECS on: push jobs: deploy: - 构建Docker镜像 - 推送至ECR - 更新ECS任务定义 - 部署至ECS服务 - 等待部署稳定
undefined

Kubernetes GitOps with ArgoCD

基于ArgoCD的Kubernetes GitOps

yaml
undefined
yaml
undefined

Git repository structure

Git仓库结构

/apps /production - deployment.yaml - service.yaml - ingress.yaml /staging - deployment.yaml
/apps /production - deployment.yaml - service.yaml - ingress.yaml /staging - deployment.yaml

ArgoCD syncs cluster state from Git

ArgoCD从Git同步集群状态

Changes: Git commit → ArgoCD detects → Auto-sync to cluster

变更流程:Git提交 → ArgoCD检测 → 自动同步至集群

undefined
undefined

Multi-Stage Docker Build

多阶段Docker构建

dockerfile
undefined
dockerfile
undefined

Build stage

构建阶段

FROM node:20-alpine AS build WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . RUN npm run build
FROM node:20-alpine AS build WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . RUN npm run build

Production stage

生产阶段

FROM node:20-alpine WORKDIR /app COPY --from=build /app/dist ./dist COPY --from=build /app/node_modules ./node_modules USER node CMD ["node", "dist/server.js"]
undefined
FROM node:20-alpine WORKDIR /app COPY --from=build /app/dist ./dist COPY --from=build /app/node_modules ./node_modules USER node CMD ["node", "dist/server.js"]
undefined

FinOps Cost Optimization Workflow

FinOps成本优化工作流

yaml
undefined
yaml
undefined

1. Discovery: Identify untagged resources

1. 发现:识别未标记资源

2. Analysis: Right-size instances (CPU/memory utilization)

2. 分析:分析资源利用率,调整资源规格

3. Optimization:

3. 优化:

- Convert to reserved instances (predictable workloads)

- 转换为预留实例(可预测工作负载)

- Use spot instances (fault-tolerant workloads)

- 使用竞价实例(容错工作负载)

- Schedule start/stop (dev environments)

- 调度启停(开发环境非工作时间)

4. Monitoring: Set budget alerts, track savings

4. 监控:设置预算告警,跟踪节省金额

5. Governance: Enforce tagging policies

5. 治理:强制实施标记策略

undefined
undefined

DevSecOps Security Pipeline

DevSecOps安全流水线

yaml
undefined
yaml
undefined

1. Code Commit

1. 代码提交

2. SAST Scan: SonarQube, Semgrep (static code analysis)

2. SAST扫描:SonarQube、Semgrep(静态代码分析)

3. Dependency Check: Snyk, Trivy (vulnerability scanning)

3. 依赖检查:Snyk、Trivy(漏洞扫描)

4. Build: Docker image

4. 构建:Docker镜像

5. Container Scan: Trivy, Grype (image vulnerabilities)

5. 容器扫描:Trivy、Grype(镜像漏洞)

6. DAST Scan: OWASP ZAP (runtime security testing)

6. DAST扫描:OWASP ZAP(运行时安全测试)

7. Deploy: Only if all scans pass

7. 部署:仅当所有扫描通过时执行

8. Runtime Protection: Falco, AWS GuardDuty

8. 运行时防护:Falco、AWS GuardDuty

undefined
undefined

Terraform Infrastructure Deployment

Terraform基础设施部署

hcl
undefined
hcl
undefined

1. Write: Define infrastructure in .tf files

1. 编写:在.tf文件中定义基础设施

2. Init: terraform init (download providers)

2. 初始化:terraform init(下载提供商)

3. Plan: terraform plan (preview changes)

3. 计划:terraform plan(预览变更)

4. Apply: terraform apply (create/update resources)

4. 应用:terraform apply(创建/更新资源)

5. State: Store state in S3 with DynamoDB locking

5. 状态:将状态存储在S3并使用DynamoDB锁

6. Modules: Reuse common patterns across environments

6. 模块:跨环境复用通用模式

undefined
undefined

Best Practices

最佳实践

DevOps

DevOps

  • CI/CD: Automate testing and deployment, use feature flags for progressive rollouts
  • GitOps: Declarative infrastructure, Git as single source of truth, automated sync
  • Monitoring: Implement observability (logs, metrics, traces), set up alerting
  • Incident Management: Runbooks, postmortems, blameless culture
  • Automation: Infrastructure as code, configuration management, self-service platforms
  • CI/CD:自动化测试与部署,使用特性标志实现渐进式发布
  • GitOps:声明式基础设施,Git作为唯一可信源,自动同步
  • 监控:实现可观测性(日志、指标、追踪),设置告警
  • 事件管理:运行手册、事后复盘、无指责文化
  • 自动化:基础设施即代码、配置管理、自助服务平台

Security (DevSecOps)

安全(DevSecOps)

  • Shift Left: Security scanning early in pipeline (SAST, dependency checks)
  • Secrets Management: Use Vault, AWS Secrets Manager, or sealed secrets (never in code/Git)
  • Container Security: Run as non-root, minimal base images, regular scanning
  • Network Security: Zero-trust architecture, service mesh, network policies
  • Access Control: Least privilege IAM, MFA, temporary credentials
  • Compliance: Audit logging, encryption at rest/transit, regular security reviews
  • Runtime Protection: Security monitoring, intrusion detection, automated response
  • 左移安全:在流水线早期执行安全扫描(SAST、依赖检查)
  • 密钥管理:使用Vault、AWS Secrets Manager或加密密钥(绝不要存于代码/Git)
  • 容器安全:以非root用户运行、使用最小基础镜像、定期扫描
  • 网络安全:零信任架构、服务网格、网络策略
  • 访问控制:最小权限IAM、多因素认证、临时凭证
  • 合规性:审计日志、静态/传输加密、定期安全审查
  • 运行时防护:安全监控、入侵检测、自动化响应

Cost Optimization (FinOps)

成本优化(FinOps)

  • Tagging: Enforce resource tagging for cost allocation and tracking
  • Rightsizing: Analyze utilization, downsize over-provisioned resources
  • Reserved Capacity: Purchase RI/savings plans for predictable workloads (up to 72% discount)
  • Spot/Preemptible: Use for fault-tolerant workloads (up to 90% discount)
  • Scheduling: Auto-stop dev/test environments during off-hours
  • Storage Optimization: Lifecycle policies, archive to cheaper tiers, delete orphaned resources
  • Monitoring: Budget alerts, cost anomaly detection, chargeback/showback
  • Governance: Approval workflows for expensive resources, quota management
  • 资源标记:强制实施资源标记以实现成本分配与追踪
  • 规格调整:分析利用率,缩小过度配置的资源
  • 预留容量:为可预测工作负载购买预留实例/节省计划(最高72%折扣)
  • 竞价/抢占式实例:用于容错工作负载(最高90%折扣)
  • 调度启停:非工作时间自动停止开发/测试环境
  • 存储优化:生命周期策略、归档至低成本层、删除孤立资源
  • 监控:预算告警、成本异常检测、成本分摊/展示
  • 治理:高成本资源审批流程、配额管理

Kubernetes

Kubernetes

  • Resource Management: Set requests/limits, use horizontal pod autoscaling
  • High Availability: Multi-zone clusters, pod disruption budgets, anti-affinity rules
  • Security: RBAC, pod security policies, network policies, admission controllers
  • Observability: Prometheus metrics, distributed tracing, centralized logging
  • GitOps: ArgoCD/Flux for declarative deployments, automatic drift correction
  • 资源管理:设置请求/限制,使用水平Pod自动扩缩容
  • 高可用性:多可用区集群、Pod中断预算、反亲和性规则
  • 安全:RBAC、Pod安全策略、网络策略、准入控制器
  • 可观测性:Prometheus指标、分布式追踪、集中式日志
  • GitOps:使用ArgoCD/Flux实现声明式部署,自动修正漂移

Performance

性能优化

  • Compute: Auto-scaling, load balancing, multi-region for low latency
  • Caching: CDN, in-memory caching (Redis/Memcached), edge computing
  • Storage: Choose appropriate tier (SSD vs HDD), enable caching, CDN for static assets
  • Containers: Multi-stage builds, minimal images, layer caching
  • Databases: Connection pooling, read replicas, query optimization, indexing
  • 计算:自动扩缩容、负载均衡、多区域部署降低延迟
  • 缓存:CDN、内存缓存(Redis/Memcached)、边缘计算
  • 存储:选择合适层级(SSD vs HDD)、启用缓存、静态资源使用CDN
  • 容器:多阶段构建、最小镜像、层缓存
  • 数据库:连接池、只读副本、查询优化、索引

Development

开发流程

  • Local Development: Docker Compose for consistent environments, dev containers
  • Testing: Unit, integration, end-to-end tests in CI/CD pipeline
  • Infrastructure as Code: Terraform/CloudFormation for repeatability
  • Documentation: Architecture diagrams, runbooks, API documentation
  • Version Control: Git for code and infrastructure, semantic versioning
  • 本地开发:使用Docker Compose保证环境一致性,开发容器
  • 测试:在CI/CD流水线中执行单元、集成、端到端测试
  • 基础设施即代码:使用Terraform/CloudFormation保证可重复性
  • 文档:架构图、运行手册、API文档
  • 版本控制:Git管理代码与基础设施,语义化版本

Decision Matrix

决策矩阵

NeedChoose
Compute
Sub-50ms latency globallyCloudflare Workers
Serverless functions (AWS ecosystem)AWS Lambda
Serverless functions (Azure ecosystem)Azure Functions
Containerized workloads (managed)AWS ECS/Fargate, Azure AKS, GCP Cloud Run
Kubernetes at scaleAWS EKS, Azure AKS, GCP GKE
VMs with full controlAWS EC2, Azure VMs, GCP Compute Engine
Storage
Object storage (S3-compatible)AWS S3, Cloudflare R2 (zero egress), Azure Blob
Block storage for VMsAWS EBS, Azure Managed Disks, GCP Persistent Disk
File storage (NFS/SMB)AWS EFS, Azure Files, GCP Filestore
Database
Managed SQL (AWS)AWS RDS (PostgreSQL, MySQL, SQL Server)
Managed SQL (Azure)Azure SQL Database
Managed SQL (GCP)Cloud SQL
NoSQL key-valueAWS DynamoDB, Azure Cosmos DB, Cloudflare KV
Global SQL (edge reads)Cloudflare D1, AWS Aurora Global
CI/CD & GitOps
GitHub-integrated CI/CDGitHub Actions
Self-hosted CI/CDGitLab CI/CD, Jenkins
Kubernetes GitOpsArgoCD, Flux
Cost Optimization
Predictable workloadsReserved Instances, Savings Plans
Fault-tolerant workloadsSpot Instances (AWS), Preemptible VMs (GCP)
Dev/test environmentsAuto-scheduling, budget alerts
Security
Secrets managementHashiCorp Vault, AWS Secrets Manager, Azure Key Vault
Container scanningTrivy, Snyk, AWS ECR scanning
SAST/DASTSonarQube, Semgrep, OWASP ZAP
Special Use Cases
Static site + edge functionsCloudflare Pages, AWS Amplify
WebSocket/real-timeCloudflare Durable Objects, AWS API Gateway WebSocket
ML/AI pipelinesAWS SageMaker, GCP Vertex AI, Azure ML
Browser automationCloudflare Browser Rendering, AWS Lambda + Puppeteer
需求选择方案
计算
全球范围亚50ms延迟Cloudflare Workers
无服务器函数(AWS生态)AWS Lambda
无服务器函数(Azure生态)Azure Functions
容器化工作负载(托管)AWS ECS/Fargate、Azure AKS、GCP Cloud Run
大规模KubernetesAWS EKS、Azure AKS、GCP GKE
完全可控的虚拟机AWS EC2、Azure VMs、GCP Compute Engine
存储
对象存储(兼容S3)AWS S3、Cloudflare R2(零出口费)、Azure Blob
虚拟机块存储AWS EBS、Azure托管磁盘、GCP持久磁盘
文件存储(NFS/SMB)AWS EFS、Azure Files、GCP Filestore
数据库
托管SQL(AWS)AWS RDS(PostgreSQL、MySQL、SQL Server)
托管SQL(Azure)Azure SQL数据库
托管SQL(GCP)Cloud SQL
NoSQL键值存储AWS DynamoDB、Azure Cosmos DB、Cloudflare KV
全局SQL(边缘读取)Cloudflare D1、AWS Aurora Global
CI/CD与GitOps
集成GitHub的CI/CDGitHub Actions
自托管CI/CDGitLab CI/CD、Jenkins
Kubernetes GitOpsArgoCD、Flux
成本优化
可预测工作负载预留实例、节省计划
容错工作负载竞价实例(AWS)、抢占式虚拟机(GCP)
开发/测试环境自动调度、预算告警
安全
密钥管理HashiCorp Vault、AWS Secrets Manager、Azure Key Vault
容器扫描Trivy、Snyk、AWS ECR扫描
SAST/DASTSonarQube、Semgrep、OWASP ZAP
特殊场景
静态站点+边缘函数Cloudflare Pages、AWS Amplify
WebSocket/实时应用Cloudflare Durable Objects、AWS API Gateway WebSocket
ML/AI流水线AWS SageMaker、GCP Vertex AI、Azure ML
浏览器自动化Cloudflare Browser Rendering、AWS Lambda + Puppeteer

Resources

资源链接

Cloud Providers

云提供商文档

Container & Orchestration

容器与编排文档

CI/CD & GitOps

CI/CD与GitOps文档

Infrastructure as Code

基础设施即代码文档

Security & Compliance

安全与合规文档

FinOps & Cost Optimization

FinOps与成本优化文档

Implementation Checklist

实施检查清单

AWS Lambda Deployment

AWS Lambda部署

  • Install AWS CLI and SAM CLI
  • Configure AWS credentials (access key, secret key)
  • Create Lambda function with SAM template
  • Configure IAM role and policies
  • Test locally with
    sam local invoke
  • Deploy with
    sam deploy
  • Set up CloudWatch monitoring and alarms
  • 安装AWS CLI和SAM CLI
  • 配置AWS凭证(访问密钥、密钥)
  • 使用SAM模板创建Lambda函数
  • 配置IAM角色与策略
  • 使用
    sam local invoke
    本地测试
  • 使用
    sam deploy
    部署
  • 设置CloudWatch监控与告警

AWS EKS Kubernetes Cluster

AWS EKS Kubernetes集群

  • Install kubectl, eksctl, aws-cli
  • Configure AWS credentials
  • Create EKS cluster with eksctl
  • Configure kubectl context
  • Install cluster autoscaler
  • Set up Helm for package management
  • Deploy applications with kubectl/Helm
  • Configure ingress controller (ALB/NGINX)
  • 安装kubectl、eksctl、aws-cli
  • 配置AWS凭证
  • 使用eksctl创建EKS集群
  • 配置kubectl上下文
  • 安装集群自动扩缩容器
  • 设置Helm包管理
  • 使用kubectl/Helm部署应用
  • 配置Ingress控制器(ALB/NGINX)

Azure Deployment

Azure部署

  • Install Azure CLI
  • Login with
    az login
  • Create resource group
  • Deploy App Service or AKS
  • Configure continuous deployment
  • Set up monitoring with Application Insights
  • 安装Azure CLI
  • 使用
    az login
    登录
  • 创建资源组
  • 部署应用服务或AKS
  • 配置持续部署
  • 使用Application Insights设置监控

Kubernetes on Any Cloud

任意云平台的Kubernetes

  • Install kubectl and helm
  • Connect to cluster (update kubeconfig)
  • Create namespaces for environments
  • Apply RBAC policies
  • Deploy applications (deployments, services)
  • Configure ingress for external access
  • Set up monitoring (Prometheus, Grafana)
  • Implement GitOps with ArgoCD/Flux
  • 安装kubectl和helm
  • 连接集群(更新kubeconfig)
  • 为环境创建命名空间
  • 应用RBAC策略
  • 部署应用(部署、服务)
  • 配置Ingress实现外部访问
  • 设置监控(Prometheus、Grafana)
  • 使用ArgoCD/Flux实施GitOps

CI/CD Pipeline (GitHub Actions)

CI/CD流水线(GitHub Actions)

  • Create .github/workflows/deploy.yml
  • Configure secrets (cloud credentials, API keys)
  • Add build and test jobs
  • Add container build and push to registry
  • Add deployment job to cloud platform
  • Set up branch protection rules
  • Enable status checks and notifications
  • 创建.github/workflows/deploy.yml
  • 配置密钥(云凭证、API密钥)
  • 添加构建与测试任务
  • 添加容器构建与推送至仓库的任务
  • 添加部署至云平台的任务
  • 设置分支保护规则
  • 启用状态检查与通知

FinOps Cost Optimization

FinOps成本优化

  • Implement resource tagging strategy
  • Enable cost allocation tags
  • Set up budget alerts
  • Analyze resource utilization (CloudWatch, Azure Monitor)
  • Identify rightsizing opportunities
  • Purchase reserved instances for predictable workloads
  • Configure auto-scaling and scheduling
  • Regular cost reviews and optimization
  • 实施资源标记策略
  • 启用成本分配标记
  • 设置预算告警
  • 分析资源利用率(CloudWatch、Azure Monitor)
  • 识别规格调整机会
  • 为可预测工作负载购买预留实例
  • 配置自动扩缩容与调度
  • 定期成本审查与优化

DevSecOps Security

DevSecOps安全

  • Add SAST scanning to CI/CD (SonarQube, Semgrep)
  • Add dependency scanning (Snyk, Trivy)
  • Implement container image scanning
  • Set up secrets management (Vault, cloud provider)
  • Configure security groups and network policies
  • Enable audit logging
  • Implement security monitoring and alerting
  • Regular vulnerability assessments
  • 在CI/CD中添加SAST扫描(SonarQube、Semgrep)
  • 添加依赖扫描(Snyk、Trivy)
  • 实施容器镜像扫描
  • 设置密钥管理(Vault、云提供商服务)
  • 配置安全组与网络策略
  • 启用审计日志
  • 配置安全监控与告警
  • 定期漏洞评估

Cloudflare Workers

Cloudflare Workers

  • Install Wrangler CLI
  • Create Worker project
  • Configure wrangler.toml (bindings, routes)
  • Test locally with
    wrangler dev
  • Deploy with
    wrangler deploy
  • 安装Wrangler CLI
  • 创建Worker项目
  • 配置wrangler.toml(绑定、路由)
  • 使用
    wrangler dev
    本地测试
  • 使用
    wrangler deploy
    部署

Docker

Docker

  • Write Dockerfile with multi-stage builds
  • Create .dockerignore file
  • Test build locally
  • Push to registry (ECR, ACR, GCR, Docker Hub)
  • Deploy to target platform
  • 编写包含多阶段构建的Dockerfile
  • 创建.dockerignore文件
  • 本地测试构建
  • 推送至仓库(ECR、ACR、GCR、Docker Hub)
  • 部署至目标平台