infra-engineer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseInfrastructure Engineering Skill
基础设施工程技能指南
Comprehensive guide for modern infrastructure engineering covering DevOps practices, multi-cloud platforms (AWS, Azure, GCP, Cloudflare), FinOps cost optimization, and DevSecOps security practices.
本指南全面介绍现代基础设施工程,涵盖DevOps实践、多云平台(AWS、Azure、GCP、Cloudflare)、FinOps成本优化及DevSecOps安全实践。
When to Use This Skill
何时使用本技能
Use this skill when:
- DevOps: Setting up CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins), implementing GitOps workflows (ArgoCD, Flux)
- AWS: Deploying to EC2, Lambda, ECS, EKS, managing S3, RDS, using CloudFormation/CDK
- Azure: Working with Azure VMs, App Service, AKS, Azure Functions, Storage Accounts
- GCP: Managing Compute Engine, GKE, Cloud Run, Cloud Storage, App Engine
- Cloudflare: Deploying Workers, R2 storage, D1 databases, Pages applications
- Kubernetes: Managing clusters, deployments, services, ingress, Helm charts, operators
- Docker: Containerizing applications, multi-stage builds, Docker Compose, registries
- FinOps: Analyzing cloud costs, optimizing spend, reserved instances, spot instances, rightsizing
- DevSecOps: Security scanning (SAST/DAST), vulnerability management, secrets management, compliance
- IaC: Terraform, CloudFormation, Pulumi, configuration management
- Monitoring: Setting up observability, logging, metrics, alerting, distributed tracing
在以下场景使用本技能:
- DevOps:搭建CI/CD流水线(GitHub Actions、GitLab CI、Jenkins)、实施GitOps工作流(ArgoCD、Flux)
- AWS:部署至EC2、Lambda、ECS、EKS,管理S3、RDS,使用CloudFormation/CDK
- Azure:使用Azure虚拟机、应用服务、AKS、Azure Functions、存储账户
- GCP:管理Compute Engine、GKE、Cloud Run、Cloud Storage、App Engine
- Cloudflare:部署Workers、R2存储、D1数据库、Pages应用
- Kubernetes:管理集群、部署、服务、Ingress、Helm Chart、Operator
- Docker:应用容器化、多阶段构建、Docker Compose、镜像仓库
- FinOps:分析云成本、优化开支、预留实例、竞价实例、资源合理配置
- DevSecOps:安全扫描(SAST/DAST)、漏洞管理、密钥管理、合规性
- IaC:Terraform、CloudFormation、Pulumi、配置管理
- 监控:搭建可观测性系统、日志、指标、告警、分布式追踪
Platform Selection Guide
平台选择指南
When to Use AWS
何时选择AWS
Best For:
- General-purpose cloud computing at scale
- Mature ecosystem with 200+ services
- Enterprise workloads with compliance requirements
- Hybrid cloud with AWS Outposts
- Extensive third-party integrations
- Advanced networking and security controls
Key Services:
- EC2 (virtual machines, flexible compute)
- Lambda (serverless functions, event-driven)
- ECS/EKS (container orchestration)
- S3 (object storage, industry standard)
- RDS (managed relational databases)
- DynamoDB (NoSQL, global tables)
- CloudFormation/CDK (infrastructure as code)
- IAM (identity and access management)
- VPC (virtual private cloud networking)
Cost Profile: Pay-as-you-go, reserved instances (up to 72% discount), savings plans, spot instances (up to 90% discount)
最佳适用场景:
- 大规模通用云计算需求
- 拥有200+服务的成熟生态系统
- 有合规要求的企业级工作负载
- 基于AWS Outposts的混合云场景
- 丰富的第三方集成
- 高级网络与安全控制
核心服务:
- EC2(虚拟机,灵活计算)
- Lambda(无服务器函数,事件驱动)
- ECS/EKS(容器编排)
- S3(对象存储,行业标准)
- RDS(托管关系型数据库)
- DynamoDB(NoSQL,全局表)
- CloudFormation/CDK(基础设施即代码)
- IAM(身份与访问管理)
- VPC(虚拟私有云网络)
成本模式: 按需付费、预留实例(最高72%折扣)、节省计划、竞价实例(最高90%折扣)
When to Use Azure
何时选择Azure
Best For:
- Microsoft-centric organizations (.NET, Active Directory)
- Hybrid cloud scenarios (Azure Arc, Stack)
- Enterprise agreements with Microsoft
- Windows Server and SQL Server workloads
- Integration with Microsoft 365 and Dynamics
- Strong compliance certifications (90+ certifications)
Key Services:
- Virtual Machines (Windows/Linux compute)
- App Service (PaaS for web apps)
- AKS (managed Kubernetes)
- Azure Functions (serverless compute)
- Storage Accounts (Blob, File, Queue, Table)
- SQL Database (managed SQL Server)
- Active Directory (identity management)
- ARM Templates/Bicep (infrastructure as code)
Cost Profile: Pay-as-you-go, reserved instances, Azure Hybrid Benefit for Windows/SQL Server licenses
最佳适用场景:
- 微软技术栈为主的企业(.NET、Active Directory)
- 混合云场景(Azure Arc、Stack)
- 与微软签订企业协议的客户
- Windows Server和SQL Server工作负载
- 与Microsoft 365及Dynamics集成
- 丰富的合规认证(90+项)
核心服务:
- 虚拟机(Windows/Linux计算)
- 应用服务(Web应用PaaS)
- AKS(托管Kubernetes)
- Azure Functions(无服务器计算)
- 存储账户(Blob、文件、队列、表)
- SQL数据库(托管SQL Server)
- Active Directory(身份管理)
- ARM模板/Bicep(基础设施即代码)
成本模式: 按需付费、预留实例、Windows/SQL Server许可证的Azure混合权益
When to Use Cloudflare
何时选择Cloudflare
Best For:
- Edge-first applications with global distribution
- Ultra-low latency requirements (<50ms)
- Static sites with serverless functions
- Zero egress cost scenarios (R2 storage)
- WebSocket/real-time applications (Durable Objects)
- AI/ML at the edge (Workers AI)
Key Products:
- Workers (serverless functions)
- R2 (object storage, S3-compatible)
- D1 (SQLite database with global replication)
- KV (key-value store)
- Pages (static hosting + functions)
- Durable Objects (stateful compute)
- Browser Rendering (headless browser automation)
Cost Profile: Pay-per-request, generous free tier, zero egress fees
最佳适用场景:
- 以边缘为核心的全球分布式应用
- 超低延迟需求(<50ms)
- 静态站点+无服务器函数
- 零出口成本场景(R2存储)
- WebSocket/实时应用(Durable Objects)
- 边缘AI/ML(Workers AI)
核心产品:
- Workers(无服务器函数)
- R2(对象存储,兼容S3)
- D1(SQLite数据库,全局复制)
- KV(键值存储)
- Pages(静态托管+函数)
- Durable Objects(有状态计算)
- Browser Rendering(无头浏览器自动化)
成本模式: 按请求付费、免费额度充足、零出口费用
When to Use Kubernetes
何时选择Kubernetes
Best For:
- Container orchestration at scale
- Microservices architectures with 10+ services
- Multi-cloud and hybrid deployments
- Self-healing and auto-scaling workloads
- Complex deployment strategies (blue/green, canary)
- Service mesh architectures (Istio, Linkerd)
- Stateful applications with operators
Key Features:
- Declarative configuration (YAML manifests)
- Automated rollouts and rollbacks
- Service discovery and load balancing
- Self-healing (restarts failed containers)
- Horizontal pod autoscaling
- Secret and configuration management
- Storage orchestration
- Batch job execution
Managed Options: EKS (AWS), AKS (Azure), GKE (GCP), managed k8s providers
Cost Profile: Cluster management fees + node costs (optimize with spot instances, cluster autoscaling)
最佳适用场景:
- 大规模容器编排
- 包含10+服务的微服务架构
- 多云与混合云部署
- 自修复与自动扩缩容工作负载
- 复杂部署策略(蓝绿、金丝雀)
- 服务网格架构(Istio、Linkerd)
- 使用Operator的有状态应用
核心特性:
- 声明式配置(YAML清单)
- 自动化发布与回滚
- 服务发现与负载均衡
- 自修复(重启故障容器)
- 水平Pod自动扩缩容
- 密钥与配置管理
- 存储编排
- 批处理作业执行
托管选项: EKS(AWS)、AKS(Azure)、GKE(GCP)、其他托管k8s服务商
成本模式: 集群管理费用+节点成本(可通过竞价实例、集群自动扩缩容优化)
When to Use Docker
何时选择Docker
Best For:
- Local development consistency
- Microservices architectures
- Multi-language stack applications
- Traditional VPS/VM deployments
- Foundation for Kubernetes workloads
- CI/CD build environments
- Database containerization (dev/test)
Key Capabilities:
- Application isolation and portability
- Multi-stage builds for optimization
- Docker Compose for multi-container apps
- Volume management for data persistence
- Network configuration and service discovery
- Cross-platform compatibility (amd64, arm64)
- BuildKit for improved build performance
Cost Profile: Infrastructure cost only (compute + storage), no orchestration overhead
最佳适用场景:
- 本地开发环境一致性保障
- 微服务架构
- 多语言栈应用
- 传统VPS/VM部署
- Kubernetes工作负载的基础
- CI/CD构建环境
- 数据库容器化(开发/测试)
核心能力:
- 应用隔离与可移植性
- 多阶段构建优化
- Docker Compose用于多容器应用
- 数据持久化的卷管理
- 网络配置与服务发现
- 跨平台兼容性(amd64、arm64)
- BuildKit提升构建性能
成本模式: 仅需基础设施成本(计算+存储),无编排开销
When to Use Google Cloud
何时选择Google Cloud
Best For:
- Enterprise-scale applications
- Data analytics and ML pipelines (BigQuery, Vertex AI)
- Hybrid/multi-cloud deployments
- Kubernetes at scale (GKE)
- Managed databases (Cloud SQL, Firestore, Spanner)
- Complex IAM and compliance requirements
Key Services:
- Compute Engine (VMs)
- GKE (managed Kubernetes)
- Cloud Run (containerized serverless)
- App Engine (PaaS)
- Cloud Storage (object storage)
- Cloud SQL (managed databases)
Cost Profile: Varied pricing, sustained use discounts, committed use contracts
最佳适用场景:
- 企业级规模应用
- 数据分析与ML流水线(BigQuery、Vertex AI)
- 混合/多云部署
- 大规模Kubernetes(GKE)
- 托管数据库(Cloud SQL、Firestore、Spanner)
- 复杂IAM与合规要求
核心服务:
- Compute Engine(虚拟机)
- GKE(托管Kubernetes)
- Cloud Run(容器化无服务器)
- App Engine(PaaS)
- Cloud Storage(对象存储)
- Cloud SQL(托管数据库)
成本模式: 多样化定价、持续使用折扣、承诺使用合同
Quick Start
快速开始
AWS Lambda Function
AWS Lambda函数
bash
undefinedbash
undefinedInstall AWS CLI
安装AWS CLI
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip && sudo ./aws/install
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip && sudo ./aws/install
Configure credentials
配置凭证
aws configure
aws configure
Create Lambda function with SAM
使用SAM创建Lambda函数
sam init --runtime python3.11
sam build && sam deploy --guided
See: `references/aws-lambda.md`sam init --runtime python3.11
sam build && sam deploy --guided
参考:`references/aws-lambda.md`AWS EKS Kubernetes Cluster
AWS EKS Kubernetes集群
bash
undefinedbash
undefinedInstall eksctl
安装eksctl
brew install eksctl # or curl download
brew install eksctl # 或通过curl下载
Create cluster
创建集群
eksctl create cluster
--name my-cluster
--region us-west-2
--nodegroup-name standard-workers
--node-type t3.medium
--nodes 3
--nodes-min 1
--nodes-max 4
--name my-cluster
--region us-west-2
--nodegroup-name standard-workers
--node-type t3.medium
--nodes 3
--nodes-min 1
--nodes-max 4
See: `references/kubernetes-basics.md`eksctl create cluster
--name my-cluster
--region us-west-2
--nodegroup-name standard-workers
--node-type t3.medium
--nodes 3
--nodes-min 1
--nodes-max 4
--name my-cluster
--region us-west-2
--nodegroup-name standard-workers
--node-type t3.medium
--nodes 3
--nodes-min 1
--nodes-max 4
参考:`references/kubernetes-basics.md`Azure Deployment
Azure部署
bash
undefinedbash
undefinedInstall Azure CLI
安装Azure CLI
curl -L https://aka.ms/InstallAzureCli | bash
curl -L https://aka.ms/InstallAzureCli | bash
Login and create resources
登录并创建资源
az login
az group create --name myResourceGroup --location eastus
az webapp create --resource-group myResourceGroup
--name myapp --runtime "NODE:18-lts"
--name myapp --runtime "NODE:18-lts"
See: `references/azure-basics.md`az login
az group create --name myResourceGroup --location eastus
az webapp create --resource-group myResourceGroup
--name myapp --runtime "NODE:18-lts"
--name myapp --runtime "NODE:18-lts"
参考:`references/azure-basics.md`Cloudflare Workers
Cloudflare Workers
bash
undefinedbash
undefinedInstall Wrangler CLI
安装Wrangler CLI
npm install -g wrangler
npm install -g wrangler
Create and deploy Worker
创建并部署Worker
wrangler init my-worker
cd my-worker
wrangler deploy
See: `references/cloudflare-workers-basics.md`wrangler init my-worker
cd my-worker
wrangler deploy
参考:`references/cloudflare-workers-basics.md`Kubernetes Deployment
Kubernetes部署
bash
undefinedbash
undefinedCreate deployment
创建部署
kubectl create deployment nginx --image=nginx:latest
kubectl expose deployment nginx --port=80 --type=LoadBalancer
kubectl create deployment nginx --image=nginx:latest
kubectl expose deployment nginx --port=80 --type=LoadBalancer
Apply from manifest
通过清单部署
kubectl apply -f deployment.yaml
kubectl apply -f deployment.yaml
Check status
检查状态
kubectl get pods,services,deployments
See: `references/kubernetes-basics.md`kubectl get pods,services,deployments
参考:`references/kubernetes-basics.md`Docker Container
Docker容器
bash
undefinedbash
undefinedCreate Dockerfile
创建Dockerfile
cat > Dockerfile <<EOF
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]
EOF
cat > Dockerfile <<EOF
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]
EOF
Build and run
构建并运行
docker build -t myapp .
docker run -p 3000:3000 myapp
See: `references/docker-basics.md`docker build -t myapp .
docker run -p 3000:3000 myapp
参考:`references/docker-basics.md`Reference Navigation
参考文档导航
AWS (Amazon Web Services)
AWS(亚马逊云科技)
- - AWS fundamentals, account setup, IAM basics
aws-overview.md - - EC2 instances, AMIs, security groups, auto-scaling
aws-ec2.md - - Serverless functions, SAM, event sources, layers
aws-lambda.md - - Container orchestration, ECS vs EKS, Fargate
aws-ecs-eks.md - - S3 storage, RDS databases, backup strategies
aws-s3-rds.md - - Infrastructure as code, CDK, best practices
aws-cloudformation.md - - VPC, subnets, security groups, load balancers
aws-networking.md
- - AWS基础、账户设置、IAM入门
aws-overview.md - - EC2实例、AMI、安全组、自动扩缩容
aws-ec2.md - - 无服务器函数、SAM、事件源、层
aws-lambda.md - - 容器编排、ECS vs EKS、Fargate
aws-ecs-eks.md - - S3存储、RDS数据库、备份策略
aws-s3-rds.md - - 基础设施即代码、CDK、最佳实践
aws-cloudformation.md - - VPC、子网、安全组、负载均衡器
aws-networking.md
Azure (Microsoft Azure)
Azure(微软Azure)
- - Azure fundamentals, subscriptions, resource groups
azure-basics.md - - VMs, App Service, AKS, Azure Functions
azure-compute.md - - Storage Accounts, Blob, Files, managed disks
azure-storage.md
- - Azure基础、订阅、资源组
azure-basics.md - - 虚拟机、应用服务、AKS、Azure Functions
azure-compute.md - - 存储账户、Blob、文件、托管磁盘
azure-storage.md
Cloudflare Platform
Cloudflare平台
- - Edge computing overview, key components
cloudflare-platform.md - - Getting started, handler types, basic patterns
cloudflare-workers-basics.md - - Advanced patterns, performance, optimization
cloudflare-workers-advanced.md - - Runtime APIs, bindings, integrations
cloudflare-workers-apis.md - - R2 object storage, S3 compatibility, best practices
cloudflare-r2-storage.md - - D1 SQLite database, KV store, use cases
cloudflare-d1-kv.md - - Puppeteer/Playwright automation on Cloudflare
browser-rendering.md
- - 边缘计算概述、核心组件
cloudflare-platform.md - - 入门指南、处理器类型、基础模式
cloudflare-workers-basics.md - - 高级模式、性能优化
cloudflare-workers-advanced.md - - 运行时API、绑定、集成
cloudflare-workers-apis.md - - R2对象存储、S3兼容性、最佳实践
cloudflare-r2-storage.md - - D1 SQLite数据库、KV存储、使用场景
cloudflare-d1-kv.md - - Cloudflare上的Puppeteer/Playwright自动化
browser-rendering.md
Kubernetes & Container Orchestration
Kubernetes与容器编排
- - Core concepts, pods, deployments, services
kubernetes-basics.md - - StatefulSets, operators, custom resources
kubernetes-advanced.md - - Ingress, service mesh, network policies
kubernetes-networking.md - - Package management, charts, repositories
helm-charts.md
- - 核心概念、Pod、部署、服务
kubernetes-basics.md - - StatefulSet、Operator、自定义资源
kubernetes-advanced.md - - Ingress、服务网格、网络策略
kubernetes-networking.md - - 包管理、Chart、仓库
helm-charts.md
Docker Containerization
Docker容器化
- - Core concepts, Dockerfile, images, containers
docker-basics.md - - Multi-container apps, networking, volumes
docker-compose.md - - Image scanning, secrets, best practices
docker-security.md
- - 核心概念、Dockerfile、镜像、容器
docker-basics.md - - 多容器应用、网络、卷
docker-compose.md - - 镜像扫描、密钥、最佳实践
docker-security.md
Google Cloud Platform
Google Cloud平台
- - GCP overview, gcloud CLI, authentication
gcloud-platform.md - - Compute Engine, GKE, Cloud Run, App Engine
gcloud-services.md
- - GCP概述、gcloud CLI、认证
gcloud-platform.md - - Compute Engine、GKE、Cloud Run、App Engine
gcloud-services.md
CI/CD & GitOps
CI/CD与GitOps
- - GitHub Actions workflows, runners, secrets
cicd-github-actions.md - - GitLab CI/CD pipelines, artifacts, caching
cicd-gitlab.md - - ArgoCD setup, app of apps pattern, sync policies
gitops-argocd.md - - Flux controllers, GitOps toolkit, multi-tenancy
gitops-flux.md
- - GitHub Actions工作流、运行器、密钥
cicd-github-actions.md - - GitLab CI/CD流水线、制品、缓存
cicd-gitlab.md - - ArgoCD设置、应用集模式、同步策略
gitops-argocd.md - - Flux控制器、GitOps工具包、多租户
gitops-flux.md
FinOps (Cost Optimization)
FinOps(成本优化)
- - Cost optimization principles, FinOps lifecycle
finops-basics.md - - AWS cost optimization, RI, savings plans, spot
finops-aws.md - - Azure cost management, reservations, hybrid benefit
finops-azure.md - - GCP cost optimization, committed use, sustained use
finops-gcp.md - - Cost analysis tools, Kubecost, CloudHealth, Infracost
finops-tools.md
- - 成本优化原则、FinOps生命周期
finops-basics.md - - AWS成本优化、预留实例、节省计划、竞价实例
finops-aws.md - - Azure成本管理、预留实例、混合权益
finops-azure.md - - GCP成本优化、承诺使用、持续使用
finops-gcp.md - - 成本分析工具、Kubecost、CloudHealth、Infracost
finops-tools.md
DevSecOps (Security)
DevSecOps(安全)
- - Security best practices, shift-left security
devsecops-basics.md - - SAST, DAST, SCA, container scanning
devsecops-scanning.md - - Vault, AWS Secrets Manager, sealed secrets
secrets-management.md - - SOC2, HIPAA, PCI-DSS, audit logging
compliance.md
- - 安全最佳实践、左移安全
devsecops-basics.md - - SAST、DAST、SCA、容器扫描
devsecops-scanning.md - - Vault、AWS Secrets Manager、加密密钥
secrets-management.md - - SOC2、HIPAA、PCI-DSS、审计日志
compliance.md
Infrastructure as Code
基础设施即代码
- - Terraform fundamentals, providers, state
terraform-basics.md - - Modules, workspaces, remote state
terraform-advanced.md - - CloudFormation templates, stacks, change sets
cloudformation-basics.md
- - Terraform基础、提供商、状态
terraform-basics.md - - 模块、工作区、远程状态
terraform-advanced.md - - CloudFormation模板、栈、变更集
cloudformation-basics.md
Utilities & Scripts
工具与脚本
- - Automate Cloudflare Worker deployments
scripts/cloudflare-deploy.py - - Analyze and optimize Dockerfiles
scripts/docker-optimize.py - - Cloud cost analysis and reporting
scripts/cost-analyzer.py - - Automated security scanning
scripts/security-scanner.py
- - 自动化Cloudflare Worker部署
scripts/cloudflare-deploy.py - - 分析与优化Dockerfile
scripts/docker-optimize.py - - 云成本分析与报告
scripts/cost-analyzer.py - - 自动化安全扫描
scripts/security-scanner.py
Common Workflows
常见工作流
Multi-Cloud Architecture
多云架构
yaml
undefinedyaml
undefinedEdge Layer: Cloudflare Workers (global routing, caching)
边缘层:Cloudflare Workers(全局路由、缓存)
Compute Layer: AWS ECS/Lambda or Azure App Service (application logic)
计算层:AWS ECS/Lambda或Azure应用服务(应用逻辑)
Data Layer: AWS RDS or Azure SQL (persistent storage)
数据层:AWS RDS或Azure SQL(持久化存储)
CDN/Storage: Cloudflare R2 or AWS S3 (static assets)
CDN/存储:Cloudflare R2或AWS S3(静态资源)
Benefits:
- Best-of-breed services per layer
- Geographic redundancy
- Cost optimization across providers
undefined优势:
- 各层选用最优服务
- 地理冗余
- 跨提供商成本优化
undefinedAWS ECS Deployment with CI/CD
基于CI/CD的AWS ECS部署
yaml
undefinedyaml
undefinedGitHub Actions workflow
GitHub Actions工作流
name: Deploy to ECS
on: push
jobs:
deploy:
- Build Docker image
- Push to ECR
- Update ECS task definition
- Deploy to ECS service
- Wait for deployment stabilization
undefinedname: Deploy to ECS
on: push
jobs:
deploy:
- 构建Docker镜像
- 推送至ECR
- 更新ECS任务定义
- 部署至ECS服务
- 等待部署稳定
undefinedKubernetes GitOps with ArgoCD
基于ArgoCD的Kubernetes GitOps
yaml
undefinedyaml
undefinedGit repository structure
Git仓库结构
/apps
/production
- deployment.yaml
- service.yaml
- ingress.yaml
/staging
- deployment.yaml
/apps
/production
- deployment.yaml
- service.yaml
- ingress.yaml
/staging
- deployment.yaml
ArgoCD syncs cluster state from Git
ArgoCD从Git同步集群状态
Changes: Git commit → ArgoCD detects → Auto-sync to cluster
变更流程:Git提交 → ArgoCD检测 → 自动同步至集群
undefinedundefinedMulti-Stage Docker Build
多阶段Docker构建
dockerfile
undefineddockerfile
undefinedBuild stage
构建阶段
FROM node:20-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:20-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
Production stage
生产阶段
FROM node:20-alpine
WORKDIR /app
COPY --from=build /app/dist ./dist
COPY --from=build /app/node_modules ./node_modules
USER node
CMD ["node", "dist/server.js"]
undefinedFROM node:20-alpine
WORKDIR /app
COPY --from=build /app/dist ./dist
COPY --from=build /app/node_modules ./node_modules
USER node
CMD ["node", "dist/server.js"]
undefinedFinOps Cost Optimization Workflow
FinOps成本优化工作流
yaml
undefinedyaml
undefined1. Discovery: Identify untagged resources
1. 发现:识别未标记资源
2. Analysis: Right-size instances (CPU/memory utilization)
2. 分析:分析资源利用率,调整资源规格
3. Optimization:
3. 优化:
- Convert to reserved instances (predictable workloads)
- 转换为预留实例(可预测工作负载)
- Use spot instances (fault-tolerant workloads)
- 使用竞价实例(容错工作负载)
- Schedule start/stop (dev environments)
- 调度启停(开发环境非工作时间)
4. Monitoring: Set budget alerts, track savings
4. 监控:设置预算告警,跟踪节省金额
5. Governance: Enforce tagging policies
5. 治理:强制实施标记策略
undefinedundefinedDevSecOps Security Pipeline
DevSecOps安全流水线
yaml
undefinedyaml
undefined1. Code Commit
1. 代码提交
2. SAST Scan: SonarQube, Semgrep (static code analysis)
2. SAST扫描:SonarQube、Semgrep(静态代码分析)
3. Dependency Check: Snyk, Trivy (vulnerability scanning)
3. 依赖检查:Snyk、Trivy(漏洞扫描)
4. Build: Docker image
4. 构建:Docker镜像
5. Container Scan: Trivy, Grype (image vulnerabilities)
5. 容器扫描:Trivy、Grype(镜像漏洞)
6. DAST Scan: OWASP ZAP (runtime security testing)
6. DAST扫描:OWASP ZAP(运行时安全测试)
7. Deploy: Only if all scans pass
7. 部署:仅当所有扫描通过时执行
8. Runtime Protection: Falco, AWS GuardDuty
8. 运行时防护:Falco、AWS GuardDuty
undefinedundefinedTerraform Infrastructure Deployment
Terraform基础设施部署
hcl
undefinedhcl
undefined1. Write: Define infrastructure in .tf files
1. 编写:在.tf文件中定义基础设施
2. Init: terraform init (download providers)
2. 初始化:terraform init(下载提供商)
3. Plan: terraform plan (preview changes)
3. 计划:terraform plan(预览变更)
4. Apply: terraform apply (create/update resources)
4. 应用:terraform apply(创建/更新资源)
5. State: Store state in S3 with DynamoDB locking
5. 状态:将状态存储在S3并使用DynamoDB锁
6. Modules: Reuse common patterns across environments
6. 模块:跨环境复用通用模式
undefinedundefinedBest Practices
最佳实践
DevOps
DevOps
- CI/CD: Automate testing and deployment, use feature flags for progressive rollouts
- GitOps: Declarative infrastructure, Git as single source of truth, automated sync
- Monitoring: Implement observability (logs, metrics, traces), set up alerting
- Incident Management: Runbooks, postmortems, blameless culture
- Automation: Infrastructure as code, configuration management, self-service platforms
- CI/CD:自动化测试与部署,使用特性标志实现渐进式发布
- GitOps:声明式基础设施,Git作为唯一可信源,自动同步
- 监控:实现可观测性(日志、指标、追踪),设置告警
- 事件管理:运行手册、事后复盘、无指责文化
- 自动化:基础设施即代码、配置管理、自助服务平台
Security (DevSecOps)
安全(DevSecOps)
- Shift Left: Security scanning early in pipeline (SAST, dependency checks)
- Secrets Management: Use Vault, AWS Secrets Manager, or sealed secrets (never in code/Git)
- Container Security: Run as non-root, minimal base images, regular scanning
- Network Security: Zero-trust architecture, service mesh, network policies
- Access Control: Least privilege IAM, MFA, temporary credentials
- Compliance: Audit logging, encryption at rest/transit, regular security reviews
- Runtime Protection: Security monitoring, intrusion detection, automated response
- 左移安全:在流水线早期执行安全扫描(SAST、依赖检查)
- 密钥管理:使用Vault、AWS Secrets Manager或加密密钥(绝不要存于代码/Git)
- 容器安全:以非root用户运行、使用最小基础镜像、定期扫描
- 网络安全:零信任架构、服务网格、网络策略
- 访问控制:最小权限IAM、多因素认证、临时凭证
- 合规性:审计日志、静态/传输加密、定期安全审查
- 运行时防护:安全监控、入侵检测、自动化响应
Cost Optimization (FinOps)
成本优化(FinOps)
- Tagging: Enforce resource tagging for cost allocation and tracking
- Rightsizing: Analyze utilization, downsize over-provisioned resources
- Reserved Capacity: Purchase RI/savings plans for predictable workloads (up to 72% discount)
- Spot/Preemptible: Use for fault-tolerant workloads (up to 90% discount)
- Scheduling: Auto-stop dev/test environments during off-hours
- Storage Optimization: Lifecycle policies, archive to cheaper tiers, delete orphaned resources
- Monitoring: Budget alerts, cost anomaly detection, chargeback/showback
- Governance: Approval workflows for expensive resources, quota management
- 资源标记:强制实施资源标记以实现成本分配与追踪
- 规格调整:分析利用率,缩小过度配置的资源
- 预留容量:为可预测工作负载购买预留实例/节省计划(最高72%折扣)
- 竞价/抢占式实例:用于容错工作负载(最高90%折扣)
- 调度启停:非工作时间自动停止开发/测试环境
- 存储优化:生命周期策略、归档至低成本层、删除孤立资源
- 监控:预算告警、成本异常检测、成本分摊/展示
- 治理:高成本资源审批流程、配额管理
Kubernetes
Kubernetes
- Resource Management: Set requests/limits, use horizontal pod autoscaling
- High Availability: Multi-zone clusters, pod disruption budgets, anti-affinity rules
- Security: RBAC, pod security policies, network policies, admission controllers
- Observability: Prometheus metrics, distributed tracing, centralized logging
- GitOps: ArgoCD/Flux for declarative deployments, automatic drift correction
- 资源管理:设置请求/限制,使用水平Pod自动扩缩容
- 高可用性:多可用区集群、Pod中断预算、反亲和性规则
- 安全:RBAC、Pod安全策略、网络策略、准入控制器
- 可观测性:Prometheus指标、分布式追踪、集中式日志
- GitOps:使用ArgoCD/Flux实现声明式部署,自动修正漂移
Performance
性能优化
- Compute: Auto-scaling, load balancing, multi-region for low latency
- Caching: CDN, in-memory caching (Redis/Memcached), edge computing
- Storage: Choose appropriate tier (SSD vs HDD), enable caching, CDN for static assets
- Containers: Multi-stage builds, minimal images, layer caching
- Databases: Connection pooling, read replicas, query optimization, indexing
- 计算:自动扩缩容、负载均衡、多区域部署降低延迟
- 缓存:CDN、内存缓存(Redis/Memcached)、边缘计算
- 存储:选择合适层级(SSD vs HDD)、启用缓存、静态资源使用CDN
- 容器:多阶段构建、最小镜像、层缓存
- 数据库:连接池、只读副本、查询优化、索引
Development
开发流程
- Local Development: Docker Compose for consistent environments, dev containers
- Testing: Unit, integration, end-to-end tests in CI/CD pipeline
- Infrastructure as Code: Terraform/CloudFormation for repeatability
- Documentation: Architecture diagrams, runbooks, API documentation
- Version Control: Git for code and infrastructure, semantic versioning
- 本地开发:使用Docker Compose保证环境一致性,开发容器
- 测试:在CI/CD流水线中执行单元、集成、端到端测试
- 基础设施即代码:使用Terraform/CloudFormation保证可重复性
- 文档:架构图、运行手册、API文档
- 版本控制:Git管理代码与基础设施,语义化版本
Decision Matrix
决策矩阵
| Need | Choose |
|---|---|
| Compute | |
| Sub-50ms latency globally | Cloudflare Workers |
| Serverless functions (AWS ecosystem) | AWS Lambda |
| Serverless functions (Azure ecosystem) | Azure Functions |
| Containerized workloads (managed) | AWS ECS/Fargate, Azure AKS, GCP Cloud Run |
| Kubernetes at scale | AWS EKS, Azure AKS, GCP GKE |
| VMs with full control | AWS EC2, Azure VMs, GCP Compute Engine |
| Storage | |
| Object storage (S3-compatible) | AWS S3, Cloudflare R2 (zero egress), Azure Blob |
| Block storage for VMs | AWS EBS, Azure Managed Disks, GCP Persistent Disk |
| File storage (NFS/SMB) | AWS EFS, Azure Files, GCP Filestore |
| Database | |
| Managed SQL (AWS) | AWS RDS (PostgreSQL, MySQL, SQL Server) |
| Managed SQL (Azure) | Azure SQL Database |
| Managed SQL (GCP) | Cloud SQL |
| NoSQL key-value | AWS DynamoDB, Azure Cosmos DB, Cloudflare KV |
| Global SQL (edge reads) | Cloudflare D1, AWS Aurora Global |
| CI/CD & GitOps | |
| GitHub-integrated CI/CD | GitHub Actions |
| Self-hosted CI/CD | GitLab CI/CD, Jenkins |
| Kubernetes GitOps | ArgoCD, Flux |
| Cost Optimization | |
| Predictable workloads | Reserved Instances, Savings Plans |
| Fault-tolerant workloads | Spot Instances (AWS), Preemptible VMs (GCP) |
| Dev/test environments | Auto-scheduling, budget alerts |
| Security | |
| Secrets management | HashiCorp Vault, AWS Secrets Manager, Azure Key Vault |
| Container scanning | Trivy, Snyk, AWS ECR scanning |
| SAST/DAST | SonarQube, Semgrep, OWASP ZAP |
| Special Use Cases | |
| Static site + edge functions | Cloudflare Pages, AWS Amplify |
| WebSocket/real-time | Cloudflare Durable Objects, AWS API Gateway WebSocket |
| ML/AI pipelines | AWS SageMaker, GCP Vertex AI, Azure ML |
| Browser automation | Cloudflare Browser Rendering, AWS Lambda + Puppeteer |
| 需求 | 选择方案 |
|---|---|
| 计算 | |
| 全球范围亚50ms延迟 | Cloudflare Workers |
| 无服务器函数(AWS生态) | AWS Lambda |
| 无服务器函数(Azure生态) | Azure Functions |
| 容器化工作负载(托管) | AWS ECS/Fargate、Azure AKS、GCP Cloud Run |
| 大规模Kubernetes | AWS EKS、Azure AKS、GCP GKE |
| 完全可控的虚拟机 | AWS EC2、Azure VMs、GCP Compute Engine |
| 存储 | |
| 对象存储(兼容S3) | AWS S3、Cloudflare R2(零出口费)、Azure Blob |
| 虚拟机块存储 | AWS EBS、Azure托管磁盘、GCP持久磁盘 |
| 文件存储(NFS/SMB) | AWS EFS、Azure Files、GCP Filestore |
| 数据库 | |
| 托管SQL(AWS) | AWS RDS(PostgreSQL、MySQL、SQL Server) |
| 托管SQL(Azure) | Azure SQL数据库 |
| 托管SQL(GCP) | Cloud SQL |
| NoSQL键值存储 | AWS DynamoDB、Azure Cosmos DB、Cloudflare KV |
| 全局SQL(边缘读取) | Cloudflare D1、AWS Aurora Global |
| CI/CD与GitOps | |
| 集成GitHub的CI/CD | GitHub Actions |
| 自托管CI/CD | GitLab CI/CD、Jenkins |
| Kubernetes GitOps | ArgoCD、Flux |
| 成本优化 | |
| 可预测工作负载 | 预留实例、节省计划 |
| 容错工作负载 | 竞价实例(AWS)、抢占式虚拟机(GCP) |
| 开发/测试环境 | 自动调度、预算告警 |
| 安全 | |
| 密钥管理 | HashiCorp Vault、AWS Secrets Manager、Azure Key Vault |
| 容器扫描 | Trivy、Snyk、AWS ECR扫描 |
| SAST/DAST | SonarQube、Semgrep、OWASP ZAP |
| 特殊场景 | |
| 静态站点+边缘函数 | Cloudflare Pages、AWS Amplify |
| WebSocket/实时应用 | Cloudflare Durable Objects、AWS API Gateway WebSocket |
| ML/AI流水线 | AWS SageMaker、GCP Vertex AI、Azure ML |
| 浏览器自动化 | Cloudflare Browser Rendering、AWS Lambda + Puppeteer |
Resources
资源链接
Cloud Providers
云提供商文档
- AWS Docs: https://docs.aws.amazon.com
- Azure Docs: https://docs.microsoft.com/azure
- GCP Docs: https://cloud.google.com/docs
- Cloudflare Docs: https://developers.cloudflare.com
- AWS Docs: https://docs.aws.amazon.com
- Azure Docs: https://docs.microsoft.com/azure
- GCP Docs: https://cloud.google.com/docs
- Cloudflare Docs: https://developers.cloudflare.com
Container & Orchestration
容器与编排文档
- Docker Docs: https://docs.docker.com
- Kubernetes Docs: https://kubernetes.io/docs
- Helm: https://helm.sh/docs
- Docker Docs: https://docs.docker.com
- Kubernetes Docs: https://kubernetes.io/docs
- Helm: https://helm.sh/docs
CI/CD & GitOps
CI/CD与GitOps文档
- GitHub Actions: https://docs.github.com/actions
- GitLab CI: https://docs.gitlab.com/ee/ci/
- ArgoCD: https://argo-cd.readthedocs.io
- Flux: https://fluxcd.io/docs
- GitHub Actions: https://docs.github.com/actions
- GitLab CI: https://docs.gitlab.com/ee/ci/
- ArgoCD: https://argo-cd.readthedocs.io
- Flux: https://fluxcd.io/docs
Infrastructure as Code
基础设施即代码文档
- Terraform: https://developer.hashicorp.com/terraform
- AWS CDK: https://docs.aws.amazon.com/cdk
- Pulumi: https://www.pulumi.com/docs
- Terraform: https://developer.hashicorp.com/terraform
- AWS CDK: https://docs.aws.amazon.com/cdk
- Pulumi: https://www.pulumi.com/docs
Security & Compliance
安全与合规文档
- OWASP: https://owasp.org
- CIS Benchmarks: https://www.cisecurity.org/cis-benchmarks
- HashiCorp Vault: https://developer.hashicorp.com/vault
- OWASP: https://owasp.org
- CIS Benchmarks: https://www.cisecurity.org/cis-benchmarks
- HashiCorp Vault: https://developer.hashicorp.com/vault
FinOps & Cost Optimization
FinOps与成本优化文档
- FinOps Foundation: https://www.finops.org
- AWS Cost Optimization: https://aws.amazon.com/pricing/cost-optimization
- Kubecost: https://www.kubecost.com
- FinOps Foundation: https://www.finops.org
- AWS Cost Optimization: https://aws.amazon.com/pricing/cost-optimization
- Kubecost: https://www.kubecost.com
Implementation Checklist
实施检查清单
AWS Lambda Deployment
AWS Lambda部署
- Install AWS CLI and SAM CLI
- Configure AWS credentials (access key, secret key)
- Create Lambda function with SAM template
- Configure IAM role and policies
- Test locally with
sam local invoke - Deploy with
sam deploy - Set up CloudWatch monitoring and alarms
- 安装AWS CLI和SAM CLI
- 配置AWS凭证(访问密钥、密钥)
- 使用SAM模板创建Lambda函数
- 配置IAM角色与策略
- 使用本地测试
sam local invoke - 使用部署
sam deploy - 设置CloudWatch监控与告警
AWS EKS Kubernetes Cluster
AWS EKS Kubernetes集群
- Install kubectl, eksctl, aws-cli
- Configure AWS credentials
- Create EKS cluster with eksctl
- Configure kubectl context
- Install cluster autoscaler
- Set up Helm for package management
- Deploy applications with kubectl/Helm
- Configure ingress controller (ALB/NGINX)
- 安装kubectl、eksctl、aws-cli
- 配置AWS凭证
- 使用eksctl创建EKS集群
- 配置kubectl上下文
- 安装集群自动扩缩容器
- 设置Helm包管理
- 使用kubectl/Helm部署应用
- 配置Ingress控制器(ALB/NGINX)
Azure Deployment
Azure部署
- Install Azure CLI
- Login with
az login - Create resource group
- Deploy App Service or AKS
- Configure continuous deployment
- Set up monitoring with Application Insights
- 安装Azure CLI
- 使用登录
az login - 创建资源组
- 部署应用服务或AKS
- 配置持续部署
- 使用Application Insights设置监控
Kubernetes on Any Cloud
任意云平台的Kubernetes
- Install kubectl and helm
- Connect to cluster (update kubeconfig)
- Create namespaces for environments
- Apply RBAC policies
- Deploy applications (deployments, services)
- Configure ingress for external access
- Set up monitoring (Prometheus, Grafana)
- Implement GitOps with ArgoCD/Flux
- 安装kubectl和helm
- 连接集群(更新kubeconfig)
- 为环境创建命名空间
- 应用RBAC策略
- 部署应用(部署、服务)
- 配置Ingress实现外部访问
- 设置监控(Prometheus、Grafana)
- 使用ArgoCD/Flux实施GitOps
CI/CD Pipeline (GitHub Actions)
CI/CD流水线(GitHub Actions)
- Create .github/workflows/deploy.yml
- Configure secrets (cloud credentials, API keys)
- Add build and test jobs
- Add container build and push to registry
- Add deployment job to cloud platform
- Set up branch protection rules
- Enable status checks and notifications
- 创建.github/workflows/deploy.yml
- 配置密钥(云凭证、API密钥)
- 添加构建与测试任务
- 添加容器构建与推送至仓库的任务
- 添加部署至云平台的任务
- 设置分支保护规则
- 启用状态检查与通知
FinOps Cost Optimization
FinOps成本优化
- Implement resource tagging strategy
- Enable cost allocation tags
- Set up budget alerts
- Analyze resource utilization (CloudWatch, Azure Monitor)
- Identify rightsizing opportunities
- Purchase reserved instances for predictable workloads
- Configure auto-scaling and scheduling
- Regular cost reviews and optimization
- 实施资源标记策略
- 启用成本分配标记
- 设置预算告警
- 分析资源利用率(CloudWatch、Azure Monitor)
- 识别规格调整机会
- 为可预测工作负载购买预留实例
- 配置自动扩缩容与调度
- 定期成本审查与优化
DevSecOps Security
DevSecOps安全
- Add SAST scanning to CI/CD (SonarQube, Semgrep)
- Add dependency scanning (Snyk, Trivy)
- Implement container image scanning
- Set up secrets management (Vault, cloud provider)
- Configure security groups and network policies
- Enable audit logging
- Implement security monitoring and alerting
- Regular vulnerability assessments
- 在CI/CD中添加SAST扫描(SonarQube、Semgrep)
- 添加依赖扫描(Snyk、Trivy)
- 实施容器镜像扫描
- 设置密钥管理(Vault、云提供商服务)
- 配置安全组与网络策略
- 启用审计日志
- 配置安全监控与告警
- 定期漏洞评估
Cloudflare Workers
Cloudflare Workers
- Install Wrangler CLI
- Create Worker project
- Configure wrangler.toml (bindings, routes)
- Test locally with
wrangler dev - Deploy with
wrangler deploy
- 安装Wrangler CLI
- 创建Worker项目
- 配置wrangler.toml(绑定、路由)
- 使用本地测试
wrangler dev - 使用部署
wrangler deploy
Docker
Docker
- Write Dockerfile with multi-stage builds
- Create .dockerignore file
- Test build locally
- Push to registry (ECR, ACR, GCR, Docker Hub)
- Deploy to target platform
- 编写包含多阶段构建的Dockerfile
- 创建.dockerignore文件
- 本地测试构建
- 推送至仓库(ECR、ACR、GCR、Docker Hub)
- 部署至目标平台