platform

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Platform Engineering

平台工程

Build reliable, observable, cost-efficient infrastructure.
构建可靠、可观测、成本优化的基础设施。

Quick Reference

快速参考

The 2026 Platform Stack

2026年平台技术栈

LayerToolPurpose
IaCOpenTofu / PulumiInfrastructure definition
GitOpsArgo CD / FluxContinuous deployment
Control PlaneCrossplaneKubernetes-native infra
ObservabilityOpenTelemetryUnified telemetry
Service MeshIstio Ambient / CiliummTLS, traffic management
CostFinOps FrameworkCloud optimization
层级工具用途
IaCOpenTofu / Pulumi基础设施定义
GitOpsArgo CD / Flux持续部署
控制平面Crossplane基于Kubernetes的基础设施
可观测性OpenTelemetry统一遥测数据
服务网格Istio Ambient / CiliummTLS、流量管理
成本管理FinOps Framework云成本优化

Infrastructure as Code

基础设施即代码

OpenTofu (Terraform-compatible, open-source):
hcl
resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"

  tags = {
    Name        = "web-server"
    Environment = "production"
  }
}
Pulumi (Real programming languages):
typescript
import * as aws from "@pulumi/aws";

const server = new aws.ec2.Instance("web", {
  ami: "ami-0c55b159cbfafe1f0",
  instanceType: "t3.micro",
  tags: { Name: "web-server" },
});

export const publicIp = server.publicIp;
OpenTofu(与Terraform兼容,开源):
hcl
resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"

  tags = {
    Name        = "web-server"
    Environment = "production"
  }
}
Pulumi(支持真实编程语言):
typescript
import * as aws from "@pulumi/aws";

const server = new aws.ec2.Instance("web", {
  ami: "ami-0c55b159cbfafe1f0",
  instanceType: "t3.micro",
  tags: { Name: "web-server" },
});

export const publicIp = server.publicIp;

GitOps with Argo CD

基于GitOps的Argo CD配置

yaml
undefined
yaml
undefined

Application manifest

应用清单

apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: my-app namespace: argocd spec: project: default source: repoURL: https://github.com/org/repo targetRevision: HEAD path: k8s/overlays/production destination: server: https://kubernetes.default.svc namespace: production syncPolicy: automated: prune: true selfHeal: true
undefined
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: my-app namespace: argocd spec: project: default source: repoURL: https://github.com/org/repo targetRevision: HEAD path: k8s/overlays/production destination: server: https://kubernetes.default.svc namespace: production syncPolicy: automated: prune: true selfHeal: true
undefined

Kubernetes Patterns

Kubernetes模式

Gateway API (replacing Ingress):
yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-route
spec:
  parentRefs:
    - name: main-gateway
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /api
      backendRefs:
        - name: api-service
          port: 8080
Istio Ambient Mode (sidecar-less service mesh):
yaml
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    istio.io/dataplane-mode: ambient # Enable ambient mesh
Gateway API(替代Ingress):
yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-route
spec:
  parentRefs:
    - name: main-gateway
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /api
      backendRefs:
        - name: api-service
          port: 8080
Istio Ambient Mode(无Sidecar的服务网格):
yaml
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    istio.io/dataplane-mode: ambient # 启用Ambient网格

OpenTelemetry Setup

OpenTelemetry配置

python
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
python
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

Initialize

初始化

provider = TracerProvider() processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://collector:4317")) provider.add_span_processor(processor) trace.set_tracer_provider(provider)
provider = TracerProvider() processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://collector:4317")) provider.add_span_processor(processor) trace.set_tracer_provider(provider)

Use

使用

tracer = trace.get_tracer(name) with tracer.start_as_current_span("my-operation"): do_work()
undefined
tracer = trace.get_tracer(name) with tracer.start_as_current_span("my-operation"): do_work()
undefined

CI/CD Pipeline (GitHub Actions)

CI/CD流水线(GitHub Actions)

yaml
name: Deploy
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}

      - name: Update manifests
        run: |
          cd k8s/overlays/production
          kustomize edit set image app=ghcr.io/${{ github.repository }}:${{ github.sha }}
          git commit -am "Deploy ${{ github.sha }}"
          git push
yaml
name: Deploy
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: 构建并推送镜像
        uses: docker/build-push-action@v5
        with:
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}

      - name: 更新清单文件
        run: |
          cd k8s/overlays/production
          kustomize edit set image app=ghcr.io/${{ github.repository }}:${{ github.sha }}
          git commit -am "Deploy ${{ github.sha }}"
          git push

FinOps Framework

FinOps框架

Phase 1: INFORM (visibility)
  • Tag everything:
    team
    ,
    environment
    ,
    cost-center
  • Use cloud cost explorers
  • Target: 95%+ cost allocation accuracy
Phase 2: OPTIMIZE (action)
  • Rightsize instances (most are overprovisioned)
  • Use spot/preemptible for stateless workloads
  • Reserved instances for baseline capacity
  • Target: 20-30% cost reduction
Phase 3: OPERATE (governance)
  • Budget alerts at 80% threshold
  • Cost metrics in CI/CD gates
  • Regular FinOps reviews
阶段1:感知(可见性)
  • 为所有资源打标签:
    team
    environment
    cost-center
  • 使用云成本分析工具
  • 目标:95%以上的成本分配准确率
阶段2:优化(执行)
  • 调整实例规格(大多数实例存在过度配置)
  • 为无状态工作负载使用Spot/抢占式实例
  • 为基线容量使用预留实例
  • 目标:降低20-30%的成本
阶段3:运营(治理)
  • 设置预算阈值为80%时触发告警
  • 在CI/CD网关中加入成本指标
  • 定期开展FinOps评审

Security Baseline

安全基线

yaml
undefined
yaml
undefined

Tetragon policy (eBPF runtime enforcement)

Tetragon策略(eBPF运行时强制执行)

apiVersion: cilium.io/v1alpha1 kind: TracingPolicy metadata: name: block-shell spec: kprobes: - call: "sys_execve" selectors: - matchBinaries: - operator: "In" values: ["/bin/sh", "/bin/bash"] matchNamespaces: - namespace: production action: Block
undefined
apiVersion: cilium.io/v1alpha1 kind: TracingPolicy metadata: name: block-shell spec: kprobes: - call: "sys_execve" selectors: - matchBinaries: - operator: "In" values: ["/bin/sh", "/bin/bash"] matchNamespaces: - namespace: production action: Block
undefined

Agents

关联Agent

  • platform-engineer - GitOps, IaC, Kubernetes, observability
  • data-engineer - Pipelines, ETL, data infrastructure
  • finops-engineer - Cloud cost optimization, FinOps framework
  • platform-engineer - GitOps、IaC、Kubernetes、可观测性
  • data-engineer - 数据流水线、ETL、数据基础设施
  • finops-engineer - 云成本优化、FinOps框架

Deep Dives

深度解析

  • references/gitops-patterns.md
  • references/kubernetes-gateway.md
  • references/opentelemetry.md
  • references/finops-framework.md
  • references/gitops-patterns.md
  • references/kubernetes-gateway.md
  • references/opentelemetry.md
  • references/finops-framework.md

Examples

示例

  • examples/argo-cd-setup/
  • examples/pulumi-aws/
  • examples/otel-stack/
  • examples/argo-cd-setup/
  • examples/pulumi-aws/
  • examples/otel-stack/