eg-production-guide

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Envoy Gateway Production Deployment

Envoy Gateway生产部署

Deployment Modes

部署模式

Single Tenant (Default)

单租户(默认)

  • One GatewayClass per Envoy Gateway controller
  • Simplest model; suitable for most single-team deployments
  • All Gateways share the same controller and Envoy Proxy fleet
  • 每个Envoy Gateway控制器对应一个GatewayClass
  • 最简单的模式;适用于大多数单团队部署场景
  • 所有Gateway共享同一个控制器和Envoy Proxy集群

Multi-Tenant

多租户

  • Deploy separate Envoy Gateway controllers per tenant namespace
  • Each controller must have a unique controller name in its GatewayClass
  • Provides strong tenant isolation at the control plane level
  • Install via separate Helm releases with distinct
    --set config.envoyGateway.gateway.controllerName=...
  • 为每个租户命名空间部署独立的Envoy Gateway控制器
  • 每个控制器的GatewayClass中必须设置唯一的控制器名称
  • 在控制平面层面提供强租户隔离
  • 通过独立的Helm发布进行安装,需指定不同的
    --set config.envoyGateway.gateway.controllerName=...
    参数

Gateway Namespace Mode

Gateway命名空间模式

  • Envoy Proxy pods deploy in the Gateway's namespace instead of the controller namespace
  • Provides stronger workload isolation: proxy runs alongside the application
  • Enables JWT authentication between proxy and controller for hardened communication
  • Enable with
    envoyGateway.provider.kubernetes.deploy.type: Namespace
  • Envoy Proxy Pod部署在Gateway所在的命名空间而非控制器命名空间
  • 提供更强的工作负载隔离:代理与应用程序部署在同一位置
  • 支持在代理与控制器之间启用JWT认证,强化通信安全性
  • 通过设置
    envoyGateway.provider.kubernetes.deploy.type: Namespace
    启用

Merged Gateways

合并Gateway

  • Merge listeners from multiple Gateway resources into a single Envoy Proxy fleet
  • All merged Gateways share a single IP address / load balancer
  • Useful for consolidating ingress when teams own different Gateways but share infrastructure
  • Enable with
    mergeGateways: true
    on the GatewayClass parametersRef (EnvoyProxy)
  • 多个Gateway资源的监听器合并到单个Envoy Proxy集群中
  • 所有合并后的Gateway共享同一个IP地址/负载均衡器
  • 适用于多个团队拥有不同Gateway但共享基础设施的场景,可整合入口流量
  • 在GatewayClass的parametersRef(EnvoyProxy)中设置
    mergeGateways: true
    启用

Performance Tuning

性能调优

  • Connection timeouts: set explicitly in ClientTrafficPolicy and BackendTrafficPolicy. Never rely on Envoy defaults.
    • timeout.http.requestTimeout
      — total time for the client to send a complete request
    • timeout.http.idleTimeout
      — close connections idle longer than this
  • HTTP/2 max concurrent streams: limit to 100 to prevent a single connection from monopolizing resources
  • Buffer limits: set to 32 KiB for both listener and cluster buffers to cap memory under load
    • Configure via EnvoyProxy
      spec.bootstrap
      or EnvoyPatchPolicy
  • Resource requests/limits: always set CPU and memory on Envoy Proxy pods via EnvoyProxy
    spec.provider.kubernetes.envoyDeployment.container.resources
  • Horizontal scaling: use HPA on the Envoy Proxy Deployment; scale on CPU utilization (target 60-70%)
  • Keep-alive: enable TCP keep-alive on backend connections to avoid connection resets through cloud load balancers
  • 连接超时:在ClientTrafficPolicy和BackendTrafficPolicy中显式设置。切勿依赖Envoy默认值。
    • timeout.http.requestTimeout
      —— 客户端发送完整请求的总时长
    • timeout.http.idleTimeout
      —— 关闭空闲时间超过该值的连接
  • HTTP/2最大并发流:限制为100,防止单个连接占用过多资源
  • 缓冲区限制:将监听器和集群缓冲区均设置为32 KiB,以控制负载下的内存占用
    • 通过EnvoyProxy
      spec.bootstrap
      或EnvoyPatchPolicy进行配置
  • 资源请求/限制:始终通过EnvoyProxy
    spec.provider.kubernetes.envoyDeployment.container.resources
    为Envoy Proxy Pod设置CPU和内存限制
  • 水平扩缩容:在Envoy Proxy Deployment上使用HPA;基于CPU利用率(目标60-70%)进行扩缩容
  • 长连接保持:在后端连接上启用TCP长连接,避免通过云负载均衡器时出现连接重置

Observability

可观测性

Access Logging

访问日志

  • Configure via EnvoyProxy
    spec.telemetry.accessLog
  • Sinks: File (stdout/path) or OpenTelemetry (gRPC collector)
  • Use structured JSON format for machine parsing
  • Include at minimum: method, path, response code, duration, upstream host
  • 通过EnvoyProxy
    spec.telemetry.accessLog
    进行配置
  • 输出方式:文件(标准输出/指定路径)或OpenTelemetry(gRPC收集器)
  • 使用结构化JSON格式以便机器解析
  • 至少包含以下字段:请求方法、路径、响应码、耗时、上游主机

Metrics

指标

  • Expose Prometheus metrics via EnvoyProxy
    spec.telemetry.metrics
  • Scrape from Envoy Proxy pods on the admin port (default 19001)
  • Key metrics:
    envoy_http_downstream_rq_total
    ,
    envoy_http_downstream_rq_xx
    ,
    envoy_cluster_upstream_rq_time
  • Enable Envoy Gateway controller metrics for control plane health
  • 通过EnvoyProxy
    spec.telemetry.metrics
    暴露Prometheus指标
  • 从Envoy Proxy Pod的管理端口(默认19001)抓取指标
  • 关键指标:
    envoy_http_downstream_rq_total
    envoy_http_downstream_rq_xx
    envoy_cluster_upstream_rq_time
  • 启用Envoy Gateway控制器指标以监控控制平面健康状态

Tracing

链路追踪

  • Configure distributed tracing via EnvoyProxy
    spec.telemetry.tracing
  • Export to OpenTelemetry collector (gRPC or HTTP)
  • Set appropriate sampling rate: 1-10% in production, 100% in staging
  • Propagate trace context headers (
    traceparent
    ,
    tracestate
    )
  • 通过EnvoyProxy
    spec.telemetry.tracing
    配置分布式链路追踪
  • 导出至OpenTelemetry收集器(gRPC或HTTP协议)
  • 设置合适的采样率:生产环境1-10%,预发布环境100%
  • 传递追踪上下文头(
    traceparent
    tracestate

Operations

操作指南

Installation

安装

bash
helm install eg oci://docker.io/envoyproxy/gateway-helm \
  --version v1.7.0 \
  -n envoy-gateway-system \
  --create-namespace
  • Always pin Helm chart versions — never use
    latest
    or omit
    --version
  • Use
    helm upgrade --install
    for idempotent deployments
bash
helm install eg oci://docker.io/envoyproxy/gateway-helm \
  --version v1.7.0 \
  -n envoy-gateway-system \
  --create-namespace
  • 始终固定Helm Chart版本——切勿使用
    latest
    或省略
    --version
    参数
  • 使用
    helm upgrade --install
    实现幂等部署

GitOps

GitOps

  • Manage all Gateway API resources via GitOps (ArgoCD, Flux)
  • Store Gateway, Route, and Policy manifests in version control
  • Implement mandatory PR reviews for all gateway configuration changes
  • Use SCM branch protection rules on the main branch
  • 通过GitOps(ArgoCD、Flux)管理所有Gateway API资源
  • 将Gateway、Route和Policy清单存储在版本控制系统中
  • 对所有网关配置变更实施强制PR审核
  • 在主分支上启用SCM分支保护规则

Upgrade Strategy

升级策略

  • Upgrade Envoy Gateway controller first, then verify CRD compatibility
  • Test upgrades in a staging environment that mirrors production topology
  • Review release notes for breaking changes in CRD schemas or default behavior
  • Back up CRD instances before upgrading (
    kubectl get -o yaml
    )
  • 先升级Envoy Gateway控制器,然后验证CRD兼容性
  • 在与生产环境拓扑一致的预发布环境中测试升级
  • 查看发布说明,了解CRD schema或默认行为中的破坏性变更
  • 升级前备份CRD实例(
    kubectl get -o yaml