eg-production-guide

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Envoy Gateway Production Deployment

Envoy Gateway生产部署

Deployment Modes

部署模式

Single Tenant (Default)

单租户（默认）

One GatewayClass per Envoy Gateway controller
Simplest model; suitable for most single-team deployments
All Gateways share the same controller and Envoy Proxy fleet

每个Envoy Gateway控制器对应一个GatewayClass
最简单的模式；适用于大多数单团队部署场景
所有Gateway共享同一个控制器和Envoy Proxy集群

Multi-Tenant

多租户

Deploy separate Envoy Gateway controllers per tenant namespace
Each controller must have a unique controller name in its GatewayClass
Provides strong tenant isolation at the control plane level

Install via separate Helm releases with distinct

--set config.envoyGateway.gateway.controllerName=...

为每个租户命名空间部署独立的Envoy Gateway控制器
每个控制器的GatewayClass中必须设置唯一的控制器名称
在控制平面层面提供强租户隔离
通过独立的Helm发布进行安装，需指定不同的
```
--set config.envoyGateway.gateway.controllerName=...
```
参数

Gateway Namespace Mode

Gateway命名空间模式

Envoy Proxy pods deploy in the Gateway's namespace instead of the controller namespace
Provides stronger workload isolation: proxy runs alongside the application
Enables JWT authentication between proxy and controller for hardened communication

Enable with

envoyGateway.provider.kubernetes.deploy.type: Namespace

Envoy Proxy Pod部署在Gateway所在的命名空间而非控制器命名空间
提供更强的工作负载隔离：代理与应用程序部署在同一位置
支持在代理与控制器之间启用JWT认证，强化通信安全性

通过设置

envoyGateway.provider.kubernetes.deploy.type: Namespace

启用

Merged Gateways

合并Gateway

Merge listeners from multiple Gateway resources into a single Envoy Proxy fleet
All merged Gateways share a single IP address / load balancer
Useful for consolidating ingress when teams own different Gateways but share infrastructure
Enable with
```
mergeGateways: true
```
on the GatewayClass parametersRef (EnvoyProxy)

将多个Gateway资源的监听器合并到单个Envoy Proxy集群中
所有合并后的Gateway共享同一个IP地址/负载均衡器
适用于多个团队拥有不同Gateway但共享基础设施的场景，可整合入口流量
在GatewayClass的parametersRef（EnvoyProxy）中设置
```
mergeGateways: true
```
启用

Performance Tuning

性能调优

Connection timeouts: set explicitly in ClientTrafficPolicy and BackendTrafficPolicy. Never rely on Envoy defaults.
- ```
timeout.http.requestTimeout
```
  — total time for the client to send a complete request
- ```
timeout.http.idleTimeout
```
  — close connections idle longer than this
HTTP/2 max concurrent streams: limit to 100 to prevent a single connection from monopolizing resources
Buffer limits: set to 32 KiB for both listener and cluster buffers to cap memory under load
- Configure via EnvoyProxy
```
spec.bootstrap
```
  or EnvoyPatchPolicy
Resource requests/limits: always set CPU and memory on Envoy Proxy pods via EnvoyProxy
```
spec.provider.kubernetes.envoyDeployment.container.resources
```
Horizontal scaling: use HPA on the Envoy Proxy Deployment; scale on CPU utilization (target 60-70%)
Keep-alive: enable TCP keep-alive on backend connections to avoid connection resets through cloud load balancers

连接超时：在ClientTrafficPolicy和BackendTrafficPolicy中显式设置。切勿依赖Envoy默认值。
- ```
timeout.http.requestTimeout
```
  —— 客户端发送完整请求的总时长
- ```
timeout.http.idleTimeout
```
  —— 关闭空闲时间超过该值的连接
HTTP/2最大并发流：限制为100，防止单个连接占用过多资源
缓冲区限制：将监听器和集群缓冲区均设置为32 KiB，以控制负载下的内存占用
- 通过EnvoyProxy
```
spec.bootstrap
```
  或EnvoyPatchPolicy进行配置
资源请求/限制：始终通过EnvoyProxy
```
spec.provider.kubernetes.envoyDeployment.container.resources
```
为Envoy Proxy Pod设置CPU和内存限制
水平扩缩容：在Envoy Proxy Deployment上使用HPA；基于CPU利用率（目标60-70%）进行扩缩容
长连接保持：在后端连接上启用TCP长连接，避免通过云负载均衡器时出现连接重置

Observability

可观测性

Access Logging

访问日志

Configure via EnvoyProxy
```
spec.telemetry.accessLog
```
Sinks: File (stdout/path) or OpenTelemetry (gRPC collector)
Use structured JSON format for machine parsing
Include at minimum: method, path, response code, duration, upstream host

通过EnvoyProxy
```
spec.telemetry.accessLog
```
进行配置
输出方式：文件（标准输出/指定路径）或OpenTelemetry（gRPC收集器）
使用结构化JSON格式以便机器解析
至少包含以下字段：请求方法、路径、响应码、耗时、上游主机

Metrics

指标

Expose Prometheus metrics via EnvoyProxy
```
spec.telemetry.metrics
```
Scrape from Envoy Proxy pods on the admin port (default 19001)

Key metrics:

envoy_http_downstream_rq_total

envoy_http_downstream_rq_xx

envoy_cluster_upstream_rq_time

Enable Envoy Gateway controller metrics for control plane health

通过EnvoyProxy
```
spec.telemetry.metrics
```
暴露Prometheus指标
从Envoy Proxy Pod的管理端口（默认19001）抓取指标

关键指标：

envoy_http_downstream_rq_total

、

envoy_http_downstream_rq_xx

、

envoy_cluster_upstream_rq_time

启用Envoy Gateway控制器指标以监控控制平面健康状态

Tracing

链路追踪

Configure distributed tracing via EnvoyProxy
```
spec.telemetry.tracing
```
Export to OpenTelemetry collector (gRPC or HTTP)
Set appropriate sampling rate: 1-10% in production, 100% in staging
Propagate trace context headers (
```
traceparent
```
,
```
tracestate
```
)

通过EnvoyProxy
```
spec.telemetry.tracing
```
配置分布式链路追踪
导出至OpenTelemetry收集器（gRPC或HTTP协议）
设置合适的采样率：生产环境1-10%，预发布环境100%
传递追踪上下文头（
```
traceparent
```
、
```
tracestate
```
）

Operations

操作指南

Installation

安装

bash

helm install eg oci://docker.io/envoyproxy/gateway-helm \
  --version v1.7.0 \
  -n envoy-gateway-system \
  --create-namespace

Always pin Helm chart versions — never use
```
latest
```
or omit
```
--version
```
Use
```
helm upgrade --install
```
for idempotent deployments

bash

helm install eg oci://docker.io/envoyproxy/gateway-helm \
  --version v1.7.0 \
  -n envoy-gateway-system \
  --create-namespace

始终固定Helm Chart版本——切勿使用
```
latest
```
或省略
```
--version
```
参数
使用
```
helm upgrade --install
```
实现幂等部署

GitOps

Manage all Gateway API resources via GitOps (ArgoCD, Flux)
Store Gateway, Route, and Policy manifests in version control
Implement mandatory PR reviews for all gateway configuration changes
Use SCM branch protection rules on the main branch

通过GitOps（ArgoCD、Flux）管理所有Gateway API资源
将Gateway、Route和Policy清单存储在版本控制系统中
对所有网关配置变更实施强制PR审核
在主分支上启用SCM分支保护规则

Upgrade Strategy

升级策略

Upgrade Envoy Gateway controller first, then verify CRD compatibility
Test upgrades in a staging environment that mirrors production topology
Review release notes for breaking changes in CRD schemas or default behavior
Back up CRD instances before upgrading (
```
kubectl get -o yaml
```
)

先升级Envoy Gateway控制器，然后验证CRD兼容性
在与生产环境拓扑一致的预发布环境中测试升级
查看发布说明，了解CRD schema或默认行为中的破坏性变更
升级前备份CRD实例（
```
kubectl get -o yaml
```
）