k8s-hpa-cost-tuning

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Kubernetes HPA Cost & Scale-Down Tuning

Kubernetes HPA 成本优化与缩容调优

Mode selection (mandatory)

模式选择（必填）

Declare a mode before executing this skill. All reasoning, thresholds, and recommendations depend on this choice.

text

mode = audit | incident

If no mode is provided, refuse to run and request clarification.

执行此技能前需声明模式。所有推理、阈值和建议均取决于此选择。

text

mode = audit | incident

若未提供模式，将拒绝运行并要求明确模式。

When to use

适用场景

mode = audit

— Periodic cost-savings audit

mode = audit

— 定期成本节约审计

Run on a schedule (weekly or bi-weekly) to:

Detect over-reservation early
Validate that scale-down and node consolidation still work
Identify safe opportunities to reduce cluster cost

This mode assumes no active incident and prioritizes stability-preserving recommendations.

按计划（每周或每两周）运行，用于：

提前检测过度预留问题
验证缩容和节点整合功能是否正常
识别可安全降低集群成本的机会

此模式假设无活跃事件，优先给出保障稳定性的建议。

mode = incident

— Post-incident scaling analysis

mode = incident

— 事件后扩缩容分析

Run after a production incident or anomaly, attaching:

Production logs
HPA events
Scaling timelines

This mode focuses on:

Explaining why scaling behaved the way it did
Distinguishing traffic-driven vs configuration-driven incidents
Preventing recurrence without overcorrecting

This skill assumes Datadog for observability and standard Kubernetes HPA + Cluster Autoscaler.

在生产事件或异常发生后运行，需附带：

生产日志
HPA事件记录
扩缩容时间线

此模式重点关注：

解释扩缩容行为异常的原因
区分流量驱动型与配置驱动型事件
在不过度修正的前提下防止事件复发

此技能假设使用Datadog作为可观测性工具，且采用标准Kubernetes HPA + Cluster Autoscaler架构。

Core mental model

核心逻辑模型

Kubernetes scaling is a three-layer system:

HPA decides how many pods (based on usage / requests)
Scheduler decides where pods go (based on requests + constraints)
Cluster Autoscaler decides how many nodes exist (only when nodes can empty)

Cost optimization only works if all three layers can move downward.

Key takeaway: HPA decides quantity, scheduler decides placement, autoscaler decides cost. Scale-up can be aggressive; scale-down must be possible. If replicas drop but nodes do not, the scheduler is the bottleneck.

Kubernetes扩缩容是一个三层系统：

HPA 决定需要多少个Pod（基于资源使用量/请求量）
调度器 决定Pod部署位置（基于资源请求量和约束条件）
Cluster Autoscaler 决定集群节点数量（仅当节点可被完全排空时）

成本优化仅在所有三层都能向下调整时生效。

关键结论：HPA决定Pod数量，调度器决定部署位置，自动扩缩容器决定成本。扩容可以激进，但缩容必须具备可行性。如果副本数下降但节点数未减少，调度器是瓶颈所在。

Key Datadog metrics

关键Datadog指标

The utility scripts query three metric families:

CPU used % — real utilization (

kubernetes.cpu.usage.total

node.cpu_allocatable

)

CPU requested % — reserved on paper (

kubernetes.cpu.requests

node.cpu_allocatable

)

Memory used vs requests — HPA-relevant ratio

CPU requested % must go down after scale-down for cost savings to be real. If memory usage stays above target, memory drives scale-up even when CPU is idle.

实用脚本会查询三类指标：

CPU使用率% — 实际利用率（

kubernetes.cpu.usage.total

node.cpu_allocatable

）

CPU请求率% — 纸面预留量（

kubernetes.cpu.requests

node.cpu_allocatable

）

内存使用量与请求量对比 — 与HPA相关的比值

缩容后CPU请求率%必须下降，才能真正实现成本节约。如果内存使用率持续高于目标值，即使CPU闲置，内存也会触发扩容。

Scale-down as a first-class cost control

将缩容作为核心成本控制手段

When scale-down is slow or blocked:

Replicas plateau
Pods remain evenly spread
Nodes never empty
Cluster Autoscaler cannot remove nodes

Result: permanent over-reservation.

当缩容缓慢或被阻塞时：

副本数停滞
Pod保持均匀分布
节点永远无法排空
Cluster Autoscaler无法移除节点

结果：永久性过度预留。

Recommended HPA scale-down policy

Topology spread: critical cost lever

拓扑分布：关键成本杠杆

Topology spread must never prevent pod consolidation during scale-down.

Strict constraints block scheduler flexibility and freeze cluster size.

拓扑分布绝不能在缩容期间阻止Pod整合。

严格的约束会限制调度器的灵活性，导致集群规模固定。

Anti-pattern (breaks cost optimization)

反模式（破坏成本优化）

yaml

maxSkew: 1
whenUnsatisfiable: DoNotSchedule

Pods cannot collapse onto fewer nodes. Nodes never drain. Reserved CPU/memory never decreases.

yaml

maxSkew: 1
whenUnsatisfiable: DoNotSchedule

Pod无法集中到更少的节点上，节点永远无法排空，预留的CPU/内存永远不会减少。

Recommended default (cost-safe)

Strict isolation (AZ-level only)

严格隔离（仅可用区级别）

When hard guarantees are required:

yaml

topologySpreadConstraints:
- topologyKey: topology.kubernetes.io/zone
  maxSkew: 1
  whenUnsatisfiable: DoNotSchedule

Do not combine this with strict hostname-level spread.

当需要硬保障时：

yaml

topologySpreadConstraints:
- topologyKey: topology.kubernetes.io/zone
  maxSkew: 1
  whenUnsatisfiable: DoNotSchedule

请勿将此配置与严格的主机名级别分布结合使用。

Anti-affinity as a soft alternative

反亲和性作为软替代方案

To avoid hot nodes without blocking scale-down:

yaml

podAntiAffinity:
  preferredDuringSchedulingIgnoredDuringExecution:
  - weight: 100
    podAffinityTerm:
      topologyKey: kubernetes.io/hostname
      labelSelector:
        matchLabels:
          app: your-app

Anti-affinity is advisory and cost-safe.

为避免节点过热但不阻塞缩容：

yaml

podAntiAffinity:
  preferredDuringSchedulingIgnoredDuringExecution:
  - weight: 100
    podAffinityTerm:
      topologyKey: kubernetes.io/hostname
      labelSelector:
        matchLabels:
          app: your-app

反亲和性是建议性的，且对成本控制友好。

Resource requests tuning

资源请求调优

Over-requesting CPU = slower scale-down
Over-requesting memory = unexpected scale-ups

Practical defaults:

```
targetCPUUtilizationPercentage: 70
```

targetMemoryUtilizationPercentage: 75–80

Adjust one knob at a time.

CPU过度请求 = 缩容速度变慢
内存过度请求 = 意外扩容

实用默认值：

```
targetCPUUtilizationPercentage: 70
```

targetMemoryUtilizationPercentage: 75–80

一次仅调整一个参数。

Validation loop

验证循环

Run weekly (or after changes):

Check HPA
```
current/target
```
values
Compare CPU used % vs CPU requested %
Observe replica collapse after load drops
Verify nodes drain and disappear
Re-check latency, errors, OOMs

每周运行一次（或更改配置后）：

检查HPA的
```
current/target
```
值
对比CPU使用率%与CPU请求率%
观察负载下降后副本数的减少情况
验证节点是否被排空并移除
重新检查延迟、错误和OOM情况

Quick validation commands

快速验证命令

bash

kubectl -n <namespace> get hpa <deployment>
kubectl -n <namespace> describe hpa <deployment>
kubectl -n <namespace> top pod --containers
kubectl top node
kubectl -n <namespace> get pods -o wide | sort -k7

bash

kubectl -n <namespace> get hpa <deployment>
kubectl -n <namespace> describe hpa <deployment>
kubectl -n <namespace> top pod --containers
kubectl top node
kubectl -n <namespace> get pods -o wide | sort -k7

Utility scripts

实用脚本

Both scripts require Datadog credentials:

bash

export DD_API_KEY=...
export DD_APP_KEY=...
export DD_SITE=datadoghq.com   # optional, defaults to datadoghq.com

两个脚本均需Datadog凭证：

bash

export DD_API_KEY=...
export DD_APP_KEY=...
export DD_SITE=datadoghq.com   # 可选，默认值为datadoghq.com

audit-metrics.mjs

— Cost-savings discovery

audit-metrics.mjs

— 成本节约发现

Scan a cluster over a wide window (default 24 h) to find over-reservation and waste.

bash

undefined

在宽时间窗口（默认24小时）内扫描集群，找出过度预留和资源浪费。

bash

undefined

Cluster-wide audit

集群范围审计

node scripts/audit-metrics.mjs --cluster <cluster>

With deployment deep-dive

结合部署深度分析

node scripts/audit-metrics.mjs
--cluster <cluster>
--namespace <namespace>
--deployment <deployment>


Reports:

- **Cluster**: CPU/memory used %, requested %, and **waste %** (requested minus used)
- **Deployment** (when provided): CPU/memory usage vs requests, HPA replica range
- **Savings opportunities**: actionable recommendations based on thresholds

node scripts/audit-metrics.mjs
--cluster <cluster>
--namespace <namespace>
--deployment <deployment>


报告内容：

- **集群层面**：CPU/内存使用率%、请求率%，以及**浪费率%**（请求量减去使用量）
- **部署层面**（若提供）：CPU/内存使用量与请求量对比、HPA副本范围
- **节约机会**：基于阈值的可操作建议

incident-metrics.mjs

— Post-incident analysis

incident-metrics.mjs

— 事件后分析

Collect metrics for a narrow incident window and get a tuning recommendation.

bash

node scripts/incident-metrics.mjs \
  --cluster <cluster> \
  --namespace <namespace> \
  --deployment <deployment> \
  --from <ISO8601> \
  --to <ISO8601>

Reports:

Cluster: CPU used % and requested % of allocatable
Deployment: CPU/memory usage vs requests, unavailable %
HPA: current / desired / max replicas
Capacity planning: required allocatable cores for 80 % and 70 % reservation ceilings
Tuning order: step-by-step recommendation (one knob at a time)

收集窄时间窗口内的事件指标，并给出调优建议。

bash

node scripts/incident-metrics.mjs \
  --cluster <cluster> \
  --namespace <namespace> \
  --deployment <deployment> \
  --from <ISO8601> \
  --to <ISO8601>

报告内容：

集群层面：CPU使用率%与可分配资源的请求率%
部署层面：CPU/内存使用量与请求量对比、不可用率%
HPA层面：当前/期望/最大副本数
容量规划：达到80%和70%预留上限所需的可分配核心数
调优顺序：分步建议（一次仅调整一个参数）

Interpretation notes

解读说明

Keep
```
limits.memory
```
unchanged unless OOMKills or near-limit memory usage are confirmed
Use
```
--out <path>
```
to save full JSON for deeper analysis or diffing across runs
Run
```
--help
```
on either script for all options (relative windows, custom HPA name, pretty JSON)

除非确认存在OOMKills或接近限制的内存使用率，否则保持
```
limits.memory
```
不变
使用
```
--out <path>
```
保存完整JSON，用于深度分析或跨运行版本对比
对任一脚本运行
```
--help
```
查看所有选项（相对时间窗口、自定义HPA名称、格式化JSON）

k8s-hpa-cost-tuning

Original

Translation

Kubernetes HPA Cost & Scale-Down Tuning

Kubernetes HPA 成本优化与缩容调优

Mode selection (mandatory)

模式选择（必填）

When to use

适用场景

mode = audit — Periodic cost-savings audit

mode = audit — 定期成本节约审计

mode = incident — Post-incident scaling analysis

mode = incident — 事件后扩缩容分析

Core mental model

核心逻辑模型

Key Datadog metrics

关键Datadog指标

Scale-down as a first-class cost control

将缩容作为核心成本控制手段

Recommended HPA scale-down policy

推荐的HPA缩容策略

Topology spread: critical cost lever

拓扑分布：关键成本杠杆

Anti-pattern (breaks cost optimization)

反模式（破坏成本优化）

Recommended default (cost-safe)

推荐默认配置（成本友好）

Strict isolation (AZ-level only)

严格隔离（仅可用区级别）

Anti-affinity as a soft alternative

反亲和性作为软替代方案

Resource requests tuning

资源请求调优

Validation loop

验证循环

Quick validation commands

快速验证命令

Utility scripts

实用脚本

audit-metrics.mjs — Cost-savings discovery

audit-metrics.mjs — 成本节约发现

Cluster-wide audit

集群范围审计

With deployment deep-dive

结合部署深度分析

incident-metrics.mjs — Post-incident analysis

incident-metrics.mjs — 事件后分析

Interpretation notes

解读说明

`mode = audit`
— Periodic cost-savings audit

`mode = audit`
— 定期成本节约审计

`mode = incident`
— Post-incident scaling analysis

`mode = incident`
— 事件后扩缩容分析

`audit-metrics.mjs`
— Cost-savings discovery

`audit-metrics.mjs`
— 成本节约发现

`incident-metrics.mjs`
— Post-incident analysis

`incident-metrics.mjs`
— 事件后分析