deco-site-scaling-tuning
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDeco Site Scaling Tuning
Deco站点扩缩容调优
Analyze a site's Prometheus metrics to discover the optimal autoscaling parameters. This skill helps you find the CPU/concurrency threshold where latency degrades and recommends scaling configuration accordingly.
分析站点的Prometheus指标,以确定最优的自动扩缩容参数。该Skill可帮助你找到延迟开始劣化时的CPU/并发阈值,并据此推荐扩缩容配置。
When to Use This Skill
何时使用该Skill
- A site is overscaled (too many pods for its traffic)
- A site oscillates between scaling up and down (panic mode loop)
- Need to switch scaling metric (concurrency vs CPU vs RPS)
- Need to find the right target value for a site
- After deploying scaling changes, to verify they're working
- 站点存在过度扩缩容问题(流量对应的Pod数量过多)
- 站点在扩缩容之间频繁振荡(恐慌模式循环)
- 需要切换扩缩容指标(并发量 vs CPU vs 请求每秒数RPS)
- 需要为站点确定合适的扩缩容目标值
- 部署扩缩容变更后,验证变更是否生效
Prerequisites
前置条件
- access to the target cluster
kubectl - Prometheus accessible via port-forward (from in monitoring namespace)
kube-prometheus-stack - Python 3 for analysis scripts
- At least 6 hours of metric history for meaningful analysis
- For direct latency data: queue-proxy PodMonitor must be applied (see Step 0)
- 拥有目标集群的访问权限
kubectl - 可通过端口转发访问Prometheus(来自monitoring命名空间的)
kube-prometheus-stack - 安装Python 3以运行分析脚本
- 至少6小时的指标历史数据,以保证分析结果有意义
- 如需直接获取延迟数据:必须已部署queue-proxy PodMonitor(参见步骤0)
Quick Start
快速开始
0. ENABLE METRICS → Apply queue-proxy PodMonitor if not already done
1. PORT-FORWARD → kubectl port-forward prometheus-pod 19090:9090
2. COLLECT DATA → Run analysis scripts against Prometheus
3. ANALYZE → Find CPU threshold where latency degrades
4. RECOMMEND → Choose scaling metric and target
5. APPLY → Use deco-site-deployment skill to apply changes
6. VERIFY → Monitor for 1-2 hours after change0. 启用指标监控 → 若尚未部署,应用queue-proxy PodMonitor
1. 端口转发 → kubectl port-forward prometheus-pod 19090:9090
2. 收集数据 → 针对Prometheus运行分析脚本
3. 分析数据 → 找到延迟开始劣化的CPU阈值
4. 生成推荐 → 选择扩缩容指标与目标值
5. 应用变更 → 使用deco-site-deployment Skill应用配置变更
6. 验证效果 → 变更后监控1-2小时Files in This Skill
该Skill包含的文件
| File | Purpose |
|---|---|
| Overview, methodology, analysis procedures |
| Ready-to-use Python scripts for Prometheus queries |
| 文件 | 用途 |
|---|---|
| 概述、方法论、分析流程 |
| 可直接使用的Prometheus查询Python脚本 |
Step 0: Enable Queue-Proxy Metrics (one-time)
步骤0:启用Queue-Proxy指标监控(一次性操作)
Queue-proxy runs as a sidecar on every Knative pod and exposes request latency histograms. These are critical for precise tuning but are not scraped by default.
Apply this PodMonitor:
yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: knative-queue-proxy
namespace: monitoring
labels:
release: kube-prometheus-stack
spec:
namespaceSelector:
any: true
selector:
matchExpressions:
- key: serving.knative.dev/revision
operator: Exists
podMetricsEndpoints:
- port: http-usermetric
path: /metrics
interval: 15sbash
kubectl apply -f queue-proxy-podmonitor.yamlQueue-proxy作为Sidecar容器运行在每个Knative Pod上,对外暴露请求延迟直方图指标。这些指标对于精准调优至关重要,但默认不会被采集。
应用以下PodMonitor:
yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: knative-queue-proxy
namespace: monitoring
labels:
release: kube-prometheus-stack
spec:
namespaceSelector:
any: true
selector:
matchExpressions:
- key: serving.knative.dev/revision
operator: Exists
podMetricsEndpoints:
- port: http-usermetric
path: /metrics
interval: 15sbash
kubectl apply -f queue-proxy-podmonitor.yamlWait 2-3 hours for data to accumulate before running latency analysis
等待2-3小时积累数据后,再进行延迟分析
**Metrics unlocked by this PodMonitor:**
- `revision_app_request_latencies_bucket` — request latency histogram (p50/p95/p99)
- `revision_app_request_latencies_sum` / `_count` — for avg latency
- `revision_app_request_count` — request rate by response code
**该PodMonitor解锁的指标:**
- `revision_app_request_latencies_bucket` — 请求延迟直方图(p50/p95/p99分位值)
- `revision_app_request_latencies_sum` / `_count` — 用于计算平均延迟
- `revision_app_request_count` — 按响应码统计的请求速率Step 1: Establish Prometheus Connection
步骤1:建立Prometheus连接
bash
PROM_POD=$(kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward -n monitoring $PROM_POD 19090:9090 &bash
PROM_POD=$(kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward -n monitoring $PROM_POD 19090:9090 &Verify
验证连接
curl -s "http://127.0.0.1:19090/api/v1/query?query=up" | jq '.status'
undefinedcurl -s "http://127.0.0.1:19090/api/v1/query?query=up" | jq '.status'
undefinedStep 2: Collect Current State
步骤2:收集当前状态
Before analyzing, understand what the site is currently configured for.
在分析前,需先了解站点当前的配置情况。
2a. Read current autoscaler config
2a. 读取当前自动扩缩容配置
bash
SITENAME="<sitename>"
NS="sites-${SITENAME}"bash
SITENAME="<站点名称>"
NS="sites-${SITENAME}"Current revision annotations
当前版本的注解信息
kubectl get rev -n $NS -o json |
jq '.items[] | select(.status.conditions[]?.status == "True" and .status.conditions[]?.type == "Active") | {name: .metadata.name, annotations: .metadata.annotations | with_entries(select(.key | startswith("autoscaling")))}'
jq '.items[] | select(.status.conditions[]?.status == "True" and .status.conditions[]?.type == "Active") | {name: .metadata.name, annotations: .metadata.annotations | with_entries(select(.key | startswith("autoscaling")))}'
kubectl get rev -n $NS -o json |
jq '.items[] | select(.status.conditions[]?.status == "True" and .status.conditions[]?.type == "Active") | {name: .metadata.name, annotations: .metadata.annotations | with_entries(select(.key | startswith("autoscaling")))}'
jq '.items[] | select(.status.conditions[]?.status == "True" and .status.conditions[]?.type == "Active") | {name: .metadata.name, annotations: .metadata.annotations | with_entries(select(.key | startswith("autoscaling")))}'
Global autoscaler defaults
全局自动扩缩容默认配置
kubectl get cm config-autoscaler -n knative-serving -o json | jq '.data | del(._example)'
undefinedkubectl get cm config-autoscaler -n knative-serving -o json | jq '.data | del(._example)'
undefined2b. Current pod count and resources
2b. 当前Pod数量与资源使用情况
bash
kubectl get pods -n $NS --no-headers | wc -l
kubectl top pods -n $NS --no-headers | head -20bash
kubectl get pods -n $NS --no-headers | wc -l
kubectl top pods -n $NS --no-headers | head -20Step 3: Run Analysis
步骤3:运行分析
Use the scripts in . The analysis follows this methodology:
analysis-scripts.md使用中的脚本。分析遵循以下方法论:
analysis-scripts.mdMethodology: Finding the Optimal CPU Target
方法论:寻找最优CPU目标值
Goal: Find the CPU level at which latency starts to degrade. This is your scaling target — keep pods below this CPU to maintain good latency.
Approach:
-
Collect CPU per pod, concurrency per pod, pod count, and (if available) request latency over 6-12 hours
-
Bucket data by CPU range (0-200m, 200-300m, ..., 700m+)
-
For each bucket, compute avg/p95 concurrency per pod
-
Compute the "latency inflation factor" — how much concurrency increases beyond what the pod count reduction explains:
excess = (avg_conc_above_threshold / avg_conc_below_threshold) / (avg_pods_below / avg_pods_above)- excess = 1.0 → concurrency increase fully explained by fewer pods (no latency degradation)
- excess > 1.0 → latency is inflating concurrency (pods are slowing down)
- The CPU level where excess crosses ~1.5x is your inflection point
-
If queue-proxy latency is available, directly plot avg latency vs CPU — the hockey stick inflection is your target
目标:找到延迟开始劣化时的CPU水平。这就是你的扩缩容目标——保持Pod的CPU使用率低于该值,以维持良好的延迟性能。
方法:
-
收集6-12小时内的每个Pod的CPU使用率、每个Pod的并发量、Pod数量,以及(若可用)请求延迟数据
-
按CPU范围分组数据(0-200m、200-300m……700m+)
-
针对每个分组,计算每个Pod的平均/95分位并发量
-
计算“延迟膨胀系数” —— 并发量的增长幅度超出Pod数量减少所能解释的部分:
excess = (阈值以上的平均并发量 / 阈值以下的平均并发量) / (阈值以下的平均Pod数 / 阈值以上的平均Pod数)- excess = 1.0 → 并发量的增长完全可以用Pod数量减少来解释(无延迟劣化)
- excess > 1.0 → 延迟导致并发量虚高(Pod处理速度变慢)
- excess超过约1.5倍时对应的CPU水平即为拐点
-
若queue-proxy延迟数据可用,直接绘制平均延迟与CPU的关系图——图中曲棍球杆状的拐点就是你的目标值
What to Look For
关注要点
CPU vs Concurrency/pod:
Low CPU (0-200m) → Low conc/pod → Pods are idle (overprovisioned)
Medium CPU (200-400m) → Moderate conc → Healthy range
★ INFLECTION ★ → Conc jumps → Latency starting to degrade
High CPU (500m+) → High conc/pod → Pods overloaded, latency badThe inflection point is where you want your scaling target.
CPU vs 每个Pod的并发量:
低CPU区间 (0-200m) → 低并发量/每个Pod → Pod处于空闲状态(过度配置)
中等CPU区间 (200-400m) → 适中的并发量 → 健康区间
★ 拐点 ★ → 并发量突增 → 延迟开始劣化
高CPU区间 (500m+) → 高并发量/每个Pod → Pod过载,延迟表现差拐点就是你需要设置的扩缩容目标值。
Decision Matrix
决策矩阵
IMPORTANT: CPU target is in millicores (not percentage). E.g., means scale when CPU reaches 400m.
target: 400| Inflection CPU | Recommended metric | Target | Notes |
|---|---|---|---|
| < CPU request | CPU scaling | target = inflection value in millicores | Standard case |
| ~ CPU request | CPU scaling | target = CPU_request × 0.8 | Conservative |
| > CPU request (no limit) | CPU scaling | target = CPU_request × 0.8, increase CPU request | Need more CPU headroom |
| No clear inflection | Concurrency scaling | Keep current but tune target | CPU isn't the bottleneck |
重要提示:CPU目标值的单位是毫核(而非百分比)。例如,表示当CPU使用率达到400毫核时触发扩缩容。
target: 400| 拐点CPU值 | 推荐指标 | 目标值 | 说明 |
|---|---|---|---|
| 小于CPU请求值 | CPU扩缩容 | 目标值 = 拐点的毫核数值 | 标准场景 |
| 接近CPU请求值 | CPU扩缩容 | 目标值 = CPU请求值 × 0.8 | 保守配置 |
| 大于CPU请求值(无限制) | CPU扩缩容 | 目标值 = CPU请求值 × 0.8,同时提高CPU请求值 | 需要更多CPU余量 |
| 无明确拐点 | 并发量扩缩容 | 保留当前指标但调整目标值 | CPU并非瓶颈 |
Common Patterns
常见模式
Pattern: CPU-bound app (Deno SSR)
- Baseline CPU: 200-300m (Deno runtime + V8 JIT)
- Inflection: 400-500m
- Recommendation: CPU scaling with target = inflection (e.g., 400 millicores)
Pattern: IO-bound app (mostly external API calls)
- CPU stays low even under high concurrency
- Inflection not visible in CPU
- Recommendation: Keep concurrency scaling, tune the target
Pattern: Oscillating (panic loop)
- Symptoms: pods cycle between min and max
- Cause: concurrency scaling + low target + ratchet
scale-down-delay - Fix: Switch to CPU scaling (breaks the latency→concurrency feedback loop)
模式:CPU密集型应用(Deno SSR)
- 基线CPU使用率:200-300m(Deno运行时 + V8 JIT)
- 拐点:400-500m
- 推荐:采用CPU扩缩容,目标值为拐点值(例如400毫核)
模式:IO密集型应用(主要调用外部API)
- 即使在高并发下CPU使用率仍保持低位
- CPU数据中无明显拐点
- 推荐:保留并发量扩缩容,调整目标值
模式:振荡(恐慌循环)
- 症状:Pod数量在最小值和最大值之间循环
- 原因:并发量扩缩容 + 低目标值 + 棘轮效应
scale-down-delay - 修复:切换为CPU扩缩容(打破延迟→并发量的反馈循环)
Step 4: Apply Changes
步骤4:应用变更
Use the skill to:
deco-site-deployment- Update the secret with new scaling config
state - Redeploy on both clouds
Example for CPU-based scaling (target is in millicores):
bash
NEW_STATE=$(echo "$STATE" | jq '
.scaling.metric = {
"type": "cpu",
"target": 400
}
')使用 Skill完成以下操作:
deco-site-deployment- 更新密钥,配置新的扩缩容参数
state - 在双云环境重新部署
基于CPU的扩缩容示例(目标值单位为毫核):
bash
NEW_STATE=$(echo "$STATE" | jq '
.scaling.metric = {
"type": "cpu",
"target": 400
}
')Step 5: Verify After Change
步骤5:变更后验证
Monitor for 1-2 hours after applying changes:
bash
undefined应用变更后监控1-2小时:
bash
undefinedWatch pod count stabilize
观察Pod数量是否稳定
watch -n 10 "kubectl get pods -n sites-<sitename> --no-headers | wc -l"
watch -n 10 "kubectl get pods -n sites-<站点名称> --no-headers | wc -l"
Check if panic mode triggers (should be N/A for HPA/CPU)
检查是否触发恐慌模式(HPA/CPU扩缩容无此模式)
HPA doesn't have panic mode — this is one of the advantages
HPA没有恐慌模式——这是其优势之一
Verify HPA is active
验证HPA是否处于活跃状态
kubectl get hpa -n sites-<sitename>
kubectl get hpa -n sites-<站点名称>
Check HPA status
查看HPA状态
kubectl describe hpa -n sites-<sitename>
undefinedkubectl describe hpa -n sites-<站点名称>
undefinedSuccess Criteria
成功标准
- Pod count stabilizes (no more oscillation)
- Avg CPU per pod stays below your target during normal traffic
- CPU crosses target only during genuine traffic spikes (and scales up proportionally)
- No panic mode events (HPA doesn't have panic mode)
- Latency stays acceptable (check with queue-proxy metrics if available)
- Pod数量保持稳定(无振荡)
- 正常流量下,每个Pod的平均CPU使用率低于目标值
- 仅在真实流量峰值时CPU使用率超过目标值(并按比例扩缩容)
- 无恐慌模式事件(HPA无此模式)
- 延迟表现保持可接受水平(若可用,通过queue-proxy指标验证)
Rollback
回滚
If the new scaling is worse, revert by changing the state secret back to concurrency scaling:
bash
NEW_STATE=$(echo "$STATE" | jq '
.scaling.metric = {
"type": "concurrency",
"target": 15,
"targetUtilizationPercentage": 70
}
')若新的扩缩容配置效果更差,可通过将state密钥改回并发量扩缩容来恢复:
bash
NEW_STATE=$(echo "$STATE" | jq '
.scaling.metric = {
"type": "concurrency",
"target": 15,
"targetUtilizationPercentage": 70
}
')Related Skills
相关Skill
- — Apply scaling changes and redeploy
deco-site-deployment - — Debug memory issues on running pods
deco-site-memory-debugging - — Incident response and triage
deco-incident-debugging
- — 应用扩缩容变更并重新部署
deco-site-deployment - — 调试运行中Pod的内存问题
deco-site-memory-debugging - — 事件响应与分类排查
deco-incident-debugging