Loading...
Loading...
Discover optimal autoscaling parameters for a Deco site by analyzing Prometheus metrics. Correlates CPU, concurrency, and latency to find the right scaling target and method.
npx skill4agent add decocms/deco-start deco-site-scaling-tuningkubectlkube-prometheus-stack0. ENABLE METRICS → Apply queue-proxy PodMonitor if not already done
1. PORT-FORWARD → kubectl port-forward prometheus-pod 19090:9090
2. COLLECT DATA → Run analysis scripts against Prometheus
3. ANALYZE → Find CPU threshold where latency degrades
4. RECOMMEND → Choose scaling metric and target
5. APPLY → Use deco-site-deployment skill to apply changes
6. VERIFY → Monitor for 1-2 hours after change| File | Purpose |
|---|---|
| Overview, methodology, analysis procedures |
| Ready-to-use Python scripts for Prometheus queries |
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: knative-queue-proxy
namespace: monitoring
labels:
release: kube-prometheus-stack
spec:
namespaceSelector:
any: true
selector:
matchExpressions:
- key: serving.knative.dev/revision
operator: Exists
podMetricsEndpoints:
- port: http-usermetric
path: /metrics
interval: 15skubectl apply -f queue-proxy-podmonitor.yaml
# Wait 2-3 hours for data to accumulate before running latency analysisrevision_app_request_latencies_bucketrevision_app_request_latencies_sum_countrevision_app_request_countPROM_POD=$(kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward -n monitoring $PROM_POD 19090:9090 &
# Verify
curl -s "http://127.0.0.1:19090/api/v1/query?query=up" | jq '.status'SITENAME="<sitename>"
NS="sites-${SITENAME}"
# Current revision annotations
kubectl get rev -n $NS -o json | \
jq '.items[] | select(.status.conditions[]?.status == "True" and .status.conditions[]?.type == "Active") |
{name: .metadata.name, annotations: .metadata.annotations | with_entries(select(.key | startswith("autoscaling")))}'
# Global autoscaler defaults
kubectl get cm config-autoscaler -n knative-serving -o json | jq '.data | del(._example)'kubectl get pods -n $NS --no-headers | wc -l
kubectl top pods -n $NS --no-headers | head -20analysis-scripts.mdexcess = (avg_conc_above_threshold / avg_conc_below_threshold) / (avg_pods_below / avg_pods_above)CPU vs Concurrency/pod:
Low CPU (0-200m) → Low conc/pod → Pods are idle (overprovisioned)
Medium CPU (200-400m) → Moderate conc → Healthy range
★ INFLECTION ★ → Conc jumps → Latency starting to degrade
High CPU (500m+) → High conc/pod → Pods overloaded, latency badtarget: 400| Inflection CPU | Recommended metric | Target | Notes |
|---|---|---|---|
| < CPU request | CPU scaling | target = inflection value in millicores | Standard case |
| ~ CPU request | CPU scaling | target = CPU_request × 0.8 | Conservative |
| > CPU request (no limit) | CPU scaling | target = CPU_request × 0.8, increase CPU request | Need more CPU headroom |
| No clear inflection | Concurrency scaling | Keep current but tune target | CPU isn't the bottleneck |
scale-down-delaydeco-site-deploymentstateNEW_STATE=$(echo "$STATE" | jq '
.scaling.metric = {
"type": "cpu",
"target": 400
}
')# Watch pod count stabilize
watch -n 10 "kubectl get pods -n sites-<sitename> --no-headers | wc -l"
# Check if panic mode triggers (should be N/A for HPA/CPU)
# HPA doesn't have panic mode — this is one of the advantages
# Verify HPA is active
kubectl get hpa -n sites-<sitename>
# Check HPA status
kubectl describe hpa -n sites-<sitename>NEW_STATE=$(echo "$STATE" | jq '
.scaling.metric = {
"type": "concurrency",
"target": 15,
"targetUtilizationPercentage": 70
}
')deco-site-deploymentdeco-site-memory-debuggingdeco-incident-debugging