adaptive-metrics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Grafana Cloud Adaptive Metrics

Grafana Cloud Adaptive Metrics

Adaptive Metrics analyses your Prometheus metrics usage and suggests aggregation rules that reduce series count without breaking any queries. Rules pre-aggregate high-cardinality metrics into lower-cardinality forms before storage.
How it works:
  1. Adaptive Metrics scans your metric usage (dashboards, alerts, recording rules) over a lookback window
  2. It identifies labels that are never queried for a given metric
  3. It generates aggregation rules that drop those labels, reducing series count
  4. The original high-cardinality metric is still ingested but the aggregated form is what gets stored long-term
Billing: Grafana Cloud charges per Active Series (series that received a sample in the last hour). Adaptive Metrics reduces your Active Series count, directly reducing your bill.

Adaptive Metrics会分析你的Prometheus指标使用情况,并建议聚合规则,在不破坏任何查询的前提下减少序列数量。规则会在存储前将高基数指标预聚合为低基数形式。
工作原理:
  1. Adaptive Metrics在回溯窗口期内扫描你的指标使用情况(仪表盘、告警、记录规则)
  2. 识别给定指标中从未被查询的标签
  3. 生成删除这些标签的聚合规则,减少序列数量
  4. 原始高基数指标仍会被摄入,但长期存储的是聚合后的形式
计费说明: Grafana Cloud按Active Series(过去一小时内收到样本的序列)收费。Adaptive Metrics减少你的Active Series数量,直接降低账单金额。

Step 1: Access Adaptive Metrics

步骤1:访问Adaptive Metrics

In Grafana Cloud: Home > Adaptive Metrics (or via the app menu).
You need the Grafana Cloud Metrics plan. Adaptive Metrics is available on all paid plans.
Key views:
  • Overview - total series count, estimated savings from pending recommendations
  • Recommendations - auto-generated aggregation rules ready to apply
  • Rules - active rules and their effect
  • Usage analysis - which metrics are queried vs. unused

在Grafana Cloud中:主页 > Adaptive Metrics(或通过应用菜单)。
你需要Grafana Cloud Metrics套餐。Adaptive Metrics适用于所有付费套餐。
核心视图:
  • 概览 - 总序列数、待处理建议的预估节省金额
  • 建议 - 可直接应用的自动生成聚合规则
  • 规则 - 生效规则及其效果
  • 使用分析 - 哪些指标被查询 vs 未被使用

Step 2: Understand the recommendations

步骤2:理解建议内容

Recommendations are sorted by estimated series reduction (highest savings first).
Each recommendation shows:
  • Metric name - the metric being aggregated
  • Current series - series count before the rule
  • Projected series - series count after applying the rule
  • Labels to drop - labels that are never queried for this metric
  • Labels to keep - labels that appear in at least one query
  • Lookback period - how many days of query history was analysed
Review before applying:
bash
undefined
建议按预估序列减少量排序(节省最多的排在最前)。
每条建议包含:
  • 指标名称 - 待聚合的指标
  • 当前序列数 - 应用规则前的序列数量
  • 预计序列数 - 应用规则后的序列数量
  • 待删除标签 - 该指标中从未被查询的标签
  • 保留标签 - 至少在一次查询中出现过的标签
  • 回溯周期 - 分析了多少天的查询历史
应用前检查:
bash
undefined

Check if any dashboards or alerts use the label being dropped

检查是否有仪表盘或告警使用了待删除的标签

Replace METRIC_NAME and LABEL_NAME with actual values

将METRIC_NAME和LABEL_NAME替换为实际值

grep -r "METRIC_NAME" /path/to/dashboards/ --include="*.json" | grep "LABEL_NAME"

Or in Grafana: use **Explore > Metrics** to query the metric and check which labels are present
and used.

---
grep -r "METRIC_NAME" /path/to/dashboards/ --include="*.json" | grep "LABEL_NAME"

或者在Grafana中:使用**探索 > 指标**查询该指标,检查哪些标签存在且被使用。

---

Step 3: Apply a recommendation

步骤3:应用建议

Via the UI:
  1. Go to Adaptive Metrics > Recommendations
  2. Review the recommended labels to keep/drop
  3. Click Apply on rules you want to enable
  4. Rules take effect within ~5 minutes
Via the API:
bash
undefined
通过UI操作:
  1. 进入Adaptive Metrics > 建议
  2. 查看推荐保留/删除的标签
  3. 点击想要启用的规则旁的应用按钮
  4. 规则约5分钟内生效
通过API操作:
bash
undefined

List recommendations

列出建议

curl -s -H "Authorization: Bearer <API_KEY>"
"https://adaptive-metrics.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {metric_name, current_series, projected_series, estimated_reduction_percent}'
curl -s -H "Authorization: Bearer <API_KEY>"
"https://adaptive-metrics.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {metric_name, current_series, projected_series, estimated_reduction_percent}'

Apply a recommendation by ID

通过ID应用建议

curl -s -X POST
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/recommendations/<RECOMMENDATION_ID>/apply"

---
curl -s -X POST
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/recommendations/<RECOMMENDATION_ID>/apply"

---

Step 4: Create custom aggregation rules

步骤4:创建自定义聚合规则

If you know which labels to drop without waiting for recommendations, create rules directly.
Rule format:
yaml
undefined
如果你无需等待建议就知道要删除哪些标签,可以直接创建规则。
规则格式:
yaml
undefined

Aggregation rule: keep only job and instance labels for process_cpu_seconds_total

聚合规则:仅保留process_cpu_seconds_total的job和instance标签

rules:
  • match_metric: process_cpu_seconds_total drop_labels:
    • version
    • go_version
    • service_name aggregations:
    • type: sum without: [] # empty = keep only the labels not in drop_labels

**Via the API:**

```bash
curl -s -X POST \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  "https://adaptive-metrics.grafana.net/api/v1/rules" \
  -d '{
    "rules": [
      {
        "metric_name": "process_cpu_seconds_total",
        "match_type": "MATCH_TYPE_EXACT",
        "drop_labels": ["version", "go_version"],
        "aggregations": [{"type": "AGGREGATION_TYPE_SUM"}]
      }
    ]
  }'
Aggregation types:
TypeUse case
sum
Counters, request counts, byte totals
max
Gauges where you want the worst-case (e.g. CPU max across pods)
min
Gauges where you want the best-case
avg
Rate metrics, averages
For counters, always use
sum
.
Averaging counters produces incorrect rates.

rules:
  • match_metric: process_cpu_seconds_total drop_labels:
    • version
    • go_version
    • service_name aggregations:
    • type: sum without: [] # 空数组 = 仅保留不在drop_labels中的标签

**通过API操作:**

```bash
curl -s -X POST \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  "https://adaptive-metrics.grafana.net/api/v1/rules" \
  -d '{
    "rules": [
      {
        "metric_name": "process_cpu_seconds_total",
        "match_type": "MATCH_TYPE_EXACT",
        "drop_labels": ["version", "go_version"],
        "aggregations": [{"type": "AGGREGATION_TYPE_SUM"}]
      }
    ]
  }'
聚合类型:
类型使用场景
sum
计数器、请求数、字节总量
max
需要获取最坏情况的仪表盘(例如,Pod间的CPU最大值)
min
需要获取最佳情况的仪表盘
avg
速率指标、平均值
对于计数器,始终使用
sum
对计数器取平均值会产生错误的速率。

Step 5: Handle metrics with regex matching

步骤5:处理正则匹配的指标

Use regex rules to cover families of metrics with similar label patterns:
bash
undefined
使用正则规则覆盖具有相似标签模式的指标族:
bash
undefined

Apply a rule to all metrics matching a pattern

对所有匹配模式的指标应用规则

curl -s -X POST
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/rules"
-d '{ "rules": [ { "metric_name": "go_.*", "match_type": "MATCH_TYPE_REGEX", "drop_labels": ["go_version", "version", "service_instance_id"], "aggregations": [{"type": "AGGREGATION_TYPE_SUM"}] } ] }'

**Common label families safe to drop globally:**
- `version`, `app_version`, `go_version` - rarely queried in PromQL
- `service_instance_id`, `pod_uid`, `container_id` - ultra-high cardinality
- `git_commit`, `build_date` - static labels that inflate series for no query value

---
curl -s -X POST
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/rules"
-d '{ "rules": [ { "metric_name": "go_.*", "match_type": "MATCH_TYPE_REGEX", "drop_labels": ["go_version", "version", "service_instance_id"], "aggregations": [{"type": "AGGREGATION_TYPE_SUM"}] } ] }'

**可全局安全删除的常见标签族:**
- `version`, `app_version`, `go_version` - 在PromQL中很少被查询
- `service_instance_id`, `pod_uid`, `container_id` - 超高基数
- `git_commit`, `build_date` - 静态标签,会无意义地增加序列数量

---

Step 6: Identify unused metrics

步骤6:识别未使用的指标

Unused metrics (never queried in any dashboard, alert, or recording rule) can be dropped entirely.
In the UI: Adaptive Metrics > Usage analysis > "Unused metrics" tab
Via the API:
bash
curl -s -H "Authorization: Bearer <API_KEY>" \
  "https://adaptive-metrics.grafana.net/api/v1/usage-analysis?filter=unused" | \
  jq '.metrics[] | {metric_name, series_count, last_queried}'
Before dropping a metric entirely:
  1. Confirm it is not used in any Grafana dashboard (search by metric name in dashboard JSON)
  2. Confirm it is not used in any Prometheus/Mimir alert rule or recording rule
  3. Check with the team that owns the service if the metric is part of an SLO
Drop unused metrics via remote_write filtering in Alloy:
alloy
prometheus.remote_write "grafana_cloud" {
  endpoint {
    url = "https://prometheus-prod-XX.grafana.net/api/prom/push"
    write_relabel_config {
      source_labels = ["__name__"]
      regex         = "unused_metric_name|another_unused_metric"
      action        = "drop"
    }
  }
}

未使用的指标(从未在任何仪表盘、告警或记录规则中被查询)可以完全删除。
在UI中: Adaptive Metrics > 使用分析 > “未使用指标”标签页
通过API操作:
bash
curl -s -H "Authorization: Bearer <API_KEY>" \
  "https://adaptive-metrics.grafana.net/api/v1/usage-analysis?filter=unused" | \
  jq '.metrics[] | {metric_name, series_count, last_queried}'
完全删除指标前:
  1. 确认它未在任何Grafana仪表盘中使用(在仪表盘JSON中按指标名称搜索)
  2. 确认它未在任何Prometheus/Mimir告警规则或记录规则中使用
  3. 与拥有该服务的团队确认该指标是否属于SLO的一部分
通过Alloy中的remote_write过滤删除未使用指标:
alloy
prometheus.remote_write "grafana_cloud" {
  endpoint {
    url = "https://prometheus-prod-XX.grafana.net/api/prom/push"
    write_relabel_config {
      source_labels = ["__name__"]
      regex         = "unused_metric_name|another_unused_metric"
      action        = "drop"
    }
  }
}

Step 7: Adaptive Logs (companion product)

步骤7:Adaptive Logs(配套产品)

For log volume reduction, Adaptive Logs works the same way for Loki:
bash
undefined
对于日志量减少,Adaptive Logs针对Loki的工作方式相同:
bash
undefined

Check log volume recommendations

查看日志量建议

curl -s -H "Authorization: Bearer <API_KEY>"
"https://adaptive-logs.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {stream_selector, estimated_reduction_percent}'

Log pattern: drops low-value log streams (e.g. debug logs from non-critical services) during
high-volume periods or permanently.

---
curl -s -H "Authorization: Bearer <API_KEY>"
"https://adaptive-logs.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {stream_selector, estimated_reduction_percent}'

日志模式:在高流量时段或永久删除低价值日志流(例如,非关键服务的调试日志)。

---

Step 8: Measure the impact

步骤8:衡量影响

After applying rules, monitor the effect over 24-48 hours:
promql
undefined
应用规则后,在24-48小时内监控效果:
promql
undefined

Active Series count over time (visible in Grafana Cloud Metrics Usage dashboard)

随时间变化的Active Series数量(在Grafana Cloud指标使用仪表盘中可见)

grafanacloud_instance_active_series
grafanacloud_instance_active_series

Series reduction from adaptive metrics

Adaptive Metrics减少的序列数

grafanacloud_instance_active_series_dropped_by_aggregation_rules

In Grafana Cloud: **Home > Usage > Metrics** shows before/after series counts and the billing
impact of active rules.

**Expected timeline:**
- Rules take effect within ~5 minutes of creation
- Full billing impact visible after the next billing cycle (usually within 1 hour)
- The original high-cardinality metric continues to be ingested but doesn't count toward billing
  for the labels that were dropped

---
grafanacloud_instance_active_series_dropped_by_aggregation_rules

在Grafana Cloud中:**主页 > 使用情况 > 指标**显示规则应用前后的序列数以及生效规则对账单的影响。

**预期时间线:**
- 规则创建后约5分钟内生效
- 完整的账单影响在下次计费周期后可见(通常1小时内)
- 原始高基数指标仍会被摄入,但被删除标签对应的部分不会计入账单

---

References

参考资料