adaptive-metrics

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Grafana Cloud Adaptive Metrics

Adaptive Metrics analyses your Prometheus metrics usage and suggests aggregation rules that reduce series count without breaking any queries. Rules pre-aggregate high-cardinality metrics into lower-cardinality forms before storage.

How it works:

Adaptive Metrics scans your metric usage (dashboards, alerts, recording rules) over a lookback window
It identifies labels that are never queried for a given metric
It generates aggregation rules that drop those labels, reducing series count
The original high-cardinality metric is still ingested but the aggregated form is what gets stored long-term

Billing: Grafana Cloud charges per Active Series (series that received a sample in the last hour). Adaptive Metrics reduces your Active Series count, directly reducing your bill.

Adaptive Metrics会分析你的Prometheus指标使用情况，并建议聚合规则，在不破坏任何查询的前提下减少序列数量。规则会在存储前将高基数指标预聚合为低基数形式。

工作原理：

Adaptive Metrics在回溯窗口期内扫描你的指标使用情况（仪表盘、告警、记录规则）
识别给定指标中从未被查询的标签
生成删除这些标签的聚合规则，减少序列数量
原始高基数指标仍会被摄入，但长期存储的是聚合后的形式

计费说明： Grafana Cloud按Active Series（过去一小时内收到样本的序列）收费。Adaptive Metrics减少你的Active Series数量，直接降低账单金额。

Step 1: Access Adaptive Metrics

步骤1：访问Adaptive Metrics

In Grafana Cloud: Home > Adaptive Metrics (or via the app menu).

You need the Grafana Cloud Metrics plan. Adaptive Metrics is available on all paid plans.

Key views:

Overview - total series count, estimated savings from pending recommendations
Recommendations - auto-generated aggregation rules ready to apply
Rules - active rules and their effect
Usage analysis - which metrics are queried vs. unused

在Grafana Cloud中：主页 > Adaptive Metrics（或通过应用菜单）。

你需要Grafana Cloud Metrics套餐。Adaptive Metrics适用于所有付费套餐。

核心视图：

概览 - 总序列数、待处理建议的预估节省金额
建议 - 可直接应用的自动生成聚合规则
规则 - 生效规则及其效果
使用分析 - 哪些指标被查询 vs 未被使用

Step 2: Understand the recommendations

步骤2：理解建议内容

Recommendations are sorted by estimated series reduction (highest savings first).

Each recommendation shows:

Metric name - the metric being aggregated
Current series - series count before the rule
Projected series - series count after applying the rule
Labels to drop - labels that are never queried for this metric
Labels to keep - labels that appear in at least one query
Lookback period - how many days of query history was analysed

Review before applying:

bash

undefined

建议按预估序列减少量排序（节省最多的排在最前）。

每条建议包含：

指标名称 - 待聚合的指标
当前序列数 - 应用规则前的序列数量
预计序列数 - 应用规则后的序列数量
待删除标签 - 该指标中从未被查询的标签
保留标签 - 至少在一次查询中出现过的标签
回溯周期 - 分析了多少天的查询历史

应用前检查：

bash

undefined

Check if any dashboards or alerts use the label being dropped

检查是否有仪表盘或告警使用了待删除的标签

Replace METRIC_NAME and LABEL_NAME with actual values

将METRIC_NAME和LABEL_NAME替换为实际值

grep -r "METRIC_NAME" /path/to/dashboards/ --include="*.json" | grep "LABEL_NAME"


Or in Grafana: use **Explore > Metrics** to query the metric and check which labels are present
and used.

---

grep -r "METRIC_NAME" /path/to/dashboards/ --include="*.json" | grep "LABEL_NAME"


或者在Grafana中：使用**探索 > 指标**查询该指标，检查哪些标签存在且被使用。

---

Step 3: Apply a recommendation

步骤3：应用建议

Via the UI:

Go to Adaptive Metrics > Recommendations
Review the recommended labels to keep/drop
Click Apply on rules you want to enable
Rules take effect within ~5 minutes

Via the API:

bash

undefined

通过UI操作：

进入Adaptive Metrics > 建议
查看推荐保留/删除的标签
点击想要启用的规则旁的应用按钮
规则约5分钟内生效

通过API操作：

bash

undefined

List recommendations

列出建议

curl -s -H "Authorization: Bearer <API_KEY>"
"https://adaptive-metrics.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {metric_name, current_series, projected_series, estimated_reduction_percent}'

Apply a recommendation by ID

通过ID应用建议

curl -s -X POST
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/recommendations/<RECOMMENDATION_ID>/apply"

---

curl -s -X POST
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/recommendations/<RECOMMENDATION_ID>/apply"

---

Step 4: Create custom aggregation rules

步骤4：创建自定义聚合规则

If you know which labels to drop without waiting for recommendations, create rules directly.

Rule format:

yaml

undefined

如果你无需等待建议就知道要删除哪些标签，可以直接创建规则。

规则格式：

yaml

undefined

Aggregation rule: keep only job and instance labels for process_cpu_seconds_total

聚合规则：仅保留process_cpu_seconds_total的job和instance标签

rules:

match_metric: process_cpu_seconds_total drop_labels:
- version
- go_version
- service_name aggregations:
- type: sum without: [] # empty = keep only the labels not in drop_labels


**Via the API:**

```bash
curl -s -X POST \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  "https://adaptive-metrics.grafana.net/api/v1/rules" \
  -d '{
    "rules": [
      {
        "metric_name": "process_cpu_seconds_total",
        "match_type": "MATCH_TYPE_EXACT",
        "drop_labels": ["version", "go_version"],
        "aggregations": [{"type": "AGGREGATION_TYPE_SUM"}]
      }
    ]
  }'

Aggregation types:

Type	Use case
`sum`	Counters, request counts, byte totals
`max`	Gauges where you want the worst-case (e.g. CPU max across pods)
`min`	Gauges where you want the best-case
`avg`	Rate metrics, averages

For counters, always use
sum
. Averaging counters produces incorrect rates.

rules:

match_metric: process_cpu_seconds_total drop_labels:
- version
- go_version
- service_name aggregations:
- type: sum without: [] # 空数组 = 仅保留不在drop_labels中的标签


**通过API操作：**

```bash
curl -s -X POST \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  "https://adaptive-metrics.grafana.net/api/v1/rules" \
  -d '{
    "rules": [
      {
        "metric_name": "process_cpu_seconds_total",
        "match_type": "MATCH_TYPE_EXACT",
        "drop_labels": ["version", "go_version"],
        "aggregations": [{"type": "AGGREGATION_TYPE_SUM"}]
      }
    ]
  }'

聚合类型：

类型	使用场景
`sum`	计数器、请求数、字节总量
`max`	需要获取最坏情况的仪表盘（例如，Pod间的CPU最大值）
`min`	需要获取最佳情况的仪表盘
`avg`	速率指标、平均值

对于计数器，始终使用
sum
。对计数器取平均值会产生错误的速率。

Step 5: Handle metrics with regex matching

步骤5：处理正则匹配的指标

Use regex rules to cover families of metrics with similar label patterns:

bash

undefined

使用正则规则覆盖具有相似标签模式的指标族：

bash

undefined

Apply a rule to all metrics matching a pattern

对所有匹配模式的指标应用规则

curl -s -X POST
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/rules"
-d '{ "rules": [ { "metric_name": "go_.*", "match_type": "MATCH_TYPE_REGEX", "drop_labels": ["go_version", "version", "service_instance_id"], "aggregations": [{"type": "AGGREGATION_TYPE_SUM"}] } ] }'


**Common label families safe to drop globally:**
- `version`, `app_version`, `go_version` - rarely queried in PromQL
- `service_instance_id`, `pod_uid`, `container_id` - ultra-high cardinality
- `git_commit`, `build_date` - static labels that inflate series for no query value

---


**可全局安全删除的常见标签族：**
- `version`, `app_version`, `go_version` - 在PromQL中很少被查询
- `service_instance_id`, `pod_uid`, `container_id` - 超高基数
- `git_commit`, `build_date` - 静态标签，会无意义地增加序列数量

---

Step 6: Identify unused metrics

步骤6：识别未使用的指标

Unused metrics (never queried in any dashboard, alert, or recording rule) can be dropped entirely.

In the UI: Adaptive Metrics > Usage analysis > "Unused metrics" tab

Via the API:

bash

curl -s -H "Authorization: Bearer <API_KEY>" \
  "https://adaptive-metrics.grafana.net/api/v1/usage-analysis?filter=unused" | \
  jq '.metrics[] | {metric_name, series_count, last_queried}'

Before dropping a metric entirely:

Confirm it is not used in any Grafana dashboard (search by metric name in dashboard JSON)
Confirm it is not used in any Prometheus/Mimir alert rule or recording rule
Check with the team that owns the service if the metric is part of an SLO

Drop unused metrics via remote_write filtering in Alloy:

alloy

prometheus.remote_write "grafana_cloud" {
  endpoint {
    url = "https://prometheus-prod-XX.grafana.net/api/prom/push"
    write_relabel_config {
      source_labels = ["__name__"]
      regex         = "unused_metric_name|another_unused_metric"
      action        = "drop"
    }
  }
}

未使用的指标（从未在任何仪表盘、告警或记录规则中被查询）可以完全删除。

在UI中： Adaptive Metrics > 使用分析 > “未使用指标”标签页

通过API操作：

bash

curl -s -H "Authorization: Bearer <API_KEY>" \
  "https://adaptive-metrics.grafana.net/api/v1/usage-analysis?filter=unused" | \
  jq '.metrics[] | {metric_name, series_count, last_queried}'

完全删除指标前：

确认它未在任何Grafana仪表盘中使用（在仪表盘JSON中按指标名称搜索）
确认它未在任何Prometheus/Mimir告警规则或记录规则中使用
与拥有该服务的团队确认该指标是否属于SLO的一部分

通过Alloy中的remote_write过滤删除未使用指标：

alloy

prometheus.remote_write "grafana_cloud" {
  endpoint {
    url = "https://prometheus-prod-XX.grafana.net/api/prom/push"
    write_relabel_config {
      source_labels = ["__name__"]
      regex         = "unused_metric_name|another_unused_metric"
      action        = "drop"
    }
  }
}

Step 7: Adaptive Logs (companion product)

步骤7：Adaptive Logs（配套产品）

For log volume reduction, Adaptive Logs works the same way for Loki:

bash

undefined

对于日志量减少，Adaptive Logs针对Loki的工作方式相同：

bash

undefined

Check log volume recommendations

查看日志量建议

curl -s -H "Authorization: Bearer <API_KEY>"
"https://adaptive-logs.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {stream_selector, estimated_reduction_percent}'


Log pattern: drops low-value log streams (e.g. debug logs from non-critical services) during
high-volume periods or permanently.

---

curl -s -H "Authorization: Bearer <API_KEY>"
"https://adaptive-logs.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {stream_selector, estimated_reduction_percent}'


日志模式：在高流量时段或永久删除低价值日志流（例如，非关键服务的调试日志）。

---

Step 8: Measure the impact

步骤8：衡量影响

After applying rules, monitor the effect over 24-48 hours:

promql

undefined

应用规则后，在24-48小时内监控效果：

promql

undefined

Active Series count over time (visible in Grafana Cloud Metrics Usage dashboard)

随时间变化的Active Series数量（在Grafana Cloud指标使用仪表盘中可见）

grafanacloud_instance_active_series

Series reduction from adaptive metrics

Adaptive Metrics减少的序列数

grafanacloud_instance_active_series_dropped_by_aggregation_rules


In Grafana Cloud: **Home > Usage > Metrics** shows before/after series counts and the billing
impact of active rules.

**Expected timeline:**
- Rules take effect within ~5 minutes of creation
- Full billing impact visible after the next billing cycle (usually within 1 hour)
- The original high-cardinality metric continues to be ingested but doesn't count toward billing
  for the labels that were dropped

---

grafanacloud_instance_active_series_dropped_by_aggregation_rules


在Grafana Cloud中：**主页 > 使用情况 > 指标**显示规则应用前后的序列数以及生效规则对账单的影响。

**预期时间线：**
- 规则创建后约5分钟内生效
- 完整的账单影响在下次计费周期后可见（通常1小时内）
- 原始高基数指标仍会被摄入，但被删除标签对应的部分不会计入账单

---

adaptive-metrics

Original

Translation

Grafana Cloud Adaptive Metrics

Grafana Cloud Adaptive Metrics

Step 1: Access Adaptive Metrics

步骤1：访问Adaptive Metrics

Step 2: Understand the recommendations

步骤2：理解建议内容

Check if any dashboards or alerts use the label being dropped

检查是否有仪表盘或告警使用了待删除的标签

Replace METRIC_NAME and LABEL_NAME with actual values

将METRIC_NAME和LABEL_NAME替换为实际值

Step 3: Apply a recommendation

步骤3：应用建议

List recommendations

列出建议

Apply a recommendation by ID

通过ID应用建议

Step 4: Create custom aggregation rules

步骤4：创建自定义聚合规则

Aggregation rule: keep only job and instance labels for process_cpu_seconds_total

聚合规则：仅保留process_cpu_seconds_total的job和instance标签

Step 5: Handle metrics with regex matching

步骤5：处理正则匹配的指标

Apply a rule to all metrics matching a pattern

对所有匹配模式的指标应用规则

Step 6: Identify unused metrics

步骤6：识别未使用的指标

Step 7: Adaptive Logs (companion product)

步骤7：Adaptive Logs（配套产品）

Check log volume recommendations

查看日志量建议

Step 8: Measure the impact

步骤8：衡量影响

Active Series count over time (visible in Grafana Cloud Metrics Usage dashboard)

随时间变化的Active Series数量（在Grafana Cloud指标使用仪表盘中可见）

Series reduction from adaptive metrics

Adaptive Metrics减少的序列数

References

参考资料