adaptive-metrics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGrafana Cloud Adaptive Metrics
Grafana Cloud Adaptive Metrics
Adaptive Metrics analyses your Prometheus metrics usage and suggests aggregation rules that
reduce series count without breaking any queries. Rules pre-aggregate high-cardinality metrics
into lower-cardinality forms before storage.
How it works:
- Adaptive Metrics scans your metric usage (dashboards, alerts, recording rules) over a lookback window
- It identifies labels that are never queried for a given metric
- It generates aggregation rules that drop those labels, reducing series count
- The original high-cardinality metric is still ingested but the aggregated form is what gets stored long-term
Billing: Grafana Cloud charges per Active Series (series that received a sample in the last hour).
Adaptive Metrics reduces your Active Series count, directly reducing your bill.
Adaptive Metrics会分析你的Prometheus指标使用情况,并建议聚合规则,在不破坏任何查询的前提下减少序列数量。规则会在存储前将高基数指标预聚合为低基数形式。
工作原理:
- Adaptive Metrics在回溯窗口期内扫描你的指标使用情况(仪表盘、告警、记录规则)
- 识别给定指标中从未被查询的标签
- 生成删除这些标签的聚合规则,减少序列数量
- 原始高基数指标仍会被摄入,但长期存储的是聚合后的形式
计费说明: Grafana Cloud按Active Series(过去一小时内收到样本的序列)收费。Adaptive Metrics减少你的Active Series数量,直接降低账单金额。
Step 1: Access Adaptive Metrics
步骤1:访问Adaptive Metrics
In Grafana Cloud: Home > Adaptive Metrics (or via the app menu).
You need the Grafana Cloud Metrics plan. Adaptive Metrics is available on all paid plans.
Key views:
- Overview - total series count, estimated savings from pending recommendations
- Recommendations - auto-generated aggregation rules ready to apply
- Rules - active rules and their effect
- Usage analysis - which metrics are queried vs. unused
在Grafana Cloud中:主页 > Adaptive Metrics(或通过应用菜单)。
你需要Grafana Cloud Metrics套餐。Adaptive Metrics适用于所有付费套餐。
核心视图:
- 概览 - 总序列数、待处理建议的预估节省金额
- 建议 - 可直接应用的自动生成聚合规则
- 规则 - 生效规则及其效果
- 使用分析 - 哪些指标被查询 vs 未被使用
Step 2: Understand the recommendations
步骤2:理解建议内容
Recommendations are sorted by estimated series reduction (highest savings first).
Each recommendation shows:
- Metric name - the metric being aggregated
- Current series - series count before the rule
- Projected series - series count after applying the rule
- Labels to drop - labels that are never queried for this metric
- Labels to keep - labels that appear in at least one query
- Lookback period - how many days of query history was analysed
Review before applying:
bash
undefined建议按预估序列减少量排序(节省最多的排在最前)。
每条建议包含:
- 指标名称 - 待聚合的指标
- 当前序列数 - 应用规则前的序列数量
- 预计序列数 - 应用规则后的序列数量
- 待删除标签 - 该指标中从未被查询的标签
- 保留标签 - 至少在一次查询中出现过的标签
- 回溯周期 - 分析了多少天的查询历史
应用前检查:
bash
undefinedCheck if any dashboards or alerts use the label being dropped
检查是否有仪表盘或告警使用了待删除的标签
Replace METRIC_NAME and LABEL_NAME with actual values
将METRIC_NAME和LABEL_NAME替换为实际值
grep -r "METRIC_NAME" /path/to/dashboards/ --include="*.json" | grep "LABEL_NAME"
Or in Grafana: use **Explore > Metrics** to query the metric and check which labels are present
and used.
---grep -r "METRIC_NAME" /path/to/dashboards/ --include="*.json" | grep "LABEL_NAME"
或者在Grafana中:使用**探索 > 指标**查询该指标,检查哪些标签存在且被使用。
---Step 3: Apply a recommendation
步骤3:应用建议
Via the UI:
- Go to Adaptive Metrics > Recommendations
- Review the recommended labels to keep/drop
- Click Apply on rules you want to enable
- Rules take effect within ~5 minutes
Via the API:
bash
undefined通过UI操作:
- 进入Adaptive Metrics > 建议
- 查看推荐保留/删除的标签
- 点击想要启用的规则旁的应用按钮
- 规则约5分钟内生效
通过API操作:
bash
undefinedList recommendations
列出建议
curl -s -H "Authorization: Bearer <API_KEY>"
"https://adaptive-metrics.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {metric_name, current_series, projected_series, estimated_reduction_percent}'
"https://adaptive-metrics.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {metric_name, current_series, projected_series, estimated_reduction_percent}'
curl -s -H "Authorization: Bearer <API_KEY>"
"https://adaptive-metrics.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {metric_name, current_series, projected_series, estimated_reduction_percent}'
"https://adaptive-metrics.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {metric_name, current_series, projected_series, estimated_reduction_percent}'
Apply a recommendation by ID
通过ID应用建议
curl -s -X POST
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/recommendations/<RECOMMENDATION_ID>/apply"
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/recommendations/<RECOMMENDATION_ID>/apply"
---curl -s -X POST
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/recommendations/<RECOMMENDATION_ID>/apply"
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/recommendations/<RECOMMENDATION_ID>/apply"
---Step 4: Create custom aggregation rules
步骤4:创建自定义聚合规则
If you know which labels to drop without waiting for recommendations, create rules directly.
Rule format:
yaml
undefined如果你无需等待建议就知道要删除哪些标签,可以直接创建规则。
规则格式:
yaml
undefinedAggregation rule: keep only job and instance labels for process_cpu_seconds_total
聚合规则:仅保留process_cpu_seconds_total的job和instance标签
rules:
- match_metric: process_cpu_seconds_total
drop_labels:
- version
- go_version
- service_name aggregations:
- type: sum without: [] # empty = keep only the labels not in drop_labels
**Via the API:**
```bash
curl -s -X POST \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
"https://adaptive-metrics.grafana.net/api/v1/rules" \
-d '{
"rules": [
{
"metric_name": "process_cpu_seconds_total",
"match_type": "MATCH_TYPE_EXACT",
"drop_labels": ["version", "go_version"],
"aggregations": [{"type": "AGGREGATION_TYPE_SUM"}]
}
]
}'Aggregation types:
| Type | Use case |
|---|---|
| Counters, request counts, byte totals |
| Gauges where you want the worst-case (e.g. CPU max across pods) |
| Gauges where you want the best-case |
| Rate metrics, averages |
For counters, always use . Averaging counters produces incorrect rates.
sumrules:
- match_metric: process_cpu_seconds_total
drop_labels:
- version
- go_version
- service_name aggregations:
- type: sum without: [] # 空数组 = 仅保留不在drop_labels中的标签
**通过API操作:**
```bash
curl -s -X POST \
-H "Authorization: Bearer <API_KEY>" \
-H "Content-Type: application/json" \
"https://adaptive-metrics.grafana.net/api/v1/rules" \
-d '{
"rules": [
{
"metric_name": "process_cpu_seconds_total",
"match_type": "MATCH_TYPE_EXACT",
"drop_labels": ["version", "go_version"],
"aggregations": [{"type": "AGGREGATION_TYPE_SUM"}]
}
]
}'聚合类型:
| 类型 | 使用场景 |
|---|---|
| 计数器、请求数、字节总量 |
| 需要获取最坏情况的仪表盘(例如,Pod间的CPU最大值) |
| 需要获取最佳情况的仪表盘 |
| 速率指标、平均值 |
对于计数器,始终使用。 对计数器取平均值会产生错误的速率。
sumStep 5: Handle metrics with regex matching
步骤5:处理正则匹配的指标
Use regex rules to cover families of metrics with similar label patterns:
bash
undefined使用正则规则覆盖具有相似标签模式的指标族:
bash
undefinedApply a rule to all metrics matching a pattern
对所有匹配模式的指标应用规则
curl -s -X POST
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/rules"
-d '{ "rules": [ { "metric_name": "go_.*", "match_type": "MATCH_TYPE_REGEX", "drop_labels": ["go_version", "version", "service_instance_id"], "aggregations": [{"type": "AGGREGATION_TYPE_SUM"}] } ] }'
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/rules"
-d '{ "rules": [ { "metric_name": "go_.*", "match_type": "MATCH_TYPE_REGEX", "drop_labels": ["go_version", "version", "service_instance_id"], "aggregations": [{"type": "AGGREGATION_TYPE_SUM"}] } ] }'
**Common label families safe to drop globally:**
- `version`, `app_version`, `go_version` - rarely queried in PromQL
- `service_instance_id`, `pod_uid`, `container_id` - ultra-high cardinality
- `git_commit`, `build_date` - static labels that inflate series for no query value
---curl -s -X POST
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/rules"
-d '{ "rules": [ { "metric_name": "go_.*", "match_type": "MATCH_TYPE_REGEX", "drop_labels": ["go_version", "version", "service_instance_id"], "aggregations": [{"type": "AGGREGATION_TYPE_SUM"}] } ] }'
-H "Authorization: Bearer <API_KEY>"
-H "Content-Type: application/json"
"https://adaptive-metrics.grafana.net/api/v1/rules"
-d '{ "rules": [ { "metric_name": "go_.*", "match_type": "MATCH_TYPE_REGEX", "drop_labels": ["go_version", "version", "service_instance_id"], "aggregations": [{"type": "AGGREGATION_TYPE_SUM"}] } ] }'
**可全局安全删除的常见标签族:**
- `version`, `app_version`, `go_version` - 在PromQL中很少被查询
- `service_instance_id`, `pod_uid`, `container_id` - 超高基数
- `git_commit`, `build_date` - 静态标签,会无意义地增加序列数量
---Step 6: Identify unused metrics
步骤6:识别未使用的指标
Unused metrics (never queried in any dashboard, alert, or recording rule) can be dropped entirely.
In the UI: Adaptive Metrics > Usage analysis > "Unused metrics" tab
Via the API:
bash
curl -s -H "Authorization: Bearer <API_KEY>" \
"https://adaptive-metrics.grafana.net/api/v1/usage-analysis?filter=unused" | \
jq '.metrics[] | {metric_name, series_count, last_queried}'Before dropping a metric entirely:
- Confirm it is not used in any Grafana dashboard (search by metric name in dashboard JSON)
- Confirm it is not used in any Prometheus/Mimir alert rule or recording rule
- Check with the team that owns the service if the metric is part of an SLO
Drop unused metrics via remote_write filtering in Alloy:
alloy
prometheus.remote_write "grafana_cloud" {
endpoint {
url = "https://prometheus-prod-XX.grafana.net/api/prom/push"
write_relabel_config {
source_labels = ["__name__"]
regex = "unused_metric_name|another_unused_metric"
action = "drop"
}
}
}未使用的指标(从未在任何仪表盘、告警或记录规则中被查询)可以完全删除。
在UI中: Adaptive Metrics > 使用分析 > “未使用指标”标签页
通过API操作:
bash
curl -s -H "Authorization: Bearer <API_KEY>" \
"https://adaptive-metrics.grafana.net/api/v1/usage-analysis?filter=unused" | \
jq '.metrics[] | {metric_name, series_count, last_queried}'完全删除指标前:
- 确认它未在任何Grafana仪表盘中使用(在仪表盘JSON中按指标名称搜索)
- 确认它未在任何Prometheus/Mimir告警规则或记录规则中使用
- 与拥有该服务的团队确认该指标是否属于SLO的一部分
通过Alloy中的remote_write过滤删除未使用指标:
alloy
prometheus.remote_write "grafana_cloud" {
endpoint {
url = "https://prometheus-prod-XX.grafana.net/api/prom/push"
write_relabel_config {
source_labels = ["__name__"]
regex = "unused_metric_name|another_unused_metric"
action = "drop"
}
}
}Step 7: Adaptive Logs (companion product)
步骤7:Adaptive Logs(配套产品)
For log volume reduction, Adaptive Logs works the same way for Loki:
bash
undefined对于日志量减少,Adaptive Logs针对Loki的工作方式相同:
bash
undefinedCheck log volume recommendations
查看日志量建议
curl -s -H "Authorization: Bearer <API_KEY>"
"https://adaptive-logs.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {stream_selector, estimated_reduction_percent}'
"https://adaptive-logs.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {stream_selector, estimated_reduction_percent}'
Log pattern: drops low-value log streams (e.g. debug logs from non-critical services) during
high-volume periods or permanently.
---curl -s -H "Authorization: Bearer <API_KEY>"
"https://adaptive-logs.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {stream_selector, estimated_reduction_percent}'
"https://adaptive-logs.grafana.net/api/v1/recommendations" |
jq '.recommendations[] | {stream_selector, estimated_reduction_percent}'
日志模式:在高流量时段或永久删除低价值日志流(例如,非关键服务的调试日志)。
---Step 8: Measure the impact
步骤8:衡量影响
After applying rules, monitor the effect over 24-48 hours:
promql
undefined应用规则后,在24-48小时内监控效果:
promql
undefinedActive Series count over time (visible in Grafana Cloud Metrics Usage dashboard)
随时间变化的Active Series数量(在Grafana Cloud指标使用仪表盘中可见)
grafanacloud_instance_active_series
grafanacloud_instance_active_series
Series reduction from adaptive metrics
Adaptive Metrics减少的序列数
grafanacloud_instance_active_series_dropped_by_aggregation_rules
In Grafana Cloud: **Home > Usage > Metrics** shows before/after series counts and the billing
impact of active rules.
**Expected timeline:**
- Rules take effect within ~5 minutes of creation
- Full billing impact visible after the next billing cycle (usually within 1 hour)
- The original high-cardinality metric continues to be ingested but doesn't count toward billing
for the labels that were dropped
---grafanacloud_instance_active_series_dropped_by_aggregation_rules
在Grafana Cloud中:**主页 > 使用情况 > 指标**显示规则应用前后的序列数以及生效规则对账单的影响。
**预期时间线:**
- 规则创建后约5分钟内生效
- 完整的账单影响在下次计费周期后可见(通常1小时内)
- 原始高基数指标仍会被摄入,但被删除标签对应的部分不会计入账单
---