cost-management

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Grafana Cloud Cost Management

Grafana Cloud成本管理

Docs: https://grafana.com/docs/grafana-cloud/cost-management-and-billing/

文档: https://grafana.com/docs/grafana-cloud/cost-management-and-billing/

Cost Management & Billing Application

成本管理与账单应用

Access: My Account → Cost Management (or within your Grafana Cloud stack)

FOCUS-compliant (FinOps Open Cost and Usage Specification) billing dashboards showing:

Spending by signal type (metrics, logs, traces, profiles)
Month-over-month trends
Usage vs. quota tracking
Invoice download

访问路径：我的账户 → 成本管理（或在你的Grafana Cloud实例内）

符合FOCUS（FinOps开放成本与使用规范）标准的账单仪表盘，展示：

按信号类型（指标、日志、链路追踪、性能剖析）划分的支出情况
月度同比趋势
使用量与配额追踪
发票下载

Cost Attribution by Label

基于标签的成本归属

Tag your telemetry at ingestion to enable per-team cost reporting:

alloy

// Add cost attribution labels in Alloy
prometheus.remote_write "cloud" {
  endpoint {
    url = sys.env("PROMETHEUS_URL")
    basic_auth {
      username = sys.env("PROM_USER")
      password = sys.env("GRAFANA_CLOUD_API_KEY")
    }
  }
  external_labels = {
    team    = "platform",
    project = "checkout-service",
    env     = "production",
  }
}

loki.write "cloud" {
  endpoint {
    url = sys.env("LOKI_URL")
    basic_auth {
      username = sys.env("LOKI_USER")
      password = sys.env("GRAFANA_CLOUD_API_KEY")
    }
  }
  external_labels = {
    team    = "platform",
    project = "checkout-service",
  }
}

在摄入遥测数据时添加标签，以支持按团队进行成本报告：

alloy

// 在Alloy中添加成本归属标签
prometheus.remote_write "cloud" {
  endpoint {
    url = sys.env("PROMETHEUS_URL")
    basic_auth {
      username = sys.env("PROM_USER")
      password = sys.env("GRAFANA_CLOUD_API_KEY")
    }
  }
  external_labels = {
    team    = "platform",
    project = "checkout-service",
    env     = "production",
  }
}

loki.write "cloud" {
  endpoint {
    url = sys.env("LOKI_URL")
    basic_auth {
      username = sys.env("LOKI_USER")
      password = sys.env("GRAFANA_CLOUD_API_KEY")
    }
  }
  external_labels = {
    team    = "platform",
    project = "checkout-service",
  }
}

Usage Alerts

使用量告警

Set alerts before you hit quota or budget thresholds:

yaml

undefined

在达到配额或预算阈值前设置告警：

yaml

undefined

Alert when approaching metrics quota

当指标配额即将耗尽时触发告警

groups:

name: grafana-cloud-usage rules:
- alert: MetricsUsageHigh expr: grafana_cloud_metrics_active_series / grafana_cloud_metrics_limit > 0.8 for: 1h labels: severity: warning annotations: summary: "Grafana Cloud metrics usage >80% of quota"
- alert: LogsIngestionHigh expr: increase(grafana_cloud_logs_bytes_ingested_total[24h]) > 50e9 # 50GB/day labels: severity: warning annotations: summary: "Grafana Cloud log ingestion >50GB today"

undefined

groups:

name: grafana-cloud-usage rules:
- alert: MetricsUsageHigh expr: grafana_cloud_metrics_active_series / grafana_cloud_metrics_limit > 0.8 for: 1h labels: severity: warning annotations: summary: "Grafana Cloud指标使用量已超过配额的80%"
- alert: LogsIngestionHigh expr: increase(grafana_cloud_logs_bytes_ingested_total[24h]) > 50e9 # 每日50GB labels: severity: warning annotations: summary: "今日Grafana Cloud日志摄入已超过50GB"

undefined

Adaptive Metrics (Reduce Cardinality)

自适应指标（缩减基数）

Automatically identifies unused or high-cardinality metrics and generates aggregation rules.

bash

undefined

自动识别未使用或高基数指标，并生成聚合规则。

bash

undefined

View recommendations


```yaml

curl https://yourstack.grafana.net/api/plugins/grafana-adaptive-metrics-app/resources/v1/recommendations
-H "Authorization: Bearer <token>"


```yaml

Apply aggregation rule — drops high-cardinality labels from a metric

应用聚合规则——从指标中移除高基数标签

match: "^http_request_duration_seconds.*" action: keep match_labels:
- method
- status_code
- service
Drops: pod, container, instance, node — reduces series from 10k → 50


**Workflow:**
1. Go to **Grafana Cloud → Adaptive Metrics**
2. Review recommended aggregation rules (sorted by series reduction impact)
3. Test rules in "Preview" mode before applying
4. Apply rules — takes effect within 5 minutes

match: "^http_request_duration_seconds.*" action: keep match_labels:
- method
- status_code
- service
移除：pod、container、instance、node——将序列数从10k减少到50


**工作流程：**
1. 进入 **Grafana Cloud → 自适应指标**
2. 查看推荐的聚合规则（按序列缩减影响排序）
3. 在“预览”模式下测试规则后再应用
4. 应用规则——5分钟内生效

Adaptive Logs (Reduce Log Volume)

自适应日志（缩减日志量）

Drop or sample log lines before ingestion using Loki's pipeline stages in Alloy:

alloy

loki.process "filter_logs" {
  forward_to = [loki.write.cloud.receiver]

  // Drop health check logs (high volume, low value)
  stage.drop {
    expression = ".*GET /health.*"
  }

  // Drop debug logs in production
  stage.drop {
    source     = "level"
    expression = "debug"
  }

  // Sample verbose info logs (keep 10%)
  stage.sampling {
    rate = 0.1
    source = "level"
    value  = "info"
  }
}

在摄入前使用Alloy中的Loki流水线阶段丢弃或采样日志行：

alloy

loki.process "filter_logs" {
  forward_to = [loki.write.cloud.receiver]

  // 丢弃健康检查日志（数量大、价值低）
  stage.drop {
    expression = ".*GET /health.*"
  }

  // 在生产环境中丢弃调试日志
  stage.drop {
    source     = "level"
    expression = "debug"
  }

  // 对 verbose 的 info 日志进行采样（保留10%）
  stage.sampling {
    rate = 0.1
    source = "level"
    value  = "info"
  }
}

Adaptive Traces (Reduce Trace Volume)

自适应链路追踪（缩减追踪量）

Use Alloy tail-based sampling to keep only important traces:

alloy

otelcol.processor.tail_sampling "cost_control" {
  decision_wait = "10s"
  policy {
    name = "keep-errors"
    type = "status_code"
    status_code { status_codes = ["ERROR"] }
  }
  policy {
    name = "keep-slow"
    type = "latency"
    latency { threshold_ms = 1000 }
  }
  policy {
    name = "sample-rest"
    type = "probabilistic"
    probabilistic { sampling_percentage = 5 }
  }
  output {
    traces = [otelcol.exporter.otlp.cloud.input]
  }
}

使用Alloy的尾部采样仅保留重要的链路追踪数据：

alloy

otelcol.processor.tail_sampling "cost_control" {
  decision_wait = "10s"
  policy {
    name = "keep-errors"
    type = "status_code"
    status_code { status_codes = ["ERROR"] }
  }
  policy {
    name = "keep-slow"
    type = "latency"
    latency { threshold_ms = 1000 }
  }
  policy {
    name = "sample-rest"
    type = "probabilistic"
    probabilistic { sampling_percentage = 5 }
  }
  output {
    traces = [otelcol.exporter.otlp.cloud.input]
  }
}

Key Metrics for Cost Monitoring

成本监控关键指标

promql

undefined

promql

undefined

Active metric series (billed unit for metrics)

活跃指标序列（指标的计费单位）

grafana_cloud_metrics_active_series

Series by label (find high-cardinality sources)

按标签统计的序列数（查找高基数来源）

topk(20, count by (name) ({name=~".+"}))

Log bytes ingested per stream

每个流的日志摄入字节数

sum(increase(loki_ingester_chunk_size_bytes_sum[24h])) by (namespace, app)

Trace spans ingested

摄入的链路追踪跨度

rate(tempo_distributor_spans_received_total[5m])

undefined

rate(tempo_distributor_spans_received_total[5m])

undefined

Optimization Checklist

优化检查清单

Run Adaptive Metrics recommendations — typically reduces series 40-60%
Drop health/readiness probe logs in Alloy pipeline
Set sampling rate for traces (5-10% is typical for most workloads)
Review top-N high-cardinality metrics:
```
topk(20, count by (__name__))
```
Add cost attribution labels (
```
team
```
,
```
project
```
) to all Alloy configs
Set usage alerts at 80% of quota
Review and clean up unused dashboards and data sources (they don't reduce cost but indicate stale collection)
Use recording rules to pre-aggregate expensive PromQL queries

运行自适应指标推荐规则——通常可减少40-60%的序列数
在Alloy流水线中丢弃健康/就绪探测日志
设置链路追踪采样率（大多数工作负载的典型值为5-10%）
查看Top-N高基数指标：
```
topk(20, count by (__name__))
```
为所有Alloy配置添加成本归属标签（
```
team
```
、
```
project
```
）
在配额达到80%时设置使用量告警
检查并清理未使用的仪表盘和数据源（虽不会降低成本，但表明存在过时的采集任务）
使用记录规则预聚合开销大的PromQL查询

Understanding Grafana Cloud Pricing

Grafana Cloud定价说明

Signal	Billing Unit
Metrics	Active series (unique label combinations)
Logs	Bytes ingested
Traces	Spans ingested
Profiles	Bytes ingested
Synthetic Monitoring	Check executions
k6	VUh (Virtual User hours)

信号类型	计费单位
指标	活跃序列（唯一标签组合）
日志	摄入字节数
链路追踪	摄入的跨度数量
性能剖析	摄入字节数
合成监控	检查执行次数
k6	VUh（虚拟用户小时数）

cost-management

Original

Translation

Grafana Cloud Cost Management

Grafana Cloud成本管理

Cost Management & Billing Application

成本管理与账单应用

Cost Attribution by Label

基于标签的成本归属

Usage Alerts

使用量告警

Alert when approaching metrics quota

当指标配额即将耗尽时触发告警

Adaptive Metrics (Reduce Cardinality)

自适应指标（缩减基数）

View recommendations

查看推荐规则

Apply aggregation rule — drops high-cardinality labels from a metric

应用聚合规则——从指标中移除高基数标签

Drops: pod, container, instance, node — reduces series from 10k → 50

移除：pod、container、instance、node——将序列数从10k减少到50

Adaptive Logs (Reduce Log Volume)

自适应日志（缩减日志量）

Adaptive Traces (Reduce Trace Volume)

自适应链路追踪（缩减追踪量）

Key Metrics for Cost Monitoring

成本监控关键指标

Active metric series (billed unit for metrics)

活跃指标序列（指标的计费单位）

Series by label (find high-cardinality sources)

按标签统计的序列数（查找高基数来源）

Log bytes ingested per stream

每个流的日志摄入字节数

Trace spans ingested

摄入的链路追踪跨度

Optimization Checklist

优化检查清单

Understanding Grafana Cloud Pricing

Grafana Cloud定价说明