database-observability

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Grafana Cloud Database Observability

Grafana Cloud 数据库可观测性

Provides query-level insights (RED metrics, query samples, explain plans) for MySQL and PostgreSQL without application code changes. Generally Available as of April 2026.
无需修改应用代码即可提供针对 MySQL 和 PostgreSQL 的查询级洞察(RED 指标、查询样本、执行计划)。自 2026 年 4 月起正式可用。

Supported Databases

支持的数据库

DatabaseVariants
MySQLSelf-managed, RDS MySQL, Aurora MySQL, Cloud SQL MySQL, Azure Database for MySQL
PostgreSQLSelf-managed, RDS PostgreSQL, Aurora PostgreSQL, Cloud SQL PostgreSQL, Azure Database for PostgreSQL
数据库变体类型
MySQL自托管、RDS MySQL、Aurora MySQL、Cloud SQL MySQL、Azure Database for MySQL
PostgreSQL自托管、RDS PostgreSQL、Aurora PostgreSQL、Cloud SQL PostgreSQL、Azure Database for PostgreSQL

Prerequisites

前提条件

PostgreSQL

PostgreSQL

sql
-- 1. Enable pg_stat_statements in postgresql.conf
-- shared_preload_libraries = 'pg_stat_statements'
-- Then restart PostgreSQL

-- 2. Create monitoring user
CREATE USER grafana_monitoring WITH PASSWORD 'secret';
GRANT pg_monitor TO grafana_monitoring;
GRANT CONNECT ON DATABASE mydb TO grafana_monitoring;

-- 3. Enable the extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
sql
-- 1. 在postgresql.conf中启用pg_stat_statements
-- shared_preload_libraries = 'pg_stat_statements'
-- 然后重启PostgreSQL

-- 2. 创建监控用户
CREATE USER grafana_monitoring WITH PASSWORD 'secret';
GRANT pg_monitor TO grafana_monitoring;
GRANT CONNECT ON DATABASE mydb TO grafana_monitoring;

-- 3. 启用扩展
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

MySQL

MySQL

sql
-- Create monitoring user with least-privilege permissions
CREATE USER 'grafana_monitoring'@'%' IDENTIFIED BY 'secret';
GRANT SELECT, PROCESS, REPLICATION CLIENT ON *.* TO 'grafana_monitoring'@'%';
GRANT SELECT ON performance_schema.* TO 'grafana_monitoring'@'%';
FLUSH PRIVILEGES;
sql
-- 创建具有最小权限的监控用户
CREATE USER 'grafana_monitoring'@'%' IDENTIFIED BY 'secret';
GRANT SELECT, PROCESS, REPLICATION CLIENT ON *.* TO 'grafana_monitoring'@'%';
GRANT SELECT ON performance_schema.* TO 'grafana_monitoring'@'%';
FLUSH PRIVILEGES;

Alloy Configuration

Alloy 配置

PostgreSQL

PostgreSQL

alloy
database_observability.postgres "mydb" {
  data_source_name = "postgresql://grafana_monitoring:secret@localhost:5432/mydb?sslmode=disable"

  enable_collectors = ["pg_stat_statements", "query_samples", "schema_details"]

  forward_metrics_to = [prometheus.remote_write.cloud.receiver]
  forward_logs_to    = [loki.write.cloud.receiver]
}

prometheus.remote_write "cloud" {
  endpoint {
    url = sys.env("PROMETHEUS_URL")
    basic_auth {
      username = sys.env("PROMETHEUS_USER")
      password = sys.env("GRAFANA_CLOUD_API_KEY")
    }
  }
}

loki.write "cloud" {
  endpoint {
    url = sys.env("LOKI_URL")
    basic_auth {
      username = sys.env("LOKI_USER")
      password = sys.env("GRAFANA_CLOUD_API_KEY")
    }
  }
}
alloy
database_observability.postgres "mydb" {
  data_source_name = "postgresql://grafana_monitoring:secret@localhost:5432/mydb?sslmode=disable"

  enable_collectors = ["pg_stat_statements", "query_samples", "schema_details"]

  forward_metrics_to = [prometheus.remote_write.cloud.receiver]
  forward_logs_to    = [loki.write.cloud.receiver]
}

prometheus.remote_write "cloud" {
  endpoint {
    url = sys.env("PROMETHEUS_URL")
    basic_auth {
      username = sys.env("PROMETHEUS_USER")
      password = sys.env("GRAFANA_CLOUD_API_KEY")
    }
  }
}

loki.write "cloud" {
  endpoint {
    url = sys.env("LOKI_URL")
    basic_auth {
      username = sys.env("LOKI_USER")
      password = sys.env("GRAFANA_CLOUD_API_KEY")
    }
  }
}

MySQL

MySQL

alloy
database_observability.mysql "mydb" {
  data_source_name = "grafana_monitoring:secret@tcp(localhost:3306)/mydb"

  enable_collectors = ["query_samples", "explain_plans", "schema_details"]

  forward_metrics_to = [prometheus.remote_write.cloud.receiver]
  forward_logs_to    = [loki.write.cloud.receiver]
}
alloy
database_observability.mysql "mydb" {
  data_source_name = "grafana_monitoring:secret@tcp(localhost:3306)/mydb"

  enable_collectors = ["query_samples", "explain_plans", "schema_details"]

  forward_metrics_to = [prometheus.remote_write.cloud.receiver]
  forward_logs_to    = [loki.write.cloud.receiver]
}

Key Metrics

关键指标

promql
undefined
promql
undefined

Query rate by database

按数据库统计的查询速率

rate(db_query_total{db_instance="mydb"}[5m])
rate(db_query_total{db_instance="mydb"}[5m])

P95 query latency

P95 查询延迟

histogram_quantile(0.95, rate(db_query_duration_seconds_bucket[5m]))
histogram_quantile(0.95, rate(db_query_duration_seconds_bucket[5m]))

Error rate

错误率

rate(db_query_errors_total[5m]) / rate(db_query_total[5m])
rate(db_query_errors_total[5m]) / rate(db_query_total[5m])

Slow queries (over 1 second)

慢查询(超过1秒)

count(db_query_duration_seconds > 1) by (db_query_digest)
count(db_query_duration_seconds > 1) by (db_query_digest)

Active connections

活跃连接数

db_connections_active{db_instance="mydb"}
undefined
db_connections_active{db_instance="mydb"}
undefined

What You Get

您将获得的内容

Query Performance Dashboard (auto-provisioned):
  • Top queries by total time, call count, mean latency
  • Query samples with actual parameters and timing
  • Visual explain plans showing index usage, scan types, costs
  • RED metrics per query digest
Correlation with APM:
  • Link slow DB queries to the application traces that triggered them
  • db.statement
    ,
    db.system
    ,
    db.name
    OTel attributes connect DB spans to query samples
  • Drill from service latency spike → specific slow SQL query → explain plan
查询性能仪表板(自动配置):
  • 按总耗时、调用次数、平均延迟排序的Top查询
  • 包含实际参数和计时信息的查询样本
  • 展示索引使用情况、扫描类型、成本的可视化执行计划
  • 按查询摘要统计的RED指标
与APM的关联
  • 将慢数据库查询与触发它们的应用追踪关联起来
  • db.statement
    ,
    db.system
    ,
    db.name
    这些OTel属性将数据库跨度与查询样本关联
  • 从服务延迟峰值 → 特定慢SQL查询 → 执行计划进行深入排查

Alert Rules

告警规则

yaml
groups:
  - name: database-observability
    rules:
      - alert: SlowQueryDetected
        expr: histogram_quantile(0.95, rate(db_query_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P95 query latency > 1s on {{ $labels.db_instance }}"

      - alert: HighDBErrorRate
        expr: rate(db_query_errors_total[5m]) / rate(db_query_total[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "DB error rate > 5% on {{ $labels.db_instance }}"

      - alert: TooManyConnections
        expr: db_connections_active / db_connections_max > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "DB connection pool >80% on {{ $labels.db_instance }}"
yaml
groups:
  - name: database-observability
    rules:
      - alert: SlowQueryDetected
        expr: histogram_quantile(0.95, rate(db_query_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P95 query latency > 1s on {{ $labels.db_instance }}"

      - alert: HighDBErrorRate
        expr: rate(db_query_errors_total[5m]) / rate(db_query_total[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "DB error rate > 5% on {{ $labels.db_instance }}"

      - alert: TooManyConnections
        expr: db_connections_active / db_connections_max > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "DB connection pool >80% on {{ $labels.db_instance }}"

Setup Checklist

设置清单

  1. Enable
    pg_stat_statements
    (PostgreSQL) or Performance Schema (MySQL)
  2. Create least-privilege monitoring user
  3. Add
    database_observability.*
    block to Alloy config
  4. Verify metrics appear in Grafana Cloud → Database Observability
  5. Set up alerting on slow queries and error rates
  6. Enable trace correlation by ensuring app uses
    db.statement
    span attributes
  1. 启用
    pg_stat_statements
    (PostgreSQL)或 Performance Schema(MySQL)
  2. 创建具有最小权限的监控用户
  3. 在Alloy配置中添加
    database_observability.*
  4. 在Grafana Cloud → 数据库可观测性中验证指标是否显示
  5. 设置慢查询和错误率的告警规则
  6. 通过确保应用使用
    db.statement
    跨度属性来启用追踪关联