grafana-dashboard

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Grafana Dashboard

Overview

概述

Design and implement comprehensive Grafana dashboards with multiple visualization types, variables, and drill-down capabilities for operational monitoring.

设计并实现包含多种可视化类型、变量和下钻功能的全面Grafana仪表盘，用于业务运维监控。

When to Use

适用场景

Creating monitoring dashboards
Building operational insights
Visualizing time-series data
Creating drill-down dashboards
Sharing metrics with stakeholders

构建监控仪表盘
搭建业务运维洞察体系
可视化时间序列数据
创建可下钻的仪表盘
向利益相关者共享指标数据

Instructions

操作指南

1. Grafana Dashboard JSON

json

{
  "dashboard": {
    "title": "Application Performance",
    "description": "Real-time application metrics",
    "tags": ["production", "performance"],
    "timezone": "UTC",
    "refresh": "30s",
    "templating": {
      "list": [
        {
          "name": "datasource",
          "type": "datasource",
          "datasource": "prometheus"
        },
        {
          "name": "service",
          "type": "query",
          "datasource": "prometheus",
          "query": "label_values(requests_total, service)"
        }
      ]
    },
    "panels": [
      {
        "id": 1,
        "title": "Request Rate",
        "type": "graph",
        "gridPos": {"x": 0, "y": 0, "w": 12, "h": 8},
        "targets": [
          {
            "expr": "sum(rate(requests_total{service=\"$service\"}[5m]))",
            "legendFormat": "{{ method }}"
          }
        ],
        "yaxes": [
          {
            "format": "rps",
            "label": "Requests per Second"
          }
        ]
      },
      {
        "id": 2,
        "title": "Error Rate",
        "type": "graph",
        "gridPos": {"x": 12, "y": 0, "w": 12, "h": 8},
        "targets": [
          {
            "expr": "sum(rate(requests_total{status_code=~\"5..\",service=\"$service\"}[5m])) / sum(rate(requests_total{service=\"$service\"}[5m]))",
            "legendFormat": "Error Rate"
          }
        ]
      },
      {
        "id": 3,
        "title": "Response Latency (p95)",
        "type": "graph",
        "gridPos": {"x": 0, "y": 8, "w": 12, "h": 8},
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(request_duration_seconds_bucket{service=\"$service\"}[5m]))",
            "legendFormat": "p95"
          }
        ]
      },
      {
        "id": 4,
        "title": "Active Connections",
        "type": "stat",
        "gridPos": {"x": 12, "y": 8, "w": 12, "h": 8},
        "targets": [
          {
            "expr": "sum(active_connections{service=\"$service\"})"
          }
        ]
      }
    ]
  }
}

json

{
  "dashboard": {
    "title": "Application Performance",
    "description": "Real-time application metrics",
    "tags": ["production", "performance"],
    "timezone": "UTC",
    "refresh": "30s",
    "templating": {
      "list": [
        {
          "name": "datasource",
          "type": "datasource",
          "datasource": "prometheus"
        },
        {
          "name": "service",
          "type": "query",
          "datasource": "prometheus",
          "query": "label_values(requests_total, service)"
        }
      ]
    },
    "panels": [
      {
        "id": 1,
        "title": "Request Rate",
        "type": "graph",
        "gridPos": {"x": 0, "y": 0, "w": 12, "h": 8},
        "targets": [
          {
            "expr": "sum(rate(requests_total{service=\"$service\"}[5m]))",
            "legendFormat": "{{ method }}"
          }
        ],
        "yaxes": [
          {
            "format": "rps",
            "label": "Requests per Second"
          }
        ]
      },
      {
        "id": 2,
        "title": "Error Rate",
        "type": "graph",
        "gridPos": {"x": 12, "y": 0, "w": 12, "h": 8},
        "targets": [
          {
            "expr": "sum(rate(requests_total{status_code=~\"5..\",service=\"$service\"}[5m])) / sum(rate(requests_total{service=\"$service\"}[5m]))",
            "legendFormat": "Error Rate"
          }
        ]
      },
      {
        "id": 3,
        "title": "Response Latency (p95)",
        "type": "graph",
        "gridPos": {"x": 0, "y": 8, "w": 12, "h": 8},
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(request_duration_seconds_bucket{service=\"$service\"}[5m]))",
            "legendFormat": "p95"
          }
        ]
      },
      {
        "id": 4,
        "title": "Active Connections",
        "type": "stat",
        "gridPos": {"x": 12, "y": 8, "w": 12, "h": 8},
        "targets": [
          {
            "expr": "sum(active_connections{service=\"$service\"})"
          }
        ]
      }
    ]
  }
}

2. Grafana Provisioning Configuration

yaml

undefined

yaml

undefined

/etc/grafana/provisioning/dashboards/dashboards.yaml

apiVersion: 1

providers:

name: 'Dashboards' orgId: 1 folder: 'Production' type: file disableDeletion: false updateIntervalSeconds: 10 options: path: /var/lib/grafana/dashboards


```yaml

apiVersion: 1

providers:

name: 'Dashboards' orgId: 1 folder: 'Production' type: file disableDeletion: false updateIntervalSeconds: 10 options: path: /var/lib/grafana/dashboards


```yaml

/etc/grafana/provisioning/datasources/prometheus.yaml

apiVersion: 1

datasources:

name: Prometheus type: prometheus access: proxy orgId: 1 url: http://prometheus:9090 isDefault: true editable: true jsonData: timeInterval: '30s'

undefined

apiVersion: 1

datasources:

name: Prometheus type: prometheus access: proxy orgId: 1 url: http://prometheus:9090 isDefault: true editable: true jsonData: timeInterval: '30s'

undefined

3. Grafana Alert Configuration

yaml

undefined

yaml

undefined

/etc/grafana/provisioning/alerting/alerts.yaml

groups:

name: application_alerts interval: 1m rules:
- uid: alert_high_error_rate title: High Error Rate condition: B data:
  - refId: A model: expr: 'sum(rate(requests_total{status_code=~"5.."}[5m]))'
  - refId: B conditions:
    - evaluator: params: [0.05] type: gt query: params: [A, 5m, now] for: 5m annotations: description: 'Error rate is {{ $values.A }}' labels: severity: critical team: platform

undefined

groups:

name: application_alerts interval: 1m rules:
- uid: alert_high_error_rate title: High Error Rate condition: B data:
  - refId: A model: expr: 'sum(rate(requests_total{status_code=~"5.."}[5m]))'
  - refId: B conditions:
    - evaluator: params: [0.05] type: gt query: params: [A, 5m, now] for: 5m annotations: description: 'Error rate is {{ $values.A }}' labels: severity: critical team: platform

undefined

4. Grafana API Client

javascript

// grafana-api-client.js
const axios = require('axios');

class GrafanaClient {
  constructor(baseUrl, apiKey) {
    this.baseUrl = baseUrl;
    this.client = axios.create({
      baseURL: baseUrl,
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json'
      }
    });
  }

  async createDashboard(dashboard) {
    const response = await this.client.post('/api/dashboards/db', {
      dashboard: dashboard,
      overwrite: true
    });
    return response.data;
  }

  async getDashboard(uid) {
    const response = await this.client.get(`/api/dashboards/uid/${uid}`);
    return response.data;
  }

  async createAlert(alert) {
    const response = await this.client.post('/api/alerts', alert);
    return response.data;
  }

  async listDashboards() {
    const response = await this.client.get('/api/search?query=');
    return response.data;
  }
}

module.exports = GrafanaClient;

javascript

// grafana-api-client.js
const axios = require('axios');

class GrafanaClient {
  constructor(baseUrl, apiKey) {
    this.baseUrl = baseUrl;
    this.client = axios.create({
      baseURL: baseUrl,
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json'
      }
    });
  }

  async createDashboard(dashboard) {
    const response = await this.client.post('/api/dashboards/db', {
      dashboard: dashboard,
      overwrite: true
    });
    return response.data;
  }

  async getDashboard(uid) {
    const response = await this.client.get(`/api/dashboards/uid/${uid}`);
    return response.data;
  }

  async createAlert(alert) {
    const response = await this.client.post('/api/alerts', alert);
    return response.data;
  }

  async listDashboards() {
    const response = await this.client.get('/api/search?query=');
    return response.data;
  }
}

module.exports = GrafanaClient;

5. Docker Compose Setup

yaml

version: '3.8'
services:
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD:-admin}
      GF_USERS_ALLOW_SIGN_UP: 'false'
      GF_SERVER_ROOT_URL: http://grafana.example.com
    volumes:
      - ./provisioning:/etc/grafana/provisioning
      - grafana_storage:/var/lib/grafana
    depends_on:
      - prometheus

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_storage:/prometheus

volumes:
  grafana_storage:
  prometheus_storage:

yaml

version: '3.8'
services:
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD:-admin}
      GF_USERS_ALLOW_SIGN_UP: 'false'
      GF_SERVER_ROOT_URL: http://grafana.example.com
    volumes:
      - ./provisioning:/etc/grafana/provisioning
      - grafana_storage:/var/lib/grafana
    depends_on:
      - prometheus

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_storage:/prometheus

volumes:
  grafana_storage:
  prometheus_storage:

Best Practices

最佳实践

✅ DO

✅ 建议

Use meaningful dashboard titles
Add documentation panels
Implement row-based organization
Use variables for flexibility
Set appropriate refresh intervals
Include runbook links in alerts
Test alerts before deploying
Use consistent color schemes
Version control dashboard JSON

使用有意义的仪表盘标题
添加文档面板
基于行进行布局组织
使用变量提升灵活性
设置合适的刷新间隔
在告警中包含运行手册链接
部署前测试告警
使用一致的配色方案
对仪表盘JSON进行版本控制

❌ DON'T

❌ 避免

Overload dashboards with too many panels
Mix different time ranges without justification
Create without runbooks
Ignore alert noise
Use inconsistent metric naming
Set refresh too frequently
Forget to configure datasources
Leave default passwords

在仪表盘中添加过多面板
无正当理由混合不同时间范围
未配置运行手册就创建告警
忽略告警噪音
使用不一致的指标命名
设置过于频繁的刷新间隔
忘记配置数据源
保留默认密码

Visualization Types

可视化类型

Graph: Time-series trends
Stat: Single value with thresholds
Gauge: Percentage or usage
Heatmap: Pattern detection
Bar Chart: Category comparison
Pie Chart: Composition

Graph（图表）：时间序列趋势
Stat（统计面板）：带阈值的单一数值
Gauge（仪表盘）：百分比或使用率
Heatmap（热力图）：模式检测
Bar Chart（柱状图）：类别对比
Pie Chart（饼图）：构成占比