prometheus

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Metrics with Prometheus and Grafana

Prometheus与Grafana指标管理

Value Proposition

价值主张

Prometheus is an open-source monitoring and alerting toolkit for cloud-native environments. Combined with Grafana Cloud Metrics (powered by Grafana Mimir), it provides a fully managed Prometheus-compatible service with long-term storage, global query performance, and enterprise scalability.
Key Differentiators: Pull-based model, dimensional data model with labels, PromQL, automatic service discovery, scales to billions of active series.
Prometheus是面向云原生环境的开源监控与告警工具包。结合Grafana Cloud Metrics(由Grafana Mimir提供支持),它可提供完全托管的Prometheus兼容服务,具备长期存储、全局查询性能及企业级可扩展性。
核心优势:基于拉取的模型、带标签的维度数据模型、PromQL、自动服务发现、可扩展至数十亿活跃时间序列。

PromQL Quick Reference

PromQL速查指南

Instant Vector Selectors

瞬时向量选择器

promql
undefined
promql
undefined

By metric name

By metric name

http_requests_total
http_requests_total

Label filter

Label filter

http_requests_total{job="api-server"}
http_requests_total{job="api-server"}

Multiple labels (AND)

Multiple labels (AND)

http_requests_total{job="api-server", method="GET"}
http_requests_total{job="api-server", method="GET"}

Regex

Regex

http_requests_total{job="api.*", status="5.."}
http_requests_total{job="api.*", status="5.."}

Negative

Negative

http_requests_total{status!="200"}
undefined
http_requests_total{status!="200"}
undefined

Range Vectors & Rates

范围向量与速率

promql
undefined
promql
undefined

Per-second rate over 5 minutes

Per-second rate over 5 minutes

rate(http_requests_total[5m])
rate(http_requests_total[5m])

Increase over interval

Increase over interval

increase(http_requests_total[1h])
increase(http_requests_total[1h])

Instant rate (last two samples)

Instant rate (last two samples)

irate(http_requests_total[5m])
irate(http_requests_total[5m])

Offset (5 minutes ago)

Offset (5 minutes ago)

rate(http_requests_total[5m] offset 5m)
undefined
rate(http_requests_total[5m] offset 5m)
undefined

Aggregations

聚合操作

promql
undefined
promql
undefined

Sum by label

Sum by label

sum by (job) (rate(http_requests_total[5m]))
sum by (job) (rate(http_requests_total[5m]))

Average

Average

avg by (instance) (node_cpu_seconds_total)
avg by (instance) (node_cpu_seconds_total)

Top-K

Top-K

topk(5, rate(http_requests_total[5m]))
topk(5, rate(http_requests_total[5m]))

Histogram quantiles

Histogram quantiles

histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

Count distinct

Count distinct

count(up{job="api"})
undefined
count(up{job="api"})
undefined

Common Patterns

常用模式

promql
undefined
promql
undefined

Error rate percentage

Error rate percentage

sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100

Saturation (CPU usage %)

Saturation (CPU usage %)

100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory usage

Memory usage

node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes

Predict disk full (linear extrapolation)

Predict disk full (linear extrapolation)

predict_linear(node_filesystem_free_bytes[6h], 24*3600) < 0
undefined
predict_linear(node_filesystem_free_bytes[6h], 24*3600) < 0
undefined

Metrics Drilldown

指标钻取

Queryless Prometheus metrics exploration (preinstalled in Grafana 12+):
  • Browse metrics without writing PromQL
  • Smart segmentation and anomaly detection
  • Auto-visualization with optimal chart types
  • Metric relationship discovery
  • Telemetry pivoting (metrics to logs)
无需编写查询的Prometheus指标探索功能(Grafana 12+预装):
  • 无需编写PromQL即可浏览指标
  • 智能分段与异常检测
  • 自动选择最优图表类型进行可视化
  • 指标关系发现
  • 遥测联动(从指标跳转至日志)

Alerting

告警

Prometheus Alertmanager

Prometheus Alertmanager

Route, group, silence, and deduplicate alerts. Multi-destination routing (PagerDuty, Slack, Email, webhooks).
对告警进行路由、分组、静默与去重。支持多目标路由(PagerDuty、Slack、邮件、Webhook)。

Grafana Alerting

Grafana告警

Unified alerting across all data sources. Supports multi-dimensional alerts, notification policies, and contact points.
跨所有数据源的统一告警功能。支持多维告警、通知策略及联系点配置。

Recording Rules

记录规则

Pre-compute expensive PromQL queries for dashboard performance:
yaml
groups:
  - name: api_rules
    rules:
      - record: job:http_requests:rate5m
        expr: sum by (job) (rate(http_requests_total[5m]))
预计算开销较大的PromQL查询,提升仪表盘性能:
yaml
groups:
  - name: api_rules
    rules:
      - record: job:http_requests:rate5m
        expr: sum by (job) (rate(http_requests_total[5m]))

Architecture

架构

  • Pull-based scraping: Prometheus scrapes HTTP endpoints at configured intervals
  • Service discovery: Automatic target discovery for K8s, EC2, Consul
  • Push gateway: For short-lived jobs that can't be scraped
  • Remote write/read: Send metrics to Grafana Cloud, Thanos, Mimir
  • Local storage: Efficient on-disk time-series database
  • 基于拉取的采集:Prometheus按配置间隔从HTTP端点采集指标
  • 服务发现:自动发现K8s、EC2、Consul等目标
  • 推送网关:用于无法被采集的短生命周期任务
  • 远程读写:将指标发送至Grafana Cloud、Thanos、Mimir
  • 本地存储:高效的磁盘时序数据库

Resources

资源