prometheus
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMetrics with Prometheus and Grafana
Prometheus与Grafana指标管理
Value Proposition
价值主张
Prometheus is an open-source monitoring and alerting toolkit for cloud-native environments. Combined with
Grafana Cloud Metrics (powered by Grafana Mimir), it provides a fully managed Prometheus-compatible service
with long-term storage, global query performance, and enterprise scalability.
Key Differentiators: Pull-based model, dimensional data model with labels, PromQL, automatic service
discovery, scales to billions of active series.
Prometheus是面向云原生环境的开源监控与告警工具包。结合Grafana Cloud Metrics(由Grafana Mimir提供支持),它可提供完全托管的Prometheus兼容服务,具备长期存储、全局查询性能及企业级可扩展性。
核心优势:基于拉取的模型、带标签的维度数据模型、PromQL、自动服务发现、可扩展至数十亿活跃时间序列。
PromQL Quick Reference
PromQL速查指南
Instant Vector Selectors
瞬时向量选择器
promql
undefinedpromql
undefinedBy metric name
By metric name
http_requests_total
http_requests_total
Label filter
Label filter
http_requests_total{job="api-server"}
http_requests_total{job="api-server"}
Multiple labels (AND)
Multiple labels (AND)
http_requests_total{job="api-server", method="GET"}
http_requests_total{job="api-server", method="GET"}
Regex
Regex
http_requests_total{job="api.*", status="5.."}
http_requests_total{job="api.*", status="5.."}
Negative
Negative
http_requests_total{status!="200"}
undefinedhttp_requests_total{status!="200"}
undefinedRange Vectors & Rates
范围向量与速率
promql
undefinedpromql
undefinedPer-second rate over 5 minutes
Per-second rate over 5 minutes
rate(http_requests_total[5m])
rate(http_requests_total[5m])
Increase over interval
Increase over interval
increase(http_requests_total[1h])
increase(http_requests_total[1h])
Instant rate (last two samples)
Instant rate (last two samples)
irate(http_requests_total[5m])
irate(http_requests_total[5m])
Offset (5 minutes ago)
Offset (5 minutes ago)
rate(http_requests_total[5m] offset 5m)
undefinedrate(http_requests_total[5m] offset 5m)
undefinedAggregations
聚合操作
promql
undefinedpromql
undefinedSum by label
Sum by label
sum by (job) (rate(http_requests_total[5m]))
sum by (job) (rate(http_requests_total[5m]))
Average
Average
avg by (instance) (node_cpu_seconds_total)
avg by (instance) (node_cpu_seconds_total)
Top-K
Top-K
topk(5, rate(http_requests_total[5m]))
topk(5, rate(http_requests_total[5m]))
Histogram quantiles
Histogram quantiles
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
Count distinct
Count distinct
count(up{job="api"})
undefinedcount(up{job="api"})
undefinedCommon Patterns
常用模式
promql
undefinedpromql
undefinedError rate percentage
Error rate percentage
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) * 100
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) * 100
Saturation (CPU usage %)
Saturation (CPU usage %)
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Memory usage
Memory usage
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
Predict disk full (linear extrapolation)
Predict disk full (linear extrapolation)
predict_linear(node_filesystem_free_bytes[6h], 24*3600) < 0
undefinedpredict_linear(node_filesystem_free_bytes[6h], 24*3600) < 0
undefinedMetrics Drilldown
指标钻取
Queryless Prometheus metrics exploration (preinstalled in Grafana 12+):
- Browse metrics without writing PromQL
- Smart segmentation and anomaly detection
- Auto-visualization with optimal chart types
- Metric relationship discovery
- Telemetry pivoting (metrics to logs)
无需编写查询的Prometheus指标探索功能(Grafana 12+预装):
- 无需编写PromQL即可浏览指标
- 智能分段与异常检测
- 自动选择最优图表类型进行可视化
- 指标关系发现
- 遥测联动(从指标跳转至日志)
Alerting
告警
Prometheus Alertmanager
Prometheus Alertmanager
Route, group, silence, and deduplicate alerts. Multi-destination routing (PagerDuty, Slack, Email, webhooks).
对告警进行路由、分组、静默与去重。支持多目标路由(PagerDuty、Slack、邮件、Webhook)。
Grafana Alerting
Grafana告警
Unified alerting across all data sources. Supports multi-dimensional alerts, notification policies,
and contact points.
跨所有数据源的统一告警功能。支持多维告警、通知策略及联系点配置。
Recording Rules
记录规则
Pre-compute expensive PromQL queries for dashboard performance:
yaml
groups:
- name: api_rules
rules:
- record: job:http_requests:rate5m
expr: sum by (job) (rate(http_requests_total[5m]))预计算开销较大的PromQL查询,提升仪表盘性能:
yaml
groups:
- name: api_rules
rules:
- record: job:http_requests:rate5m
expr: sum by (job) (rate(http_requests_total[5m]))Architecture
架构
- Pull-based scraping: Prometheus scrapes HTTP endpoints at configured intervals
- Service discovery: Automatic target discovery for K8s, EC2, Consul
- Push gateway: For short-lived jobs that can't be scraped
- Remote write/read: Send metrics to Grafana Cloud, Thanos, Mimir
- Local storage: Efficient on-disk time-series database
- 基于拉取的采集:Prometheus按配置间隔从HTTP端点采集指标
- 服务发现:自动发现K8s、EC2、Consul等目标
- 推送网关:用于无法被采集的短生命周期任务
- 远程读写:将指标发送至Grafana Cloud、Thanos、Mimir
- 本地存储:高效的磁盘时序数据库