postgresql-monitoring

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PostgreSQL Monitoring Skill

PostgreSQL监控Skill

Atomic skill for performance monitoring
用于性能监控的原子化Skill

Overview

概述

Production-ready patterns for metrics collection, alerting, and observability.
适用于生产环境的指标收集、告警与可观测性方案。

Prerequisites

前置条件

  • PostgreSQL 16+
  • pg_stat_statements extension
  • Optional: Prometheus, Grafana
  • PostgreSQL 16+
  • pg_stat_statements 扩展
  • 可选:Prometheus、Grafana

Parameters

参数

yaml
parameters:
  operation:
    type: string
    required: true
    enum: [collect_metrics, setup_alerting, diagnose]
  metric_type:
    type: string
    enum: [connections, queries, replication, storage]
yaml
parameters:
  operation:
    type: string
    required: true
    enum: [collect_metrics, setup_alerting, diagnose]
  metric_type:
    type: string
    enum: [connections, queries, replication, storage]

Quick Reference

快速参考

Key Queries

关键查询

sql
-- Connection stats
SELECT count(*), state FROM pg_stat_activity GROUP BY state;

-- Slow queries
SELECT query, mean_exec_time FROM pg_stat_statements
ORDER BY mean_exec_time DESC LIMIT 10;

-- Cache hit ratio
SELECT round(100.0 * sum(heap_blks_hit) /
    nullif(sum(heap_blks_hit + heap_blks_read), 0), 2) as ratio
FROM pg_statio_user_tables;

-- Table sizes
SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::regclass))
FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(tablename::regclass) DESC;

-- Replication lag
SELECT client_addr, pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) as lag
FROM pg_stat_replication;
sql
-- Connection stats
SELECT count(*), state FROM pg_stat_activity GROUP BY state;

-- Slow queries
SELECT query, mean_exec_time FROM pg_stat_statements
ORDER BY mean_exec_time DESC LIMIT 10;

-- Cache hit ratio
SELECT round(100.0 * sum(heap_blks_hit) /
    nullif(sum(heap_blks_hit + heap_blks_read), 0), 2) as ratio
FROM pg_statio_user_tables;

-- Table sizes
SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::regclass))
FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(tablename::regclass) DESC;

-- Replication lag
SELECT client_addr, pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) as lag
FROM pg_stat_replication;

Critical Thresholds

关键阈值

MetricWarningCritical
Connections80% max95% max
Cache hit< 99%< 95%
Replication lag> 1MB> 100MB
Dead tuples> 10K> 100K
指标警告阈值严重阈值
连接数超过最大连接数的80%超过最大连接数的95%
缓存命中率< 99%< 95%
复制延迟> 1MB> 100MB
死元组> 10K> 100K

Prometheus Exporter

Prometheus 导出器

yaml
postgres-exporter:
  image: prometheuscommunity/postgres-exporter
  environment:
    DATA_SOURCE_NAME: "postgresql://monitor:pass@postgres:5432/postgres"
yaml
postgres-exporter:
  image: prometheuscommunity/postgres-exporter
  environment:
    DATA_SOURCE_NAME: "postgresql://monitor:pass@postgres:5432/postgres"

Troubleshooting

故障排查

ProblemCauseSolution
No statsExtension missingCREATE EXTENSION pg_stat_statements
Old metricsStats not resetpg_stat_statements_reset()
High connectionsLeak or surgeCheck application
问题原因解决方案
无统计数据缺少扩展执行 CREATE EXTENSION pg_stat_statements
指标数据过时统计数据未重置执行 pg_stat_statements_reset()
连接数过高连接泄漏或突增检查应用程序

Usage

使用方法

Skill("postgresql-monitoring")
Skill("postgresql-monitoring")