postgresql-monitoring
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePostgreSQL Monitoring Skill
PostgreSQL监控Skill
Atomic skill for performance monitoring
用于性能监控的原子化Skill
Overview
概述
Production-ready patterns for metrics collection, alerting, and observability.
适用于生产环境的指标收集、告警与可观测性方案。
Prerequisites
前置条件
- PostgreSQL 16+
- pg_stat_statements extension
- Optional: Prometheus, Grafana
- PostgreSQL 16+
- pg_stat_statements 扩展
- 可选:Prometheus、Grafana
Parameters
参数
yaml
parameters:
operation:
type: string
required: true
enum: [collect_metrics, setup_alerting, diagnose]
metric_type:
type: string
enum: [connections, queries, replication, storage]yaml
parameters:
operation:
type: string
required: true
enum: [collect_metrics, setup_alerting, diagnose]
metric_type:
type: string
enum: [connections, queries, replication, storage]Quick Reference
快速参考
Key Queries
关键查询
sql
-- Connection stats
SELECT count(*), state FROM pg_stat_activity GROUP BY state;
-- Slow queries
SELECT query, mean_exec_time FROM pg_stat_statements
ORDER BY mean_exec_time DESC LIMIT 10;
-- Cache hit ratio
SELECT round(100.0 * sum(heap_blks_hit) /
nullif(sum(heap_blks_hit + heap_blks_read), 0), 2) as ratio
FROM pg_statio_user_tables;
-- Table sizes
SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::regclass))
FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(tablename::regclass) DESC;
-- Replication lag
SELECT client_addr, pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) as lag
FROM pg_stat_replication;sql
-- Connection stats
SELECT count(*), state FROM pg_stat_activity GROUP BY state;
-- Slow queries
SELECT query, mean_exec_time FROM pg_stat_statements
ORDER BY mean_exec_time DESC LIMIT 10;
-- Cache hit ratio
SELECT round(100.0 * sum(heap_blks_hit) /
nullif(sum(heap_blks_hit + heap_blks_read), 0), 2) as ratio
FROM pg_statio_user_tables;
-- Table sizes
SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::regclass))
FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(tablename::regclass) DESC;
-- Replication lag
SELECT client_addr, pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) as lag
FROM pg_stat_replication;Critical Thresholds
关键阈值
| Metric | Warning | Critical |
|---|---|---|
| Connections | 80% max | 95% max |
| Cache hit | < 99% | < 95% |
| Replication lag | > 1MB | > 100MB |
| Dead tuples | > 10K | > 100K |
| 指标 | 警告阈值 | 严重阈值 |
|---|---|---|
| 连接数 | 超过最大连接数的80% | 超过最大连接数的95% |
| 缓存命中率 | < 99% | < 95% |
| 复制延迟 | > 1MB | > 100MB |
| 死元组 | > 10K | > 100K |
Prometheus Exporter
Prometheus 导出器
yaml
postgres-exporter:
image: prometheuscommunity/postgres-exporter
environment:
DATA_SOURCE_NAME: "postgresql://monitor:pass@postgres:5432/postgres"yaml
postgres-exporter:
image: prometheuscommunity/postgres-exporter
environment:
DATA_SOURCE_NAME: "postgresql://monitor:pass@postgres:5432/postgres"Troubleshooting
故障排查
| Problem | Cause | Solution |
|---|---|---|
| No stats | Extension missing | CREATE EXTENSION pg_stat_statements |
| Old metrics | Stats not reset | pg_stat_statements_reset() |
| High connections | Leak or surge | Check application |
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 无统计数据 | 缺少扩展 | 执行 CREATE EXTENSION pg_stat_statements |
| 指标数据过时 | 统计数据未重置 | 执行 pg_stat_statements_reset() |
| 连接数过高 | 连接泄漏或突增 | 检查应用程序 |
Usage
使用方法
Skill("postgresql-monitoring")Skill("postgresql-monitoring")