postgresql-database-engineering

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PostgreSQL Database Engineering

PostgreSQL数据库工程

A comprehensive skill for professional PostgreSQL database engineering, covering everything from query optimization and indexing strategies to high availability, replication, and production database management. This skill enables you to design, optimize, and maintain high-performance PostgreSQL databases at scale.
这是面向专业PostgreSQL数据库工程的全面技能,涵盖从查询优化、索引策略到高可用性、复制和生产数据库管理的所有内容。该技能使你能够设计、优化和维护大规模的高性能PostgreSQL数据库。

When to Use This Skill

何时使用此技能

Use this skill when:
  • Designing database schemas for high-performance applications
  • Optimizing slow queries and improving database performance
  • Implementing indexing strategies for complex query patterns
  • Setting up partitioning for large tables (100M+ rows)
  • Configuring streaming replication and high availability
  • Tuning PostgreSQL configuration for production workloads
  • Implementing backup and recovery procedures
  • Debugging performance issues and query bottlenecks
  • Setting up connection pooling with pgBouncer or PgPool
  • Monitoring database health and performance metrics
  • Planning database migrations and schema changes
  • Implementing database security and access controls
  • Scaling PostgreSQL databases horizontally or vertically
  • Managing VACUUM operations and database maintenance
  • Setting up logical replication for data distribution
在以下场景使用此技能:
  • 为高性能应用设计数据库架构
  • 优化慢查询并提升数据库性能
  • 为复杂查询模式实现索引策略
  • 为大型表(1亿+行)设置分区
  • 配置流式复制和高可用性
  • 针对生产工作负载调优PostgreSQL配置
  • 实施备份与恢复流程
  • 调试性能问题和查询瓶颈
  • 使用pgBouncer或PgPool设置连接池
  • 监控数据库健康状况和性能指标
  • 规划数据库迁移和架构变更
  • 实施数据库安全和访问控制
  • 横向或纵向扩展PostgreSQL数据库
  • 管理VACUUM操作和数据库维护
  • 设置逻辑复制以实现数据分发

Core Concepts

核心概念

PostgreSQL Architecture

PostgreSQL架构

PostgreSQL uses a process-based architecture with several key components:
  • Postmaster Process: Main server process that manages connections
  • Backend Processes: One per client connection, handles queries
  • Shared Memory: Shared buffers, WAL buffers, lock tables
  • Background Workers: Autovacuum, checkpointer, WAL writer, statistics collector
  • Write-Ahead Log (WAL): Transaction log for durability and replication
  • Storage Layer: TOAST for large values, FSM for free space, VM for visibility
PostgreSQL采用基于进程的架构,包含几个关键组件:
  • Postmaster进程:管理连接的主服务器进程
  • 后端进程:每个客户端连接对应一个进程,处理查询
  • 共享内存:共享缓冲区、WAL缓冲区、锁表
  • 后台工作进程:Autovacuum、检查点进程、WAL写入器、统计信息收集器
  • 预写日志(WAL):用于持久性和复制的事务日志
  • 存储层:用于大值的TOAST、用于空闲空间的FSM、用于可见性的VM

MVCC (Multi-Version Concurrency Control)

MVCC(多版本并发控制)

PostgreSQL's foundational concurrency mechanism:
  • Snapshots: Each transaction sees a consistent snapshot of data
  • Tuple Versions: Multiple row versions coexist for concurrent access
  • Transaction IDs: xmin (creating transaction), xmax (deleting transaction)
  • Visibility Rules: Determines which row versions are visible to transactions
  • VACUUM: Reclaims space from dead tuples and prevents transaction wraparound
  • FREEZE: Marks old rows as visible to all transactions
Key Implications:
  • No read locks - readers never block writers
  • Writers never block readers
  • Updates create new row versions
  • Regular VACUUM is essential
  • Dead tuples accumulate until vacuumed
PostgreSQL的基础并发机制:
  • 快照:每个事务看到数据的一致快照
  • 元组版本:多个行版本共存以支持并发访问
  • 事务ID:xmin(创建事务)、xmax(删除事务)
  • 可见性规则:确定哪些行版本对事务可见
  • VACUUM:回收死元组的空间并防止事务回卷
  • FREEZE:将旧行标记为对所有事务可见
关键影响:
  • 无读锁 - 读取者永远不会阻塞写入者
  • 写入者永远不会阻塞读取者
  • 更新操作会创建新的行版本
  • 定期执行VACUUM至关重要
  • 死元组会不断累积直到被清理

Transaction Isolation Levels

事务隔离级别

PostgreSQL supports four isolation levels:
  1. Read Uncommitted: Treated as Read Committed in PostgreSQL
  2. Read Committed (default): Sees committed data at statement start
  3. Repeatable Read: Sees snapshot from transaction start
  4. Serializable: True serializable isolation with SSI
Choosing Isolation:
  • Read Committed: Most applications, best performance
  • Repeatable Read: Reports, analytics needing consistency
  • Serializable: Financial transactions, critical consistency needs
PostgreSQL支持四种隔离级别:
  1. Read Uncommitted:在PostgreSQL中被视为Read Committed
  2. Read Committed(默认):在语句开始时看到已提交的数据
  3. Repeatable Read:看到事务开始时的快照
  4. Serializable:基于SSI的真正可序列化隔离
隔离级别选择:
  • Read Committed:大多数应用场景,性能最佳
  • Repeatable Read:需要一致性的报表、分析场景
  • Serializable:金融交易、对一致性要求极高的场景

Index Types

索引类型

PostgreSQL offers multiple index types for different use cases:
PostgreSQL提供多种索引类型以适配不同使用场景:

1. B-Tree (Default)

1. B-Tree(默认)

  • Use for: Equality, range queries, sorting
  • Supports: <, <=, =, >=, >, BETWEEN, IN, IS NULL
  • Best for: Most general-purpose indexing
  • Example: Primary keys, foreign keys, timestamps
  • 适用场景:等值查询、范围查询、排序
  • 支持操作符:<, <=, =, >=, >, BETWEEN, IN, IS NULL
  • 最佳用途:大多数通用索引场景
  • 示例:主键、外键、时间戳

2. Hash

2. Hash

  • Use for: Equality comparisons only
  • Supports: = operator
  • Best for: Large tables with equality lookups
  • Limitation: Not WAL-logged before PG 10, no range queries
  • 适用场景:仅等值比较
  • 支持操作符:= 操作符
  • 最佳用途:大表的等值查找
  • 限制:PostgreSQL 10之前不支持WAL日志,不支持范围查询

3. GiST (Generalized Search Tree)

3. GiST(通用搜索树)

  • Use for: Geometric data, full-text search, custom types
  • Supports: Overlaps, contains, nearest neighbor
  • Best for: Spatial data, ranges, full-text search
  • Example: PostGIS geometries, tsvector, ranges
  • 适用场景:几何数据、全文搜索、自定义类型
  • 支持操作:重叠、包含、最近邻
  • 最佳用途:空间数据、范围、全文搜索
  • 示例:PostGIS几何数据、tsvector、范围类型

4. GIN (Generalized Inverted Index)

4. GIN(通用倒排索引)

  • Use for: Multi-valued columns (arrays, JSONB, full-text)
  • Supports: Contains, exists operators
  • Best for: JSONB queries, array operations, full-text search
  • Tradeoff: Slower updates, faster queries
  • 适用场景:多值列(数组、JSONB、全文)
  • 支持操作符:包含、存在操作符
  • 最佳用途:JSONB查询、数组操作、全文搜索
  • 权衡:更新速度较慢,查询速度更快

5. BRIN (Block Range Index)

5. BRIN(块范围索引)

  • Use for: Very large tables with natural ordering
  • Supports: Range queries on sorted data
  • Best for: Time-series data, append-only tables
  • Advantage: Tiny index size, scales to billions of rows
  • 适用场景:具有自然排序的超大型表
  • 支持操作:排序数据的范围查询
  • 最佳用途:时间序列数据、仅追加表
  • 优势:索引尺寸极小,可扩展至数十亿行

6. SP-GiST (Space-Partitioned GiST)

6. SP-GiST(空间分区GiST)

  • Use for: Non-balanced data structures
  • Supports: Points, ranges, IP addresses
  • Best for: Quadtrees, k-d trees, radix trees
  • 适用场景:非平衡数据结构
  • 支持类型:点、范围、IP地址
  • 最佳用途:四叉树、k-d树、基数树

Query Planning and Optimization

查询规划与优化

PostgreSQL's query planner determines execution strategies:
Planner Components:
  • Statistics: Table and column statistics for cardinality estimation
  • Cost Model: CPU, I/O, and memory cost estimation
  • Plan Types: Sequential scan, index scan, bitmap scan, joins
  • Join Methods: Nested loop, hash join, merge join
  • Optimization: Query rewriting, predicate pushdown, join reordering
Key Statistics:
  • n_distinct
    : Number of distinct values (for selectivity)
  • correlation
    : Physical row ordering correlation
  • most_common_vals
    : MCV list for skewed distributions
  • histogram_bounds
    : Value distribution histogram
Understanding EXPLAIN:
  • Cost: Startup cost .. total cost (arbitrary units)
  • Rows: Estimated row count
  • Width: Average row size in bytes
  • Actual Time: Real execution time (with ANALYZE)
  • Loops: Number of times node executed
PostgreSQL的查询规划器决定执行策略:
规划器组件:
  • 统计信息:用于基数估计的表和列统计数据
  • 成本模型:CPU、I/O和内存成本估计
  • 计划类型:顺序扫描、索引扫描、位图扫描、连接
  • 连接方法:嵌套循环、哈希连接、合并连接
  • 优化:查询重写、谓词下推、连接重排序
关键统计信息:
  • n_distinct
    :不同值的数量(用于选择性计算)
  • correlation
    :物理行顺序相关性
  • most_common_vals
    :针对倾斜分布的MCV列表
  • histogram_bounds
    :值分布直方图
理解EXPLAIN:
  • Cost:启动成本 .. 总成本(任意单位)
  • Rows:预估行数
  • Width:平均行大小(字节)
  • Actual Time:实际执行时间(搭配ANALYZE使用)
  • Loops:节点执行次数

Partitioning Strategies

分区策略

Table partitioning for managing large datasets:
用于管理大型数据集的表分区:

Range Partitioning

范围分区

  • Use for: Time-series data, sequential values
  • Example: Partition by date ranges (daily, monthly, yearly)
  • Benefit: Easy data lifecycle management, faster queries
  • 适用场景:时间序列数据、连续值
  • 示例:按日期范围(每日、每月、每年)分区
  • 优势:轻松管理数据生命周期,查询速度更快

List Partitioning

列表分区

  • Use for: Discrete categorical values
  • Example: Partition by country, region, status
  • Benefit: Logical data separation, partition pruning
  • 适用场景:离散分类值
  • 示例:按国家、地区、状态分区
  • 优势:逻辑数据分离,分区裁剪

Hash Partitioning

哈希分区

  • Use for: Even data distribution
  • Example: Partition by hash(user_id)
  • Benefit: Balanced partition sizes, parallel queries
Partition Pruning:
  • Planner eliminates irrelevant partitions
  • Drastically reduces query scope
  • Essential for partition performance
Partition-Wise Operations:
  • Partition-wise joins: Join matching partitions directly
  • Partition-wise aggregation: Aggregate within partitions
  • Parallel partition processing
  • 适用场景:均匀数据分布
  • 示例:按hash(user_id)分区
  • 优势:分区大小平衡,并行查询
分区裁剪:
  • 规划器会排除无关分区
  • 大幅减少查询范围
  • 对分区性能至关重要
分区级操作:
  • 分区级连接:直接连接匹配的分区
  • 分区级聚合:在分区内执行聚合
  • 并行分区处理

Replication and High Availability

复制与高可用性

PostgreSQL replication options:
PostgreSQL的复制选项:

Streaming Replication (Physical)

流式复制(物理)

  • Type: Binary WAL streaming to standby servers
  • Modes: Asynchronous, synchronous, quorum-based
  • Use for: High availability, read scalability
  • Failover: Automatic with tools like Patroni, repmgr
Synchronous vs Asynchronous:
  • Synchronous: Zero data loss, higher latency
  • Asynchronous: Low latency, potential data loss
  • Quorum: Balance between safety and performance
  • 类型:将二进制WAL流式传输到备用服务器
  • 模式:异步、同步、基于法定人数
  • 适用场景:高可用性、读取扩展性
  • 故障转移:通过Patroni、repmgr等工具实现自动故障转移
同步 vs 异步:
  • 同步:零数据丢失,延迟较高
  • 异步:低延迟,存在潜在数据丢失风险
  • 法定人数:在安全性和性能之间取得平衡

Logical Replication

逻辑复制

  • Type: Row-level change stream
  • Use for: Selective replication, upgrades, multi-master
  • Benefit: Replicate specific tables, cross-version
  • Limitation: No DDL replication, overhead
  • 类型:行级变更流
  • 适用场景:选择性复制、版本升级、多主架构
  • 优势:复制特定表、跨版本复制
  • 限制:不支持DDL复制,存在额外开销

Cascading Replication

级联复制

  • Standbys replicate from other standbys
  • Reduces load on primary
  • Geographic distribution
  • 备用服务器从其他备用服务器复制
  • 减少主服务器负载
  • 支持地理分布

Connection Pooling

连接池

Managing database connections efficiently:
高效管理数据库连接:

pgBouncer

pgBouncer

  • Type: Lightweight connection pooler
  • Modes: Session, transaction, statement pooling
  • Use for: High connection count applications
  • Benefit: Reduced connection overhead, resource limits
Pooling Modes:
  • Session: Client connects for entire session
  • Transaction: Connection per transaction
  • Statement: Connection per statement (rarely used)
  • 类型:轻量级连接池器
  • 模式:会话、事务、语句池化
  • 适用场景:高连接数应用
  • 优势:降低连接开销,资源限制
池化模式:
  • 会话模式:客户端在整个会话期间保持连接
  • 事务模式:每个事务对应一个连接
  • 语句模式:每个语句对应一个连接(很少使用)

PgPool-II

PgPool-II

  • Type: Feature-rich middleware
  • Features: Connection pooling, load balancing, query caching
  • Use for: Read/write splitting, connection management
  • Benefit: Advanced routing, in-memory cache
  • 类型:功能丰富的中间件
  • 特性:连接池、负载均衡、查询缓存
  • 适用场景:读写分离、连接管理
  • 优势:高级路由、内存缓存

VACUUM and Maintenance

VACUUM与维护

Critical maintenance operations:
关键的维护操作:

VACUUM

VACUUM

  • Purpose: Reclaim dead tuple space, update statistics
  • Types: Regular VACUUM, VACUUM FULL
  • When: After large updates/deletes, regularly via autovacuum
  • Impact: Regular VACUUM is non-blocking
  • 目的:回收死元组空间,更新统计信息
  • 类型:常规VACUUM、VACUUM FULL
  • 执行时机:大量更新/删除后,通过autovacuum定期执行
  • 影响:常规VACUUM不会阻塞业务

ANALYZE

ANALYZE

  • Purpose: Update planner statistics
  • When: After data changes, schema modifications
  • Impact: Minimal, fast on most tables
  • 目的:更新规划器统计信息
  • 执行时机:数据变更、架构修改后
  • 影响:影响极小,在大多数表上执行速度快

REINDEX

REINDEX

  • Purpose: Rebuild indexes, fix bloat
  • When: Index corruption, significant bloat
  • Impact: Locks table, use REINDEX CONCURRENTLY (PG 12+)
  • 目的:重建索引,修复膨胀
  • 执行时机:索引损坏、严重膨胀时
  • 影响:会锁定表,PostgreSQL 12+可使用REINDEX CONCURRENTLY

Autovacuum

Autovacuum

  • Purpose: Automated VACUUM and ANALYZE
  • Configuration: Threshold-based triggering
  • Tuning: Balance resource usage vs. responsiveness
  • Monitoring: Track autovacuum runs, prevent wraparound
  • 目的:自动执行VACUUM和ANALYZE
  • 配置:基于阈值触发
  • 调优:平衡资源使用与响应速度
  • 监控:跟踪autovacuum运行情况,防止事务回卷

Performance Tuning

性能调优

Key configuration parameters:
关键配置参数:

Memory Settings

内存设置

shared_buffers: 25% of RAM (start point)
effective_cache_size: 50-75% of RAM
work_mem: Per-operation memory (sort, hash)
maintenance_work_mem: VACUUM, CREATE INDEX memory
shared_buffers: 内存的25%(起始值)
effective_cache_size: 内存的50-75%
work_mem: 每个操作的内存(排序、哈希)
maintenance_work_mem: VACUUM、CREATE INDEX使用的内存

Checkpoint and WAL

检查点与WAL

checkpoint_timeout: How often to checkpoint
max_wal_size: WAL size before checkpoint
checkpoint_completion_target: Spread checkpoint I/O
wal_buffers: WAL write buffer size
checkpoint_timeout: 检查点执行间隔
max_wal_size: 触发检查点前的WAL大小
checkpoint_completion_target: 分散检查点I/O
wal_buffers: WAL写入缓冲区大小

Query Planner

查询规划器

random_page_cost: Relative cost of random I/O
effective_io_concurrency: Concurrent I/O operations
default_statistics_target: Histogram detail level
random_page_cost: 随机I/O的相对成本
effective_io_concurrency: 并发I/O操作数
default_statistics_target: 直方图详细程度

Connection Settings

连接设置

max_connections: Maximum client connections
connection_limit: Per-database/user limits
max_connections: 最大客户端连接数
connection_limit: 每个数据库/用户的连接限制

Index Strategies

索引策略

Choosing the Right Index

选择合适的索引

Decision Matrix:
Query PatternIndex TypeReason
WHERE id = 5
B-treeEquality lookup
WHERE created_at > '2024-01-01'
B-treeRange query
ORDER BY name
B-treeSorting support
WHERE tags @> ARRAY['sql']
GINArray containment
WHERE data->>'status' = 'active'
GIN (jsonb_path_ops)JSONB query
WHERE to_tsvector(content) @@ query
GINFull-text search
WHERE location <-> point(0,0)
GiSTNearest neighbor
WHERE timestamp BETWEEN ... (large table)
BRINSequential time-series
WHERE ip_address << '192.168.0.0/16'
GiST or SP-GiSTIP range query
决策矩阵:
查询模式索引类型原因
WHERE id = 5
B-tree等值查找
WHERE created_at > '2024-01-01'
B-tree范围查询
ORDER BY name
B-tree支持排序
WHERE tags @> ARRAY['sql']
GIN数组包含查询
WHERE data->>'status' = 'active'
GIN (jsonb_path_ops)JSONB查询
WHERE to_tsvector(content) @@ query
GIN全文搜索
WHERE location <-> point(0,0)
GiST最近邻查询
WHERE timestamp BETWEEN ... (大型表)
BRIN连续时间序列
WHERE ip_address << '192.168.0.0/16'
GiST或SP-GiSTIP范围查询

Composite Indexes

复合索引

Multi-column indexes for complex queries:
Column Ordering Rules:
  1. Equality columns first
  2. Sort/range columns last
  3. High-selectivity columns first
  4. Match query patterns exactly
Example:
sql
-- Query: WHERE status = 'active' AND created_at > '2024-01-01' ORDER BY created_at
-- Optimal index: (status, created_at)
CREATE INDEX idx_users_status_created ON users(status, created_at);
用于复杂查询的多列索引:
列顺序规则:
  1. 等值列优先
  2. 排序/范围列最后
  3. 高选择性列优先
  4. 完全匹配查询模式
示例:
sql
-- 查询:WHERE status = 'active' AND created_at > '2024-01-01' ORDER BY created_at
-- 最优索引:(status, created_at)
CREATE INDEX idx_users_status_created ON users(status, created_at);

Partial Indexes

部分索引

Index subset of rows:
Benefits:
  • Smaller index size
  • Faster updates on non-indexed rows
  • Targeted query optimization
Use Cases:
  • Index only active records:
    WHERE deleted_at IS NULL
  • Index recent data:
    WHERE created_at > NOW() - INTERVAL '90 days'
  • Index specific states:
    WHERE status IN ('pending', 'processing')
仅索引行的子集:
优势:
  • 索引尺寸更小
  • 非索引行的更新速度更快
  • 针对性的查询优化
使用场景:
  • 仅索引活跃记录:
    WHERE deleted_at IS NULL
  • 仅索引近期数据:
    WHERE created_at > NOW() - INTERVAL '90 days'
  • 仅索引特定状态:
    WHERE status IN ('pending', 'processing')

Expression Indexes

表达式索引

Index computed values:
Examples:
sql
-- Case-insensitive search
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Date truncation
CREATE INDEX idx_events_date ON events(DATE(created_at));

-- JSONB field
CREATE INDEX idx_data_status ON documents((data->>'status'));
索引计算后的值:
示例:
sql
-- 不区分大小写的搜索
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- 日期截断
CREATE INDEX idx_events_date ON events(DATE(created_at));

-- JSONB字段
CREATE INDEX idx_data_status ON documents((data->>'status'));

Covering Indexes (INCLUDE)

覆盖索引(INCLUDE)

Include non-key columns for index-only scans:
sql
CREATE INDEX idx_users_email_include
ON users(email)
INCLUDE (first_name, last_name, created_at);
Benefit: Query satisfied entirely from index, no table lookup
包含非键列以实现仅索引扫描:
sql
CREATE INDEX idx_users_email_include
ON users(email)
INCLUDE (first_name, last_name, created_at);
优势: 完全从索引满足查询,无需表查找

Index Maintenance

索引维护

Monitoring Index Usage:
sql
-- Unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
Detecting Bloat:
sql
-- Index bloat estimation
SELECT schemaname, tablename, indexname,
       pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
       idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
监控索引使用情况:
sql
-- 未使用的索引
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
检测索引膨胀:
sql
-- 索引膨胀估计
SELECT schemaname, tablename, indexname,
       pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
       idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

Query Optimization

查询优化

Using EXPLAIN ANALYZE

使用EXPLAIN ANALYZE

Understanding query execution:
sql
-- Basic EXPLAIN
EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';

-- EXPLAIN ANALYZE (actually runs query)
EXPLAIN ANALYZE SELECT * FROM users WHERE created_at > '2024-01-01';

-- Detailed output
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT u.*, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01';
Key Metrics:
  • Planning Time: Time to generate plan
  • Execution Time: Actual query runtime
  • Shared Hit vs Read: Buffer cache hits vs disk reads
  • Rows: Estimated vs actual row counts
  • Filter vs Index Cond: Post-scan filtering vs index usage
理解查询执行:
sql
-- 基础EXPLAIN
EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';

-- EXPLAIN ANALYZE(实际执行查询)
EXPLAIN ANALYZE SELECT * FROM users WHERE created_at > '2024-01-01';

-- 详细输出
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT u.*, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01';
关键指标:
  • Planning Time:生成计划的时间
  • Execution Time:实际查询运行时间
  • Shared Hit vs Read:缓冲区缓存命中 vs 磁盘读取
  • Rows:预估 vs 实际行数
  • Filter vs Index Cond:扫描后过滤 vs 索引使用

Common Query Anti-Patterns

常见查询反模式

1. N+1 Queries

1. N+1查询

Problem: One query per row in a loop Solution: JOIN or batch queries
问题: 在循环中每行执行一次查询 解决方案: 使用JOIN或批量查询

2. SELECT *

2. SELECT *

Problem: Fetches unnecessary columns Solution: Select only needed columns
问题: 获取不必要的列 解决方案: 仅选择需要的列

3. Implicit Type Conversions

3. 隐式类型转换

Problem: Index not used due to type mismatch Solution: Ensure query types match column types
问题: 类型不匹配导致索引未被使用 解决方案: 确保查询类型与列类型匹配

4. Function on Indexed Column

4. 索引列上使用函数

Problem:
WHERE UPPER(email) = 'USER@EXAMPLE.COM'
Solution: Use expression index or compare correctly
问题:
WHERE UPPER(email) = 'USER@EXAMPLE.COM'
解决方案: 使用表达式索引或正确的比较方式

5. OR Conditions

5. OR条件

Problem:
WHERE status = 'A' OR status = 'B'
Solution: Use
IN
:
WHERE status IN ('A', 'B')
问题:
WHERE status = 'A' OR status = 'B'
解决方案: 使用
IN
WHERE status IN ('A', 'B')

Join Optimization

连接优化

Join Types:
  1. Nested Loop
    • Best for: Small outer table, indexed inner table
    • How: For each outer row, scan inner table
    • When: Small result sets, good indexes
  2. Hash Join
    • Best for: Large tables, no good indexes
    • How: Build hash table of smaller table
    • When: Equality joins, sufficient memory
  3. Merge Join
    • Best for: Pre-sorted data, equality joins
    • How: Sort both inputs, merge scan
    • When: Both inputs sorted or can be sorted cheaply
Join Order Matters:
  • Planner reorders joins for optimization
  • Statistics guide join order decisions
  • Can force order with
    SET join_collapse_limit
连接类型:
  1. 嵌套循环
    • 最佳场景:小外部表、带索引的内部表
    • 工作方式:对每个外部行,扫描内部表
    • 适用时机:小结果集、索引良好
  2. 哈希连接
    • 最佳场景:大型表、无合适索引
    • 工作方式:为较小的表构建哈希表
    • 适用时机:等值连接、内存充足
  3. 合并连接
    • 最佳场景:预排序数据、等值连接
    • 工作方式:排序两个输入,合并扫描
    • 适用时机:两个输入已排序或可低成本排序
连接顺序很重要:
  • 规划器会重新排序连接以优化性能
  • 统计信息指导连接顺序决策
  • 可使用
    SET join_collapse_limit
    强制指定顺序

Aggregation Optimization

聚合优化

Techniques:
  • Partial Aggregates: Partition-wise aggregation
  • Hash Aggregates: In-memory grouping
  • Sorted Aggregates: Pre-sorted input
  • Parallel Aggregation: Multiple workers
Materialized Views:
  • Pre-compute expensive aggregations
  • Refresh on schedule or trigger
  • Trade freshness for query speed
技巧:
  • 部分聚合:分区级聚合
  • 哈希聚合:内存内分组
  • 排序聚合:预排序输入
  • 并行聚合:多工作进程
物化视图:
  • 预计算昂贵的聚合操作
  • 按计划或触发器刷新
  • 以数据新鲜度换取查询速度

Query Caching

查询缓存

Levels:
  1. Shared Buffers: PostgreSQL page cache
  2. OS Page Cache: Operating system cache
  3. Application Cache: Redis, Memcached
  4. Prepared Statements: Reuse query plans
层级:
  1. Shared Buffers:PostgreSQL页缓存
  2. OS Page Cache:操作系统缓存
  3. Application Cache:Redis、Memcached
  4. Prepared Statements:复用查询计划

Partitioning

分区

Implementing Range Partitioning

实现范围分区

Time-series example:
sql
-- Create partitioned table
CREATE TABLE events (
    id BIGSERIAL,
    event_type TEXT NOT NULL,
    user_id INTEGER NOT NULL,
    data JSONB,
    created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);

-- Create partitions
CREATE TABLE events_2024_01 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE events_2024_02 PARTITION OF events
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Default partition for data outside ranges
CREATE TABLE events_default PARTITION OF events DEFAULT;

-- Indexes on partitions
CREATE INDEX idx_events_2024_01_user ON events_2024_01(user_id);
CREATE INDEX idx_events_2024_02_user ON events_2024_02(user_id);
时间序列示例:
sql
-- 创建分区表
CREATE TABLE events (
    id BIGSERIAL,
    event_type TEXT NOT NULL,
    user_id INTEGER NOT NULL,
    data JSONB,
    created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);

-- 创建分区
CREATE TABLE events_2024_01 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE events_2024_02 PARTITION OF events
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- 为超出范围的数据创建默认分区
CREATE TABLE events_default PARTITION OF events DEFAULT;

-- 为分区创建索引
CREATE INDEX idx_events_2024_01_user ON events_2024_01(user_id);
CREATE INDEX idx_events_2024_02_user ON events_2024_02(user_id);

Partition Automation

分区自动化

Automated partition management:
sql
-- Function to create monthly partitions
CREATE OR REPLACE FUNCTION create_monthly_partition(
    base_table TEXT,
    partition_date DATE
) RETURNS VOID AS $$
DECLARE
    partition_name TEXT;
    start_date DATE;
    end_date DATE;
BEGIN
    partition_name := base_table || '_' || TO_CHAR(partition_date, 'YYYY_MM');
    start_date := DATE_TRUNC('month', partition_date);
    end_date := start_date + INTERVAL '1 month';

    EXECUTE format(
        'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I
         FOR VALUES FROM (%L) TO (%L)',
        partition_name, base_table, start_date, end_date
    );

    -- Create indexes
    EXECUTE format(
        'CREATE INDEX IF NOT EXISTS %I ON %I(user_id)',
        'idx_' || partition_name || '_user', partition_name
    );
END;
$$ LANGUAGE plpgsql;
自动分区管理:
sql
-- 创建月度分区的函数
CREATE OR REPLACE FUNCTION create_monthly_partition(
    base_table TEXT,
    partition_date DATE
) RETURNS VOID AS $$
DECLARE
    partition_name TEXT;
    start_date DATE;
    end_date DATE;
BEGIN
    partition_name := base_table || '_' || TO_CHAR(partition_date, 'YYYY_MM');
    start_date := DATE_TRUNC('month', partition_date);
    end_date := start_date + INTERVAL '1 month';

    EXECUTE format(
        'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I
         FOR VALUES FROM (%L) TO (%L)',
        partition_name, base_table, start_date, end_date
    );

    -- 创建索引
    EXECUTE format(
        'CREATE INDEX IF NOT EXISTS %I ON %I(user_id)',
        'idx_' || partition_name || '_user', partition_name
    );
END;
$$ LANGUAGE plpgsql;

Partition Maintenance

分区维护

Dropping old partitions:
sql
-- Detach partition (fast, non-blocking)
ALTER TABLE events DETACH PARTITION events_2023_01;

-- Drop detached partition
DROP TABLE events_2023_01;

-- Or archive before dropping
CREATE TABLE archive.events_2023_01 AS SELECT * FROM events_2023_01;
DROP TABLE events_2023_01;
删除旧分区:
sql
-- 分离分区(快速、非阻塞)
ALTER TABLE events DETACH PARTITION events_2023_01;

-- 删除分离的分区
DROP TABLE events_2023_01;

-- 或在删除前归档
CREATE TABLE archive.events_2023_01 AS SELECT * FROM events_2023_01;
DROP TABLE events_2023_01;

High Availability and Replication

高可用性与复制

Setting Up Streaming Replication

设置流式复制

Primary server configuration (postgresql.conf):
conf
undefined
主服务器配置(postgresql.conf):
conf
undefined

Replication settings

复制设置

wal_level = replica max_wal_senders = 10 max_replication_slots = 10 hot_standby = on synchronous_commit = on # or off for async synchronous_standby_names = 'standby1,standby2' # for sync replication

**Create replication user:**

```sql
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secure_password';
pg_hba.conf on primary:
conf
undefined
wal_level = replica max_wal_senders = 10 max_replication_slots = 10 hot_standby = on synchronous_commit = on # 异步模式设为off synchronous_standby_names = 'standby1,standby2' # 同步复制使用

**创建复制用户:**

```sql
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secure_password';
主服务器的pg_hba.conf:
conf
undefined

Allow replication connections

允许复制连接

host replication replicator standby_ip/32 md5

**Standby server setup:**

```bash
host replication replicator standby_ip/32 md5

**备用服务器设置:**

```bash

Stop standby PostgreSQL

停止备用服务器的PostgreSQL

systemctl stop postgresql
systemctl stop postgresql

Remove old data directory

删除旧数据目录

rm -rf /var/lib/postgresql/14/main
rm -rf /var/lib/postgresql/14/main

Base backup from primary

从主服务器进行基础备份

pg_basebackup -h primary_host -D /var/lib/postgresql/14/main
-U replicator -P -v -R -X stream -C -S standby1
pg_basebackup -h primary_host -D /var/lib/postgresql/14/main
-U replicator -P -v -R -X stream -C -S standby1

Start standby

启动备用服务器

systemctl start postgresql

**Standby configuration (created by -R flag):**

```conf
systemctl start postgresql

**备用服务器配置(由-R标志自动创建):**

```conf

standby.signal file created automatically

standby.signal文件自动创建

postgresql.auto.conf contains:

postgresql.auto.conf包含:

primary_conninfo = 'host=primary_host port=5432 user=replicator password=secure_password' primary_slot_name = 'standby1'
undefined
primary_conninfo = 'host=primary_host port=5432 user=replicator password=secure_password' primary_slot_name = 'standby1'
undefined

Monitoring Replication

监控复制

On primary:
sql
-- Check replication status
SELECT client_addr, state, sync_state, replay_lag
FROM pg_stat_replication;

-- Check replication slots
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;
On standby:
sql
-- Check replication lag
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;

-- Check recovery status
SELECT pg_is_in_recovery();
在主服务器上:
sql
-- 检查复制状态
SELECT client_addr, state, sync_state, replay_lag
FROM pg_stat_replication;

-- 检查复制槽
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;
在备用服务器上:
sql
-- 检查复制延迟
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;

-- 检查恢复状态
SELECT pg_is_in_recovery();

Failover and Switchover

故障转移与切换

Promoting standby to primary:
bash
undefined
将备用服务器提升为主服务器:
bash
-- 触发故障转移
pg_ctl promote -D /var/lib/postgresql/14/main

-- 或使用SQL
SELECT pg_promote();
受控切换:
bash
undefined

Trigger failover

1. 停止主服务器的写入

2. 等待备用服务器追平数据

3. 提升备用服务器为主服务器

4. 将旧主服务器重新配置为新的备用服务器

pg_ctl promote -D /var/lib/postgresql/14/main
undefined

Or using SQL

逻辑复制设置

SELECT pg_promote();

**Controlled switchover:**

```bash
在发布端(源):
sql
-- 创建发布
CREATE PUBLICATION my_publication FOR TABLE users, orders;

-- 或发布所有表
CREATE PUBLICATION all_tables FOR ALL TABLES;
在订阅端(目标):
sql
-- 创建订阅
CREATE SUBSCRIPTION my_subscription
    CONNECTION 'host=publisher_host dbname=mydb user=replicator password=pass'
    PUBLICATION my_publication;

-- 监控订阅
SELECT * FROM pg_stat_subscription;

1. Stop writes on primary

备份与恢复

2. Wait for standby to catch up

物理备份

3. Promote standby

4. Reconfigure old primary as new standby

undefined
pg_basebackup:
bash
-- 完整物理备份
pg_basebackup -h localhost -U postgres -D /backup/base \
              -F tar -z -P -v

-- 包含WAL文件以支持时间点恢复
pg_basebackup -h localhost -U postgres -D /backup/base \
              -X stream -F tar -z -P
连续归档(WAL归档):
conf
undefined

Logical Replication Setup

postgresql.conf

On publisher (source):
sql
-- Create publication
CREATE PUBLICATION my_publication FOR TABLE users, orders;

-- Or all tables
CREATE PUBLICATION all_tables FOR ALL TABLES;
On subscriber (destination):
sql
-- Create subscription
CREATE SUBSCRIPTION my_subscription
    CONNECTION 'host=publisher_host dbname=mydb user=replicator password=pass'
    PUBLICATION my_publication;

-- Monitor subscription
SELECT * FROM pg_stat_subscription;
wal_level = replica archive_mode = on archive_command = 'cp %p /archive/wal/%f'
undefined

Backup and Recovery

逻辑备份

Physical Backups

pg_basebackup:
bash
undefined
pg_dump:
bash
-- 单个数据库
pg_dump -h localhost -U postgres -F c -b -v -f mydb.dump mydb

-- 所有数据库
pg_dumpall -h localhost -U postgres -f all_databases.sql

-- 特定表
pg_dump -h localhost -U postgres -t users -t orders -F c -f tables.dump mydb

-- 仅架构
pg_dump -h localhost -U postgres --schema-only -F c -f schema.dump mydb
pg_restore:
bash
-- 恢复数据库
pg_restore -h localhost -U postgres -d mydb -v mydb.dump

-- 并行恢复
pg_restore -h localhost -U postgres -d mydb -j 4 -v mydb.dump

-- 恢复特定表
pg_restore -h localhost -U postgres -d mydb -t users -v mydb.dump

Full physical backup

时间点恢复(PITR)

pg_basebackup -h localhost -U postgres -D /backup/base
-F tar -z -P -v
设置:
  1. 执行基础备份
  2. 配置WAL归档
  3. 安全存储WAL文件
恢复:
bash
undefined

With WAL files for point-in-time recovery

1. 恢复基础备份

pg_basebackup -h localhost -U postgres -D /backup/base
-X stream -F tar -z -P

**Continuous archiving (WAL archiving):**

```conf
tar -xzf base.tar.gz -C /var/lib/postgresql/14/main

postgresql.conf

2. 创建recovery.signal文件

wal_level = replica archive_mode = on archive_command = 'cp %p /archive/wal/%f'
undefined
touch /var/lib/postgresql/14/main/recovery.signal

Logical Backups

3. 配置恢复目标(postgresql.conf或postgresql.auto.conf)

pg_dump:
bash
undefined
restore_command = 'cp /archive/wal/%f %p' recovery_target_time = '2024-01-15 14:30:00'

Single database

或:recovery_target_name = 'before_disaster'

或:recovery_target_lsn = '0/3000000'

4. 启动PostgreSQL

pg_dump -h localhost -U postgres -F c -b -v -f mydb.dump mydb
systemctl start postgresql
undefined

All databases

备份策略

pg_dumpall -h localhost -U postgres -f all_databases.sql
3-2-1规则:
  • 3份数据副本
  • 2种不同介质类型
  • 1份异地备份
备份计划:
  • 每日:增量WAL归档
  • 每周:完整pg_basebackup备份
  • 每月:长期保留备份
备份测试:
  • 定期恢复到测试环境
  • 验证数据完整性
  • 测量恢复时间

Specific tables

性能监控

需监控的关键指标

pg_dump -h localhost -U postgres -t users -t orders -F c -f tables.dump mydb
数据库健康:
  • 活跃连接数
  • 事务速率
  • 缓存命中率
  • 死锁
  • 检查点频率
  • Autovacuum运行情况
查询性能:
  • 慢查询日志
  • 查询执行时间
  • 锁等待
  • 顺序扫描
系统资源:
  • CPU利用率
  • 内存使用
  • 磁盘I/O
  • 网络带宽

Schema only

必备监控查询

pg_dump -h localhost -U postgres --schema-only -F c -f schema.dump mydb

**pg_restore:**

```bash
连接统计:
sql
SELECT count(*) as total_connections,
       count(*) FILTER (WHERE state = 'active') as active,
       count(*) FILTER (WHERE state = 'idle') as idle,
       count(*) FILTER (WHERE state = 'idle in transaction') as idle_in_transaction
FROM pg_stat_activity;
缓存命中率:
sql
SELECT sum(heap_blks_read) as heap_read,
       sum(heap_blks_hit) as heap_hit,
       sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS ratio
FROM pg_statio_user_tables;
表膨胀:
sql
SELECT schemaname, tablename,
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
       n_dead_tup,
       n_live_tup,
       round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
长运行查询:
sql
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration DESC;
锁监控:
sql
SELECT blocked_locks.pid AS blocked_pid,
       blocked_activity.usename AS blocked_user,
       blocking_locks.pid AS blocking_pid,
       blocking_activity.usename AS blocking_user,
       blocked_activity.query AS blocked_statement,
       blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted
  AND blocking_locks.granted;

Restore database

pg_stat_statements

pg_restore -h localhost -U postgres -d mydb -v mydb.dump
安装:
sql
CREATE EXTENSION pg_stat_statements;
配置(postgresql.conf):
conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000
按总时间排序的Top查询:
sql
SELECT query,
       calls,
       total_exec_time,
       mean_exec_time,
       max_exec_time,
       rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
按平均时间排序的Top查询:
sql
SELECT query,
       calls,
       mean_exec_time,
       total_exec_time
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 20;

Parallel restore

最佳实践

架构设计

pg_restore -h localhost -U postgres -d mydb -j 4 -v mydb.dump
规范化:
  • 事务系统规范化至3NF
  • 针对读密集型工作负载选择性反规范化
  • 使用外键保证数据完整性
  • 对超大型表考虑分区
数据类型:
  • 使用最小的合适数据类型
  • 大型ID使用BIGINT,较小范围使用INTEGER
  • 精确十进制值使用NUMERIC
  • 时间戳使用TIMESTAMP WITH TIME ZONE
  • 除非需要长度限制,否则使用TEXT而非VARCHAR
  • 分布式ID生成使用UUID
  • 半结构化数据使用JSONB
约束:
  • 所有表设置主键
  • 使用外键保证引用完整性
  • 使用CHECK约束实现业务规则
  • 必要时设置NOT NULL
  • 使用UNIQUE约束保证唯一性
  • 为约束命名以提升可维护性

Restore specific tables

迁移策略

pg_restore -h localhost -U postgres -d mydb -t users -v mydb.dump
undefined
零停机迁移:
  1. 添加新列
    sql
    ALTER TABLE users ADD COLUMN email_verified BOOLEAN;
  2. 批量回填数据
    sql
    UPDATE users SET email_verified = false
    WHERE email_verified IS NULL
    LIMIT 10000;
  3. 添加NOT NULL约束
    sql
    ALTER TABLE users ALTER COLUMN email_verified SET NOT NULL;
索引创建:
  • 生产环境使用
    CREATE INDEX CONCURRENTLY
  • 不会锁定表,允许读写操作
  • 执行时间更长但不会阻塞业务
  • 使用
    pg_stat_progress_create_index
    监控进度
大型表修改:
  • 使用
    pg_repack
    进行表重写
  • 修改前对大型表进行分区
  • 在维护窗口执行
  • 在类生产数据集上测试

Point-in-Time Recovery (PITR)

安全最佳实践

Setup:
  1. Take base backup
  2. Configure WAL archiving
  3. Store WAL files safely
Recovery:
bash
undefined
认证:
  • 使用强密码或证书认证
  • 密码加密使用SCRAM-SHA-256
  • 为不同应用使用独立用户
  • 避免使用超级用户进行应用连接
授权:
  • 授予最小必要权限
  • 使用基于角色的访问控制
  • 撤销PUBLIC访问权限
  • 多租户场景使用行级安全
网络安全:
  • 严格配置pg_hba.conf
  • 连接使用SSL/TLS
  • 防火墙限制数据库端口
  • 复制使用VPN或专用网络
审计日志:
  • 启用连接日志
  • 记录DDL语句
  • 使用pgAudit扩展进行详细审计
  • 监控可疑活动

1. Restore base backup

维护计划

tar -xzf base.tar.gz -C /var/lib/postgresql/14/main
每日:
  • 监控慢查询
  • 检查复制延迟
  • 查看autovacuum活动
  • 监控磁盘空间
每周:
  • 分析Top查询
  • 查看索引使用情况
  • 检查膨胀
  • 备份验证
每月:
  • 对关键表执行完整VACUUM
  • 重建膨胀的索引
  • 查看配置参数
  • 容量规划
每季度:
  • 查看并优化索引
  • 架构优化机会
  • 升级规划
  • 更新性能基线

2. Create recovery.signal file

高级主题

并行查询执行

touch /var/lib/postgresql/14/main/recovery.signal
配置:
conf
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
parallel_setup_cost = 1000
parallel_tuple_cost = 0.1
min_parallel_table_scan_size = 8MB
强制并行执行:
sql
SET max_parallel_workers_per_gather = 4;
EXPLAIN ANALYZE SELECT COUNT(*) FROM large_table;
并行化适用场景:
  • 大型顺序扫描
  • 大型聚合
  • 大型表的哈希连接
  • 位图堆扫描

3. Configure recovery target (postgresql.conf or postgresql.auto.conf)

自定义函数与存储过程

restore_command = 'cp /archive/wal/%f %p' recovery_target_time = '2024-01-15 14:30:00'
存储过程:
sql
CREATE OR REPLACE PROCEDURE update_user_statistics()
LANGUAGE plpgsql
AS $$
BEGIN
    UPDATE users SET
        order_count = (SELECT COUNT(*) FROM orders WHERE user_id = users.id),
        last_order_date = (SELECT MAX(created_at) FROM orders WHERE user_id = users.id);

    COMMIT;
END;
$$;
带错误处理的函数:
sql
CREATE OR REPLACE FUNCTION create_user(
    p_email TEXT,
    p_name TEXT
) RETURNS INTEGER
LANGUAGE plpgsql
AS $$
DECLARE
    v_user_id INTEGER;
BEGIN
    INSERT INTO users (email, name)
    VALUES (p_email, p_name)
    RETURNING id INTO v_user_id;

    RETURN v_user_id;
EXCEPTION
    WHEN unique_violation THEN
        RAISE EXCEPTION 'Email already exists: %', p_email;
    WHEN OTHERS THEN
        RAISE EXCEPTION 'Error creating user: %', SQLERRM;
END;
$$;

Or: recovery_target_name = 'before_disaster'

外部数据包装器

Or: recovery_target_lsn = '0/3000000'

4. Start PostgreSQL

systemctl start postgresql
undefined
访问外部数据源:
sql
-- 安装postgres_fdw
CREATE EXTENSION postgres_fdw;

-- 创建服务器
CREATE SERVER remote_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'remote_host', dbname 'remote_database', port '5432');

-- 创建用户映射
CREATE USER MAPPING FOR current_user
SERVER remote_db
OPTIONS (user 'remote_user', password 'remote_password');

-- 导入外部架构
IMPORT FOREIGN SCHEMA public
FROM SERVER remote_db
INTO local_schema;

-- 查询外部表
SELECT * FROM local_schema.remote_table;

Backup Strategies

JSON与JSONB操作

3-2-1 Rule:
  • 3 copies of data
  • 2 different media types
  • 1 offsite backup
Backup Schedule:
  • Daily: Incremental WAL archiving
  • Weekly: Full pg_basebackup
  • Monthly: Long-term retention
Testing Backups:
  • Regularly restore to test environment
  • Verify data integrity
  • Measure restore time
JSONB索引:
sql
-- 用于包含查询的GIN索引
CREATE INDEX idx_data_gin ON documents USING GIN (data);

-- 特定字段的表达式索引
CREATE INDEX idx_data_status ON documents ((data->>'status'));

-- 使用jsonb_path_ops的GIN索引(更小,@>查询更快)
CREATE INDEX idx_data_path_ops ON documents USING GIN (data jsonb_path_ops);
高效JSONB查询:
sql
-- 包含查询(使用GIN索引)
SELECT * FROM documents WHERE data @> '{"status": "active"}';

-- 存在查询
SELECT * FROM documents WHERE data ? 'email';

-- 路径查询
SELECT * FROM documents WHERE data->'user'->>'email' = 'user@example.com';

-- 数组操作
SELECT * FROM documents WHERE data->'tags' @> '["sql", "postgres"]';

Performance Monitoring

全文搜索

Key Metrics to Monitor

Database Health:
  • Active connections
  • Transaction rate
  • Cache hit ratio
  • Deadlocks
  • Checkpoint frequency
  • Autovacuum runs
Query Performance:
  • Slow query log
  • Query execution time
  • Lock waits
  • Sequential scans
System Resources:
  • CPU utilization
  • Memory usage
  • Disk I/O
  • Network bandwidth
基础设置:
sql
-- 添加tsvector列
ALTER TABLE articles ADD COLUMN search_vector tsvector;

-- 生成搜索向量
UPDATE articles SET search_vector =
    to_tsvector('english', coalesce(title, '') || ' ' || coalesce(content, ''));

-- 创建GIN索引
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

-- 自动更新的触发器
CREATE TRIGGER articles_search_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english', title, content);
搜索查询:
sql
-- 基础搜索
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgresql & database') query
WHERE search_vector @@ query
ORDER BY rank DESC;

-- 短语搜索
SELECT title FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'database engineering');

-- 带高亮的搜索
SELECT title,
       ts_headline('english', content, query) AS snippet
FROM articles, to_tsquery('english', 'postgresql') query
WHERE search_vector @@ query;

Essential Monitoring Queries

故障排除

常见问题

Connection stats:
sql
SELECT count(*) as total_connections,
       count(*) FILTER (WHERE state = 'active') as active,
       count(*) FILTER (WHERE state = 'idle') as idle,
       count(*) FILTER (WHERE state = 'idle in transaction') as idle_in_transaction
FROM pg_stat_activity;
Cache hit ratio:
sql
SELECT sum(heap_blks_read) as heap_read,
       sum(heap_blks_hit) as heap_hit,
       sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS ratio
FROM pg_statio_user_tables;
Table bloat:
sql
SELECT schemaname, tablename,
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
       n_dead_tup,
       n_live_tup,
       round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
Long-running queries:
sql
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration DESC;
Lock monitoring:
sql
SELECT blocked_locks.pid AS blocked_pid,
       blocked_activity.usename AS blocked_user,
       blocking_locks.pid AS blocking_pid,
       blocking_activity.usename AS blocking_user,
       blocked_activity.query AS blocked_statement,
       blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted
  AND blocking_locks.granted;
问题:慢查询
  • 查看EXPLAIN ANALYZE输出
  • 验证索引存在且被使用
  • 更新表统计信息:
    ANALYZE table_name
  • 检查外键是否缺少索引
  • 查看索引列上是否使用了函数
问题:高CPU使用率
  • 使用pg_stat_statements识别昂贵查询
  • 检查是否因缺少索引导致顺序扫描
  • 查看并行查询设置
  • 查找低效的连接或聚合
问题:连接耗尽
  • 增加max_connections(需重启)
  • 实现连接池(pgBouncer)
  • 识别应用中的连接泄漏
  • 使用
    pg_stat_activity
    监控
问题:Autovacuum无法跟上
  • 增加autovacuum_max_workers
  • 调整autovacuum阈值
  • 减少autovacuum_naptime
  • 增加autovacuum_work_mem
  • 检查是否有长运行事务阻塞VACUUM
问题:复制延迟
  • 检查主备服务器之间的网络带宽
  • 验证备用服务器的硬件资源
  • 检查备用服务器上的长运行查询
  • 监控WAL生成速率
  • 考虑增加wal_sender_timeout
问题:事务ID回卷
  • 监控最旧事务的年龄
  • 在旧表上运行VACUUM FREEZE
  • 检查autovacuum_freeze_max_age
  • 增加autovacuum的激进程度
  • 必要时手动运行VACUUM FREEZE

pg_stat_statements

诊断查询

Installation:
sql
CREATE EXTENSION pg_stat_statements;
Configuration (postgresql.conf):
conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000
Top queries by total time:
sql
SELECT query,
       calls,
       total_exec_time,
       mean_exec_time,
       max_exec_time,
       rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
Top queries by average time:
sql
SELECT query,
       calls,
       mean_exec_time,
       total_exec_time
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 20;
查找外键上缺少的索引:
sql
SELECT c.conrelid::regclass AS table,
       c.confrelid::regclass AS referenced_table,
       string_agg(a.attname, ', ') AS foreign_key_columns
FROM pg_constraint c
JOIN pg_attribute a ON a.attnum = ANY(c.conkey) AND a.attrelid = c.conrelid
WHERE c.contype = 'f'
  AND NOT EXISTS (
    SELECT 1 FROM pg_index i
    WHERE i.indrelid = c.conrelid
      AND c.conkey[1:array_length(c.conkey, 1)]
          OPERATOR(pg_catalog.@>) i.indkey[0:array_length(c.conkey, 1) - 1]
  )
GROUP BY c.conrelid, c.confrelid, c.conname;
识别阻塞查询:
sql
SELECT activity.pid,
       activity.usename,
       activity.query,
       blocking.pid AS blocking_id,
       blocking.query AS blocking_query
FROM pg_stat_activity AS activity
JOIN pg_stat_activity AS blocking ON blocking.pid = ANY(pg_blocking_pids(activity.pid));

技能版本:1.0.0 最后更新:2025年10月 技能分类:数据库工程、性能优化、数据架构 兼容版本:PostgreSQL 12+, 13, 14, 15, 16 前置要求:SQL知识、基础数据库概念、Linux命令行

Best Practices

Schema Design

Normalization:
  • Normalize to 3NF for transactional systems
  • Denormalize selectively for read-heavy workloads
  • Use foreign keys for data integrity
  • Consider partitioning for very large tables
Data Types:
  • Use smallest appropriate data type
  • BIGINT for large IDs, INTEGER for smaller ranges
  • NUMERIC for exact decimal values
  • TIMESTAMP WITH TIME ZONE for timestamps
  • TEXT over VARCHAR unless length constraint needed
  • UUID for distributed ID generation
  • JSONB for semi-structured data
Constraints:
  • Primary keys on all tables
  • Foreign keys for referential integrity
  • CHECK constraints for business rules
  • NOT NULL where appropriate
  • UNIQUE constraints for uniqueness
  • Use constraint names for maintainability

Migration Strategies

Zero-Downtime Migrations:
  1. Add new column
    sql
    ALTER TABLE users ADD COLUMN email_verified BOOLEAN;
  2. Backfill data (in batches)
    sql
    UPDATE users SET email_verified = false
    WHERE email_verified IS NULL
    LIMIT 10000;
  3. Add NOT NULL constraint
    sql
    ALTER TABLE users ALTER COLUMN email_verified SET NOT NULL;
Index Creation:
  • Use
    CREATE INDEX CONCURRENTLY
    in production
  • No table locks, allows reads/writes
  • Takes longer but doesn't block
  • Monitor progress with
    pg_stat_progress_create_index
Large Table Modifications:
  • Use
    pg_repack
    for table rewrites
  • Partition large tables before modifications
  • Schedule during maintenance windows
  • Test on production-like datasets

Security Best Practices

Authentication:
  • Use strong passwords or certificate authentication
  • SCRAM-SHA-256 for password encryption
  • Separate users for different applications
  • Avoid superuser for application connections
Authorization:
  • Grant minimal required privileges
  • Use role-based access control
  • Revoke PUBLIC access
  • Row-level security for multi-tenant
Network Security:
  • Configure pg_hba.conf restrictively
  • Use SSL/TLS for connections
  • Firewall database ports
  • VPN or private networks for replication
Audit Logging:
  • Enable connection logging
  • Log DDL statements
  • Use pgAudit extension for detailed auditing
  • Monitor for suspicious activity

Maintenance Schedule

Daily:
  • Monitor slow queries
  • Check replication lag
  • Review autovacuum activity
  • Monitor disk space
Weekly:
  • Analyze top queries
  • Review index usage
  • Check for bloat
  • Backup verification
Monthly:
  • Full VACUUM on critical tables
  • REINDEX bloated indexes
  • Review configuration parameters
  • Capacity planning
Quarterly:
  • Review and optimize indexes
  • Schema optimization opportunities
  • Upgrade planning
  • Performance baseline updates

Advanced Topics

Parallel Query Execution

Configuration:
conf
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
parallel_setup_cost = 1000
parallel_tuple_cost = 0.1
min_parallel_table_scan_size = 8MB
Forcing parallel execution:
sql
SET max_parallel_workers_per_gather = 4;
EXPLAIN ANALYZE SELECT COUNT(*) FROM large_table;
When parallelism helps:
  • Large sequential scans
  • Large aggregations
  • Hash joins on large tables
  • Bitmap heap scans

Custom Functions and Procedures

Stored procedures:
sql
CREATE OR REPLACE PROCEDURE update_user_statistics()
LANGUAGE plpgsql
AS $$
BEGIN
    UPDATE users SET
        order_count = (SELECT COUNT(*) FROM orders WHERE user_id = users.id),
        last_order_date = (SELECT MAX(created_at) FROM orders WHERE user_id = users.id);

    COMMIT;
END;
$$;
Functions with proper error handling:
sql
CREATE OR REPLACE FUNCTION create_user(
    p_email TEXT,
    p_name TEXT
) RETURNS INTEGER
LANGUAGE plpgsql
AS $$
DECLARE
    v_user_id INTEGER;
BEGIN
    INSERT INTO users (email, name)
    VALUES (p_email, p_name)
    RETURNING id INTO v_user_id;

    RETURN v_user_id;
EXCEPTION
    WHEN unique_violation THEN
        RAISE EXCEPTION 'Email already exists: %', p_email;
    WHEN OTHERS THEN
        RAISE EXCEPTION 'Error creating user: %', SQLERRM;
END;
$$;

Foreign Data Wrappers

Access external data sources:
sql
-- Install postgres_fdw
CREATE EXTENSION postgres_fdw;

-- Create server
CREATE SERVER remote_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'remote_host', dbname 'remote_database', port '5432');

-- Create user mapping
CREATE USER MAPPING FOR current_user
SERVER remote_db
OPTIONS (user 'remote_user', password 'remote_password');

-- Import foreign schema
IMPORT FOREIGN SCHEMA public
FROM SERVER remote_db
INTO local_schema;

-- Query foreign table
SELECT * FROM local_schema.remote_table;

JSON and JSONB Operations

Indexing JSONB:
sql
-- GIN index for containment queries
CREATE INDEX idx_data_gin ON documents USING GIN (data);

-- Expression index for specific field
CREATE INDEX idx_data_status ON documents ((data->>'status'));

-- GIN index with jsonb_path_ops (smaller, faster for @> queries)
CREATE INDEX idx_data_path_ops ON documents USING GIN (data jsonb_path_ops);
Efficient JSONB queries:
sql
-- Containment query (uses GIN index)
SELECT * FROM documents WHERE data @> '{"status": "active"}';

-- Existence query
SELECT * FROM documents WHERE data ? 'email';

-- Path query
SELECT * FROM documents WHERE data->'user'->>'email' = 'user@example.com';

-- Array operations
SELECT * FROM documents WHERE data->'tags' @> '["sql", "postgres"]';

Full-Text Search

Basic setup:
sql
-- Add tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector;

-- Generate search vector
UPDATE articles SET search_vector =
    to_tsvector('english', coalesce(title, '') || ' ' || coalesce(content, ''));

-- Create GIN index
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

-- Trigger for automatic updates
CREATE TRIGGER articles_search_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english', title, content);
Search queries:
sql
-- Basic search
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgresql & database') query
WHERE search_vector @@ query
ORDER BY rank DESC;

-- Phrase search
SELECT title FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'database engineering');

-- Search with highlighting
SELECT title,
       ts_headline('english', content, query) AS snippet
FROM articles, to_tsquery('english', 'postgresql') query
WHERE search_vector @@ query;

Troubleshooting

Common Issues

Problem: Slow Queries
  • Check EXPLAIN ANALYZE output
  • Verify indexes exist and are used
  • Update table statistics:
    ANALYZE table_name
  • Check for missing indexes on foreign keys
  • Look for function calls on indexed columns
Problem: High CPU Usage
  • Identify expensive queries with pg_stat_statements
  • Check for missing indexes causing sequential scans
  • Review parallel query settings
  • Look for inefficient joins or aggregations
Problem: Connection Exhaustion
  • Increase max_connections (requires restart)
  • Implement connection pooling (pgBouncer)
  • Identify connection leaks in application
  • Monitor with
    pg_stat_activity
Problem: Autovacuum Not Keeping Up
  • Increase autovacuum_max_workers
  • Adjust autovacuum thresholds
  • Reduce autovacuum_naptime
  • Increase autovacuum_work_mem
  • Check for long-running transactions blocking VACUUM
Problem: Replication Lag
  • Check network bandwidth between primary and standby
  • Verify standby hardware resources
  • Check for long-running queries on standby
  • Monitor WAL generation rate
  • Consider increasing wal_sender_timeout
Problem: Transaction ID Wraparound
  • Monitor age of oldest transaction
  • Run VACUUM FREEZE on old tables
  • Check autovacuum_freeze_max_age
  • Increase autovacuum aggressiveness
  • Run manual VACUUM FREEZE if necessary

Diagnostic Queries

Find missing indexes on foreign keys:
sql
SELECT c.conrelid::regclass AS table,
       c.confrelid::regclass AS referenced_table,
       string_agg(a.attname, ', ') AS foreign_key_columns
FROM pg_constraint c
JOIN pg_attribute a ON a.attnum = ANY(c.conkey) AND a.attrelid = c.conrelid
WHERE c.contype = 'f'
  AND NOT EXISTS (
    SELECT 1 FROM pg_index i
    WHERE i.indrelid = c.conrelid
      AND c.conkey[1:array_length(c.conkey, 1)]
          OPERATOR(pg_catalog.@>) i.indkey[0:array_length(c.conkey, 1) - 1]
  )
GROUP BY c.conrelid, c.confrelid, c.conname;
Identify blocking queries:
sql
SELECT activity.pid,
       activity.usename,
       activity.query,
       blocking.pid AS blocking_id,
       blocking.query AS blocking_query
FROM pg_stat_activity AS activity
JOIN pg_stat_activity AS blocking ON blocking.pid = ANY(pg_blocking_pids(activity.pid));

Skill Version: 1.0.0 Last Updated: October 2025 Skill Category: Database Engineering, Performance Optimization, Data Architecture Compatible With: PostgreSQL 12+, 13, 14, 15, 16 Prerequisites: SQL knowledge, basic database concepts, Linux command line