database-management-patterns

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Database Management Patterns

数据库管理模式

A comprehensive skill for mastering database management across SQL (PostgreSQL) and NoSQL (MongoDB) systems. This skill covers schema design, indexing strategies, transaction management, replication, sharding, and performance optimization for production-grade applications.
这是一项精通SQL(PostgreSQL)与NoSQL(MongoDB)系统数据库管理的综合技能。本技能涵盖生产级应用的架构设计、索引策略、事务管理、复制、分片及性能优化。

When to Use This Skill

何时使用本技能

Use this skill when:
  • Designing database schemas for new applications or refactoring existing ones
  • Choosing between SQL and NoSQL databases for your use case
  • Optimizing query performance with proper indexing strategies
  • Implementing data consistency with transactions and ACID guarantees
  • Scaling databases horizontally with sharding and replication
  • Managing high-traffic applications requiring distributed databases
  • Ensuring data integrity with constraints, triggers, and validation
  • Troubleshooting performance issues using explain plans and query analysis
  • Building fault-tolerant systems with replication and failover strategies
  • Working with complex data relationships (relational) or flexible schemas (document)
在以下场景使用本技能:
  • 设计数据库架构:为新应用设计架构或重构现有应用架构
  • 选择SQL与NoSQL数据库:根据使用场景挑选合适的数据库类型
  • 优化查询性能:通过合理的索引策略提升查询效率
  • 实现数据一致性:利用事务与ACID保障确保数据一致性
  • 扩容数据库:通过分片与复制实现水平扩容
  • 管理高流量应用:运维需要分布式数据库的高流量系统
  • 确保数据完整性:通过约束、触发器与验证机制保障数据完整性
  • 排查性能问题:使用执行计划与查询分析定位性能瓶颈
  • 构建容错系统:基于复制与故障转移策略搭建高可用系统
  • 处理复杂数据结构:应对关系型复杂数据关联或文档型灵活架构

Core Concepts

核心概念

Database Paradigms Comparison

数据库范式对比

Relational Databases (PostgreSQL)

关系型数据库(PostgreSQL)

Strengths:
  • ACID Transactions: Strong consistency guarantees
  • Complex Queries: JOIN operations, subqueries, CTEs
  • Data Integrity: Foreign keys, constraints, triggers
  • Normalized Data: Reduced redundancy, consistent updates
  • Mature Ecosystem: Rich tooling, extensions, community
Best For:
  • Financial systems requiring strict consistency
  • Complex relationships and data integrity requirements
  • Applications with structured, well-defined schemas
  • Systems requiring complex analytical queries
  • Multi-step transactions across multiple tables
优势:
  • ACID事务:提供强一致性保障
  • 复杂查询:支持JOIN操作、子查询、CTE(公共表表达式)
  • 数据完整性:外键、约束、触发器机制完善
  • 规范化数据:减少数据冗余,确保更新一致性
  • 成熟生态:工具、扩展丰富,社区支持完善
适用场景:
  • 要求严格一致性的金融系统
  • 存在复杂数据关系与完整性要求的场景
  • 数据结构明确且稳定的应用
  • 需要复杂分析查询的系统
  • 涉及跨多表的多步骤事务场景

Document Databases (MongoDB)

文档型数据库(MongoDB)

Strengths:
  • Flexible Schema: Easy schema evolution, polymorphic data
  • Horizontal Scalability: Built-in sharding support
  • JSON-Native: Natural fit for modern application development
  • Embedded Documents: Denormalized data for performance
  • Aggregation Framework: Powerful data processing pipeline
Best For:
  • Rapidly evolving applications with changing requirements
  • Content management systems with varied data structures
  • Real-time analytics and event logging
  • Mobile and web applications with JSON APIs
  • Hierarchical or nested data structures
优势:
  • 灵活架构:架构演进便捷,支持多态数据
  • 水平扩展性:内置分片支持,轻松扩容
  • 原生JSON兼容:与现代应用开发天然适配
  • 嵌入式文档:非规范化设计提升读取性能
  • 聚合框架:强大的数据处理流水线能力
适用场景:
  • 需求快速迭代、架构频繁变化的应用
  • 数据结构多样的内容管理系统
  • 实时分析与事件日志系统
  • 采用JSON API的移动与Web应用
  • 层级或嵌套结构的数据场景

ACID Properties

ACID属性

Atomicity: All operations in a transaction succeed or fail together Consistency: Transactions bring database from one valid state to another Isolation: Concurrent transactions don't interfere with each other Durability: Committed transactions survive system failures
原子性(Atomicity):事务内所有操作要么全部成功,要么全部回滚 一致性(Consistency):事务将数据库从一个有效状态转换为另一个有效状态 隔离性(Isolation):并发执行的事务互不干扰 持久性(Durability):已提交的事务结果在系统故障后仍能保留

CAP Theorem

CAP定理

In distributed systems, choose two of three:
  • Consistency: All nodes see the same data
  • Availability: System remains operational
  • Partition Tolerance: System continues despite network failures
PostgreSQL emphasizes CP (Consistency + Partition Tolerance) MongoDB can be configured for CP or AP depending on write/read concerns
分布式系统中,需在三者中权衡选择其二:
  • 一致性(Consistency):所有节点的数据保持一致
  • 可用性(Availability):系统始终对外提供服务
  • 分区容错性(Partition Tolerance):网络分区时系统仍能正常运行
PostgreSQL侧重CP(一致性+分区容错性) MongoDB可根据读写关注点配置为CP或AP模式

PostgreSQL Patterns

PostgreSQL模式

Schema Design Fundamentals

架构设计基础

Normalization Levels

规范化级别

First Normal Form (1NF)
  • Atomic values (no arrays or lists in columns)
  • Each row is unique (primary key exists)
  • No repeating groups
Second Normal Form (2NF)
  • Meets 1NF requirements
  • All non-key attributes depend on the entire primary key
Third Normal Form (3NF)
  • Meets 2NF requirements
  • No transitive dependencies (non-key attributes depend only on primary key)
When to Denormalize:
  • Read-heavy workloads where joins are expensive
  • Frequently accessed aggregate data
  • Historical snapshots that shouldn't change
  • Performance-critical queries
第一范式(1NF)
  • 列值具备原子性(列中不包含数组或列表)
  • 每行数据唯一(存在主键)
  • 无重复分组
第二范式(2NF)
  • 满足第一范式要求
  • 所有非键属性完全依赖于主键
第三范式(3NF)
  • 满足第二范式要求
  • 非键属性之间无传递依赖(仅依赖主键)
何时进行非规范化:
  • 读密集型场景,JOIN操作开销过大时
  • 频繁访问的聚合数据
  • 无需修改的历史快照数据
  • 性能敏感的查询场景

Table Design Patterns

表设计模式

Primary Keys:
sql
-- Serial auto-increment (traditional)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- UUID for distributed systems
CREATE TABLE accounts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Composite primary key
CREATE TABLE order_items (
    order_id INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    quantity INTEGER NOT NULL,
    price NUMERIC(10, 2) NOT NULL,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);
Foreign Key Constraints:
sql
-- Cascade delete: Remove child records when parent deleted
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    title VARCHAR(255) NOT NULL,
    content TEXT,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

-- Set null: Preserve child records, nullify reference
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    post_id INTEGER,
    user_id INTEGER,
    content TEXT NOT NULL,
    FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE SET NULL,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
);

-- Restrict: Prevent deletion if child records exist
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    category_id INTEGER NOT NULL,
    name VARCHAR(255) NOT NULL,
    FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE RESTRICT
);
主键设计:
sql
-- 自增序列(传统方式)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- 分布式系统使用UUID
CREATE TABLE accounts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- 复合主键
CREATE TABLE order_items (
    order_id INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    quantity INTEGER NOT NULL,
    price NUMERIC(10, 2) NOT NULL,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);
外键约束:
sql
-- 级联删除:删除父记录时自动删除子记录
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    title VARCHAR(255) NOT NULL,
    content TEXT,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

-- 设置为NULL:保留子记录,外键字段设为NULL
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    post_id INTEGER,
    user_id INTEGER,
    content TEXT NOT NULL,
    FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE SET NULL,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
);

-- 限制删除:存在子记录时禁止删除父记录
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    category_id INTEGER NOT NULL,
    name VARCHAR(255) NOT NULL,
    FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE RESTRICT
);

Advanced Constraints

高级约束

Check Constraints:
sql
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price NUMERIC(10, 2) NOT NULL CHECK (price > 0),
    discount_percent INTEGER CHECK (discount_percent BETWEEN 0 AND 100),
    stock_quantity INTEGER NOT NULL CHECK (stock_quantity >= 0)
);

-- Table-level check constraint
CREATE TABLE date_ranges (
    id SERIAL PRIMARY KEY,
    start_date DATE NOT NULL,
    end_date DATE NOT NULL,
    CHECK (end_date > start_date)
);
Unique Constraints:
sql
-- Single column unique
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL
);

-- Composite unique constraint
CREATE TABLE user_permissions (
    user_id INTEGER NOT NULL,
    permission_id INTEGER NOT NULL,
    granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (user_id, permission_id),
    FOREIGN KEY (user_id) REFERENCES users(id),
    FOREIGN KEY (permission_id) REFERENCES permissions(id)
);

-- Partial unique index (unique where condition met)
CREATE UNIQUE INDEX unique_active_email
ON users (email)
WHERE active = true;
检查约束:
sql
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price NUMERIC(10, 2) NOT NULL CHECK (price > 0),
    discount_percent INTEGER CHECK (discount_percent BETWEEN 0 AND 100),
    stock_quantity INTEGER NOT NULL CHECK (stock_quantity >= 0)
);

-- 表级检查约束
CREATE TABLE date_ranges (
    id SERIAL PRIMARY KEY,
    start_date DATE NOT NULL,
    end_date DATE NOT NULL,
    CHECK (end_date > start_date)
);
唯一约束:
sql
-- 单列唯一约束
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL
);

-- 复合唯一约束
CREATE TABLE user_permissions (
    user_id INTEGER NOT NULL,
    permission_id INTEGER NOT NULL,
    granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (user_id, permission_id),
    FOREIGN KEY (user_id) REFERENCES users(id),
    FOREIGN KEY (permission_id) REFERENCES permissions(id)
);

-- 部分唯一索引(仅对满足条件的记录建立唯一索引)
CREATE UNIQUE INDEX unique_active_email
ON users (email)
WHERE active = true;

Triggers and Functions

触发器与函数

Audit Trail Pattern:
sql
-- Audit table
CREATE TABLE audit_log (
    id SERIAL PRIMARY KEY,
    table_name VARCHAR(255) NOT NULL,
    record_id INTEGER NOT NULL,
    action VARCHAR(10) NOT NULL,
    old_data JSONB,
    new_data JSONB,
    changed_by VARCHAR(255),
    changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Trigger function
CREATE OR REPLACE FUNCTION audit_trigger_function()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO audit_log (table_name, record_id, action, new_data, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, 'INSERT', row_to_json(NEW), current_user);
        RETURN NEW;
    ELSIF TG_OP = 'UPDATE' THEN
        INSERT INTO audit_log (table_name, record_id, action, old_data, new_data, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, 'UPDATE', row_to_json(OLD), row_to_json(NEW), current_user);
        RETURN NEW;
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO audit_log (table_name, record_id, action, old_data, changed_by)
        VALUES (TG_TABLE_NAME, OLD.id, 'DELETE', row_to_json(OLD), current_user);
        RETURN OLD;
    END IF;
END;
$$ LANGUAGE plpgsql;

-- Attach trigger to table
CREATE TRIGGER users_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION audit_trigger_function();
Timestamp Update Pattern:
sql
CREATE OR REPLACE FUNCTION update_modified_timestamp()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    content TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TRIGGER posts_update_timestamp
BEFORE UPDATE ON posts
FOR EACH ROW EXECUTE FUNCTION update_modified_timestamp();
审计日志模式:
sql
-- 审计日志表
CREATE TABLE audit_log (
    id SERIAL PRIMARY KEY,
    table_name VARCHAR(255) NOT NULL,
    record_id INTEGER NOT NULL,
    action VARCHAR(10) NOT NULL,
    old_data JSONB,
    new_data JSONB,
    changed_by VARCHAR(255),
    changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- 触发器函数
CREATE OR REPLACE FUNCTION audit_trigger_function()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO audit_log (table_name, record_id, action, new_data, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, 'INSERT', row_to_json(NEW), current_user);
        RETURN NEW;
    ELSIF TG_OP = 'UPDATE' THEN
        INSERT INTO audit_log (table_name, record_id, action, old_data, new_data, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, 'UPDATE', row_to_json(OLD), row_to_json(NEW), current_user);
        RETURN NEW;
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO audit_log (table_name, record_id, action, old_data, changed_by)
        VALUES (TG_TABLE_NAME, OLD.id, 'DELETE', row_to_json(OLD), current_user);
        RETURN OLD;
    END IF;
END;
$$ LANGUAGE plpgsql;

-- 为表绑定触发器
CREATE TRIGGER users_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION audit_trigger_function();
时间戳自动更新模式:
sql
CREATE OR REPLACE FUNCTION update_modified_timestamp()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    content TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TRIGGER posts_update_timestamp
BEFORE UPDATE ON posts
FOR EACH ROW EXECUTE FUNCTION update_modified_timestamp();

Views and Materialized Views

视图与物化视图

Standard Views:
sql
-- Virtual table - computed on each query
CREATE VIEW active_users_with_posts AS
SELECT
    u.id,
    u.username,
    u.email,
    COUNT(p.id) as post_count,
    MAX(p.created_at) as last_post_date
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.active = true
GROUP BY u.id, u.username, u.email;

-- Use view like a table
SELECT * FROM active_users_with_posts WHERE post_count > 10;
Materialized Views:
sql
-- Physical table - stores computed results
CREATE MATERIALIZED VIEW user_statistics AS
SELECT
    u.id,
    u.username,
    COUNT(DISTINCT p.id) as total_posts,
    COUNT(DISTINCT c.id) as total_comments,
    AVG(p.views) as avg_post_views,
    MAX(p.created_at) as last_activity
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
LEFT JOIN comments c ON u.id = c.user_id
GROUP BY u.id, u.username;

-- Create index on materialized view
CREATE INDEX idx_user_stats_posts ON user_statistics(total_posts);

-- Refresh materialized view (update data)
REFRESH MATERIALIZED VIEW user_statistics;

-- Concurrent refresh (allows reads during refresh)
REFRESH MATERIALIZED VIEW CONCURRENTLY user_statistics;
标准视图:
sql
-- 虚拟表,每次查询时计算
CREATE VIEW active_users_with_posts AS
SELECT
    u.id,
    u.username,
    u.email,
    COUNT(p.id) as post_count,
    MAX(p.created_at) as last_post_date
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.active = true
GROUP BY u.id, u.username, u.email;

-- 像使用普通表一样使用视图
SELECT * FROM active_users_with_posts WHERE post_count > 10;
物化视图:
sql
-- 物理表,存储计算后的结果
CREATE MATERIALIZED VIEW user_statistics AS
SELECT
    u.id,
    u.username,
    COUNT(DISTINCT p.id) as total_posts,
    COUNT(DISTINCT c.id) as total_comments,
    AVG(p.views) as avg_post_views,
    MAX(p.created_at) as last_activity
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
LEFT JOIN comments c ON u.id = c.user_id
GROUP BY u.id, u.username;

-- 为物化视图创建索引
CREATE INDEX idx_user_stats_posts ON user_statistics(total_posts);

-- 刷新物化视图(更新数据)
REFRESH MATERIALIZED VIEW user_statistics;

-- 并发刷新(刷新时允许读取)
REFRESH MATERIALIZED VIEW CONCURRENTLY user_statistics;

MongoDB Patterns

MongoDB模式

Document Modeling Strategies

文档建模策略

Embedding vs Referencing

嵌入与引用

Embedding Pattern (Denormalization):
javascript
// One-to-few: Embed when relationship is contained
// Example: Blog post with comments
{
    _id: ObjectId("..."),
    title: "Database Design Patterns",
    author: "John Doe",
    content: "...",
    published_at: ISODate("2025-01-15"),
    comments: [
        {
            _id: ObjectId("..."),
            author: "Jane Smith",
            text: "Great article!",
            created_at: ISODate("2025-01-16")
        },
        {
            _id: ObjectId("..."),
            author: "Bob Johnson",
            text: "Very helpful, thanks!",
            created_at: ISODate("2025-01-17")
        }
    ],
    tags: ["database", "design", "patterns"],
    stats: {
        views: 1523,
        likes: 89,
        shares: 23
    }
}

// Benefits:
// - Single query to retrieve post with comments
// - Better read performance
// - Atomic updates to entire document
//
// Drawbacks:
// - Document size limits (16MB in MongoDB)
// - Difficult to query comments independently
// - May duplicate data if comments need to appear elsewhere
Referencing Pattern (Normalization):
javascript
// One-to-many or many-to-many: Reference when relationship is unbounded
// Example: User with many posts

// Users collection
{
    _id: ObjectId("507f1f77bcf86cd799439011"),
    username: "john_doe",
    email: "john@example.com",
    profile: {
        bio: "Software engineer",
        avatar_url: "https://...",
        location: "San Francisco"
    },
    created_at: ISODate("2024-01-01")
}

// Posts collection (references user)
{
    _id: ObjectId("507f191e810c19729de860ea"),
    user_id: ObjectId("507f1f77bcf86cd799439011"),
    title: "My First Post",
    content: "...",
    published_at: ISODate("2025-01-15"),
    comment_ids: [
        ObjectId("..."),
        ObjectId("...")
    ]
}

// Benefits:
// - No duplication of user data
// - Flexible: users can have unlimited posts
// - Easy to update user information once
//
// Drawbacks:
// - Requires multiple queries or $lookup
// - Slower read performance for joined data
Hybrid Approach (Selective Denormalization):
javascript
// Store frequently accessed fields from referenced document
{
    _id: ObjectId("..."),
    title: "Database Patterns",
    content: "...",
    author: {
        // Embedded: frequently accessed, rarely changes
        id: ObjectId("507f1f77bcf86cd799439011"),
        username: "john_doe",
        avatar_url: "https://..."
    },
    // Reference: full user data available if needed
    author_id: ObjectId("507f1f77bcf86cd799439011"),
    published_at: ISODate("2025-01-15")
}

// Benefits:
// - Fast reads with embedded frequently-used data
// - Can still get full user data when needed
// - Balance between performance and flexibility
//
// Tradeoffs:
// - Need to update embedded data when user changes username/avatar
// - Slightly larger documents
嵌入模式(非规范化):
javascript
-- 一对少关系:包含在文档内
-- 示例:带评论的博客文章
{
    _id: ObjectId("..."),
    title: "数据库设计模式",
    author: "John Doe",
    content: "...",
    published_at: ISODate("2025-01-15"),
    comments: [
        {
            _id: ObjectId("..."),
            author: "Jane Smith",
            text: "很棒的文章!",
            created_at: ISODate("2025-01-16")
        },
        {
            _id: ObjectId("..."),
            author: "Bob Johnson",
            text: "非常有帮助,谢谢!",
            created_at: ISODate("2025-01-17")
        }
    ],
    tags: ["database", "design", "patterns"],
    stats: {
        views: 1523,
        likes: 89,
        shares: 23
    }
}

-- 优势:
-- - 单查询即可获取文章与评论
-- - 读取性能更优
-- - 可对整个文档执行原子更新
--
-- 劣势:
-- - 文档大小受限(MongoDB中为16MB)
-- - 难以独立查询评论
-- - 若评论需在其他地方展示,会存在数据重复
引用模式(规范化):
javascript
-- 一对多或多对多关系:使用引用,适用于无边界关系
-- 示例:拥有多篇文章的用户

-- 用户集合
{
    _id: ObjectId("507f1f77bcf86cd799439011"),
    username: "john_doe",
    email: "john@example.com",
    profile: {
        bio: "软件工程师",
        avatar_url: "https://...",
        location: "San Francisco"
    },
    created_at: ISODate("2024-01-01")
}

-- 文章集合(引用用户)
{
    _id: ObjectId("507f191e810c19729de860ea"),
    user_id: ObjectId("507f1f77bcf86cd799439011"),
    title: "我的第一篇文章",
    content: "...",
    published_at: ISODate("2025-01-15"),
    comment_ids: [
        ObjectId("..."),
        ObjectId("...")
    ]
}

-- 优势:
-- - 无用户数据重复
-- - 灵活性高:用户可拥有任意数量的文章
-- - 用户信息更新一次即可生效
--
-- 劣势:
-- - 需要多查询或$lookup操作
-- - 关联数据的读取性能较慢
混合方式(选择性非规范化):
javascript
-- 存储引用文档中频繁访问的字段
{
    _id: ObjectId("..."),
    title: "数据库模式",
    content: "...",
    author: {
        // 嵌入:频繁访问、极少变更的字段
        id: ObjectId("507f1f77bcf86cd799439011"),
        username: "john_doe",
        avatar_url: "https://..."
    },
    // 引用:需要时可获取完整用户数据
    author_id: ObjectId("507f1f77bcf86cd799439011"),
    published_at: ISODate("2025-01-15")
}

-- 优势:
-- - 嵌入频繁使用的数据,提升读取速度
-- - 仍可在需要时获取完整用户数据
-- - 在性能与灵活性间取得平衡
--
-- 权衡:
-- - 用户用户名/头像变更时,需更新嵌入的数据
-- - 文档体积略有增大

Schema Design Patterns

架构设计模式

Bucket Pattern (Time-Series Data):
javascript
// Instead of one document per measurement:
// BAD: Millions of tiny documents
{
    sensor_id: "sensor_001",
    timestamp: ISODate("2025-01-15T10:00:00Z"),
    temperature: 72.5,
    humidity: 45
}

// GOOD: Bucket documents with arrays of measurements
{
    sensor_id: "sensor_001",
    date: ISODate("2025-01-15"),
    hour: 10,
    measurements: [
        { minute: 0, temperature: 72.5, humidity: 45 },
        { minute: 1, temperature: 72.6, humidity: 45 },
        { minute: 2, temperature: 72.4, humidity: 46 },
        // ... up to 60 measurements per hour
    ],
    summary: {
        count: 60,
        avg_temperature: 72.5,
        min_temperature: 71.8,
        max_temperature: 73.2
    }
}

// Benefits:
// - Reduced document count (60x fewer documents)
// - Better index efficiency
// - Pre-computed summaries
// - Easier to query by time ranges
Computed Pattern (Pre-Aggregated Data):
javascript
// Store computed values to avoid expensive aggregations
{
    _id: ObjectId("..."),
    product_id: "PROD-123",
    month: "2025-01",
    total_sales: 15420.50,
    units_sold: 234,
    unique_customers: 187,
    avg_order_value: 65.90,
    top_customers: [
        { customer_id: "CUST-456", revenue: 890.50 },
        { customer_id: "CUST-789", revenue: 675.25 }
    ],
    computed_at: ISODate("2025-02-01T00:00:00Z")
}

// Update pattern: Scheduled job or trigger updates computed values
Polymorphic Pattern (Varied Schemas):
javascript
// Handle different product types in single collection
{
    _id: ObjectId("..."),
    type: "book",
    name: "Database Design",
    price: 49.99,
    // Book-specific fields
    isbn: "978-0-123456-78-9",
    author: "John Smith",
    pages: 456,
    publisher: "Tech Books Inc"
}

{
    _id: ObjectId("..."),
    type: "electronics",
    name: "Wireless Mouse",
    price: 29.99,
    // Electronics-specific fields
    brand: "TechBrand",
    warranty_months: 24,
    specifications: {
        battery_life: "6 months",
        connectivity: "Bluetooth 5.0"
    }
}

// Query by type
db.products.find({ type: "book", author: "John Smith" })
db.products.find({ type: "electronics", "specifications.connectivity": /Bluetooth/ })
桶模式(时间序列数据):
javascript
-- 避免单条记录对应一个测量值:
-- 糟糕方案:数百万条小文档
{
    sensor_id: "sensor_001",
    timestamp: ISODate("2025-01-15T10:00:00Z"),
    temperature: 72.5,
    humidity: 45
}

-- 优秀方案:用桶文档存储批量测量值数组
{
    sensor_id: "sensor_001",
    date: ISODate("2025-01-15"),
    hour: 10,
    measurements: [
        { minute: 0, temperature: 72.5, humidity: 45 },
        { minute: 1, temperature: 72.6, humidity: 45 },
        { minute: 2, temperature: 72.4, humidity: 46 },
        // ... 每小时最多60条测量值
    ],
    summary: {
        count: 60,
        avg_temperature: 72.5,
        min_temperature: 71.8,
        max_temperature: 73.2
    }
}

-- 优势:
-- - 文档数量大幅减少(减少60倍)
-- - 索引效率更高
-- - 预计算汇总数据
-- - 按时间范围查询更便捷
计算模式(预聚合数据):
javascript
-- 存储计算后的值,避免昂贵的聚合操作
{
    _id: ObjectId("..."),
    product_id: "PROD-123",
    month: "2025-01",
    total_sales: 15420.50,
    units_sold: 234,
    unique_customers: 187,
    avg_order_value: 65.90,
    top_customers: [
        { customer_id: "CUST-456", revenue: 890.50 },
        { customer_id: "CUST-789", revenue: 675.25 }
    ],
    computed_at: ISODate("2025-02-01T00:00:00Z")
}

-- 更新模式:通过定时任务或触发器更新计算值
多态模式(多样架构):
javascript
-- 在单个集合中处理不同类型的产品
{
    _id: ObjectId("..."),
    type: "book",
    name: "数据库设计",
    price: 49.99,
    // 书籍专属字段
    isbn: "978-0-123456-78-9",
    author: "John Smith",
    pages: 456,
    publisher: "Tech Books Inc"
}

{
    _id: ObjectId("..."),
    type: "electronics",
    name: "无线鼠标",
    price: 29.99,
    // 电子产品专属字段
    brand: "TechBrand",
    warranty_months: 24,
    specifications: {
        battery_life: "6个月",
        connectivity: "Bluetooth 5.0"
    }
}

-- 按类型查询
db.products.find({ type: "book", author: "John Smith" })
db.products.find({ type: "electronics", "specifications.connectivity": /Bluetooth/ })

Aggregation Framework

聚合框架

Basic Aggregation Pipeline:
javascript
// Group by author and count posts
db.posts.aggregate([
    {
        $match: { published: true } // Filter stage
    },
    {
        $group: {
            _id: "$author_id",
            total_posts: { $sum: 1 },
            total_views: { $sum: "$views" },
            avg_views: { $avg: "$views" },
            latest_post: { $max: "$published_at" }
        }
    },
    {
        $sort: { total_posts: -1 } // Sort by post count
    },
    {
        $limit: 10 // Top 10 authors
    }
])
Advanced Pipeline with Lookup (Join):
javascript
// Join posts with user data
db.posts.aggregate([
    {
        $match: {
            published_at: { $gte: ISODate("2025-01-01") }
        }
    },
    {
        $lookup: {
            from: "users",
            localField: "author_id",
            foreignField: "_id",
            as: "author"
        }
    },
    {
        $unwind: "$author" // Flatten author array
    },
    {
        $project: {
            title: 1,
            content: 1,
            views: 1,
            "author.username": 1,
            "author.email": 1,
            days_since_publish: {
                $divide: [
                    { $subtract: [new Date(), "$published_at"] },
                    1000 * 60 * 60 * 24
                ]
            }
        }
    },
    {
        $sort: { views: -1 }
    }
])
Aggregation with Grouping and Reshaping:
javascript
// Complex aggregation: Sales analysis
db.orders.aggregate([
    {
        $match: {
            status: "completed",
            created_at: {
                $gte: ISODate("2025-01-01"),
                $lt: ISODate("2025-02-01")
            }
        }
    },
    {
        $unwind: "$items" // Flatten order items
    },
    {
        $group: {
            _id: {
                product_id: "$items.product_id",
                customer_region: "$customer.region"
            },
            total_quantity: { $sum: "$items.quantity" },
            total_revenue: { $sum: "$items.total_price" },
            order_count: { $sum: 1 },
            avg_order_value: { $avg: "$items.total_price" }
        }
    },
    {
        $group: {
            _id: "$_id.product_id",
            regions: {
                $push: {
                    region: "$_id.customer_region",
                    quantity: "$total_quantity",
                    revenue: "$total_revenue"
                }
            },
            total_quantity: { $sum: "$total_quantity" },
            total_revenue: { $sum: "$total_revenue" }
        }
    },
    {
        $sort: { total_revenue: -1 }
    }
])
基础聚合流水线:
javascript
-- 按作者分组并统计文章数量
db.posts.aggregate([
    {
        $match: { published: true } // 过滤阶段
    },
    {
        $group: {
            _id: "$author_id",
            total_posts: { $sum: 1 },
            total_views: { $sum: "$views" },
            avg_views: { $avg: "$views" },
            latest_post: { $max: "$published_at" }
        }
    },
    {
        $sort: { total_posts: -1 } // 按文章数量排序
    },
    {
        $limit: 10 // 取前10位作者
    }
])
带Lookup(关联)的高级流水线:
javascript
-- 关联文章与用户数据
db.posts.aggregate([
    {
        $match: {
            published_at: { $gte: ISODate("2025-01-01") }
        }
    },
    {
        $lookup: {
            from: "users",
            localField: "author_id",
            foreignField: "_id",
            as: "author"
        }
    },
    {
        $unwind: "$author" // 展开作者数组
    },
    {
        $project: {
            title: 1,
            content: 1,
            views: 1,
            "author.username": 1,
            "author.email": 1,
            days_since_publish: {
                $divide: [
                    { $subtract: [new Date(), "$published_at"] },
                    1000 * 60 * 60 * 24
                ]
            }
        }
    },
    {
        $sort: { views: -1 }
    }
])
分组与重塑的聚合:
javascript
-- 复杂聚合:销售分析
db.orders.aggregate([
    {
        $match: {
            status: "completed",
            created_at: {
                $gte: ISODate("2025-01-01"),
                $lt: ISODate("2025-02-01")
            }
        }
    },
    {
        $unwind: "$items" // 展开订单项
    },
    {
        $group: {
            _id: {
                product_id: "$items.product_id",
                customer_region: "$customer.region"
            },
            total_quantity: { $sum: "$items.quantity" },
            total_revenue: { $sum: "$items.total_price" },
            order_count: { $sum: 1 },
            avg_order_value: { $avg: "$items.total_price" }
        }
    },
    {
        $group: {
            _id: "$_id.product_id",
            regions: {
                $push: {
                    region: "$_id.customer_region",
                    quantity: "$total_quantity",
                    revenue: "$total_revenue"
                }
            },
            total_quantity: { $sum: "$total_quantity" },
            total_revenue: { $sum: "$total_revenue" }
        }
    },
    {
        $sort: { total_revenue: -1 }
    }
])

Indexing Strategies

索引策略

PostgreSQL Indexes

PostgreSQL索引

B-tree Indexes (Default):
sql
-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters!)
CREATE INDEX idx_posts_author_published
ON posts(author_id, published_at);

-- Query can use index:
-- SELECT * FROM posts WHERE author_id = 123 ORDER BY published_at;
-- SELECT * FROM posts WHERE author_id = 123 AND published_at > '2025-01-01';

-- Query CANNOT fully use index:
-- SELECT * FROM posts WHERE published_at > '2025-01-01'; (only uses first column)
Partial Indexes:
sql
-- Index only active users
CREATE INDEX idx_active_users
ON users(username)
WHERE active = true;

-- Index only recent orders
CREATE INDEX idx_recent_orders
ON orders(created_at, status)
WHERE created_at > '2024-01-01';

-- Benefits: Smaller index size, faster queries on filtered data
Expression Indexes:
sql
-- Index on lowercase email for case-insensitive search
CREATE INDEX idx_users_email_lower
ON users(LOWER(email));

-- Query that uses this index:
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- Index on JSONB field extraction
CREATE INDEX idx_metadata_tags
ON products((metadata->>'category'));
Full-Text Search Indexes:
sql
-- Add tsvector column for full-text search
ALTER TABLE articles
ADD COLUMN tsv_content tsvector;

-- Populate tsvector column
UPDATE articles
SET tsv_content = to_tsvector('english', title || ' ' || content);

-- Create GIN index for full-text search
CREATE INDEX idx_articles_tsv ON articles USING GIN(tsv_content);

-- Full-text search query
SELECT title, ts_rank(tsv_content, query) as rank
FROM articles, to_tsquery('english', 'database & design') query
WHERE tsv_content @@ query
ORDER BY rank DESC;

-- Trigger to auto-update tsvector
CREATE TRIGGER articles_tsv_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(tsv_content, 'pg_catalog.english', title, content);
JSONB Indexes:
sql
-- GIN index for JSONB containment queries
CREATE INDEX idx_products_metadata
ON products USING GIN(metadata);

-- Queries that use this index:
SELECT * FROM products WHERE metadata @> '{"color": "blue"}';
SELECT * FROM products WHERE metadata ? 'size';

-- Index on specific JSONB path
CREATE INDEX idx_products_category
ON products((metadata->>'category'));
Index Monitoring:
sql
-- Find unused indexes
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- Check index usage
SELECT
    relname as table_name,
    indexrelname as index_name,
    idx_scan as times_used,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
B-tree索引(默认):
sql
-- 单列索引
CREATE INDEX idx_users_email ON users(email);

-- 复合索引(列顺序很重要!)
CREATE INDEX idx_posts_author_published
ON posts(author_id, published_at);

-- 可使用索引的查询:
-- SELECT * FROM posts WHERE author_id = 123 ORDER BY published_at;
-- SELECT * FROM posts WHERE author_id = 123 AND published_at > '2025-01-01';

-- 无法充分使用索引的查询:
-- SELECT * FROM posts WHERE published_at > '2025-01-01';(仅使用第一列)
部分索引:
sql
-- 仅为活跃用户建立索引
CREATE INDEX idx_active_users
ON users(username)
WHERE active = true;

-- 仅为近期订单建立索引
CREATE INDEX idx_recent_orders
ON orders(created_at, status)
WHERE created_at > '2024-01-01';

-- 优势:索引体积更小,过滤后数据的查询速度更快
表达式索引:
sql
-- 为小写邮箱建立索引,支持大小写不敏感搜索
CREATE INDEX idx_users_email_lower
ON users(LOWER(email));

-- 使用该索引的查询:
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- 为JSONB字段提取值建立索引
CREATE INDEX idx_metadata_tags
ON products((metadata->>'category'));
全文搜索索引:
sql
-- 添加tsvector字段用于全文搜索
ALTER TABLE articles
ADD COLUMN tsv_content tsvector;

-- 填充tsvector字段
UPDATE articles
SET tsv_content = to_tsvector('english', title || ' ' || content);

-- 为全文搜索创建GIN索引
CREATE INDEX idx_articles_tsv ON articles USING GIN(tsv_content);

-- 全文搜索查询
SELECT title, ts_rank(tsv_content, query) as rank
FROM articles, to_tsquery('english', 'database & design') query
WHERE tsv_content @@ query
ORDER BY rank DESC;

-- 自动更新tsvector的触发器
CREATE TRIGGER articles_tsv_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(tsv_content, 'pg_catalog.english', title, content);
JSONB索引:
sql
-- 为JSONB包含查询创建GIN索引
CREATE INDEX idx_products_metadata
ON products USING GIN(metadata);

-- 使用该索引的查询:
SELECT * FROM products WHERE metadata @> '{"color": "blue"}';
SELECT * FROM products WHERE metadata ? 'size';

-- 为特定JSONB路径建立索引
CREATE INDEX idx_products_category
ON products((metadata->>'category'));
索引监控:
sql
-- 查找未使用的索引
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- 检查索引使用情况
SELECT
    relname as table_name,
    indexrelname as index_name,
    idx_scan as times_used,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

MongoDB Indexes

MongoDB索引

Single Field Indexes:
javascript
// Create index on single field
db.users.createIndex({ email: 1 }) // 1 = ascending, -1 = descending

// Unique index
db.users.createIndex({ username: 1 }, { unique: true })

// Sparse index (only index documents with the field)
db.users.createIndex({ phone_number: 1 }, { sparse: true })
Compound Indexes:
javascript
// Index on multiple fields (order matters!)
db.posts.createIndex({ author_id: 1, published_at: -1 })

// Efficient queries:
// - { author_id: "123" }
// - { author_id: "123", published_at: { $gte: ... } }
// - { author_id: "123" } with sort by published_at

// Inefficient:
// - { published_at: { $gte: ... } } alone (doesn't use index efficiently)

// ESR Rule: Equality, Sort, Range
// Best compound index order:
// 1. Equality filters first
// 2. Sort fields second
// 3. Range filters last
db.orders.createIndex({
    status: 1,           // Equality
    created_at: -1,      // Sort
    total_amount: 1      // Range
})
Multikey Indexes (Array Fields):
javascript
// Index on array field
db.posts.createIndex({ tags: 1 })

// Document with array
{
    _id: ObjectId("..."),
    title: "Database Design",
    tags: ["database", "mongodb", "schema"]
}

// Query that uses multikey index
db.posts.find({ tags: "mongodb" })
db.posts.find({ tags: { $in: ["database", "nosql"] } })

// Compound multikey index (max one array field)
db.posts.createIndex({ tags: 1, published_at: -1 }) // Valid
// db.posts.createIndex({ tags: 1, categories: 1 }) // Invalid if both are arrays
Text Indexes:
javascript
// Create text index for full-text search
db.articles.createIndex({
    title: "text",
    content: "text"
})

// Text search query
db.articles.find({
    $text: { $search: "database design patterns" }
})

// Search with relevance score
db.articles.find(
    { $text: { $search: "database design" } },
    { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

// Weighted text index (prioritize title over content)
db.articles.createIndex(
    { title: "text", content: "text" },
    { weights: { title: 10, content: 5 } }
)
Geospatial Indexes:
javascript
// 2dsphere index for geographic queries
db.locations.createIndex({ coordinates: "2dsphere" })

// Document format
{
    name: "Coffee Shop",
    coordinates: {
        type: "Point",
        coordinates: [-122.4194, 37.7749] // [longitude, latitude]
    }
}

// Find locations near a point
db.locations.find({
    coordinates: {
        $near: {
            $geometry: {
                type: "Point",
                coordinates: [-122.4194, 37.7749]
            },
            $maxDistance: 1000 // meters
        }
    }
})
Index Properties:
javascript
// TTL Index (auto-delete documents after time)
db.sessions.createIndex(
    { created_at: 1 },
    { expireAfterSeconds: 3600 } // 1 hour
)

// Partial Index (index subset of documents)
db.orders.createIndex(
    { status: 1, created_at: -1 },
    { partialFilterExpression: { status: { $eq: "pending" } } }
)

// Case-insensitive index
db.users.createIndex(
    { email: 1 },
    { collation: { locale: "en", strength: 2 } }
)

// Background index creation (doesn't block operations)
db.large_collection.createIndex(
    { field: 1 },
    { background: true }
)
Index Analysis:
javascript
// Explain query execution
db.posts.find({ author_id: "123" }).explain("executionStats")

// Check index usage
db.posts.aggregate([
    { $indexStats: {} }
])

// List all indexes on collection
db.posts.getIndexes()

// Drop unused index
db.posts.dropIndex("index_name")
单字段索引:
javascript
-- 为单个字段创建索引
db.users.createIndex({ email: 1 }) // 1 = 升序, -1 = 降序

-- 唯一索引
db.users.createIndex({ username: 1 }, { unique: true })

-- 稀疏索引(仅为包含该字段的文档建立索引)
db.users.createIndex({ phone_number: 1 }, { sparse: true })
复合索引:
javascript
-- 为多个字段创建索引(列顺序很重要!)
db.posts.createIndex({ author_id: 1, published_at: -1 })

-- 高效查询:
-- - { author_id: "123" }
-- - { author_id: "123", published_at: { $gte: ... } }
-- - { author_id: "123" } 并按published_at排序

-- 低效查询:
-- - 单独使用{ published_at: { $gte: ... } }(无法高效使用索引)

-- ESR规则:相等条件、排序、范围
-- 复合索引的最佳顺序:
-- 1. 相等过滤字段优先
-- 2. 排序字段次之
-- 3. 范围过滤字段最后
db.orders.createIndex({
    status: 1,           // 相等条件
    created_at: -1,      // 排序
    total_amount: 1      // 范围
})
多键索引(数组字段):
javascript
-- 为数组字段创建索引
db.posts.createIndex({ tags: 1 })

-- 带数组的文档
{
    _id: ObjectId("..."),
    title: "数据库设计",
    tags: ["database", "mongodb", "schema"]
}

-- 使用多键索引的查询
db.posts.find({ tags: "mongodb" })
db.posts.find({ tags: { $in: ["database", "nosql"] } })

-- 复合多键索引(最多包含一个数组字段)
db.posts.createIndex({ tags: 1, published_at: -1 }) // 有效
-- db.posts.createIndex({ tags: 1, categories: 1 }) // 无效,如果两者都是数组
文本索引:
javascript
-- 创建全文搜索的文本索引
db.articles.createIndex({
    title: "text",
    content: "text"
})

-- 文本搜索查询
db.articles.find({
    $text: { $search: "database design patterns" }
})

-- 带相关性得分的搜索
db.articles.find(
    { $text: { $search: "database design" } },
    { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

-- 加权文本索引(标题权重高于内容)
db.articles.createIndex(
    { title: "text", content: "text" },
    { weights: { title: 10, content: 5 } }
)
地理空间索引:
javascript
-- 用于地理查询的2dsphere索引
db.locations.createIndex({ coordinates: "2dsphere" })

-- 文档格式
{
    name: "咖啡店",
    coordinates: {
        type: "Point",
        coordinates: [-122.4194, 37.7749] // [经度, 纬度]
    }
}

-- 查找指定点附近的位置
db.locations.find({
    coordinates: {
        $near: {
            $geometry: {
                type: "Point",
                coordinates: [-122.4194, 37.7749]
            },
            $maxDistance: 1000 // 米
        }
    }
})
索引属性:
javascript
-- TTL索引(一段时间后自动删除文档)
db.sessions.createIndex(
    { created_at: 1 },
    { expireAfterSeconds: 3600 } // 1小时
)

-- 部分索引(仅为文档子集建立索引)
db.orders.createIndex(
    { status: 1, created_at: -1 },
    { partialFilterExpression: { status: { $eq: "pending" } } }
)

-- 大小写不敏感索引
db.users.createIndex(
    { email: 1 },
    { collation: { locale: "en", strength: 2 } }
)

-- 后台创建索引(不阻塞其他操作)
db.large_collection.createIndex(
    { field: 1 },
    { background: true }
)
索引分析:
javascript
-- 解释查询执行计划
db.posts.find({ author_id: "123" }).explain("executionStats")

-- 检查索引使用情况
db.posts.aggregate([
    { $indexStats: {} }
])

-- 列出集合的所有索引
db.posts.getIndexes()

-- 删除未使用的索引
db.posts.dropIndex("index_name")

Transactions

事务

PostgreSQL Transaction Management

PostgreSQL事务管理

Basic Transactions:
sql
-- Explicit transaction
BEGIN;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

COMMIT;
-- or ROLLBACK; to cancel changes
Savepoints (Partial Rollback):
sql
BEGIN;

UPDATE inventory SET quantity = quantity - 10 WHERE product_id = 'PROD-123';

SAVEPOINT before_audit;

INSERT INTO audit_log (action, details) VALUES ('inventory_update', '...');
-- Oops, error in audit log

ROLLBACK TO SAVEPOINT before_audit;
-- Inventory update preserved, audit insert rolled back

-- Fix and retry
INSERT INTO audit_log (action, details) VALUES ('inventory_update', 'correct details');

COMMIT;
Isolation Levels:
sql
-- Read Uncommitted (not supported in PostgreSQL, defaults to Read Committed)
-- Read Committed (default) - sees only committed data
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Repeatable Read - sees snapshot at transaction start
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM accounts WHERE id = 1; -- Returns balance 1000
-- Another transaction updates balance to 1500 and commits
SELECT * FROM accounts WHERE id = 1; -- Still returns 1000 (repeatable read)
COMMIT;

-- Serializable - strictest isolation, prevents all anomalies
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- If concurrent transactions would violate serializability, one aborts
COMMIT;
Advisory Locks (Application-Level Locking):
sql
-- Exclusive lock on arbitrary number
SELECT pg_advisory_lock(12345);
-- ... perform critical operation ...
SELECT pg_advisory_unlock(12345);

-- Try lock (non-blocking)
SELECT pg_try_advisory_lock(12345); -- Returns true if acquired, false otherwise

-- Session-level advisory lock (auto-released on disconnect)
SELECT pg_advisory_lock(user_id);
Row-Level Locking:
sql
-- SELECT FOR UPDATE - lock rows for update
BEGIN;

SELECT * FROM products
WHERE id = 123
FOR UPDATE; -- Locks this row

UPDATE products SET quantity = quantity - 1 WHERE id = 123;

COMMIT;

-- SELECT FOR SHARE - shared lock (allows other reads, blocks writes)
SELECT * FROM products WHERE id = 123 FOR SHARE;

-- SKIP LOCKED - skip locked rows instead of waiting
SELECT * FROM queue
WHERE processed = false
ORDER BY priority
LIMIT 10
FOR UPDATE SKIP LOCKED;
基础事务:
sql
-- 显式事务
BEGIN;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

COMMIT;
-- 或使用ROLLBACK; 取消变更
保存点(部分回滚):
sql
BEGIN;

UPDATE inventory SET quantity = quantity - 10 WHERE product_id = 'PROD-123';

SAVEPOINT before_audit;

INSERT INTO audit_log (action, details) VALUES ('inventory_update', '...');
-- 哦,审计日志插入出错了

ROLLBACK TO SAVEPOINT before_audit;
-- 库存更新保留,审计插入回滚

-- 修复后重试
INSERT INTO audit_log (action, details) VALUES ('inventory_update', 'correct details');

COMMIT;
隔离级别:
sql
-- 读未提交(PostgreSQL不支持,默认降级为读已提交)
-- 读已提交(默认)- 仅能看到已提交的数据
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- 可重复读 - 事务启动时看到的数据快照
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM accounts WHERE id = 1; -- 返回余额1000
-- 另一个事务将余额更新为1500并提交
SELECT * FROM accounts WHERE id = 1; -- 仍返回1000(可重复读)
COMMIT;

-- 可串行化 - 最严格的隔离级别,防止所有异常
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- 如果并发事务会破坏可串行性,其中一个会中止
COMMIT;
咨询锁(应用级锁):
sql
-- 对任意数字加排他锁
SELECT pg_advisory_lock(12345);
-- ... 执行关键操作 ...
SELECT pg_advisory_unlock(12345);

-- 尝试加锁(非阻塞)
SELECT pg_try_advisory_lock(12345); -- 获取成功返回true,否则返回false

-- 会话级咨询锁(断开连接时自动释放)
SELECT pg_advisory_lock(user_id);
行级锁:
sql
-- SELECT FOR UPDATE - 锁定行用于更新
BEGIN;

SELECT * FROM products
WHERE id = 123
FOR UPDATE; -- 锁定该行

UPDATE products SET quantity = quantity - 1 WHERE id = 123;

COMMIT;

-- SELECT FOR SHARE - 共享锁(允许其他读取,阻止写入)
SELECT * FROM products WHERE id = 123 FOR SHARE;

-- SKIP LOCKED - 跳过已锁定的行,而非等待
SELECT * FROM queue
WHERE processed = false
ORDER BY priority
LIMIT 10
FOR UPDATE SKIP LOCKED;

MongoDB Transactions

MongoDB事务

Multi-Document Transactions:
javascript
// Transactions require replica set or sharded cluster
const session = db.getMongo().startSession()
session.startTransaction()

try {
    const accountsCol = session.getDatabase("mydb").accounts

    // Debit account
    accountsCol.updateOne(
        { _id: "account1" },
        { $inc: { balance: -100 } },
        { session }
    )

    // Credit account
    accountsCol.updateOne(
        { _id: "account2" },
        { $inc: { balance: 100 } },
        { session }
    )

    // Commit transaction
    session.commitTransaction()
} catch (error) {
    // Abort on error
    session.abortTransaction()
    throw error
} finally {
    session.endSession()
}
Read and Write Concerns:
javascript
// Write Concern: Acknowledgment level
db.orders.insertOne(
    { customer_id: "123", items: [...] },
    {
        writeConcern: {
            w: "majority",        // Wait for majority of replica set
            j: true,              // Wait for journal write
            wtimeout: 5000        // Timeout after 5 seconds
        }
    }
)

// Read Concern: Data consistency level
db.orders.find(
    { status: "pending" }
).readConcern("majority") // Only return data acknowledged by majority

// Read Preference: Which replica to read from
db.orders.find({ ... }).readPref("secondary") // Read from secondary replica
Atomic Operations (Single Document):
javascript
// Single document updates are atomic by default
db.counters.updateOne(
    { _id: "page_views" },
    {
        $inc: { count: 1 },
        $set: { last_updated: new Date() }
    }
)

// Atomic array operations
db.posts.updateOne(
    { _id: ObjectId("...") },
    {
        $push: {
            comments: {
                $each: [{ author: "John", text: "Great!" }],
                $position: 0 // Insert at beginning
            }
        }
    }
)

// Find and modify (atomic read-modify-write)
db.queue.findOneAndUpdate(
    { status: "pending" },
    { $set: { status: "processing", processor_id: "worker-1" } },
    {
        sort: { priority: -1 },
        returnDocument: "after" // Return updated document
    }
)
多文档事务:
javascript
-- 事务需要副本集或分片集群
const session = db.getMongo().startSession()
session.startTransaction()

try {
    const accountsCol = session.getDatabase("mydb").accounts

    -- 扣款
    accountsCol.updateOne(
        { _id: "account1" },
        { $inc: { balance: -100 } },
        { session }
    )

    -- 存款
    accountsCol.updateOne(
        { _id: "account2" },
        { $inc: { balance: 100 } },
        { session }
    )

    -- 提交事务
    session.commitTransaction()
} catch (error) {
    -- 出错时中止
    session.abortTransaction()
    throw error
} finally {
    session.endSession()
}
读写关注点:
javascript
-- 写关注点:确认级别
db.orders.insertOne(
    { customer_id: "123", items: [...] },
    {
        writeConcern: {
            w: "majority",        // 等待副本集多数节点确认
            j: true,              // 等待日志写入
            wtimeout: 5000        // 5秒超时
        }
    }
)

-- 读关注点:数据一致性级别
db.orders.find(
    { status: "pending" }
).readConcern("majority") // 仅返回多数节点确认的数据

-- 读偏好:从哪个副本读取
db.orders.find({ ... }).readPref("secondary") // 从从节点读取
原子操作(单文档):
javascript
-- 单文档更新默认是原子的
db.counters.updateOne(
    { _id: "page_views" },
    {
        $inc: { count: 1 },
        $set: { last_updated: new Date() }
    }
)

-- 原子数组操作
db.posts.updateOne(
    { _id: ObjectId("...") },
    {
        $push: {
            comments: {
                $each: [{ author: "John", text: "Great!" }],
                $position: 0 // 插入到开头
            }
        }
    }
)

-- 查找并修改(原子读--写)
db.queue.findOneAndUpdate(
    { status: "pending" },
    { $set: { status: "processing", processor_id: "worker-1" } },
    {
        sort: { priority: -1 },
        returnDocument: "after" // 返回更新后的文档
    }
)

Replication

复制

PostgreSQL Replication

PostgreSQL复制

Streaming Replication (Primary-Standby):
sql
-- Primary server configuration (postgresql.conf)
wal_level = replica
max_wal_senders = 10
wal_keep_size = '1GB'
hot_standby = on

-- Create replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';

-- pg_hba.conf on primary
host replication replicator standby_ip/32 md5

-- Standby server (recovery.conf or postgresql.auto.conf)
primary_conninfo = 'host=primary_ip port=5432 user=replicator password=...'
restore_command = 'cp /var/lib/postgresql/archive/%f %p'
Logical Replication (Selective Replication):
sql
-- On publisher (source)
CREATE PUBLICATION my_publication FOR TABLE users, posts;
-- or FOR ALL TABLES;

-- On subscriber (destination)
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher_ip dbname=mydb user=replicator password=...'
PUBLICATION my_publication;

-- Monitor replication
SELECT * FROM pg_stat_replication;
SELECT * FROM pg_replication_slots;
Failover and Promotion:
sql
-- Promote standby to primary
pg_ctl promote -D /var/lib/postgresql/data

-- Check replication lag
SELECT
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    sync_state,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;
流复制(主从架构):
sql
-- 主服务器配置(postgresql.conf)
wal_level = replica
max_wal_senders = 10
wal_keep_size = '1GB'
hot_standby = on

-- 创建复制用户
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';

-- 主服务器的pg_hba.conf
host replication replicator standby_ip/32 md5

-- 从服务器(recovery.conf或postgresql.auto.conf)
primary_conninfo = 'host=primary_ip port=5432 user=replicator password=...'
restore_command = 'cp /var/lib/postgresql/archive/%f %p'
逻辑复制(选择性复制):
sql
-- 在发布端(源)
CREATE PUBLICATION my_publication FOR TABLE users, posts;
-- 或使用FOR ALL TABLES;

-- 在订阅端(目标)
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher_ip dbname=mydb user=replicator password=...'
PUBLICATION my_publication;

-- 监控复制状态
SELECT * FROM pg_stat_replication;
SELECT * FROM pg_replication_slots;
故障转移与提升:
sql
-- 将从服务器提升为主服务器
pg_ctl promote -D /var/lib/postgresql/data

-- 检查复制延迟
SELECT
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    sync_state,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;

MongoDB Replication

MongoDB复制

Replica Set Configuration:
javascript
// Initialize replica set
rs.initiate({
    _id: "myReplicaSet",
    members: [
        { _id: 0, host: "mongodb1.example.com:27017", priority: 2 },
        { _id: 1, host: "mongodb2.example.com:27017", priority: 1 },
        { _id: 2, host: "mongodb3.example.com:27017", priority: 1 }
    ]
})

// Add member to existing replica set
rs.add("mongodb4.example.com:27017")

// Remove member
rs.remove("mongodb4.example.com:27017")

// Check replica set status
rs.status()

// Check replication lag
rs.printSecondaryReplicationInfo()
Replica Set Roles:
javascript
// Priority 0 member (cannot become primary)
rs.add({
    host: "analytics.example.com:27017",
    priority: 0,
    hidden: true // Hidden from application drivers
})

// Arbiter (voting only, no data)
rs.addArb("arbiter.example.com:27017")

// Delayed member (disaster recovery)
rs.add({
    host: "delayed.example.com:27017",
    priority: 0,
    hidden: true,
    slaveDelay: 3600 // 1 hour behind
})
Read Preference Configuration:
javascript
// Application connection with read preference
const client = new MongoClient(uri, {
    readPreference: "secondaryPreferred", // Try secondary, fallback to primary
    readConcernLevel: "majority"
})

// Read Preference Modes:
// - primary (default): Read from primary only
// - primaryPreferred: Primary if available, else secondary
// - secondary: Read from secondary only
// - secondaryPreferred: Secondary if available, else primary
// - nearest: Read from nearest member (lowest latency)
副本集配置:
javascript
-- 初始化副本集
rs.initiate({
    _id: "myReplicaSet",
    members: [
        { _id: 0, host: "mongodb1.example.com:27017", priority: 2 },
        { _id: 1, host: "mongodb2.example.com:27017", priority: 1 },
        { _id: 2, host: "mongodb3.example.com:27017", priority: 1 }
    ]
})

-- 为现有副本集添加节点
rs.add("mongodb4.example.com:27017")

-- 删除节点
rs.remove("mongodb4.example.com:27017")

-- 检查副本集状态
rs.status()

-- 检查复制延迟
rs.printSecondaryReplicationInfo()
副本集角色:
javascript
-- 优先级0节点(无法成为主节点)
rs.add({
    host: "analytics.example.com:27017",
    priority: 0,
    hidden: true // 对应用驱动隐藏
})

-- 仲裁节点(仅投票,不存储数据)
rs.addArb("arbiter.example.com:27017")

-- 延迟节点(灾难恢复)
rs.add({
    host: "delayed.example.com:27017",
    priority: 0,
    hidden: true,
    slaveDelay: 3600 // 延迟1小时
})
读偏好配置:
javascript
-- 应用连接时配置读偏好
const client = new MongoClient(uri, {
    readPreference: "secondaryPreferred", // 优先从从节点读取,主节点作为 fallback
    readConcernLevel: "majority"
})

-- 读偏好模式:
-- - primary(默认):仅从主节点读取
-- - primaryPreferred:主节点可用则读主节点,否则读从节点
-- - secondary:仅从从节点读取
-- - secondaryPreferred:从节点可用则读从节点,否则读主节点
-- - nearest:从延迟最低的节点读取

Sharding

分片

MongoDB Sharding Architecture

MongoDB分片架构

Shard Key Selection:
javascript
// Good shard key characteristics:
// 1. High cardinality (many distinct values)
// 2. Even distribution
// 3. Query isolation (queries target specific shards)

// Example: User-based application
sh.shardCollection("mydb.users", { user_id: "hashed" })

// Hashed shard key: Even distribution, random data location
sh.shardCollection("mydb.events", { event_id: "hashed" })

// Range-based shard key: Ordered data, good for range queries
sh.shardCollection("mydb.logs", { timestamp: 1, server_id: 1 })

// Compound shard key
sh.shardCollection("mydb.orders", {
    customer_region: 1,  // Coarse grouping
    order_date: 1        // Fine grouping
})
Sharding Setup:
javascript
// 1. Start config servers (replica set)
mongod --configsvr --replSet configRS --port 27019

// 2. Initialize config server replica set
rs.initiate({
    _id: "configRS",
    configsvr: true,
    members: [
        { _id: 0, host: "cfg1.example.com:27019" },
        { _id: 1, host: "cfg2.example.com:27019" },
        { _id: 2, host: "cfg3.example.com:27019" }
    ]
})

// 3. Start shard servers (each is a replica set)
mongod --shardsvr --replSet shard1RS --port 27018

// 4. Start mongos (query router)
mongos --configdb configRS/cfg1.example.com:27019,cfg2.example.com:27019

// 5. Add shards to cluster
sh.addShard("shard1RS/shard1-a.example.com:27018")
sh.addShard("shard2RS/shard2-a.example.com:27018")

// 6. Enable sharding on database
sh.enableSharding("mydb")

// 7. Shard collections
sh.shardCollection("mydb.users", { user_id: "hashed" })
Query Targeting:
javascript
// Targeted query (includes shard key)
db.users.find({ user_id: "12345" })
// Routes to single shard

// Scatter-gather query (no shard key)
db.users.find({ email: "user@example.com" })
// Queries all shards, merges results

// Check query targeting
db.users.find({ user_id: "12345" }).explain()
// Look for "SINGLE_SHARD" vs "ALL_SHARDS"
Zone Sharding (Geographic Distribution):
javascript
// Define zones for geographic sharding
sh.addShardToZone("shard1", "US")
sh.addShardToZone("shard2", "EU")

// Define zone ranges
sh.updateZoneKeyRange(
    "mydb.users",
    { region: "US", user_id: MinKey },
    { region: "US", user_id: MaxKey },
    "US"
)

sh.updateZoneKeyRange(
    "mydb.users",
    { region: "EU", user_id: MinKey },
    { region: "EU", user_id: MaxKey },
    "EU"
)

// Shard collection with zone-aware key
sh.shardCollection("mydb.users", { region: 1, user_id: 1 })
分片键选择:
javascript
-- 优秀分片键的特征:
-- 1. 高基数(大量不同值)
-- 2. 分布均匀
-- 3. 查询隔离(查询指向特定分片)

-- 示例:基于用户的应用
sh.shardCollection("mydb.users", { user_id: "hashed" })

-- 哈希分片键:分布均匀,数据位置随机
sh.shardCollection("mydb.events", { event_id: "hashed" })

-- 范围分片键:数据有序,适合范围查询
sh.shardCollection("mydb.logs", { timestamp: 1, server_id: 1 })

-- 复合分片键
sh.shardCollection("mydb.orders", {
    customer_region: 1,  // 粗粒度分组
    order_date: 1        // 细粒度分组
})
分片搭建:
javascript
-- 1. 启动配置服务器(副本集)
mongod --configsvr --replSet configRS --port 27019

-- 2. 初始化配置服务器副本集
rs.initiate({
    _id: "configRS",
    configsvr: true,
    members: [
        { _id: 0, host: "cfg1.example.com:27019" },
        { _id: 1, host: "cfg2.example.com:27019" },
        { _id: 2, host: "cfg3.example.com:27019" }
    ]
})

-- 3. 启动分片服务器(每个都是副本集)
mongod --shardsvr --replSet shard1RS --port 27018

-- 4. 启动mongos(查询路由器)
mongos --configdb configRS/cfg1.example.com:27019,cfg2.example.com:27019

-- 5. 向集群添加分片
sh.addShard("shard1RS/shard1-a.example.com:27018")
sh.addShard("shard2RS/shard2-a.example.com:27018")

-- 6. 为数据库启用分片
sh.enableSharding("mydb")

-- 7. 为集合分片
sh.shardCollection("mydb.users", { user_id: "hashed" })
查询定位:
javascript
-- 定向查询(包含分片键)
db.users.find({ user_id: "12345" })
-- 路由到单个分片

-- 散射-聚集查询(无分片键)
db.users.find({ email: "user@example.com" })
-- 查询所有分片,合并结果

-- 检查查询定位
db.users.find({ user_id: "12345" }).explain()
-- 查找"SINGLE_SHARD" vs "ALL_SHARDS"
区域分片(地理分布):
javascript
-- 为地理分片定义区域
sh.addShardToZone("shard1", "US")
sh.addShardToZone("shard2", "EU")

-- 定义区域范围
sh.updateZoneKeyRange(
    "mydb.users",
    { region: "US", user_id: MinKey },
    { region: "US", user_id: MaxKey },
    "US"
)

sh.updateZoneKeyRange(
    "mydb.users",
    { region: "EU", user_id: MinKey },
    { region: "EU", user_id: MaxKey },
    "EU"
)

-- 使用支持区域的键分片集合
sh.shardCollection("mydb.users", { region: 1, user_id: 1 })

PostgreSQL Horizontal Partitioning

PostgreSQL水平分区

Declarative Partitioning:
sql
-- Range partitioning
CREATE TABLE logs (
    id BIGSERIAL,
    log_time TIMESTAMP NOT NULL,
    message TEXT,
    level VARCHAR(10)
) PARTITION BY RANGE (log_time);

-- Create partitions
CREATE TABLE logs_2025_01 PARTITION OF logs
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE logs_2025_02 PARTITION OF logs
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- List partitioning
CREATE TABLE customers (
    id SERIAL,
    name VARCHAR(255),
    region VARCHAR(50)
) PARTITION BY LIST (region);

CREATE TABLE customers_us PARTITION OF customers
FOR VALUES IN ('US', 'CA', 'MX');

CREATE TABLE customers_eu PARTITION OF customers
FOR VALUES IN ('UK', 'DE', 'FR', 'IT');

-- Hash partitioning
CREATE TABLE events (
    id BIGSERIAL,
    event_type VARCHAR(50),
    data JSONB
) PARTITION BY HASH (id);

CREATE TABLE events_0 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE events_1 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
-- ... events_2 and events_3
Partition Pruning (Query Optimization):
sql
-- Query automatically uses only relevant partition
SELECT * FROM logs
WHERE log_time BETWEEN '2025-01-15' AND '2025-01-20';
-- Only scans logs_2025_01 partition

-- Check query plan
EXPLAIN SELECT * FROM logs WHERE log_time > '2025-01-01';
-- Shows which partitions are scanned
声明式分区:
sql
-- 范围分区
CREATE TABLE logs (
    id BIGSERIAL,
    log_time TIMESTAMP NOT NULL,
    message TEXT,
    level VARCHAR(10)
) PARTITION BY RANGE (log_time);

-- 创建分区
CREATE TABLE logs_2025_01 PARTITION OF logs
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE logs_2025_02 PARTITION OF logs
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- 列表分区
CREATE TABLE customers (
    id SERIAL,
    name VARCHAR(255),
    region VARCHAR(50)
) PARTITION BY LIST (region);

CREATE TABLE customers_us PARTITION OF customers
FOR VALUES IN ('US', 'CA', 'MX');

CREATE TABLE customers_eu PARTITION OF customers
FOR VALUES IN ('UK', 'DE', 'FR', 'IT');

-- 哈希分区
CREATE TABLE events (
    id BIGSERIAL,
    event_type VARCHAR(50),
    data JSONB
) PARTITION BY HASH (id);

CREATE TABLE events_0 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE events_1 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
-- ... events_2 和 events_3
分区裁剪(查询优化):
sql
-- 查询自动仅使用相关分区
SELECT * FROM logs
WHERE log_time BETWEEN '2025-01-15' AND '2025-01-20';
-- 仅扫描logs_2025_01分区

-- 检查查询计划
EXPLAIN SELECT * FROM logs WHERE log_time > '2025-01-01';
-- 显示扫描的分区

Performance Tuning

性能调优

Query Optimization Techniques

查询优化技巧

PostgreSQL Query Analysis:
sql
-- Basic explain
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

-- Analyze with actual execution statistics
EXPLAIN ANALYZE
SELECT u.username, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.active = true
GROUP BY u.id, u.username
ORDER BY post_count DESC
LIMIT 10;

-- Identify slow queries
SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time,
    max_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- Table statistics
ANALYZE users; -- Update query planner statistics

-- Vacuum and analyze
VACUUM ANALYZE posts; -- Reclaim space and update stats
Common Query Patterns:
sql
-- Avoid SELECT * (retrieve only needed columns)
-- BAD
SELECT * FROM users WHERE id = 123;

-- GOOD
SELECT id, username, email FROM users WHERE id = 123;

-- Use EXISTS instead of IN for large subqueries
-- BAD
SELECT * FROM posts WHERE author_id IN (
    SELECT id FROM users WHERE active = true
);

-- GOOD
SELECT * FROM posts p WHERE EXISTS (
    SELECT 1 FROM users u
    WHERE u.id = p.author_id AND u.active = true
);

-- Use JOINs instead of multiple queries
-- BAD (N+1 query problem)
-- SELECT * FROM posts;
-- Then for each post: SELECT * FROM users WHERE id = post.author_id;

-- GOOD
SELECT p.*, u.username, u.email
FROM posts p
JOIN users u ON p.author_id = u.id;

-- Window functions instead of self-joins
-- Calculate running total
SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders;

-- Rank within groups
SELECT
    category,
    product_name,
    sales,
    RANK() OVER (PARTITION BY category ORDER BY sales DESC) as rank_in_category
FROM products;
MongoDB Query Optimization:
javascript
// Use projection to limit returned fields
// BAD
db.users.find({ active: true })

// GOOD
db.users.find(
    { active: true },
    { username: 1, email: 1, _id: 0 }
)

// Use covered queries (index covers all fields)
db.users.createIndex({ username: 1, email: 1 })
db.users.find(
    { username: "john_doe" },
    { username: 1, email: 1, _id: 0 }
) // Entire query served from index

// Avoid negation operators
// BAD (cannot use index efficiently)
db.products.find({ status: { $ne: "discontinued" } })

// GOOD
db.products.find({ status: { $in: ["active", "pending", "sold"] } })

// Use $lookup sparingly (expensive operation)
// Consider embedding data instead if appropriate

// Aggregation optimization: Filter early
// BAD
db.orders.aggregate([
    { $lookup: { ... } },        // Expensive join
    { $match: { status: "completed" } } // Filter after join
])

// GOOD
db.orders.aggregate([
    { $match: { status: "completed" } }, // Filter first
    { $lookup: { ... } }                  // Join fewer documents
])
PostgreSQL查询分析:
sql
-- 基础执行计划
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

-- 带实际执行统计的分析
EXPLAIN ANALYZE
SELECT u.username, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.active = true
GROUP BY u.id, u.username
ORDER BY post_count DESC
LIMIT 10;

-- 识别慢查询
SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time,
    max_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- 表统计信息
ANALYZE users; -- 更新查询优化器统计信息

-- 清理与分析
VACUUM ANALYZE posts; -- 回收空间并更新统计信息
常见查询模式:
sql
-- 避免SELECT *(仅获取需要的列)
-- 糟糕写法
SELECT * FROM users WHERE id = 123;

-- 优秀写法
SELECT id, username, email FROM users WHERE id = 123;

-- 大子查询用EXISTS替代IN
-- 糟糕写法
SELECT * FROM posts WHERE author_id IN (
    SELECT id FROM users WHERE active = true
);

-- 优秀写法
SELECT * FROM posts p WHERE EXISTS (
    SELECT 1 FROM users u
    WHERE u.id = p.author_id AND u.active = true
);

-- 用JOIN替代多查询
-- 糟糕写法(N+1查询问题)
-- SELECT * FROM posts;
-- 然后对每篇文章执行:SELECT * FROM users WHERE id = post.author_id;

-- 优秀写法
SELECT p.*, u.username, u.email
FROM posts p
JOIN users u ON p.author_id = u.id;

-- 用窗口函数替代自连接
-- 计算累计总和
SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders;

-- 分组内排名
SELECT
    category,
    product_name,
    sales,
    RANK() OVER (PARTITION BY category ORDER BY sales DESC) as rank_in_category
FROM products;
MongoDB查询优化:
javascript
-- 使用投影限制返回字段
-- 糟糕写法
db.users.find({ active: true })

-- 优秀写法
db.users.find(
    { active: true },
    { username: 1, email: 1, _id: 0 }
)

-- 使用覆盖查询(索引包含所有字段)
db.users.createIndex({ username: 1, email: 1 })
db.users.find(
    { username: "john_doe" },
    { username: 1, email: 1, _id: 0 }
) -- 整个查询从索引获取

-- 避免否定运算符
-- 糟糕写法(无法高效使用索引)
db.products.find({ status: { $ne: "discontinued" } })

-- 优秀写法
db.products.find({ status: { $in: ["active", "pending", "sold"] } })

-- 谨慎使用$lookup(开销大)
-- 合适的话考虑嵌入数据

-- 聚合优化:尽早过滤
-- 糟糕写法
db.orders.aggregate([
    { $lookup: { ... } },        // 开销大的关联
    { $match: { status: "completed" } } // 关联后过滤
])

-- 优秀写法
db.orders.aggregate([
    { $match: { status: "completed" } }, // 先过滤
    { $lookup: { ... } }                  // 关联更少的文档
])

Connection Pooling

连接池

PostgreSQL Connection Pooling:
javascript
// Using node-postgres (pg) with pool
const { Pool } = require('pg')

const pool = new Pool({
    host: 'localhost',
    port: 5432,
    database: 'mydb',
    user: 'dbuser',
    password: 'secret',
    max: 20,              // Maximum pool size
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 2000
})

// Execute query
const result = await pool.query('SELECT * FROM users WHERE id = $1', [123])

// Use PgBouncer for server-side pooling
// pgbouncer.ini
// [databases]
// mydb = host=localhost port=5432 dbname=mydb
//
// [pgbouncer]
// pool_mode = transaction
// max_client_conn = 1000
// default_pool_size = 25
MongoDB Connection Pooling:
javascript
// MongoClient automatically manages connection pool
const { MongoClient } = require('mongodb')

const client = new MongoClient(uri, {
    maxPoolSize: 50,           // Max connections
    minPoolSize: 10,           // Min connections
    maxIdleTimeMS: 30000,      // Close idle connections
    waitQueueTimeoutMS: 5000   // Wait for available connection
})

await client.connect()
const db = client.db('mydb')
// Connection automatically returned to pool after use
PostgreSQL连接池:
javascript
-- 使用node-postgres(pg)的连接池
const { Pool } = require('pg')

const pool = new Pool({
    host: 'localhost',
    port: 5432,
    database: 'mydb',
    user: 'dbuser',
    password: 'secret',
    max: 20,              // 最大连接池大小
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 2000
})

-- 执行查询
const result = await pool.query('SELECT * FROM users WHERE id = $1', [123])

-- 使用PgBouncer实现服务端连接池
-- pgbouncer.ini
-- [databases]
-- mydb = host=localhost port=5432 dbname=mydb
--
-- [pgbouncer]
-- pool_mode = transaction
-- max_client_conn = 1000
-- default_pool_size = 25
MongoDB连接池:
javascript
-- MongoClient自动管理连接池
const { MongoClient } = require('mongodb')

const client = new MongoClient(uri, {
    maxPoolSize: 50,           // 最大连接数
    minPoolSize: 10,           // 最小连接数
    maxIdleTimeMS: 30000,      // 关闭空闲连接的时间
    waitQueueTimeoutMS: 5000   // 等待可用连接的超时时间
})

await client.connect()
const db = client.db('mydb')
-- 使用后连接自动返回连接池

Best Practices

最佳实践

PostgreSQL Best Practices

PostgreSQL最佳实践

  1. Schema Design
    • Normalize for data integrity, denormalize for performance
    • Use appropriate data types (avoid TEXT for short strings)
    • Define NOT NULL constraints where appropriate
    • Use SERIAL or UUID for primary keys consistently
  2. Indexing
    • Index foreign keys for JOIN performance
    • Create indexes on frequently filtered/sorted columns
    • Use partial indexes for selective queries
    • Monitor and remove unused indexes
    • Keep composite index column count reasonable (typically ≤ 3-4)
  3. Query Performance
    • Use EXPLAIN ANALYZE to understand query plans
    • Avoid SELECT * in application code
    • Use prepared statements to prevent SQL injection
    • Limit result sets with LIMIT
    • Use connection pooling
  4. Maintenance
    • Run VACUUM regularly (or enable autovacuum)
    • Update statistics with ANALYZE
    • Monitor slow query log
    • Set appropriate autovacuum thresholds
    • Regular backup with pg_dump or WAL archiving
  5. Security
    • Use SSL/TLS for connections
    • Implement row-level security for multi-tenant apps
    • Grant minimum necessary privileges
    • Use parameterized queries
    • Regular security updates
  1. 架构设计
    • 为保证数据完整性进行规范化,为性能进行非规范化
    • 使用合适的数据类型(短字符串避免用TEXT)
    • 合理定义NOT NULL约束
    • 统一使用SERIAL或UUID作为主键
  2. 索引
    • 为外键创建索引提升JOIN性能
    • 为频繁过滤/排序的列创建索引
    • 对选择性查询使用部分索引
    • 监控并删除未使用的索引
    • 复合索引的列数保持合理(通常≤3-4列)
  3. 查询性能
    • 使用EXPLAIN ANALYZE理解查询计划
    • 应用代码中避免SELECT *
    • 使用预编译语句防止SQL注入
    • 用LIMIT限制结果集
    • 使用连接池
  4. 维护
    • 定期运行VACUUM(或启用自动清理)
    • 用ANALYZE更新统计信息
    • 监控慢查询日志
    • 配置合理的自动清理阈值
    • 定期用pg_dump或WAL归档备份
  5. 安全
    • 连接使用SSL/TLS
    • 多租户应用实现行级安全
    • 授予最小必要权限
    • 使用参数化查询
    • 定期进行安全更新

MongoDB Best Practices

MongoDB最佳实践

  1. Schema Design
    • Embed related data that is accessed together
    • Reference data that is large or rarely accessed
    • Use polymorphic pattern for varied schemas
    • Limit document size to reasonable bounds (< 1-2 MB typically)
    • Design for your query patterns
  2. Indexing
    • Index on fields used in queries and sorts
    • Use compound indexes with ESR rule (Equality, Sort, Range)
    • Create text indexes for full-text search
    • Monitor index usage with $indexStats
    • Avoid too many indexes (write performance impact)
  3. Query Performance
    • Use projection to limit returned fields
    • Create covered queries when possible
    • Filter early in aggregation pipelines
    • Avoid $lookup when embedding is appropriate
    • Use explain() to verify index usage
  4. Scalability
    • Choose appropriate shard key (high cardinality, even distribution)
    • Use replica sets for high availability
    • Configure appropriate read/write concerns
    • Monitor chunk distribution in sharded clusters
    • Use zones for geographic distribution
  5. Operations
    • Enable authentication and authorization
    • Use TLS for client connections
    • Regular backups (mongodump or filesystem snapshots)
    • Monitor with MongoDB Atlas, Ops Manager, or custom tools
    • Keep MongoDB version updated
  1. 架构设计
    • 嵌入一起访问的相关数据
    • 引用大型或极少访问的数据
    • 用多态模式处理多样架构
    • 文档大小控制在合理范围(通常<1-2MB)
    • 围绕查询模式设计架构
  2. 索引
    • 为查询与排序的字段创建索引
    • 遵循ESR规则(相等、排序、范围)创建复合索引
    • 为全文搜索创建文本索引
    • 用$indexStats监控索引使用情况
    • 避免过多索引(影响写入性能)
  3. 查询性能
    • 用投影限制返回字段
    • 尽可能使用覆盖查询
    • 聚合流水线中尽早过滤
    • 合适时用嵌入替代$lookup
    • 用explain()验证索引使用
  4. 扩展性
    • 选择合适的分片键(高基数、分布均匀)
    • 用副本集实现高可用
    • 配置合理的读写关注点
    • 监控分片集群的块分布
    • 用区域分片实现地理分布
  5. 运维
    • 启用认证与授权
    • 客户端连接使用TLS
    • 定期备份(mongodump或文件系统快照)
    • 用MongoDB Atlas、Ops Manager或自定义工具监控
    • 保持MongoDB版本更新

Data Modeling Decision Framework

数据建模决策框架

Choose PostgreSQL when:
  • Strong ACID guarantees required (financial transactions)
  • Complex relationships with many JOINs
  • Data structure is well-defined and stable
  • Need for advanced SQL features (window functions, CTEs, stored procedures)
  • Compliance requirements demand strict consistency
Choose MongoDB when:
  • Schema flexibility needed (rapid development, evolving requirements)
  • Horizontal scalability is priority (sharding required)
  • Document-oriented data (JSON/BSON native format)
  • Hierarchical or nested data structures
  • High write throughput with eventual consistency acceptable
Hybrid Approach:
  • Use both databases for different parts of application
  • PostgreSQL for transactional data (orders, payments)
  • MongoDB for catalog, logs, user sessions
  • Synchronize critical data between systems
选择PostgreSQL的场景:
  • 需要强ACID保障(如金融交易)
  • 存在复杂数据关系与多JOIN操作
  • 数据结构明确且稳定
  • 需要高级SQL特性(窗口函数、CTE、存储过程)
  • 合规要求严格一致性
选择MongoDB的场景:
  • 需要架构灵活性(快速迭代、需求变化)
  • 优先考虑水平扩展性(需要分片)
  • 数据为文档型(原生JSON/BSON)
  • 存在层级或嵌套数据结构
  • 高写入吞吐量,最终一致性可接受
混合方案:
  • 不同系统部分使用不同数据库
  • PostgreSQL处理事务数据(订单、支付)
  • MongoDB处理目录、日志、用户会话
  • 关键数据在系统间同步

Common Patterns and Anti-Patterns

常见模式与反模式

PostgreSQL Anti-Patterns

PostgreSQL反模式

Storing JSON when relational fits better
sql
-- BAD: Using JSONB for structured, queryable data
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    data JSONB -- { name, email, address: { street, city, state } }
);

-- GOOD: Proper normalization
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255)
);

CREATE TABLE addresses (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    street VARCHAR(255),
    city VARCHAR(100),
    state VARCHAR(50)
);
Over-indexing
sql
-- BAD: Index on every column "just in case"
CREATE INDEX idx1 ON users(username);
CREATE INDEX idx2 ON users(email);
CREATE INDEX idx3 ON users(created_at);
CREATE INDEX idx4 ON users(updated_at);
CREATE INDEX idx5 ON users(active);
-- Result: Slow writes, large database size

-- GOOD: Index based on actual query patterns
CREATE INDEX idx_users_email ON users(email); -- Login queries
CREATE INDEX idx_active_users_created ON users(created_at) WHERE active = true; -- Partial
N+1 Query Problem
sql
-- BAD: Multiple queries in loop
SELECT * FROM posts; -- Returns 100 posts
-- Then for each post:
SELECT * FROM users WHERE id = ?; -- 100 additional queries!

-- GOOD: Single query with JOIN
SELECT p.*, u.username, u.email
FROM posts p
JOIN users u ON p.author_id = u.id;
用JSON存储适合关系型的数据
sql
-- 糟糕写法:用JSONB存储结构化、可查询的数据
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    data JSONB -- { name, email, address: { street, city, state } }
);

-- 优秀写法:合理规范化
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255)
);

CREATE TABLE addresses (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    street VARCHAR(255),
    city VARCHAR(100),
    state VARCHAR(50)
);
过度索引
sql
-- 糟糕写法:为每个列都建索引"以防万一"
CREATE INDEX idx1 ON users(username);
CREATE INDEX idx2 ON users(email);
CREATE INDEX idx3 ON users(created_at);
CREATE INDEX idx4 ON users(updated_at);
CREATE INDEX idx5 ON users(active);
-- 结果:写入缓慢,数据库体积过大

-- 优秀写法:根据实际查询模式建索引
CREATE INDEX idx_users_email ON users(email); -- 登录查询用
CREATE INDEX idx_active_users_created ON users(created_at) WHERE active = true; -- 部分索引
N+1查询问题
sql
-- 糟糕写法:循环执行多查询
SELECT * FROM posts; -- 返回100篇文章
-- 然后对每篇文章执行:SELECT * FROM users WHERE id = ?; -- 额外100次查询!

-- 优秀写法:单查询用JOIN
SELECT p.*, u.username, u.email
FROM posts p
JOIN users u ON p.author_id = u.id;

MongoDB Anti-Patterns

MongoDB反模式

Massive arrays in documents
javascript
// BAD: Unbounded array growth
{
    _id: ObjectId("..."),
    username: "popular_user",
    followers: [
        ObjectId("follower1"),
        ObjectId("follower2"),
        // ... 100,000+ follower IDs
        // Document exceeds 16MB limit!
    ]
}

// GOOD: Separate collection with references
// users collection
{ _id: ObjectId("..."), username: "popular_user" }

// followers collection
{ _id: ObjectId("..."), user_id: ObjectId("..."), follower_id: ObjectId("...") }
db.followers.createIndex({ user_id: 1, follower_id: 1 })
Poor shard key selection
javascript
// BAD: Monotonically increasing shard key
sh.shardCollection("mydb.events", { _id: 1 })
// All writes go to same shard (highest _id range)

// BAD: Low cardinality shard key
sh.shardCollection("mydb.users", { country: 1 })
// Most users in few countries = uneven distribution

// GOOD: Hashed _id or compound key
sh.shardCollection("mydb.events", { _id: "hashed" }) // Even distribution
sh.shardCollection("mydb.users", { country: 1, user_id: 1 }) // Compound
Ignoring indexes on embedded documents
javascript
// Document structure
{
    username: "john_doe",
    profile: {
        email: "john@example.com",
        age: 30,
        city: "San Francisco"
    }
}

// Query on embedded field
db.users.find({ "profile.email": "john@example.com" })

// MISSING: Index on embedded field
db.users.createIndex({ "profile.email": 1 })
文档中包含超大数组
javascript
-- 糟糕写法:数组无限制增长
{
    _id: ObjectId("..."),
    username: "popular_user",
    followers: [
        ObjectId("follower1"),
        ObjectId("follower2"),
        // ... 10万+关注者ID
        // 文档超过16MB限制!
    ]
}

-- 优秀写法:用独立集合存储引用
-- 用户集合
{ _id: ObjectId("..."), username: "popular_user" }

-- 关注者集合
{ _id: ObjectId("..."), user_id: ObjectId("..."), follower_id: ObjectId("...") }
db.followers.createIndex({ user_id: 1, follower_id: 1 })
分片键选择不当
javascript
-- 糟糕写法:单调递增的分片键
sh.shardCollection("mydb.events", { _id: 1 })
-- 所有写入都到同一个分片(最高_id范围)

-- 糟糕写法:低基数分片键
sh.shardCollection("mydb.users", { country: 1 })
-- 多数用户集中在少数国家,分布不均

-- 优秀写法:哈希_id或复合键
sh.shardCollection("mydb.events", { _id: "hashed" }) // 分布均匀
sh.shardCollection("mydb.users", { country: 1, user_id: 1 }) // 复合键
忽略嵌入式文档的索引
javascript
-- 文档结构
{
    username: "john_doe",
    profile: {
        email: "john@example.com",
        age: 30,
        city: "San Francisco"
    }
}

-- 查询嵌入式字段
db.users.find({ "profile.email": "john@example.com" })

-- 缺失:嵌入式字段的索引
db.users.createIndex({ "profile.email": 1 })

Troubleshooting Guide

故障排查指南

PostgreSQL Issues

PostgreSQL问题

Slow Queries:
sql
-- Enable slow query logging (postgresql.conf)
-- log_min_duration_statement = 1000  # Log queries > 1 second

-- Find slow queries
SELECT
    query,
    calls,
    total_exec_time / calls as avg_time_ms,
    rows / calls as avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY total_exec_time DESC
LIMIT 20;

-- Analyze specific slow query
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT ... FROM ... WHERE ...;
High CPU Usage:
sql
-- Check running queries
SELECT
    pid,
    now() - query_start as duration,
    state,
    query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- Terminate long-running query
SELECT pg_terminate_backend(pid);
Lock Contention:
sql
-- View locks
SELECT
    locktype,
    relation::regclass,
    mode,
    granted,
    pid
FROM pg_locks
WHERE NOT granted;

-- Find blocking queries
SELECT
    blocked_locks.pid AS blocked_pid,
    blocking_locks.pid AS blocking_pid,
    blocked_activity.query AS blocked_query,
    blocking_activity.query AS blocking_query
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted AND blocking_locks.granted;
慢查询:
sql
-- 启用慢查询日志(postgresql.conf)
-- log_min_duration_statement = 1000  # 记录执行时间>1秒的查询

-- 查找慢查询
SELECT
    query,
    calls,
    total_exec_time / calls as avg_time_ms,
    rows / calls as avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY total_exec_time DESC
LIMIT 20;

-- 分析特定慢查询
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT ... FROM ... WHERE ...;
CPU使用率高:
sql
-- 检查运行中的查询
SELECT
    pid,
    now() - query_start as duration,
    state,
    query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- 终止长时运行的查询
SELECT pg_terminate_backend(pid);
锁竞争:
sql
-- 查看锁信息
SELECT
    locktype,
    relation::regclass,
    mode,
    granted,
    pid
FROM pg_locks
WHERE NOT granted;

-- 查找阻塞查询
SELECT
    blocked_locks.pid AS blocked_pid,
    blocking_locks.pid AS blocking_pid,
    blocked_activity.query AS blocked_query,
    blocking_activity.query AS blocking_query
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted AND blocking_locks.granted;

MongoDB Issues

MongoDB问题

Slow Queries:
javascript
// Enable profiling
db.setProfilingLevel(1, { slowms: 100 }) // Log queries > 100ms

// View slow queries
db.system.profile.find().sort({ ts: -1 }).limit(10)

// Analyze query performance
db.collection.find({ ... }).explain("executionStats")
// Check: totalDocsExamined vs nReturned (should be close)
// Check: executionTimeMillis
// Check: indexName (should show index usage)
Replication Lag:
javascript
// Check lag on secondary
rs.printSecondaryReplicationInfo()

// Check oplog size
db.getReplicationInfo()

// Increase oplog size if needed
db.adminCommand({ replSetResizeOplog: 1, size: 16384 }) // 16GB
Sharding Issues:
javascript
// Check chunk distribution
sh.status()

// Check balancer status
sh.getBalancerState()
sh.isBalancerRunning()

// Balance specific collection
sh.enableBalancing("mydb.mycollection")

// Check for jumbo chunks
db.chunks.find({ jumbo: true })
慢查询:
javascript
-- 启用性能分析
db.setProfilingLevel(1, { slowms: 100 }) // 记录执行时间>100ms的查询

-- 查看慢查询
db.system.profile.find().sort({ ts: -1 }).limit(10)

-- 分析查询性能
db.collection.find({ ... }).explain("executionStats")
-- 检查:totalDocsExamined vs nReturned(应接近)
-- 检查:executionTimeMillis
-- 检查:indexName(应显示使用的索引)
复制延迟:
javascript
-- 检查从节点延迟
rs.printSecondaryReplicationInfo()

-- 检查oplog大小
db.getReplicationInfo()

-- 必要时增大oplog大小
db.adminCommand({ replSetResizeOplog: 1, size: 16384 }) // 16GB
分片问题:
javascript
-- 检查块分布
sh.status()

-- 检查均衡器状态
sh.getBalancerState()
sh.isBalancerRunning()

-- 为特定集合启用均衡
sh.enableBalancing("mydb.mycollection")

-- 查找超大块
db.chunks.find({ jumbo: true })

Resources

资源

PostgreSQL Resources

PostgreSQL资源

MongoDB Resources

MongoDB资源

Books

书籍

  • PostgreSQL: "PostgreSQL: Up and Running" by Regina Obe & Leo Hsu
  • MongoDB: "MongoDB: The Definitive Guide" by Shannon Bradshaw, Eoin Brazil, Kristina Chodorow

Skill Version: 1.0.0 Last Updated: January 2025 Skill Category: Database Management, Data Architecture, Performance Optimization Technologies: PostgreSQL 16+, MongoDB 7+
  • PostgreSQL:《PostgreSQL实战》Regina Obe & Leo Hsu 著
  • MongoDB:《MongoDB权威指南》Shannon Bradshaw, Eoin Brazil, Kristina Chodorow 著

技能版本:1.0.0 最后更新:2025年1月 技能分类:数据库管理、数据架构、性能优化 技术栈:PostgreSQL 16+, MongoDB 7+",