database-management-patterns

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Database Management Patterns

数据库管理模式

A comprehensive skill for mastering database management across SQL (PostgreSQL) and NoSQL (MongoDB) systems. This skill covers schema design, indexing strategies, transaction management, replication, sharding, and performance optimization for production-grade applications.

这是一项精通SQL（PostgreSQL）与NoSQL（MongoDB）系统数据库管理的综合技能。本技能涵盖生产级应用的架构设计、索引策略、事务管理、复制、分片及性能优化。

When to Use This Skill

何时使用本技能

Use this skill when:

Designing database schemas for new applications or refactoring existing ones
Choosing between SQL and NoSQL databases for your use case
Optimizing query performance with proper indexing strategies
Implementing data consistency with transactions and ACID guarantees
Scaling databases horizontally with sharding and replication
Managing high-traffic applications requiring distributed databases
Ensuring data integrity with constraints, triggers, and validation
Troubleshooting performance issues using explain plans and query analysis
Building fault-tolerant systems with replication and failover strategies
Working with complex data relationships (relational) or flexible schemas (document)

在以下场景使用本技能：

设计数据库架构：为新应用设计架构或重构现有应用架构
选择SQL与NoSQL数据库：根据使用场景挑选合适的数据库类型
优化查询性能：通过合理的索引策略提升查询效率
实现数据一致性：利用事务与ACID保障确保数据一致性
扩容数据库：通过分片与复制实现水平扩容
管理高流量应用：运维需要分布式数据库的高流量系统
确保数据完整性：通过约束、触发器与验证机制保障数据完整性
排查性能问题：使用执行计划与查询分析定位性能瓶颈
构建容错系统：基于复制与故障转移策略搭建高可用系统
处理复杂数据结构：应对关系型复杂数据关联或文档型灵活架构

Core Concepts

核心概念

Database Paradigms Comparison

数据库范式对比

Relational Databases (PostgreSQL)

关系型数据库（PostgreSQL）

Strengths:

ACID Transactions: Strong consistency guarantees
Complex Queries: JOIN operations, subqueries, CTEs
Data Integrity: Foreign keys, constraints, triggers
Normalized Data: Reduced redundancy, consistent updates
Mature Ecosystem: Rich tooling, extensions, community

Best For:

Financial systems requiring strict consistency
Complex relationships and data integrity requirements
Applications with structured, well-defined schemas
Systems requiring complex analytical queries
Multi-step transactions across multiple tables

优势：

ACID事务：提供强一致性保障
复杂查询：支持JOIN操作、子查询、CTE（公共表表达式）
数据完整性：外键、约束、触发器机制完善
规范化数据：减少数据冗余，确保更新一致性
成熟生态：工具、扩展丰富，社区支持完善

适用场景：

要求严格一致性的金融系统
存在复杂数据关系与完整性要求的场景
数据结构明确且稳定的应用
需要复杂分析查询的系统
涉及跨多表的多步骤事务场景

Document Databases (MongoDB)

文档型数据库（MongoDB）

Strengths:

Flexible Schema: Easy schema evolution, polymorphic data
Horizontal Scalability: Built-in sharding support
JSON-Native: Natural fit for modern application development
Embedded Documents: Denormalized data for performance
Aggregation Framework: Powerful data processing pipeline

Best For:

Rapidly evolving applications with changing requirements
Content management systems with varied data structures
Real-time analytics and event logging
Mobile and web applications with JSON APIs
Hierarchical or nested data structures

优势：

灵活架构：架构演进便捷，支持多态数据
水平扩展性：内置分片支持，轻松扩容
原生JSON兼容：与现代应用开发天然适配
嵌入式文档：非规范化设计提升读取性能
聚合框架：强大的数据处理流水线能力

适用场景：

需求快速迭代、架构频繁变化的应用
数据结构多样的内容管理系统
实时分析与事件日志系统
采用JSON API的移动与Web应用
层级或嵌套结构的数据场景

ACID Properties

ACID属性

Atomicity: All operations in a transaction succeed or fail together Consistency: Transactions bring database from one valid state to another Isolation: Concurrent transactions don't interfere with each other Durability: Committed transactions survive system failures

原子性（Atomicity）：事务内所有操作要么全部成功，要么全部回滚 一致性（Consistency）：事务将数据库从一个有效状态转换为另一个有效状态 隔离性（Isolation）：并发执行的事务互不干扰 持久性（Durability）：已提交的事务结果在系统故障后仍能保留

CAP Theorem

CAP定理

In distributed systems, choose two of three:

Consistency: All nodes see the same data
Availability: System remains operational
Partition Tolerance: System continues despite network failures

PostgreSQL emphasizes CP (Consistency + Partition Tolerance) MongoDB can be configured for CP or AP depending on write/read concerns

分布式系统中，需在三者中权衡选择其二：

一致性（Consistency）：所有节点的数据保持一致
可用性（Availability）：系统始终对外提供服务
分区容错性（Partition Tolerance）：网络分区时系统仍能正常运行

PostgreSQL侧重CP（一致性+分区容错性） MongoDB可根据读写关注点配置为CP或AP模式

PostgreSQL Patterns

PostgreSQL模式

Schema Design Fundamentals

架构设计基础

Normalization Levels

规范化级别

First Normal Form (1NF)

Atomic values (no arrays or lists in columns)
Each row is unique (primary key exists)
No repeating groups

Second Normal Form (2NF)

Meets 1NF requirements
All non-key attributes depend on the entire primary key

Third Normal Form (3NF)

Meets 2NF requirements
No transitive dependencies (non-key attributes depend only on primary key)

When to Denormalize:

Read-heavy workloads where joins are expensive
Frequently accessed aggregate data
Historical snapshots that shouldn't change
Performance-critical queries

第一范式（1NF）

列值具备原子性（列中不包含数组或列表）
每行数据唯一（存在主键）
无重复分组

第二范式（2NF）

满足第一范式要求
所有非键属性完全依赖于主键

第三范式（3NF）

满足第二范式要求
非键属性之间无传递依赖（仅依赖主键）

何时进行非规范化：

读密集型场景，JOIN操作开销过大时
频繁访问的聚合数据
无需修改的历史快照数据
性能敏感的查询场景

Table Design Patterns

表设计模式

Primary Keys:

sql

-- Serial auto-increment (traditional)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- UUID for distributed systems
CREATE TABLE accounts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Composite primary key
CREATE TABLE order_items (
    order_id INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    quantity INTEGER NOT NULL,
    price NUMERIC(10, 2) NOT NULL,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);

Foreign Key Constraints:

sql

-- Cascade delete: Remove child records when parent deleted
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    title VARCHAR(255) NOT NULL,
    content TEXT,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

-- Set null: Preserve child records, nullify reference
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    post_id INTEGER,
    user_id INTEGER,
    content TEXT NOT NULL,
    FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE SET NULL,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
);

-- Restrict: Prevent deletion if child records exist
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    category_id INTEGER NOT NULL,
    name VARCHAR(255) NOT NULL,
    FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE RESTRICT
);

主键设计：

sql

-- 自增序列（传统方式）
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- 分布式系统使用UUID
CREATE TABLE accounts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- 复合主键
CREATE TABLE order_items (
    order_id INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    quantity INTEGER NOT NULL,
    price NUMERIC(10, 2) NOT NULL,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);

外键约束：

sql

-- 级联删除：删除父记录时自动删除子记录
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    title VARCHAR(255) NOT NULL,
    content TEXT,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

-- 设置为NULL：保留子记录，外键字段设为NULL
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    post_id INTEGER,
    user_id INTEGER,
    content TEXT NOT NULL,
    FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE SET NULL,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
);

-- 限制删除：存在子记录时禁止删除父记录
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    category_id INTEGER NOT NULL,
    name VARCHAR(255) NOT NULL,
    FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE RESTRICT
);

Advanced Constraints

高级约束

Check Constraints:

sql

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price NUMERIC(10, 2) NOT NULL CHECK (price > 0),
    discount_percent INTEGER CHECK (discount_percent BETWEEN 0 AND 100),
    stock_quantity INTEGER NOT NULL CHECK (stock_quantity >= 0)
);

-- Table-level check constraint
CREATE TABLE date_ranges (
    id SERIAL PRIMARY KEY,
    start_date DATE NOT NULL,
    end_date DATE NOT NULL,
    CHECK (end_date > start_date)
);

Unique Constraints:

sql

-- Single column unique
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL
);

-- Composite unique constraint
CREATE TABLE user_permissions (
    user_id INTEGER NOT NULL,
    permission_id INTEGER NOT NULL,
    granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (user_id, permission_id),
    FOREIGN KEY (user_id) REFERENCES users(id),
    FOREIGN KEY (permission_id) REFERENCES permissions(id)
);

-- Partial unique index (unique where condition met)
CREATE UNIQUE INDEX unique_active_email
ON users (email)
WHERE active = true;

检查约束：

sql

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price NUMERIC(10, 2) NOT NULL CHECK (price > 0),
    discount_percent INTEGER CHECK (discount_percent BETWEEN 0 AND 100),
    stock_quantity INTEGER NOT NULL CHECK (stock_quantity >= 0)
);

-- 表级检查约束
CREATE TABLE date_ranges (
    id SERIAL PRIMARY KEY,
    start_date DATE NOT NULL,
    end_date DATE NOT NULL,
    CHECK (end_date > start_date)
);

唯一约束：

sql

-- 单列唯一约束
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL
);

-- 复合唯一约束
CREATE TABLE user_permissions (
    user_id INTEGER NOT NULL,
    permission_id INTEGER NOT NULL,
    granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (user_id, permission_id),
    FOREIGN KEY (user_id) REFERENCES users(id),
    FOREIGN KEY (permission_id) REFERENCES permissions(id)
);

-- 部分唯一索引（仅对满足条件的记录建立唯一索引）
CREATE UNIQUE INDEX unique_active_email
ON users (email)
WHERE active = true;

Triggers and Functions

触发器与函数

Audit Trail Pattern:

sql

-- Audit table
CREATE TABLE audit_log (
    id SERIAL PRIMARY KEY,
    table_name VARCHAR(255) NOT NULL,
    record_id INTEGER NOT NULL,
    action VARCHAR(10) NOT NULL,
    old_data JSONB,
    new_data JSONB,
    changed_by VARCHAR(255),
    changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Trigger function
CREATE OR REPLACE FUNCTION audit_trigger_function()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO audit_log (table_name, record_id, action, new_data, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, 'INSERT', row_to_json(NEW), current_user);
        RETURN NEW;
    ELSIF TG_OP = 'UPDATE' THEN
        INSERT INTO audit_log (table_name, record_id, action, old_data, new_data, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, 'UPDATE', row_to_json(OLD), row_to_json(NEW), current_user);
        RETURN NEW;
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO audit_log (table_name, record_id, action, old_data, changed_by)
        VALUES (TG_TABLE_NAME, OLD.id, 'DELETE', row_to_json(OLD), current_user);
        RETURN OLD;
    END IF;
END;
$$ LANGUAGE plpgsql;

-- Attach trigger to table
CREATE TRIGGER users_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION audit_trigger_function();

Timestamp Update Pattern:

sql

CREATE OR REPLACE FUNCTION update_modified_timestamp()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    content TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TRIGGER posts_update_timestamp
BEFORE UPDATE ON posts
FOR EACH ROW EXECUTE FUNCTION update_modified_timestamp();

审计日志模式：

sql

-- 审计日志表
CREATE TABLE audit_log (
    id SERIAL PRIMARY KEY,
    table_name VARCHAR(255) NOT NULL,
    record_id INTEGER NOT NULL,
    action VARCHAR(10) NOT NULL,
    old_data JSONB,
    new_data JSONB,
    changed_by VARCHAR(255),
    changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- 触发器函数
CREATE OR REPLACE FUNCTION audit_trigger_function()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO audit_log (table_name, record_id, action, new_data, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, 'INSERT', row_to_json(NEW), current_user);
        RETURN NEW;
    ELSIF TG_OP = 'UPDATE' THEN
        INSERT INTO audit_log (table_name, record_id, action, old_data, new_data, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, 'UPDATE', row_to_json(OLD), row_to_json(NEW), current_user);
        RETURN NEW;
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO audit_log (table_name, record_id, action, old_data, changed_by)
        VALUES (TG_TABLE_NAME, OLD.id, 'DELETE', row_to_json(OLD), current_user);
        RETURN OLD;
    END IF;
END;
$$ LANGUAGE plpgsql;

-- 为表绑定触发器
CREATE TRIGGER users_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION audit_trigger_function();

时间戳自动更新模式：

sql

CREATE OR REPLACE FUNCTION update_modified_timestamp()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    content TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TRIGGER posts_update_timestamp
BEFORE UPDATE ON posts
FOR EACH ROW EXECUTE FUNCTION update_modified_timestamp();

Views and Materialized Views

视图与物化视图

Standard Views:

sql

-- Virtual table - computed on each query
CREATE VIEW active_users_with_posts AS
SELECT
    u.id,
    u.username,
    u.email,
    COUNT(p.id) as post_count,
    MAX(p.created_at) as last_post_date
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.active = true
GROUP BY u.id, u.username, u.email;

-- Use view like a table
SELECT * FROM active_users_with_posts WHERE post_count > 10;

Materialized Views:

sql

-- Physical table - stores computed results
CREATE MATERIALIZED VIEW user_statistics AS
SELECT
    u.id,
    u.username,
    COUNT(DISTINCT p.id) as total_posts,
    COUNT(DISTINCT c.id) as total_comments,
    AVG(p.views) as avg_post_views,
    MAX(p.created_at) as last_activity
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
LEFT JOIN comments c ON u.id = c.user_id
GROUP BY u.id, u.username;

-- Create index on materialized view
CREATE INDEX idx_user_stats_posts ON user_statistics(total_posts);

-- Refresh materialized view (update data)
REFRESH MATERIALIZED VIEW user_statistics;

-- Concurrent refresh (allows reads during refresh)
REFRESH MATERIALIZED VIEW CONCURRENTLY user_statistics;

标准视图：

sql

-- 虚拟表，每次查询时计算
CREATE VIEW active_users_with_posts AS
SELECT
    u.id,
    u.username,
    u.email,
    COUNT(p.id) as post_count,
    MAX(p.created_at) as last_post_date
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.active = true
GROUP BY u.id, u.username, u.email;

-- 像使用普通表一样使用视图
SELECT * FROM active_users_with_posts WHERE post_count > 10;

物化视图：

sql

-- 物理表，存储计算后的结果
CREATE MATERIALIZED VIEW user_statistics AS
SELECT
    u.id,
    u.username,
    COUNT(DISTINCT p.id) as total_posts,
    COUNT(DISTINCT c.id) as total_comments,
    AVG(p.views) as avg_post_views,
    MAX(p.created_at) as last_activity
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
LEFT JOIN comments c ON u.id = c.user_id
GROUP BY u.id, u.username;

-- 为物化视图创建索引
CREATE INDEX idx_user_stats_posts ON user_statistics(total_posts);

-- 刷新物化视图（更新数据）
REFRESH MATERIALIZED VIEW user_statistics;

-- 并发刷新（刷新时允许读取）
REFRESH MATERIALIZED VIEW CONCURRENTLY user_statistics;

MongoDB Patterns

MongoDB模式

Document Modeling Strategies

文档建模策略

Embedding vs Referencing

嵌入与引用

Embedding Pattern (Denormalization):

javascript

// One-to-few: Embed when relationship is contained
// Example: Blog post with comments
{
    _id: ObjectId("..."),
    title: "Database Design Patterns",
    author: "John Doe",
    content: "...",
    published_at: ISODate("2025-01-15"),
    comments: [
        {
            _id: ObjectId("..."),
            author: "Jane Smith",
            text: "Great article!",
            created_at: ISODate("2025-01-16")
        },
        {
            _id: ObjectId("..."),
            author: "Bob Johnson",
            text: "Very helpful, thanks!",
            created_at: ISODate("2025-01-17")
        }
    ],
    tags: ["database", "design", "patterns"],
    stats: {
        views: 1523,
        likes: 89,
        shares: 23
    }
}

// Benefits:
// - Single query to retrieve post with comments
// - Better read performance
// - Atomic updates to entire document
//
// Drawbacks:
// - Document size limits (16MB in MongoDB)
// - Difficult to query comments independently
// - May duplicate data if comments need to appear elsewhere

Referencing Pattern (Normalization):

javascript

// One-to-many or many-to-many: Reference when relationship is unbounded
// Example: User with many posts

// Users collection
{
    _id: ObjectId("507f1f77bcf86cd799439011"),
    username: "john_doe",
    email: "john@example.com",
    profile: {
        bio: "Software engineer",
        avatar_url: "https://...",
        location: "San Francisco"
    },
    created_at: ISODate("2024-01-01")
}

// Posts collection (references user)
{
    _id: ObjectId("507f191e810c19729de860ea"),
    user_id: ObjectId("507f1f77bcf86cd799439011"),
    title: "My First Post",
    content: "...",
    published_at: ISODate("2025-01-15"),
    comment_ids: [
        ObjectId("..."),
        ObjectId("...")
    ]
}

// Benefits:
// - No duplication of user data
// - Flexible: users can have unlimited posts
// - Easy to update user information once
//
// Drawbacks:
// - Requires multiple queries or $lookup
// - Slower read performance for joined data

Hybrid Approach (Selective Denormalization):

javascript

// Store frequently accessed fields from referenced document
{
    _id: ObjectId("..."),
    title: "Database Patterns",
    content: "...",
    author: {
        // Embedded: frequently accessed, rarely changes
        id: ObjectId("507f1f77bcf86cd799439011"),
        username: "john_doe",
        avatar_url: "https://..."
    },
    // Reference: full user data available if needed
    author_id: ObjectId("507f1f77bcf86cd799439011"),
    published_at: ISODate("2025-01-15")
}

// Benefits:
// - Fast reads with embedded frequently-used data
// - Can still get full user data when needed
// - Balance between performance and flexibility
//
// Tradeoffs:
// - Need to update embedded data when user changes username/avatar
// - Slightly larger documents

嵌入模式（非规范化）：

javascript

-- 一对少关系：包含在文档内
-- 示例：带评论的博客文章
{
    _id: ObjectId("..."),
    title: "数据库设计模式",
    author: "John Doe",
    content: "...",
    published_at: ISODate("2025-01-15"),
    comments: [
        {
            _id: ObjectId("..."),
            author: "Jane Smith",
            text: "很棒的文章！",
            created_at: ISODate("2025-01-16")
        },
        {
            _id: ObjectId("..."),
            author: "Bob Johnson",
            text: "非常有帮助，谢谢！",
            created_at: ISODate("2025-01-17")
        }
    ],
    tags: ["database", "design", "patterns"],
    stats: {
        views: 1523,
        likes: 89,
        shares: 23
    }
}

-- 优势：
-- - 单查询即可获取文章与评论
-- - 读取性能更优
-- - 可对整个文档执行原子更新
--
-- 劣势：
-- - 文档大小受限（MongoDB中为16MB）
-- - 难以独立查询评论
-- - 若评论需在其他地方展示，会存在数据重复

引用模式（规范化）：

javascript

-- 一对多或多对多关系：使用引用，适用于无边界关系
-- 示例：拥有多篇文章的用户

-- 用户集合
{
    _id: ObjectId("507f1f77bcf86cd799439011"),
    username: "john_doe",
    email: "john@example.com",
    profile: {
        bio: "软件工程师",
        avatar_url: "https://...",
        location: "San Francisco"
    },
    created_at: ISODate("2024-01-01")
}

-- 文章集合（引用用户）
{
    _id: ObjectId("507f191e810c19729de860ea"),
    user_id: ObjectId("507f1f77bcf86cd799439011"),
    title: "我的第一篇文章",
    content: "...",
    published_at: ISODate("2025-01-15"),
    comment_ids: [
        ObjectId("..."),
        ObjectId("...")
    ]
}

-- 优势：
-- - 无用户数据重复
-- - 灵活性高：用户可拥有任意数量的文章
-- - 用户信息更新一次即可生效
--
-- 劣势：
-- - 需要多查询或$lookup操作
-- - 关联数据的读取性能较慢

混合方式（选择性非规范化）：

javascript

-- 存储引用文档中频繁访问的字段
{
    _id: ObjectId("..."),
    title: "数据库模式",
    content: "...",
    author: {
        // 嵌入：频繁访问、极少变更的字段
        id: ObjectId("507f1f77bcf86cd799439011"),
        username: "john_doe",
        avatar_url: "https://..."
    },
    // 引用：需要时可获取完整用户数据
    author_id: ObjectId("507f1f77bcf86cd799439011"),
    published_at: ISODate("2025-01-15")
}

-- 优势：
-- - 嵌入频繁使用的数据，提升读取速度
-- - 仍可在需要时获取完整用户数据
-- - 在性能与灵活性间取得平衡
--
-- 权衡：
-- - 用户用户名/头像变更时，需更新嵌入的数据
-- - 文档体积略有增大

Schema Design Patterns

架构设计模式

Bucket Pattern (Time-Series Data):

javascript

// Instead of one document per measurement:
// BAD: Millions of tiny documents
{
    sensor_id: "sensor_001",
    timestamp: ISODate("2025-01-15T10:00:00Z"),
    temperature: 72.5,
    humidity: 45
}

// GOOD: Bucket documents with arrays of measurements
{
    sensor_id: "sensor_001",
    date: ISODate("2025-01-15"),
    hour: 10,
    measurements: [
        { minute: 0, temperature: 72.5, humidity: 45 },
        { minute: 1, temperature: 72.6, humidity: 45 },
        { minute: 2, temperature: 72.4, humidity: 46 },
        // ... up to 60 measurements per hour
    ],
    summary: {
        count: 60,
        avg_temperature: 72.5,
        min_temperature: 71.8,
        max_temperature: 73.2
    }
}

// Benefits:
// - Reduced document count (60x fewer documents)
// - Better index efficiency
// - Pre-computed summaries
// - Easier to query by time ranges

Computed Pattern (Pre-Aggregated Data):

javascript

// Store computed values to avoid expensive aggregations
{
    _id: ObjectId("..."),
    product_id: "PROD-123",
    month: "2025-01",
    total_sales: 15420.50,
    units_sold: 234,
    unique_customers: 187,
    avg_order_value: 65.90,
    top_customers: [
        { customer_id: "CUST-456", revenue: 890.50 },
        { customer_id: "CUST-789", revenue: 675.25 }
    ],
    computed_at: ISODate("2025-02-01T00:00:00Z")
}

// Update pattern: Scheduled job or trigger updates computed values

Polymorphic Pattern (Varied Schemas):

javascript

// Handle different product types in single collection
{
    _id: ObjectId("..."),
    type: "book",
    name: "Database Design",
    price: 49.99,
    // Book-specific fields
    isbn: "978-0-123456-78-9",
    author: "John Smith",
    pages: 456,
    publisher: "Tech Books Inc"
}

{
    _id: ObjectId("..."),
    type: "electronics",
    name: "Wireless Mouse",
    price: 29.99,
    // Electronics-specific fields
    brand: "TechBrand",
    warranty_months: 24,
    specifications: {
        battery_life: "6 months",
        connectivity: "Bluetooth 5.0"
    }
}

// Query by type
db.products.find({ type: "book", author: "John Smith" })
db.products.find({ type: "electronics", "specifications.connectivity": /Bluetooth/ })

桶模式（时间序列数据）：

javascript

-- 避免单条记录对应一个测量值：
-- 糟糕方案：数百万条小文档
{
    sensor_id: "sensor_001",
    timestamp: ISODate("2025-01-15T10:00:00Z"),
    temperature: 72.5,
    humidity: 45
}

-- 优秀方案：用桶文档存储批量测量值数组
{
    sensor_id: "sensor_001",
    date: ISODate("2025-01-15"),
    hour: 10,
    measurements: [
        { minute: 0, temperature: 72.5, humidity: 45 },
        { minute: 1, temperature: 72.6, humidity: 45 },
        { minute: 2, temperature: 72.4, humidity: 46 },
        // ... 每小时最多60条测量值
    ],
    summary: {
        count: 60,
        avg_temperature: 72.5,
        min_temperature: 71.8,
        max_temperature: 73.2
    }
}

-- 优势：
-- - 文档数量大幅减少（减少60倍）
-- - 索引效率更高
-- - 预计算汇总数据
-- - 按时间范围查询更便捷

计算模式（预聚合数据）：

javascript

-- 存储计算后的值，避免昂贵的聚合操作
{
    _id: ObjectId("..."),
    product_id: "PROD-123",
    month: "2025-01",
    total_sales: 15420.50,
    units_sold: 234,
    unique_customers: 187,
    avg_order_value: 65.90,
    top_customers: [
        { customer_id: "CUST-456", revenue: 890.50 },
        { customer_id: "CUST-789", revenue: 675.25 }
    ],
    computed_at: ISODate("2025-02-01T00:00:00Z")
}

-- 更新模式：通过定时任务或触发器更新计算值

多态模式（多样架构）：

javascript

-- 在单个集合中处理不同类型的产品
{
    _id: ObjectId("..."),
    type: "book",
    name: "数据库设计",
    price: 49.99,
    // 书籍专属字段
    isbn: "978-0-123456-78-9",
    author: "John Smith",
    pages: 456,
    publisher: "Tech Books Inc"
}

{
    _id: ObjectId("..."),
    type: "electronics",
    name: "无线鼠标",
    price: 29.99,
    // 电子产品专属字段
    brand: "TechBrand",
    warranty_months: 24,
    specifications: {
        battery_life: "6个月",
        connectivity: "Bluetooth 5.0"
    }
}

-- 按类型查询
db.products.find({ type: "book", author: "John Smith" })
db.products.find({ type: "electronics", "specifications.connectivity": /Bluetooth/ })

Aggregation Framework

聚合框架

Basic Aggregation Pipeline:

javascript

// Group by author and count posts
db.posts.aggregate([
    {
        $match: { published: true } // Filter stage
    },
    {
        $group: {
            _id: "$author_id",
            total_posts: { $sum: 1 },
            total_views: { $sum: "$views" },
            avg_views: { $avg: "$views" },
            latest_post: { $max: "$published_at" }
        }
    },
    {
        $sort: { total_posts: -1 } // Sort by post count
    },
    {
        $limit: 10 // Top 10 authors
    }
])

Advanced Pipeline with Lookup (Join):

javascript

// Join posts with user data
db.posts.aggregate([
    {
        $match: {
            published_at: { $gte: ISODate("2025-01-01") }
        }
    },
    {
        $lookup: {
            from: "users",
            localField: "author_id",
            foreignField: "_id",
            as: "author"
        }
    },
    {
        $unwind: "$author" // Flatten author array
    },
    {
        $project: {
            title: 1,
            content: 1,
            views: 1,
            "author.username": 1,
            "author.email": 1,
            days_since_publish: {
                $divide: [
                    { $subtract: [new Date(), "$published_at"] },
                    1000 * 60 * 60 * 24
                ]
            }
        }
    },
    {
        $sort: { views: -1 }
    }
])

Aggregation with Grouping and Reshaping:

javascript

// Complex aggregation: Sales analysis
db.orders.aggregate([
    {
        $match: {
            status: "completed",
            created_at: {
                $gte: ISODate("2025-01-01"),
                $lt: ISODate("2025-02-01")
            }
        }
    },
    {
        $unwind: "$items" // Flatten order items
    },
    {
        $group: {
            _id: {
                product_id: "$items.product_id",
                customer_region: "$customer.region"
            },
            total_quantity: { $sum: "$items.quantity" },
            total_revenue: { $sum: "$items.total_price" },
            order_count: { $sum: 1 },
            avg_order_value: { $avg: "$items.total_price" }
        }
    },
    {
        $group: {
            _id: "$_id.product_id",
            regions: {
                $push: {
                    region: "$_id.customer_region",
                    quantity: "$total_quantity",
                    revenue: "$total_revenue"
                }
            },
            total_quantity: { $sum: "$total_quantity" },
            total_revenue: { $sum: "$total_revenue" }
        }
    },
    {
        $sort: { total_revenue: -1 }
    }
])

基础聚合流水线：

javascript

-- 按作者分组并统计文章数量
db.posts.aggregate([
    {
        $match: { published: true } // 过滤阶段
    },
    {
        $group: {
            _id: "$author_id",
            total_posts: { $sum: 1 },
            total_views: { $sum: "$views" },
            avg_views: { $avg: "$views" },
            latest_post: { $max: "$published_at" }
        }
    },
    {
        $sort: { total_posts: -1 } // 按文章数量排序
    },
    {
        $limit: 10 // 取前10位作者
    }
])

带Lookup（关联）的高级流水线：

javascript

-- 关联文章与用户数据
db.posts.aggregate([
    {
        $match: {
            published_at: { $gte: ISODate("2025-01-01") }
        }
    },
    {
        $lookup: {
            from: "users",
            localField: "author_id",
            foreignField: "_id",
            as: "author"
        }
    },
    {
        $unwind: "$author" // 展开作者数组
    },
    {
        $project: {
            title: 1,
            content: 1,
            views: 1,
            "author.username": 1,
            "author.email": 1,
            days_since_publish: {
                $divide: [
                    { $subtract: [new Date(), "$published_at"] },
                    1000 * 60 * 60 * 24
                ]
            }
        }
    },
    {
        $sort: { views: -1 }
    }
])

分组与重塑的聚合：

javascript

-- 复杂聚合：销售分析
db.orders.aggregate([
    {
        $match: {
            status: "completed",
            created_at: {
                $gte: ISODate("2025-01-01"),
                $lt: ISODate("2025-02-01")
            }
        }
    },
    {
        $unwind: "$items" // 展开订单项
    },
    {
        $group: {
            _id: {
                product_id: "$items.product_id",
                customer_region: "$customer.region"
            },
            total_quantity: { $sum: "$items.quantity" },
            total_revenue: { $sum: "$items.total_price" },
            order_count: { $sum: 1 },
            avg_order_value: { $avg: "$items.total_price" }
        }
    },
    {
        $group: {
            _id: "$_id.product_id",
            regions: {
                $push: {
                    region: "$_id.customer_region",
                    quantity: "$total_quantity",
                    revenue: "$total_revenue"
                }
            },
            total_quantity: { $sum: "$total_quantity" },
            total_revenue: { $sum: "$total_revenue" }
        }
    },
    {
        $sort: { total_revenue: -1 }
    }
])

Indexing Strategies

索引策略

PostgreSQL Indexes

PostgreSQL索引

B-tree Indexes (Default):

sql

-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters!)
CREATE INDEX idx_posts_author_published
ON posts(author_id, published_at);

-- Query can use index:
-- SELECT * FROM posts WHERE author_id = 123 ORDER BY published_at;
-- SELECT * FROM posts WHERE author_id = 123 AND published_at > '2025-01-01';

-- Query CANNOT fully use index:
-- SELECT * FROM posts WHERE published_at > '2025-01-01'; (only uses first column)

Partial Indexes:

sql

-- Index only active users
CREATE INDEX idx_active_users
ON users(username)
WHERE active = true;

-- Index only recent orders
CREATE INDEX idx_recent_orders
ON orders(created_at, status)
WHERE created_at > '2024-01-01';

-- Benefits: Smaller index size, faster queries on filtered data

Expression Indexes:

sql

-- Index on lowercase email for case-insensitive search
CREATE INDEX idx_users_email_lower
ON users(LOWER(email));

-- Query that uses this index:
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- Index on JSONB field extraction
CREATE INDEX idx_metadata_tags
ON products((metadata->>'category'));

Full-Text Search Indexes:

sql

-- Add tsvector column for full-text search
ALTER TABLE articles
ADD COLUMN tsv_content tsvector;

-- Populate tsvector column
UPDATE articles
SET tsv_content = to_tsvector('english', title || ' ' || content);

-- Create GIN index for full-text search
CREATE INDEX idx_articles_tsv ON articles USING GIN(tsv_content);

-- Full-text search query
SELECT title, ts_rank(tsv_content, query) as rank
FROM articles, to_tsquery('english', 'database & design') query
WHERE tsv_content @@ query
ORDER BY rank DESC;

-- Trigger to auto-update tsvector
CREATE TRIGGER articles_tsv_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(tsv_content, 'pg_catalog.english', title, content);

JSONB Indexes:

sql

-- GIN index for JSONB containment queries
CREATE INDEX idx_products_metadata
ON products USING GIN(metadata);

-- Queries that use this index:
SELECT * FROM products WHERE metadata @> '{"color": "blue"}';
SELECT * FROM products WHERE metadata ? 'size';

-- Index on specific JSONB path
CREATE INDEX idx_products_category
ON products((metadata->>'category'));

Index Monitoring:

sql

-- Find unused indexes
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- Check index usage
SELECT
    relname as table_name,
    indexrelname as index_name,
    idx_scan as times_used,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

B-tree索引（默认）：

sql

-- 单列索引
CREATE INDEX idx_users_email ON users(email);

-- 复合索引（列顺序很重要！）
CREATE INDEX idx_posts_author_published
ON posts(author_id, published_at);

-- 可使用索引的查询：
-- SELECT * FROM posts WHERE author_id = 123 ORDER BY published_at;
-- SELECT * FROM posts WHERE author_id = 123 AND published_at > '2025-01-01';

-- 无法充分使用索引的查询：
-- SELECT * FROM posts WHERE published_at > '2025-01-01';（仅使用第一列）

部分索引：

sql

-- 仅为活跃用户建立索引
CREATE INDEX idx_active_users
ON users(username)
WHERE active = true;

-- 仅为近期订单建立索引
CREATE INDEX idx_recent_orders
ON orders(created_at, status)
WHERE created_at > '2024-01-01';

-- 优势：索引体积更小，过滤后数据的查询速度更快

表达式索引：

sql

-- 为小写邮箱建立索引，支持大小写不敏感搜索
CREATE INDEX idx_users_email_lower
ON users(LOWER(email));

-- 使用该索引的查询：
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- 为JSONB字段提取值建立索引
CREATE INDEX idx_metadata_tags
ON products((metadata->>'category'));

全文搜索索引：

sql

-- 添加tsvector字段用于全文搜索
ALTER TABLE articles
ADD COLUMN tsv_content tsvector;

-- 填充tsvector字段
UPDATE articles
SET tsv_content = to_tsvector('english', title || ' ' || content);

-- 为全文搜索创建GIN索引
CREATE INDEX idx_articles_tsv ON articles USING GIN(tsv_content);

-- 全文搜索查询
SELECT title, ts_rank(tsv_content, query) as rank
FROM articles, to_tsquery('english', 'database & design') query
WHERE tsv_content @@ query
ORDER BY rank DESC;

-- 自动更新tsvector的触发器
CREATE TRIGGER articles_tsv_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(tsv_content, 'pg_catalog.english', title, content);

JSONB索引：

sql

-- 为JSONB包含查询创建GIN索引
CREATE INDEX idx_products_metadata
ON products USING GIN(metadata);

-- 使用该索引的查询：
SELECT * FROM products WHERE metadata @> '{"color": "blue"}';
SELECT * FROM products WHERE metadata ? 'size';

-- 为特定JSONB路径建立索引
CREATE INDEX idx_products_category
ON products((metadata->>'category'));

索引监控：

sql

-- 查找未使用的索引
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- 检查索引使用情况
SELECT
    relname as table_name,
    indexrelname as index_name,
    idx_scan as times_used,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

MongoDB Indexes

MongoDB索引

Single Field Indexes:

javascript

// Create index on single field
db.users.createIndex({ email: 1 }) // 1 = ascending, -1 = descending

// Unique index
db.users.createIndex({ username: 1 }, { unique: true })

// Sparse index (only index documents with the field)
db.users.createIndex({ phone_number: 1 }, { sparse: true })

Compound Indexes:

javascript

// Index on multiple fields (order matters!)
db.posts.createIndex({ author_id: 1, published_at: -1 })

// Efficient queries:
// - { author_id: "123" }
// - { author_id: "123", published_at: { $gte: ... } }
// - { author_id: "123" } with sort by published_at

// Inefficient:
// - { published_at: { $gte: ... } } alone (doesn't use index efficiently)

// ESR Rule: Equality, Sort, Range
// Best compound index order:
// 1. Equality filters first
// 2. Sort fields second
// 3. Range filters last
db.orders.createIndex({
    status: 1,           // Equality
    created_at: -1,      // Sort
    total_amount: 1      // Range
})

Multikey Indexes (Array Fields):

javascript

// Index on array field
db.posts.createIndex({ tags: 1 })

// Document with array
{
    _id: ObjectId("..."),
    title: "Database Design",
    tags: ["database", "mongodb", "schema"]
}

// Query that uses multikey index
db.posts.find({ tags: "mongodb" })
db.posts.find({ tags: { $in: ["database", "nosql"] } })

// Compound multikey index (max one array field)
db.posts.createIndex({ tags: 1, published_at: -1 }) // Valid
// db.posts.createIndex({ tags: 1, categories: 1 }) // Invalid if both are arrays

Text Indexes:

javascript

// Create text index for full-text search
db.articles.createIndex({
    title: "text",
    content: "text"
})

// Text search query
db.articles.find({
    $text: { $search: "database design patterns" }
})

// Search with relevance score
db.articles.find(
    { $text: { $search: "database design" } },
    { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

// Weighted text index (prioritize title over content)
db.articles.createIndex(
    { title: "text", content: "text" },
    { weights: { title: 10, content: 5 } }
)

Geospatial Indexes:

javascript

// 2dsphere index for geographic queries
db.locations.createIndex({ coordinates: "2dsphere" })

// Document format
{
    name: "Coffee Shop",
    coordinates: {
        type: "Point",
        coordinates: [-122.4194, 37.7749] // [longitude, latitude]
    }
}

// Find locations near a point
db.locations.find({
    coordinates: {
        $near: {
            $geometry: {
                type: "Point",
                coordinates: [-122.4194, 37.7749]
            },
            $maxDistance: 1000 // meters
        }
    }
})

Index Properties:

javascript

// TTL Index (auto-delete documents after time)
db.sessions.createIndex(
    { created_at: 1 },
    { expireAfterSeconds: 3600 } // 1 hour
)

// Partial Index (index subset of documents)
db.orders.createIndex(
    { status: 1, created_at: -1 },
    { partialFilterExpression: { status: { $eq: "pending" } } }
)

// Case-insensitive index
db.users.createIndex(
    { email: 1 },
    { collation: { locale: "en", strength: 2 } }
)

// Background index creation (doesn't block operations)
db.large_collection.createIndex(
    { field: 1 },
    { background: true }
)

Index Analysis:

javascript

// Explain query execution
db.posts.find({ author_id: "123" }).explain("executionStats")

// Check index usage
db.posts.aggregate([
    { $indexStats: {} }
])

// List all indexes on collection
db.posts.getIndexes()

// Drop unused index
db.posts.dropIndex("index_name")

单字段索引：

javascript

-- 为单个字段创建索引
db.users.createIndex({ email: 1 }) // 1 = 升序, -1 = 降序

-- 唯一索引
db.users.createIndex({ username: 1 }, { unique: true })

-- 稀疏索引（仅为包含该字段的文档建立索引）
db.users.createIndex({ phone_number: 1 }, { sparse: true })

复合索引：

javascript

-- 为多个字段创建索引（列顺序很重要！）
db.posts.createIndex({ author_id: 1, published_at: -1 })

-- 高效查询：
-- - { author_id: "123" }
-- - { author_id: "123", published_at: { $gte: ... } }
-- - { author_id: "123" } 并按published_at排序

-- 低效查询：
-- - 单独使用{ published_at: { $gte: ... } }（无法高效使用索引）

-- ESR规则：相等条件、排序、范围
-- 复合索引的最佳顺序：
-- 1. 相等过滤字段优先
-- 2. 排序字段次之
-- 3. 范围过滤字段最后
db.orders.createIndex({
    status: 1,           // 相等条件
    created_at: -1,      // 排序
    total_amount: 1      // 范围
})

多键索引（数组字段）：

javascript

-- 为数组字段创建索引
db.posts.createIndex({ tags: 1 })

-- 带数组的文档
{
    _id: ObjectId("..."),
    title: "数据库设计",
    tags: ["database", "mongodb", "schema"]
}

-- 使用多键索引的查询
db.posts.find({ tags: "mongodb" })
db.posts.find({ tags: { $in: ["database", "nosql"] } })

-- 复合多键索引（最多包含一个数组字段）
db.posts.createIndex({ tags: 1, published_at: -1 }) // 有效
-- db.posts.createIndex({ tags: 1, categories: 1 }) // 无效，如果两者都是数组

文本索引：

javascript

-- 创建全文搜索的文本索引
db.articles.createIndex({
    title: "text",
    content: "text"
})

-- 文本搜索查询
db.articles.find({
    $text: { $search: "database design patterns" }
})

-- 带相关性得分的搜索
db.articles.find(
    { $text: { $search: "database design" } },
    { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

-- 加权文本索引（标题权重高于内容）
db.articles.createIndex(
    { title: "text", content: "text" },
    { weights: { title: 10, content: 5 } }
)

地理空间索引：

javascript

-- 用于地理查询的2dsphere索引
db.locations.createIndex({ coordinates: "2dsphere" })

-- 文档格式
{
    name: "咖啡店",
    coordinates: {
        type: "Point",
        coordinates: [-122.4194, 37.7749] // [经度, 纬度]
    }
}

-- 查找指定点附近的位置
db.locations.find({
    coordinates: {
        $near: {
            $geometry: {
                type: "Point",
                coordinates: [-122.4194, 37.7749]
            },
            $maxDistance: 1000 // 米
        }
    }
})

索引属性：

javascript

-- TTL索引（一段时间后自动删除文档）
db.sessions.createIndex(
    { created_at: 1 },
    { expireAfterSeconds: 3600 } // 1小时
)

-- 部分索引（仅为文档子集建立索引）
db.orders.createIndex(
    { status: 1, created_at: -1 },
    { partialFilterExpression: { status: { $eq: "pending" } } }
)

-- 大小写不敏感索引
db.users.createIndex(
    { email: 1 },
    { collation: { locale: "en", strength: 2 } }
)

-- 后台创建索引（不阻塞其他操作）
db.large_collection.createIndex(
    { field: 1 },
    { background: true }
)

索引分析：

javascript

-- 解释查询执行计划
db.posts.find({ author_id: "123" }).explain("executionStats")

-- 检查索引使用情况
db.posts.aggregate([
    { $indexStats: {} }
])

-- 列出集合的所有索引
db.posts.getIndexes()

-- 删除未使用的索引
db.posts.dropIndex("index_name")

Transactions

事务

PostgreSQL Transaction Management

PostgreSQL事务管理

Basic Transactions:

sql

-- Explicit transaction
BEGIN;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

COMMIT;
-- or ROLLBACK; to cancel changes

Savepoints (Partial Rollback):

sql

BEGIN;

UPDATE inventory SET quantity = quantity - 10 WHERE product_id = 'PROD-123';

SAVEPOINT before_audit;

INSERT INTO audit_log (action, details) VALUES ('inventory_update', '...');
-- Oops, error in audit log

ROLLBACK TO SAVEPOINT before_audit;
-- Inventory update preserved, audit insert rolled back

-- Fix and retry
INSERT INTO audit_log (action, details) VALUES ('inventory_update', 'correct details');

COMMIT;

Isolation Levels:

sql

-- Read Uncommitted (not supported in PostgreSQL, defaults to Read Committed)
-- Read Committed (default) - sees only committed data
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Repeatable Read - sees snapshot at transaction start
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM accounts WHERE id = 1; -- Returns balance 1000
-- Another transaction updates balance to 1500 and commits
SELECT * FROM accounts WHERE id = 1; -- Still returns 1000 (repeatable read)
COMMIT;

-- Serializable - strictest isolation, prevents all anomalies
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- If concurrent transactions would violate serializability, one aborts
COMMIT;

Advisory Locks (Application-Level Locking):

sql

-- Exclusive lock on arbitrary number
SELECT pg_advisory_lock(12345);
-- ... perform critical operation ...
SELECT pg_advisory_unlock(12345);

-- Try lock (non-blocking)
SELECT pg_try_advisory_lock(12345); -- Returns true if acquired, false otherwise

-- Session-level advisory lock (auto-released on disconnect)
SELECT pg_advisory_lock(user_id);

Row-Level Locking:

sql

-- SELECT FOR UPDATE - lock rows for update
BEGIN;

SELECT * FROM products
WHERE id = 123
FOR UPDATE; -- Locks this row

UPDATE products SET quantity = quantity - 1 WHERE id = 123;

COMMIT;

-- SELECT FOR SHARE - shared lock (allows other reads, blocks writes)
SELECT * FROM products WHERE id = 123 FOR SHARE;

-- SKIP LOCKED - skip locked rows instead of waiting
SELECT * FROM queue
WHERE processed = false
ORDER BY priority
LIMIT 10
FOR UPDATE SKIP LOCKED;

基础事务：

sql

-- 显式事务
BEGIN;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

COMMIT;
-- 或使用ROLLBACK; 取消变更

保存点（部分回滚）：

sql

BEGIN;

UPDATE inventory SET quantity = quantity - 10 WHERE product_id = 'PROD-123';

SAVEPOINT before_audit;

INSERT INTO audit_log (action, details) VALUES ('inventory_update', '...');
-- 哦，审计日志插入出错了

ROLLBACK TO SAVEPOINT before_audit;
-- 库存更新保留，审计插入回滚

-- 修复后重试
INSERT INTO audit_log (action, details) VALUES ('inventory_update', 'correct details');

COMMIT;

隔离级别：

sql

-- 读未提交（PostgreSQL不支持，默认降级为读已提交）
-- 读已提交（默认）- 仅能看到已提交的数据
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- 可重复读 - 事务启动时看到的数据快照
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM accounts WHERE id = 1; -- 返回余额1000
-- 另一个事务将余额更新为1500并提交
SELECT * FROM accounts WHERE id = 1; -- 仍返回1000（可重复读）
COMMIT;

-- 可串行化 - 最严格的隔离级别，防止所有异常
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- 如果并发事务会破坏可串行性，其中一个会中止
COMMIT;

咨询锁（应用级锁）：

sql

-- 对任意数字加排他锁
SELECT pg_advisory_lock(12345);
-- ... 执行关键操作 ...
SELECT pg_advisory_unlock(12345);

-- 尝试加锁（非阻塞）
SELECT pg_try_advisory_lock(12345); -- 获取成功返回true，否则返回false

-- 会话级咨询锁（断开连接时自动释放）
SELECT pg_advisory_lock(user_id);

行级锁：

sql

-- SELECT FOR UPDATE - 锁定行用于更新
BEGIN;

SELECT * FROM products
WHERE id = 123
FOR UPDATE; -- 锁定该行

UPDATE products SET quantity = quantity - 1 WHERE id = 123;

COMMIT;

-- SELECT FOR SHARE - 共享锁（允许其他读取，阻止写入）
SELECT * FROM products WHERE id = 123 FOR SHARE;

-- SKIP LOCKED - 跳过已锁定的行，而非等待
SELECT * FROM queue
WHERE processed = false
ORDER BY priority
LIMIT 10
FOR UPDATE SKIP LOCKED;

MongoDB Transactions

MongoDB事务

Multi-Document Transactions:

javascript

// Transactions require replica set or sharded cluster
const session = db.getMongo().startSession()
session.startTransaction()

try {
    const accountsCol = session.getDatabase("mydb").accounts

    // Debit account
    accountsCol.updateOne(
        { _id: "account1" },
        { $inc: { balance: -100 } },
        { session }
    )

    // Credit account
    accountsCol.updateOne(
        { _id: "account2" },
        { $inc: { balance: 100 } },
        { session }
    )

    // Commit transaction
    session.commitTransaction()
} catch (error) {
    // Abort on error
    session.abortTransaction()
    throw error
} finally {
    session.endSession()
}

Read and Write Concerns:

javascript

// Write Concern: Acknowledgment level
db.orders.insertOne(
    { customer_id: "123", items: [...] },
    {
        writeConcern: {
            w: "majority",        // Wait for majority of replica set
            j: true,              // Wait for journal write
            wtimeout: 5000        // Timeout after 5 seconds
        }
    }
)

// Read Concern: Data consistency level
db.orders.find(
    { status: "pending" }
).readConcern("majority") // Only return data acknowledged by majority

// Read Preference: Which replica to read from
db.orders.find({ ... }).readPref("secondary") // Read from secondary replica

Atomic Operations (Single Document):

javascript

// Single document updates are atomic by default
db.counters.updateOne(
    { _id: "page_views" },
    {
        $inc: { count: 1 },
        $set: { last_updated: new Date() }
    }
)

// Atomic array operations
db.posts.updateOne(
    { _id: ObjectId("...") },
    {
        $push: {
            comments: {
                $each: [{ author: "John", text: "Great!" }],
                $position: 0 // Insert at beginning
            }
        }
    }
)

// Find and modify (atomic read-modify-write)
db.queue.findOneAndUpdate(
    { status: "pending" },
    { $set: { status: "processing", processor_id: "worker-1" } },
    {
        sort: { priority: -1 },
        returnDocument: "after" // Return updated document
    }
)

多文档事务：

javascript

-- 事务需要副本集或分片集群
const session = db.getMongo().startSession()
session.startTransaction()

try {
    const accountsCol = session.getDatabase("mydb").accounts

    -- 扣款
    accountsCol.updateOne(
        { _id: "account1" },
        { $inc: { balance: -100 } },
        { session }
    )

    -- 存款
    accountsCol.updateOne(
        { _id: "account2" },
        { $inc: { balance: 100 } },
        { session }
    )

    -- 提交事务
    session.commitTransaction()
} catch (error) {
    -- 出错时中止
    session.abortTransaction()
    throw error
} finally {
    session.endSession()
}

读写关注点：

javascript

-- 写关注点：确认级别
db.orders.insertOne(
    { customer_id: "123", items: [...] },
    {
        writeConcern: {
            w: "majority",        // 等待副本集多数节点确认
            j: true,              // 等待日志写入
            wtimeout: 5000        // 5秒超时
        }
    }
)

-- 读关注点：数据一致性级别
db.orders.find(
    { status: "pending" }
).readConcern("majority") // 仅返回多数节点确认的数据

-- 读偏好：从哪个副本读取
db.orders.find({ ... }).readPref("secondary") // 从从节点读取

原子操作（单文档）：

javascript

-- 单文档更新默认是原子的
db.counters.updateOne(
    { _id: "page_views" },
    {
        $inc: { count: 1 },
        $set: { last_updated: new Date() }
    }
)

-- 原子数组操作
db.posts.updateOne(
    { _id: ObjectId("...") },
    {
        $push: {
            comments: {
                $each: [{ author: "John", text: "Great!" }],
                $position: 0 // 插入到开头
            }
        }
    }
)

-- 查找并修改（原子读-改-写）
db.queue.findOneAndUpdate(
    { status: "pending" },
    { $set: { status: "processing", processor_id: "worker-1" } },
    {
        sort: { priority: -1 },
        returnDocument: "after" // 返回更新后的文档
    }
)

Replication

复制

PostgreSQL Replication

PostgreSQL复制

Streaming Replication (Primary-Standby):

sql

-- Primary server configuration (postgresql.conf)
wal_level = replica
max_wal_senders = 10
wal_keep_size = '1GB'
hot_standby = on

-- Create replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';

-- pg_hba.conf on primary
host replication replicator standby_ip/32 md5

-- Standby server (recovery.conf or postgresql.auto.conf)
primary_conninfo = 'host=primary_ip port=5432 user=replicator password=...'
restore_command = 'cp /var/lib/postgresql/archive/%f %p'

Logical Replication (Selective Replication):

sql

-- On publisher (source)
CREATE PUBLICATION my_publication FOR TABLE users, posts;
-- or FOR ALL TABLES;

-- On subscriber (destination)
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher_ip dbname=mydb user=replicator password=...'
PUBLICATION my_publication;

-- Monitor replication
SELECT * FROM pg_stat_replication;
SELECT * FROM pg_replication_slots;

Failover and Promotion:

sql

-- Promote standby to primary
pg_ctl promote -D /var/lib/postgresql/data

-- Check replication lag
SELECT
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    sync_state,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;

流复制（主从架构）：

sql

-- 主服务器配置（postgresql.conf）
wal_level = replica
max_wal_senders = 10
wal_keep_size = '1GB'
hot_standby = on

-- 创建复制用户
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';

-- 主服务器的pg_hba.conf
host replication replicator standby_ip/32 md5

-- 从服务器（recovery.conf或postgresql.auto.conf）
primary_conninfo = 'host=primary_ip port=5432 user=replicator password=...'
restore_command = 'cp /var/lib/postgresql/archive/%f %p'

逻辑复制（选择性复制）：

sql

-- 在发布端（源）
CREATE PUBLICATION my_publication FOR TABLE users, posts;
-- 或使用FOR ALL TABLES;

-- 在订阅端（目标）
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher_ip dbname=mydb user=replicator password=...'
PUBLICATION my_publication;

-- 监控复制状态
SELECT * FROM pg_stat_replication;
SELECT * FROM pg_replication_slots;

故障转移与提升：

sql

-- 将从服务器提升为主服务器
pg_ctl promote -D /var/lib/postgresql/data

-- 检查复制延迟
SELECT
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    sync_state,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;

MongoDB Replication

MongoDB复制

Replica Set Configuration:

javascript

// Initialize replica set
rs.initiate({
    _id: "myReplicaSet",
    members: [
        { _id: 0, host: "mongodb1.example.com:27017", priority: 2 },
        { _id: 1, host: "mongodb2.example.com:27017", priority: 1 },
        { _id: 2, host: "mongodb3.example.com:27017", priority: 1 }
    ]
})

// Add member to existing replica set
rs.add("mongodb4.example.com:27017")

// Remove member
rs.remove("mongodb4.example.com:27017")

// Check replica set status
rs.status()

// Check replication lag
rs.printSecondaryReplicationInfo()

Replica Set Roles:

javascript

// Priority 0 member (cannot become primary)
rs.add({
    host: "analytics.example.com:27017",
    priority: 0,
    hidden: true // Hidden from application drivers
})

// Arbiter (voting only, no data)
rs.addArb("arbiter.example.com:27017")

// Delayed member (disaster recovery)
rs.add({
    host: "delayed.example.com:27017",
    priority: 0,
    hidden: true,
    slaveDelay: 3600 // 1 hour behind
})

Read Preference Configuration:

javascript

// Application connection with read preference
const client = new MongoClient(uri, {
    readPreference: "secondaryPreferred", // Try secondary, fallback to primary
    readConcernLevel: "majority"
})

// Read Preference Modes:
// - primary (default): Read from primary only
// - primaryPreferred: Primary if available, else secondary
// - secondary: Read from secondary only
// - secondaryPreferred: Secondary if available, else primary
// - nearest: Read from nearest member (lowest latency)

副本集配置：

javascript

-- 初始化副本集
rs.initiate({
    _id: "myReplicaSet",
    members: [
        { _id: 0, host: "mongodb1.example.com:27017", priority: 2 },
        { _id: 1, host: "mongodb2.example.com:27017", priority: 1 },
        { _id: 2, host: "mongodb3.example.com:27017", priority: 1 }
    ]
})

-- 为现有副本集添加节点
rs.add("mongodb4.example.com:27017")

-- 删除节点
rs.remove("mongodb4.example.com:27017")

-- 检查副本集状态
rs.status()

-- 检查复制延迟
rs.printSecondaryReplicationInfo()

副本集角色：

javascript

-- 优先级0节点（无法成为主节点）
rs.add({
    host: "analytics.example.com:27017",
    priority: 0,
    hidden: true // 对应用驱动隐藏
})

-- 仲裁节点（仅投票，不存储数据）
rs.addArb("arbiter.example.com:27017")

-- 延迟节点（灾难恢复）
rs.add({
    host: "delayed.example.com:27017",
    priority: 0,
    hidden: true,
    slaveDelay: 3600 // 延迟1小时
})

读偏好配置：

javascript

-- 应用连接时配置读偏好
const client = new MongoClient(uri, {
    readPreference: "secondaryPreferred", // 优先从从节点读取，主节点作为 fallback
    readConcernLevel: "majority"
})

-- 读偏好模式：
-- - primary（默认）：仅从主节点读取
-- - primaryPreferred：主节点可用则读主节点，否则读从节点
-- - secondary：仅从从节点读取
-- - secondaryPreferred：从节点可用则读从节点，否则读主节点
-- - nearest：从延迟最低的节点读取

Sharding

分片

MongoDB Sharding Architecture

MongoDB分片架构

Shard Key Selection:

javascript

// Good shard key characteristics:
// 1. High cardinality (many distinct values)
// 2. Even distribution
// 3. Query isolation (queries target specific shards)

// Example: User-based application
sh.shardCollection("mydb.users", { user_id: "hashed" })

// Hashed shard key: Even distribution, random data location
sh.shardCollection("mydb.events", { event_id: "hashed" })

// Range-based shard key: Ordered data, good for range queries
sh.shardCollection("mydb.logs", { timestamp: 1, server_id: 1 })

// Compound shard key
sh.shardCollection("mydb.orders", {
    customer_region: 1,  // Coarse grouping
    order_date: 1        // Fine grouping
})

Sharding Setup:

javascript

// 1. Start config servers (replica set)
mongod --configsvr --replSet configRS --port 27019

// 2. Initialize config server replica set
rs.initiate({
    _id: "configRS",
    configsvr: true,
    members: [
        { _id: 0, host: "cfg1.example.com:27019" },
        { _id: 1, host: "cfg2.example.com:27019" },
        { _id: 2, host: "cfg3.example.com:27019" }
    ]
})

// 3. Start shard servers (each is a replica set)
mongod --shardsvr --replSet shard1RS --port 27018

// 4. Start mongos (query router)
mongos --configdb configRS/cfg1.example.com:27019,cfg2.example.com:27019

// 5. Add shards to cluster
sh.addShard("shard1RS/shard1-a.example.com:27018")
sh.addShard("shard2RS/shard2-a.example.com:27018")

// 6. Enable sharding on database
sh.enableSharding("mydb")

// 7. Shard collections
sh.shardCollection("mydb.users", { user_id: "hashed" })

Query Targeting:

javascript

// Targeted query (includes shard key)
db.users.find({ user_id: "12345" })
// Routes to single shard

// Scatter-gather query (no shard key)
db.users.find({ email: "user@example.com" })
// Queries all shards, merges results

// Check query targeting
db.users.find({ user_id: "12345" }).explain()
// Look for "SINGLE_SHARD" vs "ALL_SHARDS"

Zone Sharding (Geographic Distribution):

javascript

// Define zones for geographic sharding
sh.addShardToZone("shard1", "US")
sh.addShardToZone("shard2", "EU")

// Define zone ranges
sh.updateZoneKeyRange(
    "mydb.users",
    { region: "US", user_id: MinKey },
    { region: "US", user_id: MaxKey },
    "US"
)

sh.updateZoneKeyRange(
    "mydb.users",
    { region: "EU", user_id: MinKey },
    { region: "EU", user_id: MaxKey },
    "EU"
)

// Shard collection with zone-aware key
sh.shardCollection("mydb.users", { region: 1, user_id: 1 })

分片键选择：

javascript

-- 优秀分片键的特征：
-- 1. 高基数（大量不同值）
-- 2. 分布均匀
-- 3. 查询隔离（查询指向特定分片）

-- 示例：基于用户的应用
sh.shardCollection("mydb.users", { user_id: "hashed" })

-- 哈希分片键：分布均匀，数据位置随机
sh.shardCollection("mydb.events", { event_id: "hashed" })

-- 范围分片键：数据有序，适合范围查询
sh.shardCollection("mydb.logs", { timestamp: 1, server_id: 1 })

-- 复合分片键
sh.shardCollection("mydb.orders", {
    customer_region: 1,  // 粗粒度分组
    order_date: 1        // 细粒度分组
})

分片搭建：

javascript

-- 1. 启动配置服务器（副本集）
mongod --configsvr --replSet configRS --port 27019

-- 2. 初始化配置服务器副本集
rs.initiate({
    _id: "configRS",
    configsvr: true,
    members: [
        { _id: 0, host: "cfg1.example.com:27019" },
        { _id: 1, host: "cfg2.example.com:27019" },
        { _id: 2, host: "cfg3.example.com:27019" }
    ]
})

-- 3. 启动分片服务器（每个都是副本集）
mongod --shardsvr --replSet shard1RS --port 27018

-- 4. 启动mongos（查询路由器）
mongos --configdb configRS/cfg1.example.com:27019,cfg2.example.com:27019

-- 5. 向集群添加分片
sh.addShard("shard1RS/shard1-a.example.com:27018")
sh.addShard("shard2RS/shard2-a.example.com:27018")

-- 6. 为数据库启用分片
sh.enableSharding("mydb")

-- 7. 为集合分片
sh.shardCollection("mydb.users", { user_id: "hashed" })

查询定位：

javascript

-- 定向查询（包含分片键）
db.users.find({ user_id: "12345" })
-- 路由到单个分片

-- 散射-聚集查询（无分片键）
db.users.find({ email: "user@example.com" })
-- 查询所有分片，合并结果

-- 检查查询定位
db.users.find({ user_id: "12345" }).explain()
-- 查找"SINGLE_SHARD" vs "ALL_SHARDS"

区域分片（地理分布）：

javascript

-- 为地理分片定义区域
sh.addShardToZone("shard1", "US")
sh.addShardToZone("shard2", "EU")

-- 定义区域范围
sh.updateZoneKeyRange(
    "mydb.users",
    { region: "US", user_id: MinKey },
    { region: "US", user_id: MaxKey },
    "US"
)

sh.updateZoneKeyRange(
    "mydb.users",
    { region: "EU", user_id: MinKey },
    { region: "EU", user_id: MaxKey },
    "EU"
)

-- 使用支持区域的键分片集合
sh.shardCollection("mydb.users", { region: 1, user_id: 1 })

PostgreSQL Horizontal Partitioning

PostgreSQL水平分区

Declarative Partitioning:

sql

-- Range partitioning
CREATE TABLE logs (
    id BIGSERIAL,
    log_time TIMESTAMP NOT NULL,
    message TEXT,
    level VARCHAR(10)
) PARTITION BY RANGE (log_time);

-- Create partitions
CREATE TABLE logs_2025_01 PARTITION OF logs
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE logs_2025_02 PARTITION OF logs
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- List partitioning
CREATE TABLE customers (
    id SERIAL,
    name VARCHAR(255),
    region VARCHAR(50)
) PARTITION BY LIST (region);

CREATE TABLE customers_us PARTITION OF customers
FOR VALUES IN ('US', 'CA', 'MX');

CREATE TABLE customers_eu PARTITION OF customers
FOR VALUES IN ('UK', 'DE', 'FR', 'IT');

-- Hash partitioning
CREATE TABLE events (
    id BIGSERIAL,
    event_type VARCHAR(50),
    data JSONB
) PARTITION BY HASH (id);

CREATE TABLE events_0 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE events_1 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
-- ... events_2 and events_3

Partition Pruning (Query Optimization):

sql

-- Query automatically uses only relevant partition
SELECT * FROM logs
WHERE log_time BETWEEN '2025-01-15' AND '2025-01-20';
-- Only scans logs_2025_01 partition

-- Check query plan
EXPLAIN SELECT * FROM logs WHERE log_time > '2025-01-01';
-- Shows which partitions are scanned

声明式分区：

sql

-- 范围分区
CREATE TABLE logs (
    id BIGSERIAL,
    log_time TIMESTAMP NOT NULL,
    message TEXT,
    level VARCHAR(10)
) PARTITION BY RANGE (log_time);

-- 创建分区
CREATE TABLE logs_2025_01 PARTITION OF logs
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE logs_2025_02 PARTITION OF logs
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- 列表分区
CREATE TABLE customers (
    id SERIAL,
    name VARCHAR(255),
    region VARCHAR(50)
) PARTITION BY LIST (region);

CREATE TABLE customers_us PARTITION OF customers
FOR VALUES IN ('US', 'CA', 'MX');

CREATE TABLE customers_eu PARTITION OF customers
FOR VALUES IN ('UK', 'DE', 'FR', 'IT');

-- 哈希分区
CREATE TABLE events (
    id BIGSERIAL,
    event_type VARCHAR(50),
    data JSONB
) PARTITION BY HASH (id);

CREATE TABLE events_0 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE events_1 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
-- ... events_2 和 events_3

分区裁剪（查询优化）：

sql

-- 查询自动仅使用相关分区
SELECT * FROM logs
WHERE log_time BETWEEN '2025-01-15' AND '2025-01-20';
-- 仅扫描logs_2025_01分区

-- 检查查询计划
EXPLAIN SELECT * FROM logs WHERE log_time > '2025-01-01';
-- 显示扫描的分区

Performance Tuning

性能调优

Query Optimization Techniques

查询优化技巧

PostgreSQL Query Analysis:

sql

-- Basic explain
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

-- Analyze with actual execution statistics
EXPLAIN ANALYZE
SELECT u.username, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.active = true
GROUP BY u.id, u.username
ORDER BY post_count DESC
LIMIT 10;

-- Identify slow queries
SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time,
    max_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- Table statistics
ANALYZE users; -- Update query planner statistics

-- Vacuum and analyze
VACUUM ANALYZE posts; -- Reclaim space and update stats

Common Query Patterns:

sql

-- Avoid SELECT * (retrieve only needed columns)
-- BAD
SELECT * FROM users WHERE id = 123;

-- GOOD
SELECT id, username, email FROM users WHERE id = 123;

-- Use EXISTS instead of IN for large subqueries
-- BAD
SELECT * FROM posts WHERE author_id IN (
    SELECT id FROM users WHERE active = true
);

-- GOOD
SELECT * FROM posts p WHERE EXISTS (
    SELECT 1 FROM users u
    WHERE u.id = p.author_id AND u.active = true
);

-- Use JOINs instead of multiple queries
-- BAD (N+1 query problem)
-- SELECT * FROM posts;
-- Then for each post: SELECT * FROM users WHERE id = post.author_id;

-- GOOD
SELECT p.*, u.username, u.email
FROM posts p
JOIN users u ON p.author_id = u.id;

-- Window functions instead of self-joins
-- Calculate running total
SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders;

-- Rank within groups
SELECT
    category,
    product_name,
    sales,
    RANK() OVER (PARTITION BY category ORDER BY sales DESC) as rank_in_category
FROM products;

MongoDB Query Optimization:

javascript

// Use projection to limit returned fields
// BAD
db.users.find({ active: true })

// GOOD
db.users.find(
    { active: true },
    { username: 1, email: 1, _id: 0 }
)

// Use covered queries (index covers all fields)
db.users.createIndex({ username: 1, email: 1 })
db.users.find(
    { username: "john_doe" },
    { username: 1, email: 1, _id: 0 }
) // Entire query served from index

// Avoid negation operators
// BAD (cannot use index efficiently)
db.products.find({ status: { $ne: "discontinued" } })

// GOOD
db.products.find({ status: { $in: ["active", "pending", "sold"] } })

// Use $lookup sparingly (expensive operation)
// Consider embedding data instead if appropriate

// Aggregation optimization: Filter early
// BAD
db.orders.aggregate([
    { $lookup: { ... } },        // Expensive join
    { $match: { status: "completed" } } // Filter after join
])

// GOOD
db.orders.aggregate([
    { $match: { status: "completed" } }, // Filter first
    { $lookup: { ... } }                  // Join fewer documents
])

PostgreSQL查询分析：

sql

-- 基础执行计划
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

-- 带实际执行统计的分析
EXPLAIN ANALYZE
SELECT u.username, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.active = true
GROUP BY u.id, u.username
ORDER BY post_count DESC
LIMIT 10;

-- 识别慢查询
SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time,
    max_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- 表统计信息
ANALYZE users; -- 更新查询优化器统计信息

-- 清理与分析
VACUUM ANALYZE posts; -- 回收空间并更新统计信息

常见查询模式：

sql

-- 避免SELECT *（仅获取需要的列）
-- 糟糕写法
SELECT * FROM users WHERE id = 123;

-- 优秀写法
SELECT id, username, email FROM users WHERE id = 123;

-- 大子查询用EXISTS替代IN
-- 糟糕写法
SELECT * FROM posts WHERE author_id IN (
    SELECT id FROM users WHERE active = true
);

-- 优秀写法
SELECT * FROM posts p WHERE EXISTS (
    SELECT 1 FROM users u
    WHERE u.id = p.author_id AND u.active = true
);

-- 用JOIN替代多查询
-- 糟糕写法（N+1查询问题）
-- SELECT * FROM posts;
-- 然后对每篇文章执行：SELECT * FROM users WHERE id = post.author_id;

-- 优秀写法
SELECT p.*, u.username, u.email
FROM posts p
JOIN users u ON p.author_id = u.id;

-- 用窗口函数替代自连接
-- 计算累计总和
SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders;

-- 分组内排名
SELECT
    category,
    product_name,
    sales,
    RANK() OVER (PARTITION BY category ORDER BY sales DESC) as rank_in_category
FROM products;

MongoDB查询优化：

javascript

-- 使用投影限制返回字段
-- 糟糕写法
db.users.find({ active: true })

-- 优秀写法
db.users.find(
    { active: true },
    { username: 1, email: 1, _id: 0 }
)

-- 使用覆盖查询（索引包含所有字段）
db.users.createIndex({ username: 1, email: 1 })
db.users.find(
    { username: "john_doe" },
    { username: 1, email: 1, _id: 0 }
) -- 整个查询从索引获取

-- 避免否定运算符
-- 糟糕写法（无法高效使用索引）
db.products.find({ status: { $ne: "discontinued" } })

-- 优秀写法
db.products.find({ status: { $in: ["active", "pending", "sold"] } })

-- 谨慎使用$lookup（开销大）
-- 合适的话考虑嵌入数据

-- 聚合优化：尽早过滤
-- 糟糕写法
db.orders.aggregate([
    { $lookup: { ... } },        // 开销大的关联
    { $match: { status: "completed" } } // 关联后过滤
])

-- 优秀写法
db.orders.aggregate([
    { $match: { status: "completed" } }, // 先过滤
    { $lookup: { ... } }                  // 关联更少的文档
])

Connection Pooling

连接池

PostgreSQL Connection Pooling:

javascript

// Using node-postgres (pg) with pool
const { Pool } = require('pg')

const pool = new Pool({
    host: 'localhost',
    port: 5432,
    database: 'mydb',
    user: 'dbuser',
    password: 'secret',
    max: 20,              // Maximum pool size
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 2000
})

// Execute query
const result = await pool.query('SELECT * FROM users WHERE id = $1', [123])

// Use PgBouncer for server-side pooling
// pgbouncer.ini
// [databases]
// mydb = host=localhost port=5432 dbname=mydb
//
// [pgbouncer]
// pool_mode = transaction
// max_client_conn = 1000
// default_pool_size = 25

MongoDB Connection Pooling:

javascript

// MongoClient automatically manages connection pool
const { MongoClient } = require('mongodb')

const client = new MongoClient(uri, {
    maxPoolSize: 50,           // Max connections
    minPoolSize: 10,           // Min connections
    maxIdleTimeMS: 30000,      // Close idle connections
    waitQueueTimeoutMS: 5000   // Wait for available connection
})

await client.connect()
const db = client.db('mydb')
// Connection automatically returned to pool after use

PostgreSQL连接池：

javascript

-- 使用node-postgres（pg）的连接池
const { Pool } = require('pg')

const pool = new Pool({
    host: 'localhost',
    port: 5432,
    database: 'mydb',
    user: 'dbuser',
    password: 'secret',
    max: 20,              // 最大连接池大小
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 2000
})

-- 执行查询
const result = await pool.query('SELECT * FROM users WHERE id = $1', [123])

-- 使用PgBouncer实现服务端连接池
-- pgbouncer.ini
-- [databases]
-- mydb = host=localhost port=5432 dbname=mydb
--
-- [pgbouncer]
-- pool_mode = transaction
-- max_client_conn = 1000
-- default_pool_size = 25

MongoDB连接池：

javascript

-- MongoClient自动管理连接池
const { MongoClient } = require('mongodb')

const client = new MongoClient(uri, {
    maxPoolSize: 50,           // 最大连接数
    minPoolSize: 10,           // 最小连接数
    maxIdleTimeMS: 30000,      // 关闭空闲连接的时间
    waitQueueTimeoutMS: 5000   // 等待可用连接的超时时间
})

await client.connect()
const db = client.db('mydb')
-- 使用后连接自动返回连接池

Best Practices

最佳实践

PostgreSQL Best Practices

PostgreSQL最佳实践

Schema Design
- Normalize for data integrity, denormalize for performance
- Use appropriate data types (avoid TEXT for short strings)
- Define NOT NULL constraints where appropriate
- Use SERIAL or UUID for primary keys consistently
Indexing
- Index foreign keys for JOIN performance
- Create indexes on frequently filtered/sorted columns
- Use partial indexes for selective queries
- Monitor and remove unused indexes
- Keep composite index column count reasonable (typically ≤ 3-4)
Query Performance
- Use EXPLAIN ANALYZE to understand query plans
- Avoid SELECT * in application code
- Use prepared statements to prevent SQL injection
- Limit result sets with LIMIT
- Use connection pooling
Maintenance
- Run VACUUM regularly (or enable autovacuum)
- Update statistics with ANALYZE
- Monitor slow query log
- Set appropriate autovacuum thresholds
- Regular backup with pg_dump or WAL archiving
Security
- Use SSL/TLS for connections
- Implement row-level security for multi-tenant apps
- Grant minimum necessary privileges
- Use parameterized queries
- Regular security updates

架构设计
- 为保证数据完整性进行规范化，为性能进行非规范化
- 使用合适的数据类型（短字符串避免用TEXT）
- 合理定义NOT NULL约束
- 统一使用SERIAL或UUID作为主键
索引
- 为外键创建索引提升JOIN性能
- 为频繁过滤/排序的列创建索引
- 对选择性查询使用部分索引
- 监控并删除未使用的索引
- 复合索引的列数保持合理（通常≤3-4列）
查询性能
- 使用EXPLAIN ANALYZE理解查询计划
- 应用代码中避免SELECT *
- 使用预编译语句防止SQL注入
- 用LIMIT限制结果集
- 使用连接池
维护
- 定期运行VACUUM（或启用自动清理）
- 用ANALYZE更新统计信息
- 监控慢查询日志
- 配置合理的自动清理阈值
- 定期用pg_dump或WAL归档备份
安全
- 连接使用SSL/TLS
- 多租户应用实现行级安全
- 授予最小必要权限
- 使用参数化查询
- 定期进行安全更新

MongoDB Best Practices

MongoDB最佳实践

Schema Design
- Embed related data that is accessed together
- Reference data that is large or rarely accessed
- Use polymorphic pattern for varied schemas
- Limit document size to reasonable bounds (< 1-2 MB typically)
- Design for your query patterns
Indexing
- Index on fields used in queries and sorts
- Use compound indexes with ESR rule (Equality, Sort, Range)
- Create text indexes for full-text search
- Monitor index usage with $indexStats
- Avoid too many indexes (write performance impact)
Query Performance
- Use projection to limit returned fields
- Create covered queries when possible
- Filter early in aggregation pipelines
- Avoid $lookup when embedding is appropriate
- Use explain() to verify index usage
Scalability
- Choose appropriate shard key (high cardinality, even distribution)
- Use replica sets for high availability
- Configure appropriate read/write concerns
- Monitor chunk distribution in sharded clusters
- Use zones for geographic distribution
Operations
- Enable authentication and authorization
- Use TLS for client connections
- Regular backups (mongodump or filesystem snapshots)
- Monitor with MongoDB Atlas, Ops Manager, or custom tools
- Keep MongoDB version updated

架构设计
- 嵌入一起访问的相关数据
- 引用大型或极少访问的数据
- 用多态模式处理多样架构
- 文档大小控制在合理范围（通常<1-2MB）
- 围绕查询模式设计架构
索引
- 为查询与排序的字段创建索引
- 遵循ESR规则（相等、排序、范围）创建复合索引
- 为全文搜索创建文本索引
- 用$indexStats监控索引使用情况
- 避免过多索引（影响写入性能）
查询性能
- 用投影限制返回字段
- 尽可能使用覆盖查询
- 聚合流水线中尽早过滤
- 合适时用嵌入替代$lookup
- 用explain()验证索引使用
扩展性
- 选择合适的分片键（高基数、分布均匀）
- 用副本集实现高可用
- 配置合理的读写关注点
- 监控分片集群的块分布
- 用区域分片实现地理分布
运维
- 启用认证与授权
- 客户端连接使用TLS
- 定期备份（mongodump或文件系统快照）
- 用MongoDB Atlas、Ops Manager或自定义工具监控
- 保持MongoDB版本更新

Data Modeling Decision Framework

数据建模决策框架

Choose PostgreSQL when:

Strong ACID guarantees required (financial transactions)
Complex relationships with many JOINs
Data structure is well-defined and stable
Need for advanced SQL features (window functions, CTEs, stored procedures)
Compliance requirements demand strict consistency

Choose MongoDB when:

Schema flexibility needed (rapid development, evolving requirements)
Horizontal scalability is priority (sharding required)
Document-oriented data (JSON/BSON native format)
Hierarchical or nested data structures
High write throughput with eventual consistency acceptable

Hybrid Approach:

Use both databases for different parts of application
PostgreSQL for transactional data (orders, payments)
MongoDB for catalog, logs, user sessions
Synchronize critical data between systems

选择PostgreSQL的场景：

需要强ACID保障（如金融交易）
存在复杂数据关系与多JOIN操作
数据结构明确且稳定
需要高级SQL特性（窗口函数、CTE、存储过程）
合规要求严格一致性

选择MongoDB的场景：

需要架构灵活性（快速迭代、需求变化）
优先考虑水平扩展性（需要分片）
数据为文档型（原生JSON/BSON）
存在层级或嵌套数据结构
高写入吞吐量，最终一致性可接受

混合方案：

不同系统部分使用不同数据库
PostgreSQL处理事务数据（订单、支付）
MongoDB处理目录、日志、用户会话
关键数据在系统间同步

Common Patterns and Anti-Patterns

常见模式与反模式

PostgreSQL Anti-Patterns

PostgreSQL反模式

❌ Storing JSON when relational fits better

sql

-- BAD: Using JSONB for structured, queryable data
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    data JSONB -- { name, email, address: { street, city, state } }
);

-- GOOD: Proper normalization
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255)
);

CREATE TABLE addresses (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    street VARCHAR(255),
    city VARCHAR(100),
    state VARCHAR(50)
);

❌ Over-indexing

sql

-- BAD: Index on every column "just in case"
CREATE INDEX idx1 ON users(username);
CREATE INDEX idx2 ON users(email);
CREATE INDEX idx3 ON users(created_at);
CREATE INDEX idx4 ON users(updated_at);
CREATE INDEX idx5 ON users(active);
-- Result: Slow writes, large database size

-- GOOD: Index based on actual query patterns
CREATE INDEX idx_users_email ON users(email); -- Login queries
CREATE INDEX idx_active_users_created ON users(created_at) WHERE active = true; -- Partial

❌ N+1 Query Problem

sql

-- BAD: Multiple queries in loop
SELECT * FROM posts; -- Returns 100 posts
-- Then for each post:
SELECT * FROM users WHERE id = ?; -- 100 additional queries!

-- GOOD: Single query with JOIN
SELECT p.*, u.username, u.email
FROM posts p
JOIN users u ON p.author_id = u.id;

❌ 用JSON存储适合关系型的数据

sql

-- 糟糕写法：用JSONB存储结构化、可查询的数据
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    data JSONB -- { name, email, address: { street, city, state } }
);

-- 优秀写法：合理规范化
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255)
);

CREATE TABLE addresses (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    street VARCHAR(255),
    city VARCHAR(100),
    state VARCHAR(50)
);

❌ 过度索引

sql

-- 糟糕写法：为每个列都建索引"以防万一"
CREATE INDEX idx1 ON users(username);
CREATE INDEX idx2 ON users(email);
CREATE INDEX idx3 ON users(created_at);
CREATE INDEX idx4 ON users(updated_at);
CREATE INDEX idx5 ON users(active);
-- 结果：写入缓慢，数据库体积过大

-- 优秀写法：根据实际查询模式建索引
CREATE INDEX idx_users_email ON users(email); -- 登录查询用
CREATE INDEX idx_active_users_created ON users(created_at) WHERE active = true; -- 部分索引

❌ N+1查询问题

sql

-- 糟糕写法：循环执行多查询
SELECT * FROM posts; -- 返回100篇文章
-- 然后对每篇文章执行：SELECT * FROM users WHERE id = ?; -- 额外100次查询！

-- 优秀写法：单查询用JOIN
SELECT p.*, u.username, u.email
FROM posts p
JOIN users u ON p.author_id = u.id;

MongoDB Anti-Patterns

MongoDB反模式

❌ Massive arrays in documents

javascript

// BAD: Unbounded array growth
{
    _id: ObjectId("..."),
    username: "popular_user",
    followers: [
        ObjectId("follower1"),
        ObjectId("follower2"),
        // ... 100,000+ follower IDs
        // Document exceeds 16MB limit!
    ]
}

// GOOD: Separate collection with references
// users collection
{ _id: ObjectId("..."), username: "popular_user" }

// followers collection
{ _id: ObjectId("..."), user_id: ObjectId("..."), follower_id: ObjectId("...") }
db.followers.createIndex({ user_id: 1, follower_id: 1 })

❌ Poor shard key selection

javascript

// BAD: Monotonically increasing shard key
sh.shardCollection("mydb.events", { _id: 1 })
// All writes go to same shard (highest _id range)

// BAD: Low cardinality shard key
sh.shardCollection("mydb.users", { country: 1 })
// Most users in few countries = uneven distribution

// GOOD: Hashed _id or compound key
sh.shardCollection("mydb.events", { _id: "hashed" }) // Even distribution
sh.shardCollection("mydb.users", { country: 1, user_id: 1 }) // Compound

❌ Ignoring indexes on embedded documents

javascript

// Document structure
{
    username: "john_doe",
    profile: {
        email: "john@example.com",
        age: 30,
        city: "San Francisco"
    }
}

// Query on embedded field
db.users.find({ "profile.email": "john@example.com" })

// MISSING: Index on embedded field
db.users.createIndex({ "profile.email": 1 })

❌ 文档中包含超大数组

javascript

-- 糟糕写法：数组无限制增长
{
    _id: ObjectId("..."),
    username: "popular_user",
    followers: [
        ObjectId("follower1"),
        ObjectId("follower2"),
        // ... 10万+关注者ID
        // 文档超过16MB限制！
    ]
}

-- 优秀写法：用独立集合存储引用
-- 用户集合
{ _id: ObjectId("..."), username: "popular_user" }

-- 关注者集合
{ _id: ObjectId("..."), user_id: ObjectId("..."), follower_id: ObjectId("...") }
db.followers.createIndex({ user_id: 1, follower_id: 1 })

❌ 分片键选择不当

javascript

-- 糟糕写法：单调递增的分片键
sh.shardCollection("mydb.events", { _id: 1 })
-- 所有写入都到同一个分片（最高_id范围）

-- 糟糕写法：低基数分片键
sh.shardCollection("mydb.users", { country: 1 })
-- 多数用户集中在少数国家，分布不均

-- 优秀写法：哈希_id或复合键
sh.shardCollection("mydb.events", { _id: "hashed" }) // 分布均匀
sh.shardCollection("mydb.users", { country: 1, user_id: 1 }) // 复合键

❌ 忽略嵌入式文档的索引

javascript

-- 文档结构
{
    username: "john_doe",
    profile: {
        email: "john@example.com",
        age: 30,
        city: "San Francisco"
    }
}

-- 查询嵌入式字段
db.users.find({ "profile.email": "john@example.com" })

-- 缺失：嵌入式字段的索引
db.users.createIndex({ "profile.email": 1 })

Troubleshooting Guide

故障排查指南

PostgreSQL Issues

PostgreSQL问题

Slow Queries:

sql

-- Enable slow query logging (postgresql.conf)
-- log_min_duration_statement = 1000  # Log queries > 1 second

-- Find slow queries
SELECT
    query,
    calls,
    total_exec_time / calls as avg_time_ms,
    rows / calls as avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY total_exec_time DESC
LIMIT 20;

-- Analyze specific slow query
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT ... FROM ... WHERE ...;

High CPU Usage:

sql

-- Check running queries
SELECT
    pid,
    now() - query_start as duration,
    state,
    query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- Terminate long-running query
SELECT pg_terminate_backend(pid);

Lock Contention:

sql

-- View locks
SELECT
    locktype,
    relation::regclass,
    mode,
    granted,
    pid
FROM pg_locks
WHERE NOT granted;

-- Find blocking queries
SELECT
    blocked_locks.pid AS blocked_pid,
    blocking_locks.pid AS blocking_pid,
    blocked_activity.query AS blocked_query,
    blocking_activity.query AS blocking_query
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted AND blocking_locks.granted;

慢查询：

sql

-- 启用慢查询日志（postgresql.conf）
-- log_min_duration_statement = 1000  # 记录执行时间>1秒的查询

-- 查找慢查询
SELECT
    query,
    calls,
    total_exec_time / calls as avg_time_ms,
    rows / calls as avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY total_exec_time DESC
LIMIT 20;

-- 分析特定慢查询
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT ... FROM ... WHERE ...;

CPU使用率高：

sql

-- 检查运行中的查询
SELECT
    pid,
    now() - query_start as duration,
    state,
    query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- 终止长时运行的查询
SELECT pg_terminate_backend(pid);

锁竞争：

sql

-- 查看锁信息
SELECT
    locktype,
    relation::regclass,
    mode,
    granted,
    pid
FROM pg_locks
WHERE NOT granted;

-- 查找阻塞查询
SELECT
    blocked_locks.pid AS blocked_pid,
    blocking_locks.pid AS blocking_pid,
    blocked_activity.query AS blocked_query,
    blocking_activity.query AS blocking_query
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted AND blocking_locks.granted;

MongoDB Issues

MongoDB问题

Slow Queries:

javascript

// Enable profiling
db.setProfilingLevel(1, { slowms: 100 }) // Log queries > 100ms

// View slow queries
db.system.profile.find().sort({ ts: -1 }).limit(10)

// Analyze query performance
db.collection.find({ ... }).explain("executionStats")
// Check: totalDocsExamined vs nReturned (should be close)
// Check: executionTimeMillis
// Check: indexName (should show index usage)

Replication Lag:

javascript

// Check lag on secondary
rs.printSecondaryReplicationInfo()

// Check oplog size
db.getReplicationInfo()

// Increase oplog size if needed
db.adminCommand({ replSetResizeOplog: 1, size: 16384 }) // 16GB

Sharding Issues:

javascript

// Check chunk distribution
sh.status()

// Check balancer status
sh.getBalancerState()
sh.isBalancerRunning()

// Balance specific collection
sh.enableBalancing("mydb.mycollection")

// Check for jumbo chunks
db.chunks.find({ jumbo: true })

慢查询：

javascript

-- 启用性能分析
db.setProfilingLevel(1, { slowms: 100 }) // 记录执行时间>100ms的查询

-- 查看慢查询
db.system.profile.find().sort({ ts: -1 }).limit(10)

-- 分析查询性能
db.collection.find({ ... }).explain("executionStats")
-- 检查：totalDocsExamined vs nReturned（应接近）
-- 检查：executionTimeMillis
-- 检查：indexName（应显示使用的索引）

复制延迟：

javascript

-- 检查从节点延迟
rs.printSecondaryReplicationInfo()

-- 检查oplog大小
db.getReplicationInfo()

-- 必要时增大oplog大小
db.adminCommand({ replSetResizeOplog: 1, size: 16384 }) // 16GB

分片问题：

javascript

-- 检查块分布
sh.status()

-- 检查均衡器状态
sh.getBalancerState()
sh.isBalancerRunning()

-- 为特定集合启用均衡
sh.enableBalancing("mydb.mycollection")

-- 查找超大块
db.chunks.find({ jumbo: true })

Resources

资源

PostgreSQL Resources

PostgreSQL资源

Official Documentation: https://www.postgresql.org/docs/
PostgreSQL Wiki: https://wiki.postgresql.org/
Performance Tuning: https://wiki.postgresql.org/wiki/Performance_Optimization
Explain Visualizer: https://explain.dalibo.com/
pg_stat_statements Extension: Essential for query analysis

官方文档：https://www.postgresql.org/docs/
PostgreSQL Wiki：https://wiki.postgresql.org/
性能调优：https://wiki.postgresql.org/wiki/Performance_Optimization
执行计划可视化：https://explain.dalibo.com/
pg_stat_statements扩展：查询分析必备

MongoDB Resources

MongoDB资源

Official Documentation: https://docs.mongodb.com/
MongoDB University: Free courses and certification
Aggregation Framework: https://docs.mongodb.com/manual/aggregation/
Sharding Guide: https://docs.mongodb.com/manual/sharding/
Schema Design Patterns: https://www.mongodb.com/blog/post/building-with-patterns-a-summary

官方文档：https://docs.mongodb.com/
MongoDB大学：免费课程与认证
聚合框架：https://docs.mongodb.com/manual/aggregation/
分片指南：https://docs.mongodb.com/manual/sharding/
架构设计模式：https://www.mongodb.com/blog/post/building-with-patterns-a-summary

Books

书籍

PostgreSQL: "PostgreSQL: Up and Running" by Regina Obe & Leo Hsu
MongoDB: "MongoDB: The Definitive Guide" by Shannon Bradshaw, Eoin Brazil, Kristina Chodorow

Skill Version: 1.0.0 Last Updated: January 2025 Skill Category: Database Management, Data Architecture, Performance Optimization Technologies: PostgreSQL 16+, MongoDB 7+

PostgreSQL：《PostgreSQL实战》Regina Obe & Leo Hsu 著
MongoDB：《MongoDB权威指南》Shannon Bradshaw, Eoin Brazil, Kristina Chodorow 著

技能版本：1.0.0 最后更新：2025年1月 技能分类：数据库管理、数据架构、性能优化 技术栈：PostgreSQL 16+, MongoDB 7+",