database-design
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatabase Design
数据库设计
When to Load
何时启用
- Trigger: Schema design, migrations, query optimization, indexing strategies, data modeling, N+1 fixes
- Skip: No database work involved in the current task
- 触发条件:Schema设计、数据迁移、查询优化、索引策略、数据建模、N+1问题修复
- 跳过场景:当前任务不涉及任何数据库相关工作
Database Design Workflow
数据库设计工作流
Copy this checklist and track progress:
Database Design Progress:
- [ ] Step 1: Identify entities and relationships
- [ ] Step 2: Normalize schema (3NF minimum)
- [ ] Step 3: Evaluate denormalization needs
- [ ] Step 4: Design indexes for query patterns
- [ ] Step 5: Write and optimize critical queries
- [ ] Step 6: Plan migration strategy
- [ ] Step 7: Configure connection pooling
- [ ] Step 8: Validate against anti-patterns checklist复制以下检查清单跟踪进度:
Database Design Progress:
- [ ] Step 1: Identify entities and relationships
- [ ] Step 2: Normalize schema (3NF minimum)
- [ ] Step 3: Evaluate denormalization needs
- [ ] Step 4: Design indexes for query patterns
- [ ] Step 5: Write and optimize critical queries
- [ ] Step 6: Plan migration strategy
- [ ] Step 7: Configure connection pooling
- [ ] Step 8: Validate against anti-patterns checklistSchema Design Principles
Schema设计原则
Normalization Forms
范式等级
1NF: Atomic values, no repeating groups
2NF: 1NF + no partial dependencies (all non-key columns depend on full PK)
3NF: 2NF + no transitive dependencies (non-key columns don't depend on other non-key columns)sql
-- WRONG: Unnormalized
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_name TEXT,
customer_email TEXT, -- duplicated across orders
product1_name TEXT, -- repeating groups
product1_qty INT,
product2_name TEXT,
product2_qty INT
);
-- CORRECT: Normalized to 3NF
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id),
product_id INT REFERENCES products(id),
quantity INT NOT NULL CHECK (quantity > 0)
);1NF: Atomic values, no repeating groups
2NF: 1NF + no partial dependencies (all non-key columns depend on full PK)
3NF: 2NF + no transitive dependencies (non-key columns don't depend on other non-key columns)sql
-- WRONG: Unnormalized
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_name TEXT,
customer_email TEXT, -- duplicated across orders
product1_name TEXT, -- repeating groups
product1_qty INT,
product2_name TEXT,
product2_qty INT
);
-- CORRECT: Normalized to 3NF
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id),
product_id INT REFERENCES products(id),
quantity INT NOT NULL CHECK (quantity > 0)
);When to Denormalize
何时进行反范式设计
Denormalize only when you have measured proof of performance issues:
sql
-- Acceptable denormalization: precomputed counter to avoid COUNT(*)
ALTER TABLE posts ADD COLUMN comment_count INT DEFAULT 0;
-- Update via trigger or application code
CREATE FUNCTION update_comment_count() RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
UPDATE posts SET comment_count = comment_count + 1 WHERE id = NEW.post_id;
ELSIF TG_OP = 'DELETE' THEN
UPDATE posts SET comment_count = comment_count - 1 WHERE id = OLD.post_id;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;仅当你有可量化的性能问题证明时才进行反范式设计:
sql
-- Acceptable denormalization: precomputed counter to avoid COUNT(*)
ALTER TABLE posts ADD COLUMN comment_count INT DEFAULT 0;
-- Update via trigger or application code
CREATE FUNCTION update_comment_count() RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
UPDATE posts SET comment_count = comment_count + 1 WHERE id = NEW.post_id;
ELSIF TG_OP = 'DELETE' THEN
UPDATE posts SET comment_count = comment_count - 1 WHERE id = OLD.post_id;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;Indexing Strategy
索引策略
Index Types and When to Use
索引类型及适用场景
B-tree (default): Equality, range, sorting, LIKE 'prefix%'
Hash: Equality only (rarely better than B-tree)
GIN: Full-text search, JSONB, arrays
GiST: Geometry, range types, full-text
BRIN: Large tables with naturally ordered data (timestamps)B-tree (default): Equality, range, sorting, LIKE 'prefix%'
Hash: Equality only (rarely better than B-tree)
GIN: Full-text search, JSONB, arrays
GiST: Geometry, range types, full-text
BRIN: Large tables with naturally ordered data (timestamps)Composite Indexes
联合索引
sql
-- Column order matters: leftmost prefix rule
CREATE INDEX idx_users_status_created ON users (status, created_at);
-- This index supports:
-- WHERE status = 'active' -- YES
-- WHERE status = 'active' AND created_at > '2024' -- YES
-- WHERE created_at > '2024' -- NO (skips first column)列顺序非常重要:遵循最左前缀匹配规则
sql
-- Column order matters: leftmost prefix rule
CREATE INDEX idx_users_status_created ON users (status, created_at);
-- This index supports:
-- WHERE status = 'active' -- YES
-- WHERE status = 'active' AND created_at > '2024' -- YES
-- WHERE created_at > '2024' -- NO (skips first column)Partial and Covering Indexes
部分索引与覆盖索引
sql
-- Partial index: only index rows matching condition
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending'; -- smaller index, faster lookups
-- Covering index: include columns to avoid table lookup
CREATE INDEX idx_users_email_covering ON users (email)
INCLUDE (name, avatar_url); -- index-only scan for profile lookupssql
-- Partial index: only index rows matching condition
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending'; -- smaller index, faster lookups
-- Covering index: include columns to avoid table lookup
CREATE INDEX idx_users_email_covering ON users (email)
INCLUDE (name, avatar_url); -- index-only scan for profile lookupsIndex Anti-patterns
索引反模式
sql
-- WRONG: Index on low-cardinality column alone
CREATE INDEX idx_users_active ON users (is_active); -- boolean = 2 values
-- WRONG: Too many indexes (slows writes)
-- Every INSERT/UPDATE must update ALL indexes
-- CORRECT: Composite index targeting actual queries
CREATE INDEX idx_users_active_created ON users (is_active, created_at DESC)
WHERE is_active = true;sql
-- WRONG: Index on low-cardinality column alone
CREATE INDEX idx_users_active ON users (is_active); -- boolean = 2 values
-- WRONG: Too many indexes (slows writes)
-- Every INSERT/UPDATE must update ALL indexes
-- CORRECT: Composite index targeting actual queries
CREATE INDEX idx_users_active_created ON users (is_active, created_at DESC)
WHERE is_active = true;Query Optimization
查询优化
Reading EXPLAIN Plans
解读EXPLAIN执行计划
sql
EXPLAIN ANALYZE SELECT u.name, COUNT(o.id)
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.status = 'active'
GROUP BY u.name;
-- Key things to look for:
-- Seq Scan -> missing index (on large tables)
-- Nested Loop -> fine for small sets, bad for large joins
-- Hash Join -> good for large equi-joins
-- Sort -> consider index to avoid sort
-- actual time -> real execution time
-- rows -> if estimated vs actual differ wildly, run ANALYZEsql
EXPLAIN ANALYZE SELECT u.name, COUNT(o.id)
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.status = 'active'
GROUP BY u.name;
-- Key things to look for:
-- Seq Scan -> missing index (on large tables)
-- Nested Loop -> fine for small sets, bad for large joins
-- Hash Join -> good for large equi-joins
-- Sort -> consider index to avoid sort
-- actual time -> real execution time
-- rows -> if estimated vs actual differ wildly, run ANALYZEN+1 Query Detection and Prevention
N+1查询的检测与预防
python
undefinedpython
undefinedWRONG: N+1 queries (1 query for users + N queries for orders)
WRONG: N+1 queries (1 query for users + N queries for orders)
users = db.query(User).all()
for user in users:
orders = db.query(Order).filter(Order.user_id == user.id).all() # N queries!
users = db.query(User).all()
for user in users:
orders = db.query(Order).filter(Order.user_id == user.id).all() # N queries!
CORRECT: Eager loading with SQLAlchemy
CORRECT: Eager loading with SQLAlchemy
users = db.query(User).options(joinedload(User.orders)).all()
users = db.query(User).options(joinedload(User.orders)).all()
CORRECT: Batch query
CORRECT: Batch query
user_ids = [u.id for u in users]
orders = db.query(Order).filter(Order.user_id.in_(user_ids)).all()
orders_by_user = defaultdict(list)
for order in orders:
orders_by_user[order.user_id].append(order)
```javascript
// WRONG: N+1 with Prisma
const users = await prisma.user.findMany();
for (const user of users) {
const orders = await prisma.order.findMany({ where: { userId: user.id } }); // N+1!
}
// CORRECT: Include relation
const users = await prisma.user.findMany({
include: { orders: true },
});
// CORRECT: Batch with findMany + in
const userIds = users.map((u) => u.id);
const orders = await prisma.order.findMany({
where: { userId: { in: userIds } },
});user_ids = [u.id for u in users]
orders = db.query(Order).filter(Order.user_id.in_(user_ids)).all()
orders_by_user = defaultdict(list)
for order in orders:
orders_by_user[order.user_id].append(order)
```javascript
// WRONG: N+1 with Prisma
const users = await prisma.user.findMany();
for (const user of users) {
const orders = await prisma.order.findMany({ where: { userId: user.id } }); // N+1!
}
// CORRECT: Include relation
const users = await prisma.user.findMany({
include: { orders: true },
});
// CORRECT: Batch with findMany + in
const userIds = users.map((u) => u.id);
const orders = await prisma.order.findMany({
where: { userId: { in: userIds } },
});Pagination
分页
sql
-- WRONG: OFFSET pagination (rescans all skipped rows)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- CORRECT: Cursor-based pagination (keyset)
SELECT * FROM posts
WHERE created_at < '2024-01-15T10:30:00Z'
ORDER BY created_at DESC
LIMIT 20;sql
-- WRONG: OFFSET pagination (rescans all skipped rows)
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- CORRECT: Cursor-based pagination (keyset)
SELECT * FROM posts
WHERE created_at < '2024-01-15T10:30:00Z'
ORDER BY created_at DESC
LIMIT 20;Migration Patterns
迁移模式
Safe Migration Rules
安全迁移规则
1. Never rename a column in one step (add new, migrate data, drop old)
2. Never drop a column that's still read by running code
3. Add columns as nullable or with defaults
4. Create indexes CONCURRENTLY to avoid locking
5. Test rollback before deploying1. Never rename a column in one step (add new, migrate data, drop old)
2. Never drop a column that's still read by running code
3. Add columns as nullable or with defaults
4. Create indexes CONCURRENTLY to avoid locking
5. Test rollback before deployingZero-Downtime Migration Example
零停机迁移示例
sql
-- Step 1: Add new column (safe, no lock)
ALTER TABLE users ADD COLUMN display_name TEXT;
-- Step 2: Backfill data (do in batches)
UPDATE users SET display_name = name WHERE display_name IS NULL AND id BETWEEN 1 AND 10000;
-- Step 3: Deploy code that writes to BOTH columns
-- Step 4: Deploy code that reads from new column
-- Step 5: Drop old column (after confirming no reads)
ALTER TABLE users DROP COLUMN name;sql
-- Step 1: Add new column (safe, no lock)
ALTER TABLE users ADD COLUMN display_name TEXT;
-- Step 2: Backfill data (do in batches)
UPDATE users SET display_name = name WHERE display_name IS NULL AND id BETWEEN 1 AND 10000;
-- Step 3: Deploy code that writes to BOTH columns
-- Step 4: Deploy code that reads from new column
-- Step 5: Drop old column (after confirming no reads)
ALTER TABLE users DROP COLUMN name;Index Creation
索引创建
sql
-- WRONG: Blocks writes on the table
CREATE INDEX idx_orders_user ON orders (user_id);
-- CORRECT: Non-blocking (PostgreSQL)
CREATE INDEX CONCURRENTLY idx_orders_user ON orders (user_id);sql
-- WRONG: Blocks writes on the table
CREATE INDEX idx_orders_user ON orders (user_id);
-- CORRECT: Non-blocking (PostgreSQL)
CREATE INDEX CONCURRENTLY idx_orders_user ON orders (user_id);Connection Pooling
连接池
Rule of thumb: connections = (CPU cores * 2) + disk spindles
For most apps: 10-20 connections per application instancepython
undefinedRule of thumb: connections = (CPU cores * 2) + disk spindles
For most apps: 10-20 connections per application instancepython
undefinedSQLAlchemy connection pool
SQLAlchemy connection pool
engine = create_engine(
DATABASE_URL,
pool_size=10, # maintained connections
max_overflow=20, # extra connections under load
pool_timeout=30, # seconds to wait for connection
pool_recycle=1800, # recycle connections every 30 min
pool_pre_ping=True, # verify connection before use
)
```javascript
// Prisma datasource
// In schema.prisma:
// datasource db {
// provider = "postgresql"
// url = env("DATABASE_URL")
// }
// Connection limit via URL: ?connection_limit=10&pool_timeout=30engine = create_engine(
DATABASE_URL,
pool_size=10, # maintained connections
max_overflow=20, # extra connections under load
pool_timeout=30, # seconds to wait for connection
pool_recycle=1800, # recycle connections every 30 min
pool_pre_ping=True, # verify connection before use
)
```javascript
// Prisma datasource
// In schema.prisma:
// datasource db {
// provider = "postgresql"
// url = env("DATABASE_URL")
// }
// Connection limit via URL: ?connection_limit=10&pool_timeout=30ORM Best Practices
ORM最佳实践
Select Only What You Need
仅查询需要的字段
python
undefinedpython
undefinedWRONG: Fetches all columns
WRONG: Fetches all columns
users = db.query(User).all()
users = db.query(User).all()
CORRECT: Select specific columns
CORRECT: Select specific columns
users = db.query(User.id, User.name).all()
```javascript
// WRONG: Fetches everything
const users = await prisma.user.findMany();
// CORRECT: Select specific fields
const users = await prisma.user.findMany({
select: { id: true, name: true, email: true },
});users = db.query(User.id, User.name).all()
```javascript
// WRONG: Fetches everything
const users = await prisma.user.findMany();
// CORRECT: Select specific fields
const users = await prisma.user.findMany({
select: { id: true, name: true, email: true },
});Bulk Operations
批量操作
python
undefinedpython
undefinedWRONG: Individual inserts in a loop
WRONG: Individual inserts in a loop
for item in items:
db.add(Item(**item))
db.commit() # commit per item!
for item in items:
db.add(Item(**item))
db.commit() # commit per item!
CORRECT: Bulk insert
CORRECT: Bulk insert
db.bulk_insert_mappings(Item, items)
db.commit()
```javascript
// WRONG: Sequential creates
for (const item of items) {
await prisma.item.create({ data: item });
}
// CORRECT: Batch create
await prisma.item.createMany({ data: items });
// CORRECT: Transaction for dependent operations
await prisma.$transaction([
prisma.user.create({ data: userData }),
prisma.profile.create({ data: profileData }),
]);db.bulk_insert_mappings(Item, items)
db.commit()
```javascript
// WRONG: Sequential creates
for (const item of items) {
await prisma.item.create({ data: item });
}
// CORRECT: Batch create
await prisma.item.createMany({ data: items });
// CORRECT: Transaction for dependent operations
await prisma.$transaction([
prisma.user.create({ data: userData }),
prisma.profile.create({ data: profileData }),
]);NoSQL Design Patterns
NoSQL设计模式
Document Database (MongoDB)
文档数据库(MongoDB)
javascript
// Design for access patterns, not normalization
// Embed when: 1:1, 1:few, data read together
// Reference when: 1:many, many:many, data grows unbounded
// WRONG: Normalizing in MongoDB like SQL
// users collection: { _id, name }
// addresses collection: { _id, userId, street } // requires joins
// CORRECT: Embed bounded, co-accessed data
{
_id: ObjectId("..."),
name: "Alice",
addresses: [
{ street: "123 Main St", city: "NYC", type: "home" },
{ street: "456 Work Ave", city: "NYC", type: "work" }
]
}
// CORRECT: Reference unbounded or independent data
// user: { _id, name, orderIds: [ObjectId("...")] }
// orders: { _id, userId, items: [...], total: 99.99 }javascript
// Design for access patterns, not normalization
// Embed when: 1:1, 1:few, data read together
// Reference when: 1:many, many:many, data grows unbounded
// WRONG: Normalizing in MongoDB like SQL
// users collection: { _id, name }
// addresses collection: { _id, userId, street } // requires joins
// CORRECT: Embed bounded, co-accessed data
{
_id: ObjectId("..."),
name: "Alice",
addresses: [
{ street: "123 Main St", city: "NYC", type: "home" },
{ street: "456 Work Ave", city: "NYC", type: "work" }
]
}
// CORRECT: Reference unbounded or independent data
// user: { _id, name, orderIds: [ObjectId("...")] }
// orders: { _id, userId, items: [...], total: 99.99 }Key-Value / Redis Patterns
键值存储/Redis模式
undefinedundefinedCache-aside pattern
Cache-aside pattern
- Check cache for key
- If miss, query database
- Store result in cache with TTL
- Return result
- Check cache for key
- If miss, query database
- Store result in cache with TTL
- Return result
Cache invalidation
Cache invalidation
- TTL-based: SET key value EX 3600 (1 hour)
- Event-based: Delete key on write
- Write-through: Update cache on every write
undefined- TTL-based: SET key value EX 3600 (1 hour)
- Event-based: Delete key on write
- Write-through: Update cache on every write
undefinedCommon Anti-Patterns Summary
常见反模式汇总
AVOID DO INSTEAD
-------------------------------------------------------------------
SELECT * SELECT specific columns
OFFSET pagination Cursor-based pagination
N+1 queries Eager load or batch queries
Indexing every column Index based on query patterns
UUID v4 as primary key UUID v7 or BIGSERIAL (better locality)
Storing money as FLOAT Use DECIMAL / BIGINT (cents)
No foreign keys "for speed" Use foreign keys (data integrity)
Giant migrations Small, reversible steps
No connection pooling Always pool connections
Premature denormalization Normalize first, denormalize with dataAVOID DO INSTEAD
-------------------------------------------------------------------
SELECT * SELECT specific columns
OFFSET pagination Cursor-based pagination
N+1 queries Eager load or batch queries
Indexing every column Index based on query patterns
UUID v4 as primary key UUID v7 or BIGSERIAL (better locality)
Storing money as FLOAT Use DECIMAL / BIGINT (cents)
No foreign keys "for speed" Use foreign keys (data integrity)
Giant migrations Small, reversible steps
No connection pooling Always pool connections
Premature denormalization Normalize first, denormalize with data