database-designer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatabase Designer - POWERFUL Tier Skill
Database Designer - 强力级技能
Overview
概述
A comprehensive database design skill that provides expert-level analysis, optimization, and migration capabilities for modern database systems. This skill combines theoretical principles with practical tools to help architects and developers create scalable, performant, and maintainable database schemas.
这是一项全面的数据库设计技能,为现代数据库系统提供专家级的分析、优化和迁移能力。该技能将理论原则与实用工具相结合,帮助架构师和开发者创建可扩展、高性能且易于维护的数据库架构。
Core Competencies
核心能力
Schema Design & Analysis
架构设计与分析
- Normalization Analysis: Automated detection of normalization levels (1NF through BCNF)
- Denormalization Strategy: Smart recommendations for performance optimization
- Data Type Optimization: Identification of inappropriate types and size issues
- Constraint Analysis: Missing foreign keys, unique constraints, and null checks
- Naming Convention Validation: Consistent table and column naming patterns
- ERD Generation: Automatic Mermaid diagram creation from DDL
- 规范化分析:自动检测规范化级别(1NF至BCNF)
- 反规范化策略:性能优化的智能建议
- 数据类型优化:识别不合适的数据类型及大小问题
- 约束分析:检查缺失的外键、唯一约束和空值校验
- 命名规范验证:确保表和列的命名模式一致
- ERD生成:从DDL自动创建Mermaid图表
Index Optimization
索引优化
- Index Gap Analysis: Identification of missing indexes on foreign keys and query patterns
- Composite Index Strategy: Optimal column ordering for multi-column indexes
- Index Redundancy Detection: Elimination of overlapping and unused indexes
- Performance Impact Modeling: Selectivity estimation and query cost analysis
- Index Type Selection: B-tree, hash, partial, covering, and specialized indexes
- 索引缺口分析:识别外键和查询模式中缺失的索引
- 复合索引策略:多列索引的最优列排序方案
- 索引冗余检测:消除重叠和未使用的索引
- 性能影响建模:选择性估算和查询成本分析
- 索引类型选择:B-tree、哈希、部分索引、覆盖索引及专用索引
Migration Management
迁移管理
- Zero-Downtime Migrations: Expand-contract pattern implementation
- Schema Evolution: Safe column additions, deletions, and type changes
- Data Migration Scripts: Automated data transformation and validation
- Rollback Strategy: Complete reversal capabilities with validation
- Execution Planning: Ordered migration steps with dependency resolution
- 零停机迁移:实现扩容-收缩模式
- 架构演进:安全地添加、删除列及修改数据类型
- 数据迁移脚本:自动数据转换与验证
- 回滚策略:具备完整的验证回滚能力
- 执行规划:按依赖关系排序的迁移步骤
Database Design Principles
数据库设计原则
→ See references/database-design-reference.md for details
→ 详情请参考references/database-design-reference.md
Best Practices
最佳实践
Schema Design
架构设计
- Use meaningful names: Clear, consistent naming conventions
- Choose appropriate data types: Right-sized columns for storage efficiency
- Define proper constraints: Foreign keys, check constraints, unique indexes
- Consider future growth: Plan for scale from the beginning
- Document relationships: Clear foreign key relationships and business rules
- 使用有意义的名称:清晰、一致的命名规范
- 选择合适的数据类型:大小适配的列以提升存储效率
- 定义恰当的约束:外键、检查约束、唯一索引
- 考虑未来增长:从一开始就规划可扩展性
- 记录关系:明确的外键关系和业务规则
Performance Optimization
性能优化
- Index strategically: Cover common query patterns without over-indexing
- Monitor query performance: Regular analysis of slow queries
- Partition large tables: Improve query performance and maintenance
- Use appropriate isolation levels: Balance consistency with performance
- Implement connection pooling: Efficient resource utilization
- 策略性建索引:覆盖常见查询模式但避免过度索引
- 监控查询性能:定期分析慢查询
- 分区大表:提升查询性能和维护效率
- 使用合适的隔离级别:平衡一致性与性能
- 实现连接池:高效利用资源
Security Considerations
安全考量
- Principle of least privilege: Grant minimal necessary permissions
- Encrypt sensitive data: At rest and in transit
- Audit access patterns: Monitor and log database access
- Validate inputs: Prevent SQL injection attacks
- Regular security updates: Keep database software current
- 最小权限原则:仅授予必要的最小权限
- 加密敏感数据:静态和传输中的数据都需加密
- 审计访问模式:监控并记录数据库访问
- 验证输入:防止SQL注入攻击
- 定期安全更新:保持数据库软件为最新版本
Query Generation Patterns
查询生成模式
SELECT with JOINs
带JOIN的SELECT
sql
-- INNER JOIN: only matching rows
SELECT o.id, c.name, o.total
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id;
-- LEFT JOIN: all left rows, NULLs for non-matches
SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.name;
-- Self-join: hierarchical data (employees/managers)
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;sql
-- INNER JOIN: 仅匹配行
SELECT o.id, c.name, o.total
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id;
-- LEFT JOIN: 左表所有行,无匹配则为NULL
SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.name;
-- 自连接:层级数据(员工/经理)
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;Common Table Expressions (CTEs)
公共表表达式(CTE)
sql
-- Recursive CTE for org chart
WITH RECURSIVE org AS (
SELECT id, name, manager_id, 1 AS depth
FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, o.depth + 1
FROM employees e INNER JOIN org o ON o.id = e.manager_id
)
SELECT * FROM org ORDER BY depth, name;sql
-- 用于组织架构图的递归CTE
WITH RECURSIVE org AS (
SELECT id, name, manager_id, 1 AS depth
FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, o.depth + 1
FROM employees e INNER JOIN org o ON o.id = e.manager_id
)
SELECT * FROM org ORDER BY depth, name;Window Functions
窗口函数
sql
-- ROW_NUMBER for pagination / dedup
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders;
-- RANK with gaps, DENSE_RANK without gaps
SELECT name, score, RANK() OVER (ORDER BY score DESC) AS rank FROM leaderboard;
-- LAG/LEAD for comparing adjacent rows
SELECT date, revenue,
revenue - LAG(revenue) OVER (ORDER BY date) AS daily_change
FROM daily_sales;sql
-- ROW_NUMBER用于分页/去重
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders;
-- RANK会产生间隙,DENSE_RANK无间隙
SELECT name, score, RANK() OVER (ORDER BY score DESC) AS rank FROM leaderboard;
-- LAG/LEAD用于比较相邻行
SELECT date, revenue,
revenue - LAG(revenue) OVER (ORDER BY date) AS daily_change
FROM daily_sales;Aggregation Patterns
聚合模式
sql
-- FILTER clause (PostgreSQL) for conditional aggregation
SELECT
COUNT(*) AS total,
COUNT(*) FILTER (WHERE status = 'active') AS active,
AVG(amount) FILTER (WHERE amount > 0) AS avg_positive
FROM accounts;
-- GROUPING SETS for multi-level rollups
SELECT region, product, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS ((region, product), (region), ());sql
-- FILTER子句(PostgreSQL)用于条件聚合
SELECT
COUNT(*) AS total,
COUNT(*) FILTER (WHERE status = 'active') AS active,
AVG(amount) FILTER (WHERE amount > 0) AS avg_positive
FROM accounts;
-- GROUPING SETS用于多级汇总
SELECT region, product, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS ((region, product), (region), ());Migration Patterns
迁移模式
Up/Down Migration Scripts
升级/降级迁移脚本
Every migration must have a reversible counterpart. Name files with a timestamp prefix for ordering:
migrations/
├── 20260101_000001_create_users.up.sql
├── 20260101_000001_create_users.down.sql
├── 20260115_000002_add_users_email_index.up.sql
└── 20260115_000002_add_users_email_index.down.sql每个迁移必须有对应的可逆脚本。文件名称以时间戳前缀排序:
migrations/
├── 20260101_000001_create_users.up.sql
├── 20260101_000001_create_users.down.sql
├── 20260115_000002_add_users_email_index.up.sql
└── 20260115_000002_add_users_email_index.down.sqlZero-Downtime Migrations (Expand/Contract)
零停机迁移(扩容-收缩模式)
Use the expand-contract pattern to avoid locking or breaking running code:
- Expand — add the new column/table (nullable, with default)
- Migrate data — backfill in batches; dual-write from application
- Transition — application reads from new column; stop writing to old
- Contract — drop old column in a follow-up migration
使用扩容-收缩模式避免锁定或破坏运行中的代码:
- 扩容 — 添加新列/表(可为空,带默认值)
- 迁移数据 — 分批回填;应用双写至新旧列
- 过渡 — 应用读取新列;停止写入旧列
- 收缩 — 在后续迁移中删除旧列
Data Backfill Strategies
数据回填策略
sql
-- Batch update to avoid long-running locks
UPDATE users SET email_normalized = LOWER(email)
WHERE id IN (SELECT id FROM users WHERE email_normalized IS NULL LIMIT 5000);
-- Repeat in a loop until 0 rows affectedsql
-- 批量更新以避免长时间锁定
UPDATE users SET email_normalized = LOWER(email)
WHERE id IN (SELECT id FROM users WHERE email_normalized IS NULL LIMIT 5000);
-- 循环执行直至影响行数为0Rollback Procedures
回滚流程
- Always test the in staging before deploying
down.sqlto productionup.sql - Keep rollback window short — if the contract step has run, rollback requires a new forward migration
- For irreversible changes (dropping columns with data), take a logical backup first
- 在生产环境部署之前,务必在预发布环境测试
up.sqldown.sql - 保持回滚窗口较短 — 若已执行收缩步骤,回滚需要新的正向迁移
- 对于不可逆变更(删除带数据的列),先进行逻辑备份
Performance Optimization
性能优化
Indexing Strategies
索引策略
| Index Type | Use Case | Example |
|---|---|---|
| B-tree (default) | Equality, range, ORDER BY | |
| GIN | Full-text search, JSONB, arrays | |
| GiST | Geometry, range types, nearest-neighbor | |
| Partial | Subset of rows (reduce size) | |
| Covering | Index-only scans | |
| 索引类型 | 使用场景 | 示例 |
|---|---|---|
| B-tree(默认) | 等值查询、范围查询、ORDER BY | |
| GIN | 全文搜索、JSONB、数组 | |
| GiST | 几何类型、范围类型、最近邻查询 | |
| 部分索引 | 行的子集(减小索引大小) | |
| 覆盖索引 | 仅索引扫描 | |
EXPLAIN Plan Reading
EXPLAIN计划解读
sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;Key signals to watch:
- Seq Scan on large tables — missing index
- Nested Loop with high row estimates — consider hash/merge join or add index
- Buffers shared read much higher than hit — working set exceeds memory
sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;需关注的关键信号:
- 大表上的Seq Scan — 缺失索引
- 高行数预估的Nested Loop — 考虑哈希/合并连接或添加索引
- Buffers shared read远高于hit — 工作集超出内存
N+1 Query Detection
N+1查询检测
Symptoms: application issues one query per row (e.g., fetching related records in a loop).
Fixes:
- Use or subquery to fetch in one round-trip
JOIN - ORM eager loading (/
select_related/includes)with - DataLoader pattern for GraphQL resolvers
症状:应用为每行发起一次查询(例如,循环获取关联记录)。
解决方案:
- 使用或子查询一次性获取数据
JOIN - ORM预加载(/
select_related/includes)with - GraphQL解析器使用DataLoader模式
Connection Pooling
连接池
| Tool | Protocol | Best For |
|---|---|---|
| PgBouncer | PostgreSQL | Transaction/statement pooling, low overhead |
| ProxySQL | MySQL | Query routing, read/write splitting |
| Built-in pool (HikariCP, SQLAlchemy pool) | Any | Application-level pooling |
Rule of thumb: Set pool size to . For cloud SSDs, start with and tune.
(2 * CPU cores) + disk spindles2 * vCPUs| 工具 | 协议 | 适用场景 |
|---|---|---|
| PgBouncer | PostgreSQL | 事务/语句池化,低开销 |
| ProxySQL | MySQL | 查询路由,读写分离 |
| 内置池(HikariCP、SQLAlchemy pool) | 任意 | 应用层池化 |
经验法则:池大小设置为。对于云SSD,从开始调整。
(2 * CPU核心数) + 磁盘主轴数2 * vCPUsRead Replicas and Query Routing
只读副本与查询路由
- Route all queries to replicas; writes to primary
SELECT - Account for replication lag (typically <1s for async, 0 for sync)
- Use to detect lag before reading critical data
pg_last_wal_replay_lsn()
- 将所有路由至副本;写入操作至主库
SELECT - 考虑复制延迟(异步复制通常<1s,同步复制为0)
- 读取关键数据前,使用检测延迟
pg_last_wal_replay_lsn()
Multi-Database Decision Matrix
多数据库决策矩阵
| Criteria | PostgreSQL | MySQL | SQLite | SQL Server |
|---|---|---|---|---|
| Best for | Complex queries, JSONB, extensions | Web apps, read-heavy workloads | Embedded, dev/test, edge | Enterprise .NET stacks |
| JSON support | Excellent (JSONB + GIN) | Good (JSON type) | Minimal | Good (OPENJSON) |
| Replication | Streaming, logical | Group replication, InnoDB cluster | N/A | Always On AG |
| Licensing | Open source (PostgreSQL License) | Open source (GPL) / commercial | Public domain | Commercial |
| Max practical size | Multi-TB | Multi-TB | ~1 TB (single-writer) | Multi-TB |
When to choose:
- PostgreSQL — default choice for new projects; best extensibility and standards compliance
- MySQL — existing MySQL ecosystem; simple read-heavy web applications
- SQLite — mobile apps, CLI tools, unit test databases, IoT/edge
- SQL Server — mandated by enterprise policy; deep .NET/Azure integration
| 评估标准 | PostgreSQL | MySQL | SQLite | SQL Server |
|---|---|---|---|---|
| 最佳适用场景 | 复杂查询、JSONB、扩展功能 | Web应用、读密集型负载 | 嵌入式、开发/测试、边缘场景 | 企业.NET栈 |
| JSON支持 | 优秀(JSONB + GIN) | 良好(JSON类型) | 有限 | 良好(OPENJSON) |
| 复制 | 流式复制、逻辑复制 | 组复制、InnoDB集群 | 无 | Always On AG |
| 许可 | 开源(PostgreSQL许可证) | 开源(GPL)/商业 | 公有领域 | 商业 |
| 实际最大容量 | 多TB | 多TB | ~1 TB(单写入者) | 多TB |
选型建议:
- PostgreSQL — 新项目默认选择;扩展性和标准兼容性最佳
- MySQL — 现有MySQL生态;简单读密集型Web应用
- SQLite — 移动应用、CLI工具、单元测试数据库、IoT/边缘场景
- SQL Server — 企业政策要求;深度.NET/Azure集成
NoSQL Considerations
NoSQL考量
| Database | Model | Use When |
|---|---|---|
| MongoDB | Document | Schema flexibility, rapid prototyping, content management |
| Redis | Key-value / cache | Session store, rate limiting, leaderboards, pub/sub |
| DynamoDB | Wide-column | Serverless AWS apps, single-digit-ms latency at any scale |
Use SQL as default. Reach for NoSQL only when the access pattern clearly benefits from it.
| 数据库 | 模型 | 适用场景 |
|---|---|---|
| MongoDB | 文档型 | 架构灵活性、快速原型开发、内容管理 |
| Redis | 键值/缓存 | 会话存储、限流、排行榜、发布/订阅 |
| DynamoDB | 宽列型 | 无服务器AWS应用、任意规模下的单位数毫秒延迟 |
优先选择SQL。仅当访问模式明确能从中获益时,才考虑NoSQL。
Sharding & Replication
分片与复制
Horizontal vs Vertical Partitioning
水平 vs 垂直分区
- Vertical partitioning: Split columns across tables (e.g., separate BLOB columns). Reduces I/O for narrow queries.
- Horizontal partitioning (sharding): Split rows across databases/servers. Required when a single node cannot hold the dataset or handle the throughput.
- 垂直分区:将列拆分到不同表中(例如,分离BLOB列)。减少窄查询的I/O。
- 水平分区(分片):将行拆分到不同数据库/服务器中。当单个节点无法容纳数据集或处理吞吐量时必需。
Sharding Strategies
分片策略
| Strategy | How It Works | Pros | Cons |
|---|---|---|---|
| Hash | | Even distribution | Resharding is expensive |
| Range | Shard by date or ID range | Simple, good for time-series | Hot spots on latest shard |
| Geographic | Shard by user region | Data locality, compliance | Cross-region queries are hard |
| 策略 | 工作原理 | 优点 | 缺点 |
|---|---|---|---|
| 哈希分片 | | 分布均匀 | 重新分片成本高 |
| 范围分片 | 按日期或ID范围分片 | 简单,适合时间序列数据 | 最新分片易成为热点 |
| 地理分片 | 按用户区域分片 | 数据本地化、合规性 | 跨区域查询难度大 |
Replication Patterns
复制模式
| Pattern | Consistency | Latency | Use Case |
|---|---|---|---|
| Synchronous | Strong | Higher write latency | Financial transactions |
| Asynchronous | Eventual | Low write latency | Read-heavy web apps |
| Semi-synchronous | At-least-one replica confirmed | Moderate | Balance of safety and speed |
| 模式 | 一致性 | 延迟 | 适用场景 |
|---|---|---|---|
| 同步复制 | 强一致性 | 写入延迟较高 | 金融交易 |
| 异步复制 | 最终一致性 | 写入延迟低 | 读密集型Web应用 |
| 半同步复制 | 至少一个副本确认 | 延迟中等 | 安全性与速度的平衡 |
Cross-References
交叉引用
- sql-database-assistant — query writing, optimization, and debugging for day-to-day SQL work
- database-schema-designer — ERD modeling, normalization analysis, and schema generation
- migration-architect — large-scale migration planning across database engines or major schema overhauls
- senior-backend — application-layer patterns (connection pooling, ORM best practices)
- senior-devops — infrastructure provisioning for database clusters and replicas
- sql-database-assistant — 日常SQL工作的查询编写、优化与调试
- database-schema-designer — ERD建模、规范化分析与架构生成
- migration-architect — 跨数据库引擎或大型架构重构的大规模迁移规划
- senior-backend — 应用层模式(连接池、ORM最佳实践)
- senior-devops — 数据库集群与副本的基础设施配置
Conclusion
总结
Effective database design requires balancing multiple competing concerns: performance, scalability, maintainability, and business requirements. This skill provides the tools and knowledge to make informed decisions throughout the database lifecycle, from initial schema design through production optimization and evolution.
The included tools automate common analysis and optimization tasks, while the comprehensive guides provide the theoretical foundation for making sound architectural decisions. Whether building a new system or optimizing an existing one, these resources provide expert-level guidance for creating robust, scalable database solutions.
有效的数据库设计需要平衡多个相互竞争的因素:性能、可扩展性、可维护性和业务需求。本技能提供了贯穿数据库生命周期的工具和知识,从初始架构设计到生产环境优化和演进,帮助做出明智决策。
内置工具可自动化常见的分析和优化任务,而全面的指南则为制定合理的架构决策提供理论基础。无论是构建新系统还是优化现有系统,这些资源都能提供专家级指导,助力创建稳健、可扩展的数据库解决方案。