sql-patterns

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

SQL Patterns

SQL 模式

When to Use

适用场景

When the user asks to:
  • Write or optimize SQL queries (joins, subqueries, CTEs, window functions)
  • Design or review a database schema
  • Analyze query performance using EXPLAIN plans
  • Choose indexing strategies for a workload
  • Fix slow queries or identify anti-patterns
  • Implement pagination, deduplication, or analytical queries
当用户要求执行以下操作时使用:
  • 编写或优化SQL查询(连接、子查询、CTE、窗口函数)
  • 设计或审核数据库 schema
  • 使用 EXPLAIN 执行计划分析查询性能
  • 为工作负载选择索引策略
  • 修复慢查询或识别反模式
  • 实现分页、去重或分析类查询

Instructions

使用指南

1. Query Construction

1. 查询构建

Build queries incrementally, starting with the core logic before adding complexity.
Joins - choose the right type for the data relationship:
  • INNER JOIN
    — only matching rows in both tables (default, most common)
  • LEFT JOIN
    — all rows from left table, NULLs where right has no match
  • RIGHT JOIN
    — all rows from right table (rare; rewrite as LEFT JOIN for clarity)
  • FULL OUTER JOIN
    — all rows from both, NULLs on the unmatched side
  • CROSS JOIN
    — cartesian product (use intentionally for generating combinations)
CTEs (WITH clauses) — prefer CTEs over nested subqueries for readability:
sql
WITH active_users AS (
    SELECT user_id, MAX(login_at) AS last_login
    FROM logins
    WHERE login_at > CURRENT_DATE - INTERVAL '30 days'
    GROUP BY user_id
),
user_orders AS (
    SELECT user_id, COUNT(*) AS order_count, SUM(total) AS revenue
    FROM orders
    GROUP BY user_id
)
SELECT u.name, au.last_login, uo.order_count, uo.revenue
FROM users u
JOIN active_users au USING (user_id)
LEFT JOIN user_orders uo USING (user_id);
Window functions — use for rankings, running totals, and row comparisons:
sql
SELECT
    department,
    employee,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
    salary - LAG(salary) OVER (PARTITION BY department ORDER BY salary) AS diff_from_prev,
    SUM(salary) OVER (PARTITION BY department) AS dept_total
FROM employees;
Aggregations — remember the evaluation order:
WHERE
filters rows,
GROUP BY
groups them,
HAVING
filters groups, window functions run after
HAVING
.
增量构建查询,先实现核心逻辑再增加复杂度。
连接(Joins) - 选择适合数据关系的连接类型:
  • INNER JOIN
    — 仅返回两张表中匹配的行(默认选项,最常用)
  • LEFT JOIN
    — 返回左表的所有行,右表无匹配的位置返回NULL
  • RIGHT JOIN
    — 返回右表的所有行(很少使用,为了可读性建议重写为LEFT JOIN)
  • FULL OUTER JOIN
    — 返回两张表的所有行,未匹配的一侧返回NULL
  • CROSS JOIN
    — 笛卡尔积(仅在有意生成组合场景下使用)
CTE(WITH 子句) — 相比嵌套子查询,优先使用CTE提升可读性:
sql
WITH active_users AS (
    SELECT user_id, MAX(login_at) AS last_login
    FROM logins
    WHERE login_at > CURRENT_DATE - INTERVAL '30 days'
    GROUP BY user_id
),
user_orders AS (
    SELECT user_id, COUNT(*) AS order_count, SUM(total) AS revenue
    FROM orders
    GROUP BY user_id
)
SELECT u.name, au.last_login, uo.order_count, uo.revenue
FROM users u
JOIN active_users au USING (user_id)
LEFT JOIN user_orders uo USING (user_id);
窗口函数(Window functions) — 用于排名、滚动求和以及行对比:
sql
SELECT
    department,
    employee,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
    salary - LAG(salary) OVER (PARTITION BY department ORDER BY salary) AS diff_from_prev,
    SUM(salary) OVER (PARTITION BY department) AS dept_total
FROM employees;
聚合操作 — 记住执行顺序:
WHERE
过滤行,
GROUP BY
分组,
HAVING
过滤分组,窗口函数在
HAVING
之后执行。

2. Query Optimization

2. 查询优化

When optimizing, follow this process:
  1. Read the EXPLAIN plan — look for sequential scans on large tables, nested loops on large result sets, sort operations without indexes
  2. Check indexes — ensure WHERE, JOIN, and ORDER BY columns are indexed
  3. Reduce data early — filter in WHERE before joining; join on indexed columns
  4. Avoid anti-patterns:
    • SELECT *
      in production queries (fetches unnecessary columns)
    • Functions on indexed columns in WHERE (
      WHERE YEAR(created_at) = 2025
      defeats the index; use range instead)
    • NOT IN
      with NULLable columns (use
      NOT EXISTS
      instead)
    • Correlated subqueries that execute per-row (rewrite as JOIN or CTE)
    • Missing LIMIT on exploratory queries
    • Implicit type conversions that prevent index use
优化时遵循以下流程:
  1. 阅读EXPLAIN执行计划 — 查找大表的全表扫描、大结果集上的嵌套循环、无索引的排序操作
  2. 检查索引 — 确保WHERE、JOIN和ORDER BY涉及的列都已建立索引
  3. 尽早缩减数据量 — 连接前先通过WHERE过滤;使用有索引的列做连接条件
  4. 避免反模式:
    • 生产环境查询使用
      SELECT *
      (会拉取不必要的列)
    • WHERE子句中对索引列使用函数(
      WHERE YEAR(created_at) = 2025
      会导致索引失效,改用范围查询替代)
    • 对可空列使用
      NOT IN
      (改用
      NOT EXISTS
      替代)
    • 每行都会执行的关联子查询(重写为JOIN或CTE)
    • 探索性查询缺少LIMIT限制
    • 导致索引失效的隐式类型转换

3. Indexing Strategy

3. 索引策略

Choose index types based on query patterns:
  • B-tree (default) — equality, range, ORDER BY, LIKE 'prefix%'
  • Hash — equality only, smaller than B-tree (Postgres-specific)
  • GIN — full-text search, JSONB containment, array operations
  • GiST — geometric data, range types, nearest-neighbor
  • Composite indexes — column order matters; leftmost prefix rule applies
  • Partial indexes
    WHERE is_active = true
    for queries that always filter on that condition
  • Covering indexes
    INCLUDE
    columns to enable index-only scans
根据查询模式选择索引类型:
  • B-tree(默认) — 等值、范围、ORDER BY、LIKE '前缀%'场景
  • Hash — 仅支持等值查询,比B-tree占用空间更小(Postgres特有)
  • GIN — 全文搜索、JSONB包含查询、数组操作
  • GiST — 几何数据、范围类型、最近邻查询
  • 复合索引 — 列顺序很重要,遵循最左前缀原则
  • 部分索引 — 针对总是筛选特定条件的查询,例如
    WHERE is_active = true
  • 覆盖索引 — 通过
    INCLUDE
    列实现仅索引扫描

4. Transactions and Concurrency

4. 事务和并发

  • Use explicit transactions for multi-statement operations
  • Choose the right isolation level: READ COMMITTED (default, good for most), REPEATABLE READ (consistent snapshots), SERIALIZABLE (full isolation, retry on conflict)
  • Keep transactions short — avoid user interaction inside a transaction
  • Use
    SELECT ... FOR UPDATE
    when reading rows you intend to modify
  • Handle deadlocks with retry logic, not just longer timeouts
  • 多语句操作使用显式事务
  • 选择合适的隔离级别:READ COMMITTED(默认,适合大多数场景)、REPEATABLE READ(一致性快照)、SERIALIZABLE(完全隔离,冲突时需重试)
  • 保持事务简短 —— 避免在事务内部进行用户交互
  • 当你读取的行后续需要修改时使用
    SELECT ... FOR UPDATE
  • 通过重试逻辑处理死锁,而不是仅延长超时时间

5. Schema Design Basics

5. Schema设计基础

Refer to
references/schema-design.md
for detailed patterns. Key principles:
  • Normalize to 3NF by default, denormalize intentionally with a documented reason
  • Use surrogate keys (auto-increment or UUID) as primary keys
  • Add
    created_at
    and
    updated_at
    timestamps to all tables
  • Use foreign keys to enforce referential integrity
  • Name constraints explicitly for readable error messages
See
references/query-patterns.md
for common query recipes.
详细模式参考
references/schema-design.md
,核心原则:
  • 默认规范化到3NF,有意进行反规范化时需要记录原因
  • 使用代理键(自增或UUID)作为主键
  • 为所有表添加
    created_at
    updated_at
    时间戳
  • 使用外键强制参照完整性
  • 显式命名约束以获得可读性更强的错误信息
常见查询示例参考
references/query-patterns.md

Examples

示例

User: "Write a query to find the top 3 products per category by revenue" Agent: Uses a CTE with
ROW_NUMBER()
window function partitioned by category, ordered by revenue descending, then filters to
row_num <= 3
. Includes the JOIN to get category and product names, and suggests an index on
(category_id, revenue DESC)
.
User: "This query is slow, can you optimize it?" Agent: Runs
EXPLAIN ANALYZE
on the query, identifies a sequential scan on a 2M-row table due to a function call on an indexed column in the WHERE clause. Rewrites the condition to use a range comparison, confirms the index is now used, and shows the before/after execution times.
User: "How should I index this table for our read patterns?" Agent: Analyzes the common query patterns (filter by status + date range, search by email, sort by created_at), recommends a composite index
(status, created_at)
, a unique index on
email
, and a partial index for
WHERE status = 'pending'
since 80% of queries filter on that. Explains the trade-off with write performance.
用户: "编写一个查询,找出每个类别中收入排名前三的产品" Agent: 使用包含按类别分区、按收入降序排序的
ROW_NUMBER()
窗口函数的CTE,然后筛选
row_num <= 3
的结果。包含获取类别和产品名称的JOIN语句,并建议建立
(category_id, revenue DESC)
索引。
用户: "这个查询很慢,你能优化它吗?" Agent: 对查询执行
EXPLAIN ANALYZE
,识别出因为WHERE子句中对索引列使用了函数调用,导致200万行的表发生全表扫描。重写条件使用范围比较,确认索引已正常使用,并展示优化前后的执行时间。
用户: "针对我们的读模式,我该如何为这张表建索引?" Agent: 分析常见查询模式(按状态+日期范围过滤、按邮箱搜索、按created_at排序),推荐建立复合索引
(status, created_at)
、email上的唯一索引,以及
WHERE status = 'pending'
的部分索引,因为80%的查询都会筛选这个条件。同时说明该方案对写入性能的权衡。