sql-patterns

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

SQL Patterns

SQL 模式

When to Use

适用场景

When the user asks to:

Write or optimize SQL queries (joins, subqueries, CTEs, window functions)
Design or review a database schema
Analyze query performance using EXPLAIN plans
Choose indexing strategies for a workload
Fix slow queries or identify anti-patterns
Implement pagination, deduplication, or analytical queries

当用户要求执行以下操作时使用：

编写或优化SQL查询（连接、子查询、CTE、窗口函数）
设计或审核数据库 schema
使用 EXPLAIN 执行计划分析查询性能
为工作负载选择索引策略
修复慢查询或识别反模式
实现分页、去重或分析类查询

Instructions

使用指南

1. Query Construction

1. 查询构建

Build queries incrementally, starting with the core logic before adding complexity.

Joins - choose the right type for the data relationship:

```
INNER JOIN
```
— only matching rows in both tables (default, most common)
```
LEFT JOIN
```
— all rows from left table, NULLs where right has no match
```
RIGHT JOIN
```
— all rows from right table (rare; rewrite as LEFT JOIN for clarity)
```
FULL OUTER JOIN
```
— all rows from both, NULLs on the unmatched side
```
CROSS JOIN
```
— cartesian product (use intentionally for generating combinations)

CTEs (WITH clauses) — prefer CTEs over nested subqueries for readability:

sql

WITH active_users AS (
    SELECT user_id, MAX(login_at) AS last_login
    FROM logins
    WHERE login_at > CURRENT_DATE - INTERVAL '30 days'
    GROUP BY user_id
),
user_orders AS (
    SELECT user_id, COUNT(*) AS order_count, SUM(total) AS revenue
    FROM orders
    GROUP BY user_id
)
SELECT u.name, au.last_login, uo.order_count, uo.revenue
FROM users u
JOIN active_users au USING (user_id)
LEFT JOIN user_orders uo USING (user_id);

Window functions — use for rankings, running totals, and row comparisons:

sql

SELECT
    department,
    employee,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
    salary - LAG(salary) OVER (PARTITION BY department ORDER BY salary) AS diff_from_prev,
    SUM(salary) OVER (PARTITION BY department) AS dept_total
FROM employees;

Aggregations — remember the evaluation order:

WHERE

filters rows,

GROUP BY

groups them,

HAVING

filters groups, window functions run after

HAVING

增量构建查询，先实现核心逻辑再增加复杂度。

连接（Joins） - 选择适合数据关系的连接类型：

```
INNER JOIN
```
— 仅返回两张表中匹配的行（默认选项，最常用）
```
LEFT JOIN
```
— 返回左表的所有行，右表无匹配的位置返回NULL
```
RIGHT JOIN
```
— 返回右表的所有行（很少使用，为了可读性建议重写为LEFT JOIN）
```
FULL OUTER JOIN
```
— 返回两张表的所有行，未匹配的一侧返回NULL
```
CROSS JOIN
```
— 笛卡尔积（仅在有意生成组合场景下使用）

CTE（WITH 子句） — 相比嵌套子查询，优先使用CTE提升可读性：

sql

WITH active_users AS (
    SELECT user_id, MAX(login_at) AS last_login
    FROM logins
    WHERE login_at > CURRENT_DATE - INTERVAL '30 days'
    GROUP BY user_id
),
user_orders AS (
    SELECT user_id, COUNT(*) AS order_count, SUM(total) AS revenue
    FROM orders
    GROUP BY user_id
)
SELECT u.name, au.last_login, uo.order_count, uo.revenue
FROM users u
JOIN active_users au USING (user_id)
LEFT JOIN user_orders uo USING (user_id);

窗口函数（Window functions） — 用于排名、滚动求和以及行对比：

sql

SELECT
    department,
    employee,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
    salary - LAG(salary) OVER (PARTITION BY department ORDER BY salary) AS diff_from_prev,
    SUM(salary) OVER (PARTITION BY department) AS dept_total
FROM employees;

聚合操作 — 记住执行顺序：

WHERE

过滤行，

GROUP BY

分组，

HAVING

过滤分组，窗口函数在

HAVING

之后执行。

2. Query Optimization

2. 查询优化

When optimizing, follow this process:

Read the EXPLAIN plan — look for sequential scans on large tables, nested loops on large result sets, sort operations without indexes
Check indexes — ensure WHERE, JOIN, and ORDER BY columns are indexed
Reduce data early — filter in WHERE before joining; join on indexed columns
Avoid anti-patterns:
- ```
SELECT *
```
  in production queries (fetches unnecessary columns)
- Functions on indexed columns in WHERE (
```
WHERE YEAR(created_at) = 2025
```
  defeats the index; use range instead)
- ```
NOT IN
```
  with NULLable columns (use
```
NOT EXISTS
```
  instead)
- Correlated subqueries that execute per-row (rewrite as JOIN or CTE)
- Missing LIMIT on exploratory queries
- Implicit type conversions that prevent index use

优化时遵循以下流程：

阅读EXPLAIN执行计划 — 查找大表的全表扫描、大结果集上的嵌套循环、无索引的排序操作
检查索引 — 确保WHERE、JOIN和ORDER BY涉及的列都已建立索引
尽早缩减数据量 — 连接前先通过WHERE过滤；使用有索引的列做连接条件
避免反模式：
- 生产环境查询使用
```
SELECT *
```
  （会拉取不必要的列）
- WHERE子句中对索引列使用函数（
```
WHERE YEAR(created_at) = 2025
```
  会导致索引失效，改用范围查询替代）
- 对可空列使用
```
NOT IN
```
  （改用
```
NOT EXISTS
```
  替代）
- 每行都会执行的关联子查询（重写为JOIN或CTE）
- 探索性查询缺少LIMIT限制
- 导致索引失效的隐式类型转换

3. Indexing Strategy

3. 索引策略

Choose index types based on query patterns:

B-tree (default) — equality, range, ORDER BY, LIKE 'prefix%'
Hash — equality only, smaller than B-tree (Postgres-specific)
GIN — full-text search, JSONB containment, array operations
GiST — geometric data, range types, nearest-neighbor
Composite indexes — column order matters; leftmost prefix rule applies
Partial indexes —
```
WHERE is_active = true
```
for queries that always filter on that condition
Covering indexes —
```
INCLUDE
```
columns to enable index-only scans

根据查询模式选择索引类型：

B-tree（默认） — 等值、范围、ORDER BY、LIKE '前缀%'场景
Hash — 仅支持等值查询，比B-tree占用空间更小（Postgres特有）
GIN — 全文搜索、JSONB包含查询、数组操作
GiST — 几何数据、范围类型、最近邻查询
复合索引 — 列顺序很重要，遵循最左前缀原则
部分索引 — 针对总是筛选特定条件的查询，例如
```
WHERE is_active = true
```
覆盖索引 — 通过
```
INCLUDE
```
列实现仅索引扫描

4. Transactions and Concurrency

4. 事务和并发

Use explicit transactions for multi-statement operations
Choose the right isolation level: READ COMMITTED (default, good for most), REPEATABLE READ (consistent snapshots), SERIALIZABLE (full isolation, retry on conflict)
Keep transactions short — avoid user interaction inside a transaction
Use
```
SELECT ... FOR UPDATE
```
when reading rows you intend to modify
Handle deadlocks with retry logic, not just longer timeouts

多语句操作使用显式事务
选择合适的隔离级别：READ COMMITTED（默认，适合大多数场景）、REPEATABLE READ（一致性快照）、SERIALIZABLE（完全隔离，冲突时需重试）
保持事务简短 —— 避免在事务内部进行用户交互
当你读取的行后续需要修改时使用
```
SELECT ... FOR UPDATE
```
通过重试逻辑处理死锁，而不是仅延长超时时间

5. Schema Design Basics

5. Schema设计基础

Refer to

references/schema-design.md

for detailed patterns. Key principles:

Normalize to 3NF by default, denormalize intentionally with a documented reason
Use surrogate keys (auto-increment or UUID) as primary keys
Add
```
created_at
```
and
```
updated_at
```
timestamps to all tables
Use foreign keys to enforce referential integrity
Name constraints explicitly for readable error messages

See

references/query-patterns.md

for common query recipes.

详细模式参考

references/schema-design.md

，核心原则：

默认规范化到3NF，有意进行反规范化时需要记录原因
使用代理键（自增或UUID）作为主键
为所有表添加
```
created_at
```
和
```
updated_at
```
时间戳
使用外键强制参照完整性
显式命名约束以获得可读性更强的错误信息

常见查询示例参考

references/query-patterns.md

。

Examples

示例

User: "Write a query to find the top 3 products per category by revenue" Agent: Uses a CTE with

ROW_NUMBER()

window function partitioned by category, ordered by revenue descending, then filters to

row_num <= 3

. Includes the JOIN to get category and product names, and suggests an index on

(category_id, revenue DESC)

User: "This query is slow, can you optimize it?" Agent: Runs

EXPLAIN ANALYZE

on the query, identifies a sequential scan on a 2M-row table due to a function call on an indexed column in the WHERE clause. Rewrites the condition to use a range comparison, confirms the index is now used, and shows the before/after execution times.

User: "How should I index this table for our read patterns?" Agent: Analyzes the common query patterns (filter by status + date range, search by email, sort by created_at), recommends a composite index

(status, created_at)

, a unique index on

email

, and a partial index for

WHERE status = 'pending'

since 80% of queries filter on that. Explains the trade-off with write performance.

用户： "编写一个查询，找出每个类别中收入排名前三的产品" Agent： 使用包含按类别分区、按收入降序排序的

ROW_NUMBER()

窗口函数的CTE，然后筛选

row_num <= 3

的结果。包含获取类别和产品名称的JOIN语句，并建议建立

(category_id, revenue DESC)

索引。

用户： "这个查询很慢，你能优化它吗？" Agent： 对查询执行

EXPLAIN ANALYZE

，识别出因为WHERE子句中对索引列使用了函数调用，导致200万行的表发生全表扫描。重写条件使用范围比较，确认索引已正常使用，并展示优化前后的执行时间。

用户： "针对我们的读模式，我该如何为这张表建索引？" Agent： 分析常见查询模式（按状态+日期范围过滤、按邮箱搜索、按created_at排序），推荐建立复合索引

(status, created_at)

、email上的唯一索引，以及

WHERE status = 'pending'

的部分索引，因为80%的查询都会筛选这个条件。同时说明该方案对写入性能的权衡。