optimizing-queries

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Query Optimization

查询优化

Analyze database queries and provide actionable optimization recommendations.

分析数据库查询并提供可执行的优化建议。

Workflow

工作流程

Copy and track progress:

Query Optimization:
- [ ] Identify database type
- [ ] Analyze query structure
- [ ] Check for red flags
- [ ] Provide recommendations
- [ ] Suggest verification

Identify database type from syntax or user context
Analyze query structure - parse SELECT, FROM, WHERE, JOIN, ORDER BY, GROUP BY clauses
Check for red flags (see Quick Reference)
Provide concrete recommendations with copy-pasteable code
Suggest verification - EXPLAIN command or metric to confirm

For complex queries (3+ JOINs or nested subqueries):

Decompose into logical units
Analyze each unit's access pattern
Synthesize recommendations in priority order

复制并跟踪进度：

Query Optimization:
- [ ] 识别数据库类型
- [ ] 分析查询结构
- [ ] 检查风险点
- [ ] 提供优化建议
- [ ] 建议验证方式

识别数据库类型：根据语法或用户上下文判断
分析查询结构 - 解析SELECT、FROM、WHERE、JOIN、ORDER BY、GROUP BY子句
检查风险点（参考快速指南）
提供具体优化建议，附带可直接复制粘贴的代码
建议验证方式 - 使用EXPLAIN命令或指标确认优化效果

针对复杂查询（包含3个及以上JOIN或嵌套子查询）：

将查询分解为逻辑单元
分析每个单元的访问模式
按优先级整合优化建议

Analysis Checklist

分析检查清单

For each query, evaluate:

Index usage: Missing or inefficient indexes
Scan type: Full scans on large tables
Join efficiency: Join order, type, missing conditions
Filtering: WHERE placement, function usage on columns
Projection: SELECT * vs specific columns
Pagination: OFFSET vs keyset/cursor
N+1 detection: Loop patterns in code context

针对每个查询，评估以下内容：

索引使用情况：缺失或低效的索引
扫描类型：对大表执行全表扫描
连接效率：连接顺序、类型，是否缺少连接条件
过滤逻辑：WHERE子句的位置、列上的函数使用
字段投影：使用SELECT *还是指定具体列
分页方式：OFFSET vs 键集/游标分页
N+1问题检测：代码上下文中的循环查询模式

Quick Reference

快速指南

SQL Red Flags

SQL风险点

```
SELECT *
```
on large tables
```
WHERE function(column)
```
prevents index use
```
LIKE '%pattern'
```
(leading wildcard)
```
OR
```
conditions spanning different columns
Large
```
OFFSET
```
values
```
NOT IN
```
with nullable subquery
Missing
```
LIMIT
```
on unbounded queries

对大表使用
```
SELECT *
```
```
WHERE function(column)
```
会导致索引失效
使用
```
LIKE '%pattern'
```
（前导通配符）
跨不同列的
```
OR
```
条件
大的
```
OFFSET
```
值
子查询包含可空值时使用
```
NOT IN
```
无界查询缺少
```
LIMIT
```

NoSQL Red Flags

NoSQL风险点

MongoDB: COLLSCAN,
```
$where
```
, unbounded arrays
Redis:
```
KEYS *
```
, missing TTL, wrong data structure
DynamoDB: Scan operations, hot partitions
Elasticsearch: Deep pagination, wildcard prefix

MongoDB：COLLSCAN、
```
$where
```
、无界数组
Redis：
```
KEYS *
```
、缺少TTL、使用错误的数据结构
DynamoDB：扫描操作、热点分区
Elasticsearch：深度分页、前缀通配符

Examples

示例

Example 1: Missing Index

Input:

sql

SELECT * FROM orders WHERE customer_email = 'user@example.com';

Output:

undefined

示例1：缺失索引

输入：

sql

SELECT * FROM orders WHERE customer_email = 'user@example.com';

输出：

undefined

Issue [HIGH]

问题 [高优先级]

Full table scan on

orders

- no index on

customer_email

对

orders

表执行全表扫描 -

customer_email

列无索引

Impact

影响

Query time grows linearly with table size; causes lock contention

查询时间随表大小线性增长；会导致锁竞争

Solution

解决方案

CREATE INDEX idx_orders_customer_email ON orders(customer_email);

Verification

验证方式

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_email = 'user@example.com'; -- Should show "Index Scan" instead of "Seq Scan"


**Example 2: N+1 Query Pattern**

Input:
```python
users = db.query("SELECT * FROM users")
for user in users:
    orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}")

Output:

undefined

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_email = 'user@example.com'; -- 应显示"Index Scan"而非"Seq Scan"


**示例2：N+1查询模式**

输入：
```python
users = db.query("SELECT * FROM users")
for user in users:
    orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}")

输出：

undefined

Issue [HIGH]

问题 [高优先级]

N+1 query pattern - executes 1 + N queries instead of 1

N+1查询模式 - 执行1+N次查询而非1次

Impact

影响

100 users = 101 queries; network latency multiplied

100个用户会产生101次查询；网络延迟被放大N倍

Solution

解决方案

SELECT u., o. FROM users u LEFT JOIN orders o ON u.id = o.user_id;

Verification

验证方式

Check query count before/after; should reduce from N+1 to 1


**Example 3: When NOT to Optimize**

Input:
```sql
SELECT * FROM config WHERE key = 'app_version';

Output:

undefined

优化前后对比查询次数；应从N+1次减少到1次


**示例3：无需优化的场景**

输入：
```sql
SELECT * FROM config WHERE key = 'app_version';

输出：

undefined

No Issues Detected

未检测到问题

This query on a small lookup table (typically <100 rows) is acceptable. Adding an index would add write overhead without meaningful read benefit.

该查询针对小型查找表（通常少于100行），性能可接受。添加索引会增加写入开销，但无法带来明显的读取性能提升。

When to Reconsider

何时需要重新考虑

If table grows beyond 10,000 rows
If query is called >100 times/second


**Example 4: Incomplete Information**

Input:
```sql
SELECT * FROM large_table WHERE status = ?;

Output:

undefined

当表行数超过10000行时
当查询调用频率超过每秒100次时


**示例4：信息不完整**

输入：
```sql
SELECT * FROM large_table WHERE status = ?;

输出：

undefined

Issue [MEDIUM]

问题 [中优先级]

Preliminary analysis without schema information

缺少表结构信息，无法完成初步分析

Potential Issues

潜在问题

```
SELECT *
```
- may fetch unnecessary columns
```
status
```
column - if low cardinality (e.g., 5 values), index may not help

```
SELECT *
```
- 可能获取不必要的列
```
status
```
列 - 如果基数较低（如仅5个可选值），索引可能无法起到作用

Information Needed

需要补充的信息

Table row count
Cardinality of
```
status
```
column
Existing indexes:
```
SHOW INDEX FROM large_table;
```


**Example 5: MongoDB COLLSCAN**

Input:
```javascript
db.orders.find({ customer_email: "user@example.com" })

Output:

undefined

表的行数
```
status
```
列的基数
现有索引：
```
SHOW INDEX FROM large_table;
```


**示例5：MongoDB全集合扫描**

输入：
```javascript
db.orders.find({ customer_email: "user@example.com" })

输出：

undefined

Issue [HIGH]

问题 [高优先级]

COLLSCAN on

orders

- no index on

customer_email

对

orders

集合执行COLLSCAN -

customer_email

字段无索引

Impact

影响

Query scans entire collection; O(n) performance

查询扫描整个集合；性能为O(n)

Solution

解决方案

db.orders.createIndex({ customer_email: 1 })

Verification

验证方式

db.orders.find({ customer_email: "..." }).explain("executionStats") // Should show "stage": "IXSCAN" instead of "COLLSCAN"

undefined

db.orders.find({ customer_email: "..." }).explain("executionStats") // 应显示"stage": "IXSCAN"而非"COLLSCAN"

undefined

Response Format

响应格式

Structure recommendations as:

undefined

优化建议请按以下结构呈现：

undefined

Issue [HIGH/MEDIUM/LOW]

问题 [高/中/低优先级]

[Specific problem identified]

[识别到的具体问题]

Impact

影响

[Performance/resource cost]

[性能或资源消耗影响]

Solution

解决方案

[Concrete fix with code]

[附带代码的具体修复方案]

Verification

验证方式

[EXPLAIN command or metric to confirm]

undefined

[用于确认效果的EXPLAIN命令或指标]

undefined

Confidence Levels

置信度等级

When uncertainty exists, indicate confidence:

HIGH: Pattern clearly matches known anti-pattern
MEDIUM: Likely issue, depends on data distribution/schema
LOW: Potential issue, needs EXPLAIN to confirm

当存在不确定性时，请标注置信度：

高：模式明显匹配已知的反模式
中：可能存在问题，取决于数据分布或表结构
低：潜在问题，需要EXPLAIN输出确认

Response Constraints

响应约束

Max 5 recommendations per query - prioritize by impact
Order by severity: HIGH → MEDIUM → LOW
One primary fix per issue - mention alternatives briefly
Code must be copy-pasteable - no placeholders like
```
<table_name>
```
Verification commands must be complete - include all flags

每个查询最多提供5条优化建议 - 按影响优先级排序
按严重程度排序：高 → 中 → 低
每个问题提供一个主要修复方案 - 简要提及替代方案
代码必须可直接复制粘贴 - 不允许使用
```
<table_name>
```
这类占位符
验证命令必须完整 - 包含所有必要的参数

Prioritization Matrix

优先级矩阵

Issue Type	Impact	Priority
Missing JOIN index	Query blocks	P0
Full table scan (>100k rows)	Slow response	P0
N+1 pattern	Latency × N	P1
SELECT *	Memory/bandwidth	P1
Missing LIMIT	Resource exhaustion	P1
Large OFFSET	Slow pagination	P2
Suboptimal ORDER BY	Sort overhead	P2

问题类型	影响	优先级
缺失JOIN索引	查询阻塞	P0
全表扫描（行数>100k）	响应缓慢	P0
N+1模式	延迟×N	P1
SELECT *	内存/带宽消耗	P1
缺少LIMIT	资源耗尽	P1
大OFFSET值	分页缓慢	P2
非最优ORDER BY	排序开销大	P2

Reference Files

参考文档

Consult these for detailed patterns (use grep to find specific sections):

references/sql-patterns.md

- SQL optimization

Grep:

## Index

## Query Rewrite

## Join

## Aggregation

references/nosql-patterns.md

- MongoDB, Redis, DynamoDB, Elasticsearch

Grep:

## MongoDB

## Redis

## DynamoDB

## Elasticsearch

references/explain-analysis.md

- Execution plans

Grep:
```
## PostgreSQL
```
,
```
## MySQL
```
,
```
## MongoDB
```

references/common-scenarios.md

- Common patterns

Grep:

## Slow pagination

## N+1

## Dashboard

## Bulk

references/orm-patterns.md

- ORM fixes

Grep:

### Django

### SQLAlchemy

### ActiveRecord

### Prisma

如需详细模式，请参考以下文档（可使用grep查找特定章节）：

references/sql-patterns.md

- SQL优化

Grep关键词：

## Index

## Query Rewrite

## Join

## Aggregation

references/nosql-patterns.md

- MongoDB、Redis、DynamoDB、Elasticsearch优化

Grep关键词：

## MongoDB

## Redis

## DynamoDB

## Elasticsearch

references/explain-analysis.md

- 执行计划分析

Grep关键词：
```
## PostgreSQL
```
,
```
## MySQL
```
,
```
## MongoDB
```

references/common-scenarios.md

- 常见场景

Grep关键词：

## Slow pagination

## N+1

## Dashboard

## Bulk

references/orm-patterns.md

- ORM修复方案

Grep关键词：

### Django

### SQLAlchemy

### ActiveRecord

### Prisma

Utility Scripts

实用脚本

All scripts use Python standard library only (no pip install required).

Run these for automated analysis:

所有脚本仅依赖Python标准库（无需通过pip安装额外依赖）。

运行以下脚本可实现自动化分析：

Query Analysis

查询分析

analyze_query.py - Detect anti-patterns and score complexity:

bash

python scripts/analyze_query.py "SELECT * FROM orders WHERE ..."
python scripts/analyze_query.py --file query.sql --json

suggest_index.py - Generate index recommendations:

bash

python scripts/suggest_index.py "SELECT ... FROM orders WHERE status = 'pending'"

analyze_query.py - 检测反模式并评估查询复杂度：

bash

python scripts/analyze_query.py "SELECT * FROM orders WHERE ..."
python scripts/analyze_query.py --file query.sql --json

suggest_index.py - 生成索引建议：

bash

python scripts/suggest_index.py "SELECT ... FROM orders WHERE status = 'pending'"

EXPLAIN & Logs

EXPLAIN与日志分析

parse_explain.py - Analyze EXPLAIN output:

bash

python scripts/parse_explain.py --pg explain_output.txt
python scripts/parse_explain.py --mysql explain_output.txt

diff_explain.py - Compare before/after EXPLAIN outputs:

bash

python scripts/diff_explain.py --pg before.txt after.txt
python scripts/diff_explain.py --mysql before.txt after.txt --json

parse_slow_log.py - Analyze slow query logs:

bash

python scripts/parse_slow_log.py --pg /var/log/postgresql.log --top 20
python scripts/parse_slow_log.py --mysql slow-query.log --json

parse_explain.py - 分析EXPLAIN输出：

bash

python scripts/parse_explain.py --pg explain_output.txt
python scripts/parse_explain.py --mysql explain_output.txt

diff_explain.py - 对比优化前后的EXPLAIN输出：

bash

python scripts/diff_explain.py --pg before.txt after.txt
python scripts/diff_explain.py --mysql before.txt after.txt --json

parse_slow_log.py - 分析慢查询日志：

bash

python scripts/parse_slow_log.py --pg /var/log/postgresql.log --top 20
python scripts/parse_slow_log.py --mysql slow-query.log --json

Schema & Code

表结构与代码分析

analyze_schema.py - Find schema optimization opportunities:

bash

python scripts/analyze_schema.py schema.sql

detect_orm_issues.py - Find ORM anti-patterns in code:

bash

python scripts/detect_orm_issues.py app.py
python scripts/detect_orm_issues.py --dir ./models

check_migration.py - Check migration safety:

bash

python scripts/check_migration.py --pg migration.sql
python scripts/check_migration.py --mysql migration.sql

analyze_schema.py - 发现表结构优化机会：

bash

python scripts/analyze_schema.py schema.sql

detect_orm_issues.py - 检测代码中的ORM反模式：

bash

python scripts/detect_orm_issues.py app.py
python scripts/detect_orm_issues.py --dir ./models

check_migration.py - 检查迁移的安全性：

bash

python scripts/check_migration.py --pg migration.sql
python scripts/check_migration.py --mysql migration.sql

Workflow with Scripts

脚本工作流程

For query optimization:

```
analyze_query.py
```
→ detect issues
```
suggest_index.py
```
→ generate indexes
User runs EXPLAIN (before)
User applies optimization
User runs EXPLAIN (after)
```
diff_explain.py
```
→ verify improvement

For verifying optimization impact:

bash

undefined

查询优化流程：

运行
```
analyze_query.py
```
→ 检测问题
运行
```
suggest_index.py
```
→ 生成索引建议
用户运行EXPLAIN（优化前）
用户应用优化方案
用户运行EXPLAIN（优化后）
运行
```
diff_explain.py
```
→ 验证优化效果

验证优化影响的流程：

bash

undefined

Save before state

保存优化前状态

psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > before.txt

Apply optimization (add index, rewrite query, etc.)

应用优化方案（添加索引、重写查询等）

Save after state

保存优化后状态

psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > after.txt

Compare

对比结果

python scripts/diff_explain.py --pg before.txt after.txt


**For codebase audit:**
1. `detect_orm_issues.py --dir ./` → find N+1 patterns
2. `analyze_schema.py` → check schema
3. Prioritize fixes by severity

**For migration review:**
1. `check_migration.py` → safety check
2. Fix critical issues before deployment
3. Use safe alternatives (CONCURRENTLY, etc.)

python scripts/diff_explain.py --pg before.txt after.txt


**代码库审计流程：**
1. 运行`detect_orm_issues.py --dir ./` → 发现N+1模式
2. 运行`analyze_schema.py` → 检查表结构
3. 按严重程度优先处理问题

**迁移审查流程：**
1. 运行`check_migration.py` → 安全性检查
2. 部署前修复严重问题
3. 使用安全的替代方案（如CONCURRENTLY等）

EXPLAIN Commands

EXPLAIN命令

Provide appropriate commands:

sql

-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- MySQL
EXPLAIN ANALYZE SELECT ...;

-- SQLite
EXPLAIN QUERY PLAN SELECT ...;

javascript

// MongoDB
db.collection.find({...}).explain("executionStats")

请使用对应数据库的EXPLAIN命令：

sql

-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- MySQL
EXPLAIN ANALYZE SELECT ...;

-- SQLite
EXPLAIN QUERY PLAN SELECT ...;

javascript

// MongoDB
db.collection.find({...}).explain("executionStats")

Index Recommendation Format

索引建议格式

When suggesting indexes:

sql

-- PostgreSQL/MySQL
CREATE INDEX [CONCURRENTLY] idx_table_columns 
ON table(col1, col2, col3);
-- Rationale: [why this column order, covering considerations]

javascript

// MongoDB (ESR: Equality, Sort, Range)
db.collection.createIndex({ equality_field: 1, sort_field: -1, range_field: 1 })

建议索引时请遵循以下格式：

sql

-- PostgreSQL/MySQL
CREATE INDEX [CONCURRENTLY] idx_table_columns 
ON table(col1, col2, col3);
-- 理由：[说明列顺序的原因、覆盖索引的考虑]

javascript

// MongoDB（ESR原则：匹配、排序、范围）
db.collection.createIndex({ equality_field: 1, sort_field: -1, range_field: 1 })

Handling Missing Information

处理信息缺失的情况

When user provides incomplete context:

No schema provided:

"To give precise recommendations, I need the table schema. Could you share:

Column data types

Existing indexes

Approximate row counts

Meanwhile, here's what I can infer from the query..."

No EXPLAIN output:

"Run this command and share the output:
sql
EXPLAIN (ANALYZE, BUFFERS) <their query>;
This will show actual vs estimated rows and scan types."

Unknown database type:

"Which database are you using? (PostgreSQL, MySQL, SQLite, MongoDB, etc.) The optimization strategies differ significantly between them."

当用户提供的上下文不完整时：

未提供表结构：

"为了给出精准的建议，我需要表结构信息。能否提供以下内容：

列的数据类型

现有索引

大致行数

同时，我可以根据查询本身给出一些推断性建议..."

未提供EXPLAIN输出：

"请运行以下命令并分享输出结果：
sql
EXPLAIN (ANALYZE, BUFFERS) <你的查询>;
该命令会显示实际行数与预估行数的对比，以及扫描类型。"

未知数据库类型：

"你使用的是哪种数据库？（PostgreSQL、MySQL、SQLite、MongoDB等）不同数据库的优化策略差异很大。"

Conditional Workflows

条件工作流程

Analyzing a query:

Run
```
analyze_query.py
```
first
Check for anti-patterns
Suggest fixes with examples

Interpreting EXPLAIN output:

Run
```
parse_explain.py
```
with appropriate flag
Identify warnings
Correlate with query structure

Full optimization request:

Analyze query → identify issues
Suggest indexes → generate CREATE statements
Request EXPLAIN → interpret results
Provide before/after comparison

Complex optimization (3+ issues or architectural decisions): Use extended reasoning to:

Map full query execution path
Identify all optimization opportunities
Evaluate trade-offs between approaches
Prioritize by impact and implementation cost

分析查询时：

先运行
```
analyze_query.py
```
检查反模式
提供带示例的修复建议

解读EXPLAIN输出时：

使用对应标志运行
```
parse_explain.py
```
识别警告信息
关联查询结构进行分析

完整优化请求：

分析查询 → 识别问题
建议索引 → 生成CREATE语句
请求EXPLAIN输出 → 解读结果
提供优化前后的对比

复杂优化（包含3个及以上问题或架构决策）： 使用扩展推理：

映射完整的查询执行路径
识别所有优化机会
评估不同方案的权衡
按影响和实施成本排序优先级

Script Error Handling

脚本错误处理

If scripts fail:

Check Python version (requires 3.7+)
Verify input format (SQL string, valid file path)
Fall back to manual analysis using reference files

Common errors:

```
FileNotFoundError
```
→ Check file path, use absolute paths
```
json.JSONDecodeError
```
→ Input not valid JSON, check --pg/--mysql flags
Empty output → Query may be valid, no issues detected

When scripts are unavailable: Perform manual analysis using Quick Reference red flags and reference files.

如果脚本运行失败：

检查Python版本（需要3.7及以上）
验证输入格式（SQL字符串、有效的文件路径）
fallback到使用参考文档进行手动分析

常见错误：

```
FileNotFoundError
```
→ 检查文件路径，使用绝对路径
```
json.JSONDecodeError
```
→ 输入不是有效的JSON，检查--pg/--mysql标志
输出为空 → 查询可能是有效的，未检测到问题

当脚本不可用时： 使用快速指南中的风险点和参考文档进行手动分析。

Unsupported Databases

不支持的数据库

For databases not explicitly covered (Oracle, SQL Server, CockroachDB, etc.):

Apply general SQL patterns from
```
references/sql-patterns.md
```
Note: "Optimization patterns based on PostgreSQL/MySQL; verify syntax for [database]"
Recommend user consult database-specific documentation for EXPLAIN syntax and index creation

对于未明确覆盖的数据库（Oracle、SQL Server、CockroachDB等）：

应用
```
references/sql-patterns.md
```
中的通用SQL模式
备注："优化模式基于PostgreSQL/MySQL；请针对[数据库]验证语法"
建议用户参考数据库官方文档获取EXPLAIN语法和索引创建方法