optimizing-queries

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Query Optimization

查询优化

Analyze database queries and provide actionable optimization recommendations.
分析数据库查询并提供可执行的优化建议。

Workflow

工作流程

Copy and track progress:
Query Optimization:
- [ ] Identify database type
- [ ] Analyze query structure
- [ ] Check for red flags
- [ ] Provide recommendations
- [ ] Suggest verification
  1. Identify database type from syntax or user context
  2. Analyze query structure - parse SELECT, FROM, WHERE, JOIN, ORDER BY, GROUP BY clauses
  3. Check for red flags (see Quick Reference)
  4. Provide concrete recommendations with copy-pasteable code
  5. Suggest verification - EXPLAIN command or metric to confirm
For complex queries (3+ JOINs or nested subqueries):
  • Decompose into logical units
  • Analyze each unit's access pattern
  • Synthesize recommendations in priority order
复制并跟踪进度:
Query Optimization:
- [ ] 识别数据库类型
- [ ] 分析查询结构
- [ ] 检查风险点
- [ ] 提供优化建议
- [ ] 建议验证方式
  1. 识别数据库类型:根据语法或用户上下文判断
  2. 分析查询结构 - 解析SELECT、FROM、WHERE、JOIN、ORDER BY、GROUP BY子句
  3. 检查风险点(参考快速指南)
  4. 提供具体优化建议,附带可直接复制粘贴的代码
  5. 建议验证方式 - 使用EXPLAIN命令或指标确认优化效果
针对复杂查询(包含3个及以上JOIN或嵌套子查询):
  • 将查询分解为逻辑单元
  • 分析每个单元的访问模式
  • 按优先级整合优化建议

Analysis Checklist

分析检查清单

For each query, evaluate:
  • Index usage: Missing or inefficient indexes
  • Scan type: Full scans on large tables
  • Join efficiency: Join order, type, missing conditions
  • Filtering: WHERE placement, function usage on columns
  • Projection: SELECT * vs specific columns
  • Pagination: OFFSET vs keyset/cursor
  • N+1 detection: Loop patterns in code context
针对每个查询,评估以下内容:
  • 索引使用情况:缺失或低效的索引
  • 扫描类型:对大表执行全表扫描
  • 连接效率:连接顺序、类型,是否缺少连接条件
  • 过滤逻辑:WHERE子句的位置、列上的函数使用
  • 字段投影:使用SELECT *还是指定具体列
  • 分页方式:OFFSET vs 键集/游标分页
  • N+1问题检测:代码上下文中的循环查询模式

Quick Reference

快速指南

SQL Red Flags

SQL风险点

  • SELECT *
    on large tables
  • WHERE function(column)
    prevents index use
  • LIKE '%pattern'
    (leading wildcard)
  • OR
    conditions spanning different columns
  • Large
    OFFSET
    values
  • NOT IN
    with nullable subquery
  • Missing
    LIMIT
    on unbounded queries
  • 对大表使用
    SELECT *
  • WHERE function(column)
    会导致索引失效
  • 使用
    LIKE '%pattern'
    (前导通配符)
  • 跨不同列的
    OR
    条件
  • 大的
    OFFSET
  • 子查询包含可空值时使用
    NOT IN
  • 无界查询缺少
    LIMIT

NoSQL Red Flags

NoSQL风险点

  • MongoDB: COLLSCAN,
    $where
    , unbounded arrays
  • Redis:
    KEYS *
    , missing TTL, wrong data structure
  • DynamoDB: Scan operations, hot partitions
  • Elasticsearch: Deep pagination, wildcard prefix
  • MongoDB:COLLSCAN、
    $where
    、无界数组
  • Redis:
    KEYS *
    、缺少TTL、使用错误的数据结构
  • DynamoDB:扫描操作、热点分区
  • Elasticsearch:深度分页、前缀通配符

Examples

示例

Example 1: Missing Index
Input:
sql
SELECT * FROM orders WHERE customer_email = 'user@example.com';
Output:
undefined
示例1:缺失索引
输入:
sql
SELECT * FROM orders WHERE customer_email = 'user@example.com';
输出:
undefined

Issue [HIGH]

问题 [高优先级]

Full table scan on
orders
- no index on
customer_email
orders
表执行全表扫描 -
customer_email
列无索引

Impact

影响

Query time grows linearly with table size; causes lock contention
查询时间随表大小线性增长;会导致锁竞争

Solution

解决方案

CREATE INDEX idx_orders_customer_email ON orders(customer_email);
CREATE INDEX idx_orders_customer_email ON orders(customer_email);

Verification

验证方式

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_email = 'user@example.com'; -- Should show "Index Scan" instead of "Seq Scan"

**Example 2: N+1 Query Pattern**

Input:
```python
users = db.query("SELECT * FROM users")
for user in users:
    orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}")
Output:
undefined
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_email = 'user@example.com'; -- 应显示"Index Scan"而非"Seq Scan"

**示例2:N+1查询模式**

输入:
```python
users = db.query("SELECT * FROM users")
for user in users:
    orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}")
输出:
undefined

Issue [HIGH]

问题 [高优先级]

N+1 query pattern - executes 1 + N queries instead of 1
N+1查询模式 - 执行1+N次查询而非1次

Impact

影响

100 users = 101 queries; network latency multiplied
100个用户会产生101次查询;网络延迟被放大N倍

Solution

解决方案

SELECT u., o. FROM users u LEFT JOIN orders o ON u.id = o.user_id;
SELECT u., o. FROM users u LEFT JOIN orders o ON u.id = o.user_id;

Verification

验证方式

Check query count before/after; should reduce from N+1 to 1

**Example 3: When NOT to Optimize**

Input:
```sql
SELECT * FROM config WHERE key = 'app_version';
Output:
undefined
优化前后对比查询次数;应从N+1次减少到1次

**示例3:无需优化的场景**

输入:
```sql
SELECT * FROM config WHERE key = 'app_version';
输出:
undefined

No Issues Detected

未检测到问题

This query on a small lookup table (typically <100 rows) is acceptable. Adding an index would add write overhead without meaningful read benefit.
该查询针对小型查找表(通常少于100行),性能可接受。 添加索引会增加写入开销,但无法带来明显的读取性能提升。

When to Reconsider

何时需要重新考虑

  • If table grows beyond 10,000 rows
  • If query is called >100 times/second

**Example 4: Incomplete Information**

Input:
```sql
SELECT * FROM large_table WHERE status = ?;
Output:
undefined
  • 当表行数超过10000行时
  • 当查询调用频率超过每秒100次时

**示例4:信息不完整**

输入:
```sql
SELECT * FROM large_table WHERE status = ?;
输出:
undefined

Issue [MEDIUM]

问题 [中优先级]

Preliminary analysis without schema information
缺少表结构信息,无法完成初步分析

Potential Issues

潜在问题

  1. SELECT *
    - may fetch unnecessary columns
  2. status
    column - if low cardinality (e.g., 5 values), index may not help
  1. SELECT *
    - 可能获取不必要的列
  2. status
    列 - 如果基数较低(如仅5个可选值),索引可能无法起到作用

Information Needed

需要补充的信息

  • Table row count
  • Cardinality of
    status
    column
  • Existing indexes:
    SHOW INDEX FROM large_table;

**Example 5: MongoDB COLLSCAN**

Input:
```javascript
db.orders.find({ customer_email: "user@example.com" })
Output:
undefined
  • 表的行数
  • status
    列的基数
  • 现有索引:
    SHOW INDEX FROM large_table;

**示例5:MongoDB全集合扫描**

输入:
```javascript
db.orders.find({ customer_email: "user@example.com" })
输出:
undefined

Issue [HIGH]

问题 [高优先级]

COLLSCAN on
orders
- no index on
customer_email
orders
集合执行COLLSCAN -
customer_email
字段无索引

Impact

影响

Query scans entire collection; O(n) performance
查询扫描整个集合;性能为O(n)

Solution

解决方案

db.orders.createIndex({ customer_email: 1 })
db.orders.createIndex({ customer_email: 1 })

Verification

验证方式

db.orders.find({ customer_email: "..." }).explain("executionStats") // Should show "stage": "IXSCAN" instead of "COLLSCAN"
undefined
db.orders.find({ customer_email: "..." }).explain("executionStats") // 应显示"stage": "IXSCAN"而非"COLLSCAN"
undefined

Response Format

响应格式

Structure recommendations as:
undefined
优化建议请按以下结构呈现:
undefined

Issue [HIGH/MEDIUM/LOW]

问题 [高/中/低优先级]

[Specific problem identified]
[识别到的具体问题]

Impact

影响

[Performance/resource cost]
[性能或资源消耗影响]

Solution

解决方案

[Concrete fix with code]
[附带代码的具体修复方案]

Verification

验证方式

[EXPLAIN command or metric to confirm]
undefined
[用于确认效果的EXPLAIN命令或指标]
undefined

Confidence Levels

置信度等级

When uncertainty exists, indicate confidence:
  • HIGH: Pattern clearly matches known anti-pattern
  • MEDIUM: Likely issue, depends on data distribution/schema
  • LOW: Potential issue, needs EXPLAIN to confirm
当存在不确定性时,请标注置信度:
  • :模式明显匹配已知的反模式
  • :可能存在问题,取决于数据分布或表结构
  • :潜在问题,需要EXPLAIN输出确认

Response Constraints

响应约束

  1. Max 5 recommendations per query - prioritize by impact
  2. Order by severity: HIGH → MEDIUM → LOW
  3. One primary fix per issue - mention alternatives briefly
  4. Code must be copy-pasteable - no placeholders like
    <table_name>
  5. Verification commands must be complete - include all flags
  1. 每个查询最多提供5条优化建议 - 按影响优先级排序
  2. 按严重程度排序:高 → 中 → 低
  3. 每个问题提供一个主要修复方案 - 简要提及替代方案
  4. 代码必须可直接复制粘贴 - 不允许使用
    <table_name>
    这类占位符
  5. 验证命令必须完整 - 包含所有必要的参数

Prioritization Matrix

优先级矩阵

Issue TypeImpactPriority
Missing JOIN indexQuery blocksP0
Full table scan (>100k rows)Slow responseP0
N+1 patternLatency × NP1
SELECT *Memory/bandwidthP1
Missing LIMITResource exhaustionP1
Large OFFSETSlow paginationP2
Suboptimal ORDER BYSort overheadP2
问题类型影响优先级
缺失JOIN索引查询阻塞P0
全表扫描(行数>100k)响应缓慢P0
N+1模式延迟×NP1
SELECT *内存/带宽消耗P1
缺少LIMIT资源耗尽P1
大OFFSET值分页缓慢P2
非最优ORDER BY排序开销大P2

Reference Files

参考文档

Consult these for detailed patterns (use grep to find specific sections):
  • references/sql-patterns.md
    - SQL optimization
    • Grep:
      ## Index
      ,
      ## Query Rewrite
      ,
      ## Join
      ,
      ## Aggregation
  • references/nosql-patterns.md
    - MongoDB, Redis, DynamoDB, Elasticsearch
    • Grep:
      ## MongoDB
      ,
      ## Redis
      ,
      ## DynamoDB
      ,
      ## Elasticsearch
  • references/explain-analysis.md
    - Execution plans
    • Grep:
      ## PostgreSQL
      ,
      ## MySQL
      ,
      ## MongoDB
  • references/common-scenarios.md
    - Common patterns
    • Grep:
      ## Slow pagination
      ,
      ## N+1
      ,
      ## Dashboard
      ,
      ## Bulk
  • references/orm-patterns.md
    - ORM fixes
    • Grep:
      ### Django
      ,
      ### SQLAlchemy
      ,
      ### ActiveRecord
      ,
      ### Prisma
如需详细模式,请参考以下文档(可使用grep查找特定章节):
  • references/sql-patterns.md
    - SQL优化
    • Grep关键词:
      ## Index
      ,
      ## Query Rewrite
      ,
      ## Join
      ,
      ## Aggregation
  • references/nosql-patterns.md
    - MongoDB、Redis、DynamoDB、Elasticsearch优化
    • Grep关键词:
      ## MongoDB
      ,
      ## Redis
      ,
      ## DynamoDB
      ,
      ## Elasticsearch
  • references/explain-analysis.md
    - 执行计划分析
    • Grep关键词:
      ## PostgreSQL
      ,
      ## MySQL
      ,
      ## MongoDB
  • references/common-scenarios.md
    - 常见场景
    • Grep关键词:
      ## Slow pagination
      ,
      ## N+1
      ,
      ## Dashboard
      ,
      ## Bulk
  • references/orm-patterns.md
    - ORM修复方案
    • Grep关键词:
      ### Django
      ,
      ### SQLAlchemy
      ,
      ### ActiveRecord
      ,
      ### Prisma

Utility Scripts

实用脚本

All scripts use Python standard library only (no pip install required).
Run these for automated analysis:
所有脚本仅依赖Python标准库(无需通过pip安装额外依赖)。
运行以下脚本可实现自动化分析:

Query Analysis

查询分析

analyze_query.py - Detect anti-patterns and score complexity:
bash
python scripts/analyze_query.py "SELECT * FROM orders WHERE ..."
python scripts/analyze_query.py --file query.sql --json
suggest_index.py - Generate index recommendations:
bash
python scripts/suggest_index.py "SELECT ... FROM orders WHERE status = 'pending'"
analyze_query.py - 检测反模式并评估查询复杂度:
bash
python scripts/analyze_query.py "SELECT * FROM orders WHERE ..."
python scripts/analyze_query.py --file query.sql --json
suggest_index.py - 生成索引建议:
bash
python scripts/suggest_index.py "SELECT ... FROM orders WHERE status = 'pending'"

EXPLAIN & Logs

EXPLAIN与日志分析

parse_explain.py - Analyze EXPLAIN output:
bash
python scripts/parse_explain.py --pg explain_output.txt
python scripts/parse_explain.py --mysql explain_output.txt
diff_explain.py - Compare before/after EXPLAIN outputs:
bash
python scripts/diff_explain.py --pg before.txt after.txt
python scripts/diff_explain.py --mysql before.txt after.txt --json
parse_slow_log.py - Analyze slow query logs:
bash
python scripts/parse_slow_log.py --pg /var/log/postgresql.log --top 20
python scripts/parse_slow_log.py --mysql slow-query.log --json
parse_explain.py - 分析EXPLAIN输出:
bash
python scripts/parse_explain.py --pg explain_output.txt
python scripts/parse_explain.py --mysql explain_output.txt
diff_explain.py - 对比优化前后的EXPLAIN输出:
bash
python scripts/diff_explain.py --pg before.txt after.txt
python scripts/diff_explain.py --mysql before.txt after.txt --json
parse_slow_log.py - 分析慢查询日志:
bash
python scripts/parse_slow_log.py --pg /var/log/postgresql.log --top 20
python scripts/parse_slow_log.py --mysql slow-query.log --json

Schema & Code

表结构与代码分析

analyze_schema.py - Find schema optimization opportunities:
bash
python scripts/analyze_schema.py schema.sql
detect_orm_issues.py - Find ORM anti-patterns in code:
bash
python scripts/detect_orm_issues.py app.py
python scripts/detect_orm_issues.py --dir ./models
check_migration.py - Check migration safety:
bash
python scripts/check_migration.py --pg migration.sql
python scripts/check_migration.py --mysql migration.sql
analyze_schema.py - 发现表结构优化机会:
bash
python scripts/analyze_schema.py schema.sql
detect_orm_issues.py - 检测代码中的ORM反模式:
bash
python scripts/detect_orm_issues.py app.py
python scripts/detect_orm_issues.py --dir ./models
check_migration.py - 检查迁移的安全性:
bash
python scripts/check_migration.py --pg migration.sql
python scripts/check_migration.py --mysql migration.sql

Workflow with Scripts

脚本工作流程

For query optimization:
  1. analyze_query.py
    → detect issues
  2. suggest_index.py
    → generate indexes
  3. User runs EXPLAIN (before)
  4. User applies optimization
  5. User runs EXPLAIN (after)
  6. diff_explain.py
    → verify improvement
For verifying optimization impact:
bash
undefined
查询优化流程:
  1. 运行
    analyze_query.py
    → 检测问题
  2. 运行
    suggest_index.py
    → 生成索引建议
  3. 用户运行EXPLAIN(优化前)
  4. 用户应用优化方案
  5. 用户运行EXPLAIN(优化后)
  6. 运行
    diff_explain.py
    → 验证优化效果
验证优化影响的流程:
bash
undefined

Save before state

保存优化前状态

psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > before.txt
psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > before.txt

Apply optimization (add index, rewrite query, etc.)

应用优化方案(添加索引、重写查询等)

Save after state

保存优化后状态

psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > after.txt
psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > after.txt

Compare

对比结果

python scripts/diff_explain.py --pg before.txt after.txt

**For codebase audit:**
1. `detect_orm_issues.py --dir ./` → find N+1 patterns
2. `analyze_schema.py` → check schema
3. Prioritize fixes by severity

**For migration review:**
1. `check_migration.py` → safety check
2. Fix critical issues before deployment
3. Use safe alternatives (CONCURRENTLY, etc.)
python scripts/diff_explain.py --pg before.txt after.txt

**代码库审计流程:**
1. 运行`detect_orm_issues.py --dir ./` → 发现N+1模式
2. 运行`analyze_schema.py` → 检查表结构
3. 按严重程度优先处理问题

**迁移审查流程:**
1. 运行`check_migration.py` → 安全性检查
2. 部署前修复严重问题
3. 使用安全的替代方案(如CONCURRENTLY等)

EXPLAIN Commands

EXPLAIN命令

Provide appropriate commands:
sql
-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- MySQL
EXPLAIN ANALYZE SELECT ...;

-- SQLite
EXPLAIN QUERY PLAN SELECT ...;
javascript
// MongoDB
db.collection.find({...}).explain("executionStats")
请使用对应数据库的EXPLAIN命令:
sql
-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- MySQL
EXPLAIN ANALYZE SELECT ...;

-- SQLite
EXPLAIN QUERY PLAN SELECT ...;
javascript
// MongoDB
db.collection.find({...}).explain("executionStats")

Index Recommendation Format

索引建议格式

When suggesting indexes:
sql
-- PostgreSQL/MySQL
CREATE INDEX [CONCURRENTLY] idx_table_columns 
ON table(col1, col2, col3);
-- Rationale: [why this column order, covering considerations]
javascript
// MongoDB (ESR: Equality, Sort, Range)
db.collection.createIndex({ equality_field: 1, sort_field: -1, range_field: 1 })
建议索引时请遵循以下格式:
sql
-- PostgreSQL/MySQL
CREATE INDEX [CONCURRENTLY] idx_table_columns 
ON table(col1, col2, col3);
-- 理由:[说明列顺序的原因、覆盖索引的考虑]
javascript
// MongoDB(ESR原则:匹配、排序、范围)
db.collection.createIndex({ equality_field: 1, sort_field: -1, range_field: 1 })

Handling Missing Information

处理信息缺失的情况

When user provides incomplete context:
No schema provided:
"To give precise recommendations, I need the table schema. Could you share:
  1. Column data types
  2. Existing indexes
  3. Approximate row counts
Meanwhile, here's what I can infer from the query..."
No EXPLAIN output:
"Run this command and share the output:
sql
EXPLAIN (ANALYZE, BUFFERS) <their query>;
This will show actual vs estimated rows and scan types."
Unknown database type:
"Which database are you using? (PostgreSQL, MySQL, SQLite, MongoDB, etc.) The optimization strategies differ significantly between them."
当用户提供的上下文不完整时:
未提供表结构:
"为了给出精准的建议,我需要表结构信息。能否提供以下内容:
  1. 列的数据类型
  2. 现有索引
  3. 大致行数
同时,我可以根据查询本身给出一些推断性建议..."
未提供EXPLAIN输出:
"请运行以下命令并分享输出结果:
sql
EXPLAIN (ANALYZE, BUFFERS) <你的查询>;
该命令会显示实际行数与预估行数的对比,以及扫描类型。"
未知数据库类型:
"你使用的是哪种数据库?(PostgreSQL、MySQL、SQLite、MongoDB等) 不同数据库的优化策略差异很大。"

Conditional Workflows

条件工作流程

Analyzing a query:
  1. Run
    analyze_query.py
    first
  2. Check for anti-patterns
  3. Suggest fixes with examples
Interpreting EXPLAIN output:
  1. Run
    parse_explain.py
    with appropriate flag
  2. Identify warnings
  3. Correlate with query structure
Full optimization request:
  1. Analyze query → identify issues
  2. Suggest indexes → generate CREATE statements
  3. Request EXPLAIN → interpret results
  4. Provide before/after comparison
Complex optimization (3+ issues or architectural decisions): Use extended reasoning to:
  1. Map full query execution path
  2. Identify all optimization opportunities
  3. Evaluate trade-offs between approaches
  4. Prioritize by impact and implementation cost
分析查询时:
  1. 先运行
    analyze_query.py
  2. 检查反模式
  3. 提供带示例的修复建议
解读EXPLAIN输出时:
  1. 使用对应标志运行
    parse_explain.py
  2. 识别警告信息
  3. 关联查询结构进行分析
完整优化请求:
  1. 分析查询 → 识别问题
  2. 建议索引 → 生成CREATE语句
  3. 请求EXPLAIN输出 → 解读结果
  4. 提供优化前后的对比
复杂优化(包含3个及以上问题或架构决策): 使用扩展推理:
  1. 映射完整的查询执行路径
  2. 识别所有优化机会
  3. 评估不同方案的权衡
  4. 按影响和实施成本排序优先级

Script Error Handling

脚本错误处理

If scripts fail:
  1. Check Python version (requires 3.7+)
  2. Verify input format (SQL string, valid file path)
  3. Fall back to manual analysis using reference files
Common errors:
  • FileNotFoundError
    → Check file path, use absolute paths
  • json.JSONDecodeError
    → Input not valid JSON, check --pg/--mysql flags
  • Empty output → Query may be valid, no issues detected
When scripts are unavailable: Perform manual analysis using Quick Reference red flags and reference files.
如果脚本运行失败:
  1. 检查Python版本(需要3.7及以上)
  2. 验证输入格式(SQL字符串、有效的文件路径)
  3. fallback到使用参考文档进行手动分析
常见错误:
  • FileNotFoundError
    → 检查文件路径,使用绝对路径
  • json.JSONDecodeError
    → 输入不是有效的JSON,检查--pg/--mysql标志
  • 输出为空 → 查询可能是有效的,未检测到问题
当脚本不可用时: 使用快速指南中的风险点和参考文档进行手动分析。

Unsupported Databases

不支持的数据库

For databases not explicitly covered (Oracle, SQL Server, CockroachDB, etc.):
  1. Apply general SQL patterns from
    references/sql-patterns.md
  2. Note: "Optimization patterns based on PostgreSQL/MySQL; verify syntax for [database]"
  3. Recommend user consult database-specific documentation for EXPLAIN syntax and index creation
对于未明确覆盖的数据库(Oracle、SQL Server、CockroachDB等):
  1. 应用
    references/sql-patterns.md
    中的通用SQL模式
  2. 备注:"优化模式基于PostgreSQL/MySQL;请针对[数据库]验证语法"
  3. 建议用户参考数据库官方文档获取EXPLAIN语法和索引创建方法