database-query-optimization

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Database Query Optimization

数据库查询优化

Overview

概述

Slow database queries are a common performance bottleneck. Optimization through indexing, efficient queries, and caching dramatically improves application performance.

缓慢的数据库查询是常见的性能瓶颈。通过索引、高效查询和缓存进行优化可显著提升应用程序性能。

When to Use

适用场景

Slow response times
High database CPU usage
Performance regression
New feature deployment
Regular maintenance

响应时间缓慢
数据库CPU使用率过高
性能退化
新功能部署
常规维护

Instructions

操作指南

1. Query Analysis

1. 查询分析

sql

-- Analyze query performance

EXPLAIN ANALYZE
SELECT users.id, users.name, COUNT(orders.id) as order_count
FROM users
LEFT JOIN orders ON users.id = orders.user_id
WHERE users.created_at > '2024-01-01'
GROUP BY users.id, users.name
ORDER BY order_count DESC;

-- Results show:
-- - Seq Scan (slow) vs Index Scan (fast)
-- - Rows: actual vs planned (high variance = bad)
-- - Execution time (milliseconds)

-- Key metrics:
-- - Sequential Scan: Full table read (slow)
-- - Index Scan: Uses index (fast)
-- - Nested Loop: Joins with loops
-- - Sort: In-memory or disk sort

sql

-- 分析查询性能

EXPLAIN ANALYZE
SELECT users.id, users.name, COUNT(orders.id) as order_count
FROM users
LEFT JOIN orders ON users.id = orders.user_id
WHERE users.created_at > '2024-01-01'
GROUP BY users.id, users.name
ORDER BY order_count DESC;

-- 结果显示:
-- - 顺序扫描（慢）与索引扫描（快）
-- - 行数：实际行数与计划行数（差异大=性能差）
-- - 执行时间（毫秒）

-- 关键指标:
-- - 顺序扫描：全表读取（慢）
-- - 索引扫描：使用索引（快）
-- - 嵌套循环：通过循环进行连接
-- - 排序：内存或磁盘排序

2. Indexing Strategy

2. 索引策略

yaml

Index Types:

Single Column:
  CREATE INDEX idx_users_email ON users(email);
  Use: WHERE email = ?
  Size: Small, quick to create

Composite Index:
  CREATE INDEX idx_orders_user_date
    ON orders(user_id, created_at);
  Use: WHERE user_id = ? AND created_at > ?
  Order: Most selective first

Covering Index:
  CREATE INDEX idx_orders_covering
    ON orders(user_id) INCLUDE (total_amount);
  Benefit: No table lookup needed

Partial Index:
  CREATE INDEX idx_active_users
    ON users(id) WHERE status = 'active';
  Benefit: Smaller, faster

Full Text:
  CREATE FULLTEXT INDEX idx_search
    ON articles(title, content);
  Use: Text search queries

---

Index Rules:

- Create indexes for WHERE conditions
- Create indexes for JOIN columns
- Create indexes for ORDER BY
- Don't over-index (slows writes)
- Monitor index usage
- Remove unused indexes
- Update statistics regularly
- Partial indexes for filtered queries

Missing Index Query:
SELECT object_name, equality_columns
FROM sys.dm_db_missing_index_details
ORDER BY equality_columns;

yaml

Index Types:

Single Column:
  CREATE INDEX idx_users_email ON users(email);
  Use: WHERE email = ?
  Size: Small, quick to create

Composite Index:
  CREATE INDEX idx_orders_user_date
    ON orders(user_id, created_at);
  Use: WHERE user_id = ? AND created_at > ?
  Order: Most selective first

Covering Index:
  CREATE INDEX idx_orders_covering
    ON orders(user_id) INCLUDE (total_amount);
  Benefit: No table lookup needed

Partial Index:
  CREATE INDEX idx_active_users
    ON users(id) WHERE status = 'active';
  Benefit: Smaller, faster

Full Text:
  CREATE FULLTEXT INDEX idx_search
    ON articles(title, content);
  Use: Text search queries

---

Index Rules:

- Create indexes for WHERE conditions
- Create indexes for JOIN columns
- Create indexes for ORDER BY
- Don't over-index (slows writes)
- Monitor index usage
- Remove unused indexes
- Update statistics regularly
- Partial indexes for filtered queries

Missing Index Query:
SELECT object_name, equality_columns
FROM sys.dm_db_missing_index_details
ORDER BY equality_columns;

3. Query Optimization Techniques

3. 查询优化技巧

python

undefined

python

undefined

Common optimization patterns

BEFORE (N+1 queries)

for user in users: orders = db.query("SELECT * FROM orders WHERE user_id = ?", user.id) # 1 + N queries

AFTER (single query with JOIN)

orders = db.query(""" SELECT u., o. FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE u.created_at > ? """, date_threshold)

BEFORE (inefficient WHERE)

SELECT * FROM users WHERE LOWER(email) = LOWER('Test@Example.com')

Can't use index (function used)

AFTER (index-friendly)

SELECT * FROM users WHERE email = 'test@example.com'

Case-insensitive constraint + index

BEFORE (wildcard at start)

SELECT * FROM users WHERE email LIKE '%example.com'

Can't use index (wildcard at start)

AFTER (wildcard at end)

SELECT * FROM users WHERE email LIKE 'user%'

Can use index

BEFORE (slow aggregation)

SELECT user_id, COUNT(*) as cnt FROM orders GROUP BY user_id ORDER BY cnt DESC LIMIT 10

AFTER (pre-aggregated)

SELECT user_id, order_count FROM user_order_stats WHERE order_count IS NOT NULL ORDER BY order_count DESC LIMIT 10

undefined

SELECT user_id, order_count FROM user_order_stats WHERE order_count IS NOT NULL ORDER BY order_count DESC LIMIT 10

undefined

4. Optimization Checklist

4. 优化检查清单

yaml

Analysis:
  [ ] Run EXPLAIN ANALYZE on slow queries
  [ ] Check actual vs estimated rows
  [ ] Look for sequential scans
  [ ] Identify expensive operations
  [ ] Compare execution plans

Indexing:
  [ ] Index WHERE columns
  [ ] Index JOIN columns
  [ ] Index ORDER BY columns
  [ ] Check unused indexes
  [ ] Remove duplicate indexes
  [ ] Create composite indexes strategically
  [ ] Analyze index statistics

Query Optimization:
  [ ] Remove unnecessary columns (SELECT *)
  [ ] Use JOINs instead of subqueries
  [ ] Avoid functions in WHERE
  [ ] Use wildcards carefully (avoid %)
  [ ] Batch operations
  [ ] Use LIMIT for result sets
  [ ] Archive old data

Caching:
  [ ] Implement query caching
  [ ] Cache aggregations
  [ ] Use Redis for hot data
  [ ] Invalidate strategically

Monitoring:
  [ ] Track slow queries
  [ ] Monitor index usage
  [ ] Set up alerts
  [ ] Regular statistics update
  [ ] Measure improvements

---

Expected Improvements:

With Proper Indexing:
  - Sequential Scan → Index Scan
  - Response time: 5 seconds → 50ms (100x faster)
  - CPU usage: 80% → 20%
  - Concurrent users: 100 → 1000

Quick Wins:
  - Add index to frequently filtered column
  - Fix N+1 queries
  - Use LIMIT for large results
  - Archive old data
  - Expected: 20-50% improvement

yaml

Analysis:
  [ ] Run EXPLAIN ANALYZE on slow queries
  [ ] Check actual vs estimated rows
  [ ] Look for sequential scans
  [ ] Identify expensive operations
  [ ] Compare execution plans

Indexing:
  [ ] Index WHERE columns
  [ ] Index JOIN columns
  [ ] Index ORDER BY columns
  [ ] Check unused indexes
  [ ] Remove duplicate indexes
  [ ] Create composite indexes strategically
  [ ] Analyze index statistics

Query Optimization:
  [ ] Remove unnecessary columns (SELECT *)
  [ ] Use JOINs instead of subqueries
  [ ] Avoid functions in WHERE
  [ ] Use wildcards carefully (avoid %)
  [ ] Batch operations
  [ ] Use LIMIT for result sets
  [ ] Archive old data

Caching:
  [ ] Implement query caching
  [ ] Cache aggregations
  [ ] Use Redis for hot data
  [ ] Invalidate strategically

Monitoring:
  [ ] Track slow queries
  [ ] Monitor index usage
  [ ] Set up alerts
  [ ] Regular statistics update
  [ ] Measure improvements

---

Expected Improvements:

With Proper Indexing:
  - Sequential Scan → Index Scan
  - Response time: 5 seconds → 50ms (100x faster)
  - CPU usage: 80% → 20%
  - Concurrent users: 100 → 1000

Quick Wins:
  - Add index to frequently filtered column
  - Fix N+1 queries
  - Use LIMIT for large results
  - Archive old data
  - Expected: 20-50% improvement

Key Points

核心要点

EXPLAIN ANALYZE shows query execution
Indexes must match WHERE/JOIN/ORDER BY
Avoid functions in WHERE clauses
Fix N+1 queries (join instead of loop)
Monitor slow query log regularly
Stats updates needed for accuracy
Pre-calculate aggregations
Archive historical data
Use explain plans before/after
Measure and monitor continuously

EXPLAIN ANALYZE可展示查询执行过程
索引必须与WHERE/JOIN/ORDER BY匹配
避免在WHERE子句中使用函数
修复N+1查询问题（使用连接而非循环）
定期监控慢查询日志
需要更新统计信息以保证准确性
预先计算聚合结果
归档历史数据
优化前后对比执行计划
持续进行测量与监控