database-query-optimization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Database Query Optimization

数据库查询优化

Overview

概述

Slow database queries are a common performance bottleneck. Optimization through indexing, efficient queries, and caching dramatically improves application performance.
缓慢的数据库查询是常见的性能瓶颈。通过索引、高效查询和缓存进行优化可显著提升应用程序性能。

When to Use

适用场景

  • Slow response times
  • High database CPU usage
  • Performance regression
  • New feature deployment
  • Regular maintenance
  • 响应时间缓慢
  • 数据库CPU使用率过高
  • 性能退化
  • 新功能部署
  • 常规维护

Instructions

操作指南

1. Query Analysis

1. 查询分析

sql
-- Analyze query performance

EXPLAIN ANALYZE
SELECT users.id, users.name, COUNT(orders.id) as order_count
FROM users
LEFT JOIN orders ON users.id = orders.user_id
WHERE users.created_at > '2024-01-01'
GROUP BY users.id, users.name
ORDER BY order_count DESC;

-- Results show:
-- - Seq Scan (slow) vs Index Scan (fast)
-- - Rows: actual vs planned (high variance = bad)
-- - Execution time (milliseconds)

-- Key metrics:
-- - Sequential Scan: Full table read (slow)
-- - Index Scan: Uses index (fast)
-- - Nested Loop: Joins with loops
-- - Sort: In-memory or disk sort
sql
-- 分析查询性能

EXPLAIN ANALYZE
SELECT users.id, users.name, COUNT(orders.id) as order_count
FROM users
LEFT JOIN orders ON users.id = orders.user_id
WHERE users.created_at > '2024-01-01'
GROUP BY users.id, users.name
ORDER BY order_count DESC;

-- 结果显示:
-- - 顺序扫描(慢)与索引扫描(快)
-- - 行数:实际行数与计划行数(差异大=性能差)
-- - 执行时间(毫秒)

-- 关键指标:
-- - 顺序扫描:全表读取(慢)
-- - 索引扫描:使用索引(快)
-- - 嵌套循环:通过循环进行连接
-- - 排序:内存或磁盘排序

2. Indexing Strategy

2. 索引策略

yaml
Index Types:

Single Column:
  CREATE INDEX idx_users_email ON users(email);
  Use: WHERE email = ?
  Size: Small, quick to create

Composite Index:
  CREATE INDEX idx_orders_user_date
    ON orders(user_id, created_at);
  Use: WHERE user_id = ? AND created_at > ?
  Order: Most selective first

Covering Index:
  CREATE INDEX idx_orders_covering
    ON orders(user_id) INCLUDE (total_amount);
  Benefit: No table lookup needed

Partial Index:
  CREATE INDEX idx_active_users
    ON users(id) WHERE status = 'active';
  Benefit: Smaller, faster

Full Text:
  CREATE FULLTEXT INDEX idx_search
    ON articles(title, content);
  Use: Text search queries

---

Index Rules:

- Create indexes for WHERE conditions
- Create indexes for JOIN columns
- Create indexes for ORDER BY
- Don't over-index (slows writes)
- Monitor index usage
- Remove unused indexes
- Update statistics regularly
- Partial indexes for filtered queries

Missing Index Query:
SELECT object_name, equality_columns
FROM sys.dm_db_missing_index_details
ORDER BY equality_columns;
yaml
Index Types:

Single Column:
  CREATE INDEX idx_users_email ON users(email);
  Use: WHERE email = ?
  Size: Small, quick to create

Composite Index:
  CREATE INDEX idx_orders_user_date
    ON orders(user_id, created_at);
  Use: WHERE user_id = ? AND created_at > ?
  Order: Most selective first

Covering Index:
  CREATE INDEX idx_orders_covering
    ON orders(user_id) INCLUDE (total_amount);
  Benefit: No table lookup needed

Partial Index:
  CREATE INDEX idx_active_users
    ON users(id) WHERE status = 'active';
  Benefit: Smaller, faster

Full Text:
  CREATE FULLTEXT INDEX idx_search
    ON articles(title, content);
  Use: Text search queries

---

Index Rules:

- Create indexes for WHERE conditions
- Create indexes for JOIN columns
- Create indexes for ORDER BY
- Don't over-index (slows writes)
- Monitor index usage
- Remove unused indexes
- Update statistics regularly
- Partial indexes for filtered queries

Missing Index Query:
SELECT object_name, equality_columns
FROM sys.dm_db_missing_index_details
ORDER BY equality_columns;

3. Query Optimization Techniques

3. 查询优化技巧

python
undefined
python
undefined

Common optimization patterns

Common optimization patterns

BEFORE (N+1 queries)

BEFORE (N+1 queries)

for user in users: orders = db.query("SELECT * FROM orders WHERE user_id = ?", user.id) # 1 + N queries
for user in users: orders = db.query("SELECT * FROM orders WHERE user_id = ?", user.id) # 1 + N queries

AFTER (single query with JOIN)

AFTER (single query with JOIN)

orders = db.query(""" SELECT u., o. FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE u.created_at > ? """, date_threshold)
orders = db.query(""" SELECT u., o. FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE u.created_at > ? """, date_threshold)

BEFORE (inefficient WHERE)

BEFORE (inefficient WHERE)

SELECT * FROM users WHERE LOWER(email) = LOWER('Test@Example.com')
SELECT * FROM users WHERE LOWER(email) = LOWER('Test@Example.com')

Can't use index (function used)

Can't use index (function used)

AFTER (index-friendly)

AFTER (index-friendly)

SELECT * FROM users WHERE email = 'test@example.com'
SELECT * FROM users WHERE email = 'test@example.com'

Case-insensitive constraint + index

Case-insensitive constraint + index

BEFORE (wildcard at start)

BEFORE (wildcard at start)

SELECT * FROM users WHERE email LIKE '%example.com'
SELECT * FROM users WHERE email LIKE '%example.com'

Can't use index (wildcard at start)

Can't use index (wildcard at start)

AFTER (wildcard at end)

AFTER (wildcard at end)

SELECT * FROM users WHERE email LIKE 'user%'
SELECT * FROM users WHERE email LIKE 'user%'

Can use index

Can use index

BEFORE (slow aggregation)

BEFORE (slow aggregation)

SELECT user_id, COUNT(*) as cnt FROM orders GROUP BY user_id ORDER BY cnt DESC LIMIT 10
SELECT user_id, COUNT(*) as cnt FROM orders GROUP BY user_id ORDER BY cnt DESC LIMIT 10

AFTER (pre-aggregated)

AFTER (pre-aggregated)

SELECT user_id, order_count FROM user_order_stats WHERE order_count IS NOT NULL ORDER BY order_count DESC LIMIT 10
undefined
SELECT user_id, order_count FROM user_order_stats WHERE order_count IS NOT NULL ORDER BY order_count DESC LIMIT 10
undefined

4. Optimization Checklist

4. 优化检查清单

yaml
Analysis:
  [ ] Run EXPLAIN ANALYZE on slow queries
  [ ] Check actual vs estimated rows
  [ ] Look for sequential scans
  [ ] Identify expensive operations
  [ ] Compare execution plans

Indexing:
  [ ] Index WHERE columns
  [ ] Index JOIN columns
  [ ] Index ORDER BY columns
  [ ] Check unused indexes
  [ ] Remove duplicate indexes
  [ ] Create composite indexes strategically
  [ ] Analyze index statistics

Query Optimization:
  [ ] Remove unnecessary columns (SELECT *)
  [ ] Use JOINs instead of subqueries
  [ ] Avoid functions in WHERE
  [ ] Use wildcards carefully (avoid %)
  [ ] Batch operations
  [ ] Use LIMIT for result sets
  [ ] Archive old data

Caching:
  [ ] Implement query caching
  [ ] Cache aggregations
  [ ] Use Redis for hot data
  [ ] Invalidate strategically

Monitoring:
  [ ] Track slow queries
  [ ] Monitor index usage
  [ ] Set up alerts
  [ ] Regular statistics update
  [ ] Measure improvements

---

Expected Improvements:

With Proper Indexing:
  - Sequential Scan → Index Scan
  - Response time: 5 seconds → 50ms (100x faster)
  - CPU usage: 80% → 20%
  - Concurrent users: 100 → 1000

Quick Wins:
  - Add index to frequently filtered column
  - Fix N+1 queries
  - Use LIMIT for large results
  - Archive old data
  - Expected: 20-50% improvement
yaml
Analysis:
  [ ] Run EXPLAIN ANALYZE on slow queries
  [ ] Check actual vs estimated rows
  [ ] Look for sequential scans
  [ ] Identify expensive operations
  [ ] Compare execution plans

Indexing:
  [ ] Index WHERE columns
  [ ] Index JOIN columns
  [ ] Index ORDER BY columns
  [ ] Check unused indexes
  [ ] Remove duplicate indexes
  [ ] Create composite indexes strategically
  [ ] Analyze index statistics

Query Optimization:
  [ ] Remove unnecessary columns (SELECT *)
  [ ] Use JOINs instead of subqueries
  [ ] Avoid functions in WHERE
  [ ] Use wildcards carefully (avoid %)
  [ ] Batch operations
  [ ] Use LIMIT for result sets
  [ ] Archive old data

Caching:
  [ ] Implement query caching
  [ ] Cache aggregations
  [ ] Use Redis for hot data
  [ ] Invalidate strategically

Monitoring:
  [ ] Track slow queries
  [ ] Monitor index usage
  [ ] Set up alerts
  [ ] Regular statistics update
  [ ] Measure improvements

---

Expected Improvements:

With Proper Indexing:
  - Sequential Scan → Index Scan
  - Response time: 5 seconds → 50ms (100x faster)
  - CPU usage: 80% → 20%
  - Concurrent users: 100 → 1000

Quick Wins:
  - Add index to frequently filtered column
  - Fix N+1 queries
  - Use LIMIT for large results
  - Archive old data
  - Expected: 20-50% improvement

Key Points

核心要点

  • EXPLAIN ANALYZE shows query execution
  • Indexes must match WHERE/JOIN/ORDER BY
  • Avoid functions in WHERE clauses
  • Fix N+1 queries (join instead of loop)
  • Monitor slow query log regularly
  • Stats updates needed for accuracy
  • Pre-calculate aggregations
  • Archive historical data
  • Use explain plans before/after
  • Measure and monitor continuously
  • EXPLAIN ANALYZE可展示查询执行过程
  • 索引必须与WHERE/JOIN/ORDER BY匹配
  • 避免在WHERE子句中使用函数
  • 修复N+1查询问题(使用连接而非循环)
  • 定期监控慢查询日志
  • 需要更新统计信息以保证准确性
  • 预先计算聚合结果
  • 归档历史数据
  • 优化前后对比执行计划
  • 持续进行测量与监控