database-performance-debugging

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Database Performance Debugging

数据库性能调试

Overview

概述

Database performance issues directly impact application responsiveness. Debugging focuses on identifying slow queries and optimizing execution plans.
数据库性能问题会直接影响应用程序的响应速度。调试工作的重点是识别慢查询并优化执行计划。

When to Use

适用场景

  • Slow application response times
  • High database CPU
  • Slow queries identified
  • Performance regression
  • Under load stress
  • 应用程序响应缓慢
  • 数据库CPU占用率高
  • 检测到慢查询
  • 性能出现退化
  • 处于负载压力下

Instructions

操作步骤

1. Identify Slow Queries

1. 识别慢查询

sql
-- Enable slow query log (MySQL)
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.5;

-- View slow queries
SHOW GLOBAL STATUS LIKE 'Slow_queries';
SELECT * FROM mysql.slow_log;

-- PostgreSQL slow queries
CREATE EXTENSION pg_stat_statements;
SELECT mean_exec_time, calls, query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC LIMIT 10;

-- SQL Server slow queries
SELECT TOP 10
  execution_count,
  total_elapsed_time,
  statement_text
FROM sys.dm_exec_query_stats
ORDER BY total_elapsed_time DESC;

-- Query profiling
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 123;
-- Slow: Seq Scan (full table scan)
-- Fast: Index Scan
sql
-- Enable slow query log (MySQL)
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.5;

-- View slow queries
SHOW GLOBAL STATUS LIKE 'Slow_queries';
SELECT * FROM mysql.slow_log;

-- PostgreSQL slow queries
CREATE EXTENSION pg_stat_statements;
SELECT mean_exec_time, calls, query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC LIMIT 10;

-- SQL Server slow queries
SELECT TOP 10
  execution_count,
  total_elapsed_time,
  statement_text
FROM sys.dm_exec_query_stats
ORDER BY total_elapsed_time DESC;

-- Query profiling
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 123;
-- 慢查询:Seq Scan(全表扫描)
-- 快查询:Index Scan

2. Common Issues & Solutions

2. 常见问题与解决方案

yaml
Issue: N+1 Query Problem

Symptom: 1001 queries for 1000 records

Example (Python):
  for user in users:
    posts = db.query(Post).filter(Post.user_id == user.id)
    # 1 + 1000 queries

Solution:
  users = db.query(User).options(joinedload(User.posts))
  # Single query with JOIN

---

Issue: Missing Index

Symptom: Seq Scan instead of Index Scan

Solution:
  CREATE INDEX idx_orders_user_id ON orders(user_id);
  Verify: EXPLAIN ANALYZE shows Index Scan now

---

Issue: Inefficient JOIN

Before:
  SELECT * FROM orders o, users u
  WHERE o.user_id = u.id AND u.email LIKE '%@example.com'
  # Bad: Table scan on users for every order

After:
  SELECT o.* FROM orders o
  JOIN users u ON o.user_id = u.id
  WHERE u.email = 'exact@example.com'
  # Good: Single email lookup

---

Issue: Large Table Scan

Symptom: SELECT * FROM large_table (1M rows)

Solutions:
  1. Add LIMIT clause
  2. Add WHERE condition
  3. Select specific columns
  4. Use pagination
  5. Archive old data

---

Issue: Slow Aggregation

Before (1 minute):
  SELECT user_id, COUNT(*), SUM(amount)
  FROM transactions
  GROUP BY user_id

After (50ms):
  SELECT user_id, transaction_count, total_amount
  FROM user_transaction_stats
  WHERE updated_at > NOW() - INTERVAL 1 DAY
  # Materialized view or aggregation table
yaml
问题:N+1查询问题

症状:获取1000条数据时执行1001次查询

示例(Python):
  for user in users:
    posts = db.query(Post).filter(Post.user_id == user.id)
    # 1 + 1000次查询

解决方案:
  users = db.query(User).options(joinedload(User.posts))
  # 单次JOIN查询

---

问题:缺少索引

症状:使用Seq Scan而非Index Scan

解决方案:
  CREATE INDEX idx_orders_user_id ON orders(user_id);
  验证:EXPLAIN ANALYZE显示当前使用Index Scan

---

问题:低效JOIN

优化前:
  SELECT * FROM orders o, users u
  WHERE o.user_id = u.id AND u.email LIKE '%@example.com'
  # 不佳:对每个订单都扫描用户表

优化后:
  SELECT o.* FROM orders o
  JOIN users u ON o.user_id = u.id
  WHERE u.email = 'exact@example.com'
  # 良好:单次邮箱查找

---

问题:大表扫描

症状:SELECT * FROM large_table(100万行数据)

解决方案:
  1. 添加LIMIT子句
  2. 添加WHERE条件
  3. 选择特定列
  4. 使用分页
  5. 归档旧数据

---

问题:聚合查询缓慢

优化前(耗时1分钟):
  SELECT user_id, COUNT(*), SUM(amount)
  FROM transactions
  GROUP BY user_id

优化后(耗时50毫秒):
  SELECT user_id, transaction_count, total_amount
  FROM user_transaction_stats
  WHERE updated_at > NOW() - INTERVAL 1 DAY
  # 物化视图或聚合表

3. Execution Plan Analysis

3. 执行计划分析

yaml
EXPLAIN Output Understanding:

Seq Scan (Full Table Scan):
  - Reads entire table
  - Slowest method
  - Fix: Add index

Index Scan:
  - Uses index
  - Fast
  - Ideal

Bitmap Index Scan:
  - Partial index scan
  - Converts to heap scan
  - Moderate speed

Nested Loop:
  - For each row in left, scan right
  - O(n*m) complexity
  - Slow for large tables

Hash Join:
  - Build hash table of smaller table
  - Probe with larger table
  - Faster than nested loop

Merge Join:
  - Sort both tables, merge
  - Fastest for large sorted data
  - Requires sort operation

---

Reading EXPLAIN ANALYZE:

Node: Seq Scan on orders (actual 8023.456 ms)
  - Seq Scan = Full table scan
  - actual time = real execution time
  - 8023 ms = TOO SLOW

Rows: 1000000 (estimated) 1000000 (actual)
  - Match = planner accurate
  - Mismatch = update statistics

Node: Index Scan (actual 15.234 ms)
  - Index Scan = Fast
  - 15 ms = ACCEPTABLE
yaml
EXPLAIN输出解读:

Seq Scan(全表扫描):
  - 读取整个表
  - 最慢的查询方式
  - 解决方法:添加索引

Index Scan(索引扫描):
  - 使用索引
  - 查询速度快
  - 理想方式

Bitmap Index Scan(位图索引扫描):
  - 部分索引扫描
  - 转换为堆扫描
  - 速度中等

Nested Loop(嵌套循环):
  - 对左表的每一行扫描右表
  - 时间复杂度O(n*m)
  - 大表查询速度慢

Hash Join(哈希连接):
  - 为小表构建哈希表
  - 用大表进行探测
  - 比嵌套循环更快

Merge Join(合并连接):
  - 对两个表排序后合并
  - 对大型有序数据查询最快
  - 需要排序操作

---

解读EXPLAIN ANALYZE:

节点:Seq Scan on orders(实际耗时8023.456毫秒)
  - Seq Scan = 全表扫描
  - actual time = 实际执行时间
  - 8023毫秒 = 过慢

行数:1000000(预估)1000000(实际)
  - 匹配 = 规划器预估准确
  - 不匹配 = 更新统计信息

节点:Index Scan(实际耗时15.234毫秒)
  - Index Scan = 速度快
  - 15毫秒 = 可接受

4. Debugging Process

4. 调试流程

yaml
Steps:

1. Identify Slow Query
  - Enable slow query logging
  - Run workload
  - Review slow log
  - Note execution time

2. Analyze with EXPLAIN
  - Run EXPLAIN ANALYZE
  - Look for Seq Scan
  - Check estimated vs actual rows
  - Review join methods

3. Find Root Cause
  - Missing index?
  - Inefficient join?
  - Missing WHERE clause?
  - Outdated statistics?

4. Try Fix
  - Add index
  - Rewrite query
  - Update statistics
  - Archive old data

5. Measure Improvement
  - Run query after fix
  - Compare execution time
  - Before: 5000ms
  - After: 100ms (50x faster!)

6. Monitor
  - Track slow queries
  - Set baseline
  - Alert on regression
  - Periodic review

---

Checklist:

[ ] Slow query identified and logged
[ ] EXPLAIN ANALYZE run
[ ] Estimated vs actual rows analyzed
[ ] Seq Scans identified
[ ] Indexes checked
[ ] Join strategy reviewed
[ ] Statistics updated
[ ] Query rewritten if needed
[ ] Index created if needed
[ ] Fix verified
[ ] Performance baseline established
[ ] Monitoring configured
[ ] Documented for team
yaml
步骤:

1. 识别慢查询
  - 启用慢查询日志
  - 运行工作负载
  - 查看慢查询日志
  - 记录执行时间

2. 使用EXPLAIN分析
  - 运行EXPLAIN ANALYZE
  - 查找Seq Scan
  - 分析预估行数与实际行数
  - 审查连接方式

3. 定位根本原因
  - 是否缺少索引?
  - JOIN是否低效?
  - 是否缺少WHERE子句?
  - 统计信息是否过时?

4. 尝试修复
  - 添加索引
  - 重写查询语句
  - 更新统计信息
  - 归档旧数据

5. 衡量优化效果
  - 运行优化后的查询
  - 对比执行时间
  - 优化前:5000毫秒
  - 优化后:100毫秒(提速50倍!)

6. 监控
  - 跟踪慢查询
  - 设置基准线
  - 性能退化时触发告警
  - 定期审查

---

检查清单:

[ ] 已识别并记录慢查询
[ ] 已运行EXPLAIN ANALYZE
[ ] 已分析预估行数与实际行数
[ ] 已识别Seq Scan
[ ] 已检查索引
[ ] 已审查连接策略
[ ] 已更新统计信息
[ ] 按需重写了查询语句
[ ] 按需创建了索引
[ ] 已验证修复效果
[ ] 已建立性能基准线
[ ] 已配置监控
[ ] 已为团队完成文档记录

Key Points

关键要点

  • Enable slow query logging in production
  • Use EXPLAIN ANALYZE to investigate
  • Look for Seq Scan = missing index
  • Add indexes to WHERE/JOIN columns
  • Monitor query statistics
  • Update table statistics regularly
  • Rewrite queries to avoid inefficiencies
  • Use pagination for large result sets
  • Measure before and after optimization
  • Track slow query trends
  • 在生产环境中启用慢查询日志
  • 使用EXPLAIN ANALYZE进行调查
  • 出现Seq Scan通常意味着缺少索引
  • 为WHERE/JOIN列添加索引
  • 监控查询统计信息
  • 定期更新表统计信息
  • 重写查询语句以避免低效操作
  • 对大型结果集使用分页
  • 优化前后都要进行性能测量
  • 跟踪慢查询趋势