sql-query-optimizer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

SQL Query Optimizer

SQL查询优化工具

Optimize SQL queries for maximum performance.
优化SQL查询以实现最佳性能。

EXPLAIN Analysis

EXPLAIN执行计划分析

sql
-- Original slow query
EXPLAIN ANALYZE
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id
ORDER BY order_count DESC
LIMIT 10;

-- Output analysis:
/*
Sort  (cost=15234.32..15234.34 rows=10 width=120) (actual time=245.123..245.125 rows=10 loops=1)
  Sort Key: (count(o.id)) DESC
  ->  HashAggregate  (cost=15000.00..15100.00 rows=1000 width=120) (actual time=244.891..245.023 rows=1000 loops=1)
        Group Key: u.id
        ->  Hash Left Join  (cost=1234.56..14500.00 rows=50000 width=112) (actual time=12.345..230.456 rows=50000 loops=1)
              Hash Cond: (o.user_id = u.id)
              ->  Seq Scan on orders o  (cost=0.00..10000.00 rows=100000 width=8) (actual time=0.012..180.234 rows=100000 loops=1)
              ->  Hash  (cost=1000.00..1000.00 rows=5000 width=112) (actual time=10.234..10.234 rows=5000 loops=1)
                    Buckets: 8192  Batches: 1  Memory Usage: 456kB
                    ->  Seq Scan on users u  (cost=0.00..1000.00 rows=5000 width=112) (actual time=0.008..5.123 rows=5000 loops=1)
                          Filter: (created_at > '2024-01-01'::date)
                          Rows Removed by Filter: 1000
Planning Time: 0.234 ms
Execution Time: 245.234 ms
*/

-- Issues identified:
-- 1. Seq Scan on orders (no index on user_id)
-- 2. Seq Scan on users (no index on created_at)
-- 3. Full table scans expensive
sql
-- 原始慢查询
EXPLAIN ANALYZE
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id
ORDER BY order_count DESC
LIMIT 10;

-- 执行计划输出分析:
/*
Sort  (cost=15234.32..15234.34 rows=10 width=120) (actual time=245.123..245.125 rows=10 loops=1)
  Sort Key: (count(o.id)) DESC
  ->  HashAggregate  (cost=15000.00..15100.00 rows=1000 width=120) (actual time=244.891..245.023 rows=1000 loops=1)
        Group Key: u.id
        ->  Hash Left Join  (cost=1234.56..14500.00 rows=50000 width=112) (actual time=12.345..230.456 rows=50000 loops=1)
              Hash Cond: (o.user_id = u.id)
              ->  Seq Scan on orders o  (cost=0.00..10000.00 rows=100000 width=8) (actual time=0.012..180.234 rows=100000 loops=1)
              ->  Hash  (cost=1000.00..1000.00 rows=5000 width=112) (actual time=10.234..10.234 rows=5000 loops=1)
                    Buckets: 8192  Batches: 1  Memory Usage: 456kB
                    ->  Seq Scan on users u  (cost=0.00..1000.00 rows=5000 width=112) (actual time=0.008..5.123 rows=5000 loops=1)
                          Filter: (created_at > '2024-01-01'::date)
                          Rows Removed by Filter: 1000
Planning Time: 0.234 ms
Execution Time: 245.234 ms
*/

-- 发现的问题:
-- 1. 对orders表执行全表扫描(user_id字段无索引)
-- 2. 对users表执行全表扫描(created_at字段无索引)
-- 3. 全表扫描开销高昂

Index Recommendations

索引建议

sql
-- Problem: Sequential scans
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 123;
/*
Seq Scan on orders  (cost=0.00..10000.00 rows=50 width=100) (actual time=0.012..89.456 rows=50 loops=1)
  Filter: (user_id = 123)
  Rows Removed by Filter: 99950
*/

-- Solution: Add index
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- After index:
/*
Index Scan using idx_orders_user_id on orders  (cost=0.29..45.32 rows=50 width=100) (actual time=0.023..0.089 rows=50 loops=1)
  Index Cond: (user_id = 123)
*/

-- Performance: 89ms → 0.09ms (990x faster!)
sql
-- 问题:全表扫描
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 123;
/*
Seq Scan on orders  (cost=0.00..10000.00 rows=50 width=100) (actual time=0.012..89.456 rows=50 loops=1)
  Filter: (user_id = 123)
  Rows Removed by Filter: 99950
*/

-- 解决方案:添加索引
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- 添加索引后:
/*
Index Scan using idx_orders_user_id on orders  (cost=0.29..45.32 rows=50 width=100) (actual time=0.023..0.089 rows=50 loops=1)
  Index Cond: (user_id = 123)
*/

-- 性能提升:89ms → 0.09ms(速度提升990倍!)

Query Rewrites

查询重写

1. Avoid SELECT *

1. 避免使用SELECT *

sql
-- ❌ Bad: Fetches all columns
SELECT * FROM users WHERE id = 123;

-- ✅ Good: Fetch only needed columns
SELECT id, email, name FROM users WHERE id = 123;

-- Performance: 50% faster, less network transfer
sql
-- ❌ 不良写法:获取所有列
SELECT * FROM users WHERE id = 123;

-- ✅ 推荐写法:仅获取所需列
SELECT id, email, name FROM users WHERE id = 123;

-- 性能提升:速度快50%,减少网络传输量

2. Use EXISTS Instead of IN

2. 使用EXISTS替代IN

sql
-- ❌ Slow: Subquery executed fully
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 100);

-- ✅ Fast: Short-circuits on first match
SELECT * FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.user_id = u.id AND o.total > 100
);

-- Performance: 3x faster on large datasets
sql
-- ❌ 慢查询:子查询会完整执行
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 100);

-- ✅ 快查询:找到第一个匹配项后立即终止
SELECT * FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.user_id = u.id AND o.total > 100
);

-- 性能提升:在大数据集上速度快3倍

3. Avoid Functions on Indexed Columns

3. 避免对索引列使用函数

sql
-- ❌ Bad: Index not used
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';

-- ✅ Good: Index scan possible
SELECT * FROM users WHERE email = 'john@example.com';

-- Or create functional index:
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
sql
-- ❌ 不良写法:无法使用索引
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';

-- ✅ 推荐写法:可执行索引扫描
SELECT * FROM users WHERE email = 'john@example.com';

-- 或创建函数索引:
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

4. Use Covering Indexes

4. 使用覆盖索引

sql
-- Query needs: id, email, name
SELECT id, email, name FROM users WHERE email = 'john@example.com';

-- Create covering index (includes all needed columns)
CREATE INDEX idx_users_email_covering ON users(email) INCLUDE (id, name);

-- Result: Index-only scan (no table access needed)
sql
-- 查询所需字段:id, email, name
SELECT id, email, name FROM users WHERE email = 'john@example.com';

-- 创建覆盖索引(包含所有所需列)
CREATE INDEX idx_users_email_covering ON users(email) INCLUDE (id, name);

-- 结果:仅索引扫描(无需访问表数据)

5. Optimize JOIN Order

5. 优化JOIN顺序

sql
-- ❌ Bad: Large table first
SELECT * FROM orders o
JOIN users u ON u.id = o.user_id
WHERE u.email = 'john@example.com';

-- ✅ Good: Filter first, join second
SELECT * FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.email = 'john@example.com';

-- Or use CTE for clarity:
WITH filtered_users AS (
  SELECT id FROM users WHERE email = 'john@example.com'
)
SELECT o.* FROM orders o
JOIN filtered_users u ON u.id = o.user_id;
sql
-- ❌ 不良写法:先关联大表
SELECT * FROM orders o
JOIN users u ON u.id = o.user_id
WHERE u.email = 'john@example.com';

-- ✅ 推荐写法:先过滤再关联
SELECT * FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.email = 'john@example.com';

-- 或使用CTE提升可读性:
WITH filtered_users AS (
  SELECT id FROM users WHERE email = 'john@example.com'
)
SELECT o.* FROM orders o
JOIN filtered_users u ON u.id = o.user_id;

Composite Indexes

复合索引

sql
-- Query pattern: WHERE user_id = X AND status = 'active' ORDER BY created_at DESC
CREATE INDEX idx_orders_user_status_created
ON orders(user_id, status, created_at DESC);

-- Index column order matters!
-- Rule: Equality filters → Range filters → Sort columns

-- Example queries that use this index:
-- 1. SELECT * FROM orders WHERE user_id = 123;  ✅
-- 2. SELECT * FROM orders WHERE user_id = 123 AND status = 'active';  ✅
-- 3. SELECT * FROM orders WHERE user_id = 123 ORDER BY created_at DESC;  ✅
-- 4. SELECT * FROM orders WHERE status = 'active';  ❌ (doesn't start with user_id)
sql
-- 查询模式:WHERE user_id = X AND status = 'active' ORDER BY created_at DESC
CREATE INDEX idx_orders_user_status_created
ON orders(user_id, status, created_at DESC);

-- 索引列顺序至关重要!
-- 规则:等值过滤列 → 范围过滤列 → 排序列

-- 可使用该索引的示例查询:
-- 1. SELECT * FROM orders WHERE user_id = 123;  ✅
-- 2. SELECT * FROM orders WHERE user_id = 123 AND status = 'active';  ✅
-- 3. SELECT * FROM orders WHERE user_id = 123 ORDER BY created_at DESC;  ✅
-- 4. SELECT * FROM orders WHERE status = 'active';  ❌(未以user_id开头)

Query Performance Benchmarking

查询性能基准测试

typescript
// scripts/benchmark-queries.ts
import { PrismaClient } from "@prisma/client";
import { performance } from "perf_hooks";

const prisma = new PrismaClient();

async function benchmarkQuery(
  name: string,
  query: () => Promise<any>,
  iterations: number = 10
) {
  const times: number[] = [];

  for (let i = 0; i < iterations; i++) {
    const start = performance.now();
    await query();
    const end = performance.now();
    times.push(end - start);
  }

  const avg = times.reduce((a, b) => a + b, 0) / times.length;
  const min = Math.min(...times);
  const max = Math.max(...times);

  console.log(`\n${name}:`);
  console.log(`  Avg: ${avg.toFixed(2)}ms`);
  console.log(`  Min: ${min.toFixed(2)}ms`);
  console.log(`  Max: ${max.toFixed(2)}ms`);

  return { avg, min, max };
}

// Compare queries
async function compareQueries() {
  console.log("🔍 Benchmarking queries...\n");

  // Query 1: Original
  const result1 = await benchmarkQuery("Original Query", async () => {
    return prisma.$queryRaw`
      SELECT u.*, COUNT(o.id) as order_count
      FROM users u
      LEFT JOIN orders o ON o.user_id = u.id
      GROUP BY u.id
      LIMIT 10
    `;
  });

  // Query 2: Optimized
  const result2 = await benchmarkQuery("Optimized Query", async () => {
    return prisma.$queryRaw`
      SELECT u.id, u.email, u.name,
             (SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count
      FROM users u
      LIMIT 10
    `;
  });

  // Comparison
  const improvement = (
    ((result1.avg - result2.avg) / result1.avg) *
    100
  ).toFixed(1);
  console.log(`\n📊 Improvement: ${improvement}% faster`);
}

compareQueries();
typescript
// scripts/benchmark-queries.ts
import { PrismaClient } from "@prisma/client";
import { performance } from "perf_hooks";

const prisma = new PrismaClient();

async function benchmarkQuery(
  name: string,
  query: () => Promise<any>,
  iterations: number = 10
) {
  const times: number[] = [];

  for (let i = 0; i < iterations; i++) {
    const start = performance.now();
    await query();
    const end = performance.now();
    times.push(end - start);
  }

  const avg = times.reduce((a, b) => a + b, 0) / times.length;
  const min = Math.min(...times);
  const max = Math.max(...times);

  console.log(`\n${name}:`);
  console.log(`  Avg: ${avg.toFixed(2)}ms`);
  console.log(`  Min: ${min.toFixed(2)}ms`);
  console.log(`  Max: ${max.toFixed(2)}ms`);

  return { avg, min, max };
}

// 对比查询
async function compareQueries() {
  console.log("🔍 正在进行查询基准测试...\n");

  // 查询1:原始版本
  const result1 = await benchmarkQuery("原始查询", async () => {
    return prisma.$queryRaw`
      SELECT u.*, COUNT(o.id) as order_count
      FROM users u
      LEFT JOIN orders o ON o.user_id = u.id
      GROUP BY u.id
      LIMIT 10
    `;
  });

  // 查询2:优化版本
  const result2 = await benchmarkQuery("优化后查询", async () => {
    return prisma.$queryRaw`
      SELECT u.id, u.email, u.name,
             (SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count
      FROM users u
      LIMIT 10
    `;
  });

  // 性能对比
  const improvement = (
    ((result1.avg - result2.avg) / result1.avg) *
    100
  ).toFixed(1);
  console.log(`\n📊 性能提升:速度快${improvement}%`);
}

compareQueries();

Query Optimization Checklist

查询优化检查清单

typescript
interface QueryOptimization {
  query: string;
  issues: string[];
  recommendations: string[];
  estimatedImprovement: string;
}

const optimizations: QueryOptimization[] = [
  {
    query: "SELECT * FROM orders WHERE user_id = $1",
    issues: [
      "Missing index on user_id",
      "SELECT * fetches unnecessary columns",
    ],
    recommendations: [
      "CREATE INDEX idx_orders_user_id ON orders(user_id)",
      "SELECT id, total, status instead of *",
    ],
    estimatedImprovement: "90% faster",
  },
  {
    query: "SELECT COUNT(*) FROM orders",
    issues: ["Full table scan", "No WHERE clause filtering"],
    recommendations: [
      "Add WHERE clause to filter rows",
      "Consider approximate count for large tables",
    ],
    estimatedImprovement: "70% faster",
  },
];
typescript
interface QueryOptimization {
  query: string;
  issues: string[];
  recommendations: string[];
  estimatedImprovement: string;
}

const optimizations: QueryOptimization[] = [
  {
    query: "SELECT * FROM orders WHERE user_id = $1",
    issues: [
      "user_id字段缺少索引",
      "SELECT *获取了不必要的列",
    ],
    recommendations: [
      "CREATE INDEX idx_orders_user_id ON orders(user_id)",
      "改用SELECT id, total, status替代SELECT *",
    ],
    estimatedImprovement: "速度提升90%",
  },
  {
    query: "SELECT COUNT(*) FROM orders",
    issues: ["全表扫描", "未使用WHERE子句过滤"],
    recommendations: [
      "添加WHERE子句过滤行数据",
      "针对大表考虑使用近似计数",
    ],
    estimatedImprovement: "速度提升70%",
  },
];

Automated Slow Query Detection

自动化慢查询检测

typescript
// scripts/detect-slow-queries.ts
async function detectSlowQueries() {
  // Enable slow query logging in PostgreSQL
  await prisma.$executeRaw`
    ALTER DATABASE mydb SET log_min_duration_statement = 100;
  `;

  // Query pg_stat_statements for slow queries
  const slowQueries = await prisma.$queryRaw<any[]>`
    SELECT
      query,
      calls,
      total_exec_time / 1000 as total_time_seconds,
      mean_exec_time / 1000 as mean_time_ms,
      max_exec_time / 1000 as max_time_ms
    FROM pg_stat_statements
    WHERE mean_exec_time > 100  -- > 100ms
    ORDER BY mean_exec_time DESC
    LIMIT 20
  `;

  console.log("🐌 Slow Queries Detected:\n");
  slowQueries.forEach((q, i) => {
    console.log(`${i + 1}. ${q.query.substring(0, 80)}...`);
    console.log(`   Calls: ${q.calls}`);
    console.log(`   Avg: ${q.mean_time_ms.toFixed(2)}ms`);
    console.log(`   Max: ${q.max_time_ms.toFixed(2)}ms\n`);
  });
}
typescript
// scripts/detect-slow-queries.ts
async function detectSlowQueries() {
  // 在PostgreSQL中启用慢查询日志
  await prisma.$executeRaw`
    ALTER DATABASE mydb SET log_min_duration_statement = 100;
  `;

  // 查询pg_stat_statements视图获取慢查询
  const slowQueries = await prisma.$queryRaw<any[]>`
    SELECT
      query,
      calls,
      total_exec_time / 1000 as total_time_seconds,
      mean_exec_time / 1000 as mean_time_ms,
      max_exec_time / 1000 as max_time_ms
    FROM pg_stat_statements
    WHERE mean_exec_time > 100  -- 执行时间超过100ms
    ORDER BY mean_exec_time DESC
    LIMIT 20
  `;

  console.log("🐌 检测到慢查询:\n");
  slowQueries.forEach((q, i) => {
    console.log(`${i + 1}. ${q.query.substring(0, 80)}...`);
    console.log(`   调用次数: ${q.calls}`);
    console.log(`   平均耗时: ${q.mean_time_ms.toFixed(2)}ms`);
    console.log(`   最大耗时: ${q.max_time_ms.toFixed(2)}ms\n`);
  });
}

Best Practices

最佳实践

  1. Always use EXPLAIN: Understand query plans
  2. Index foreign keys: Essential for joins
  3. Avoid SELECT *: Fetch only needed columns
  4. Use composite indexes: Multi-column queries
  5. Consider covering indexes: Eliminate table access
  6. Batch operations: Reduce round trips
  7. Monitor regularly: Track slow queries
  1. 始终使用EXPLAIN:理解查询执行计划
  2. 为外键创建索引:关联查询的必备项
  3. 避免使用SELECT *:仅获取所需列
  4. 使用复合索引:适配多列查询场景
  5. 考虑使用覆盖索引:消除表数据访问
  6. 批量操作:减少网络往返次数
  7. 定期监控:跟踪慢查询情况

Output Checklist

输出检查清单

  • EXPLAIN plan analyzed
  • Missing indexes identified
  • Query rewrite suggestions
  • Performance benchmarks
  • Before/after metrics
  • Index creation scripts
  • Slow query monitoring
  • Optimization priority list
  • 已分析EXPLAIN执行计划
  • 已识别缺失的索引
  • 已给出查询重写建议
  • 已完成性能基准测试
  • 已提供优化前后的指标对比
  • 已给出索引创建脚本
  • 已配置慢查询监控
  • 已生成优化优先级列表