database-optimization

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Database Optimization

数据库优化

Optimize database performance, schema design, indexing, and query performance across different database systems including SQL (PostgreSQL, MySQL, SQL Server) and NoSQL (MongoDB, Cassandra, Redis) databases. This skill provides comprehensive database optimization strategies and tools.

针对包括SQL（PostgreSQL、MySQL、SQL Server）和NoSQL（MongoDB、Cassandra、Redis）在内的不同数据库系统，优化数据库性能、Schema设计、索引和查询性能。本技能提供全面的数据库优化策略与工具。

When to use me

适用场景

Use this skill when:

Database performance is degrading and queries are slow
You need to optimize database schema and indexing strategies
You want to analyze and optimize query performance
You need to tune database configuration parameters
You're migrating databases and need optimization guidance
You want to implement database monitoring and alerting
You need to optimize database for specific workloads (OLTP, OLAP, hybrid)
You want to reduce database costs through optimization

在以下场景中使用本技能：

数据库性能下降，查询速度变慢
需要优化数据库Schema和索引策略
想要分析并优化查询性能
需要调整数据库配置参数
正在进行数据库迁移，需要优化指导
想要部署数据库监控与告警
需要针对特定工作负载（OLTP、OLAP、混合负载）优化数据库
希望通过优化降低数据库成本

What I do

功能说明

Query performance analysis: Analyze and optimize SQL and NoSQL queries
Index optimization: Analyze and optimize database indexes
Schema optimization: Optimize database schema design and normalization
Configuration tuning: Tune database configuration parameters for optimal performance
Connection pooling optimization: Optimize database connection management
Locking and concurrency optimization: Optimize locking strategies and concurrency control
Storage optimization: Optimize database storage and partitioning strategies
Replication and sharding optimization: Optimize replication and sharding strategies
Backup and recovery optimization: Optimize backup and recovery strategies
Database monitoring: Implement database performance monitoring and alerting

查询性能分析：分析并优化SQL与NoSQL查询
索引优化：分析并优化数据库索引
Schema优化：优化数据库Schema设计与规范化
配置调优：调整数据库配置参数以实现最佳性能
连接池优化：优化数据库连接管理
锁与并发优化：优化锁策略与并发控制
存储优化：优化数据库存储与分区策略
复制与分片优化：优化复制与分片策略
备份与恢复优化：优化备份与恢复策略
数据库监控：部署数据库性能监控与告警

Examples

示例

bash

undefined

bash

undefined

Analyze query performance

./scripts/analyze-database-optimization.sh --query-analysis --database postgresql

Optimize database indexes

./scripts/analyze-database-optimization.sh --index-optimization --database mysql

Tune database configuration

./scripts/analyze-database-optimization.sh --configuration-tuning --database mongodb

Generate optimization report

./scripts/analyze-database-optimization.sh --report --output optimization-report.json

Monitor database performance

./scripts/analyze-database-optimization.sh --performance-monitoring --interval 60

undefined

./scripts/analyze-database-optimization.sh --performance-monitoring --interval 60

undefined

Output format

输出格式

Database Optimization Analysis
─────────────────────────────────────
Analysis Date: 2025-01-15T10:30:00Z
Database System: PostgreSQL 14.8
Database Size: 245 GB
Analysis Duration: 15 minutes

PERFORMANCE METRICS:
────────────────────
Current Performance Score: 72/100
Query Response Time: 85th percentile: 450ms (Target: < 200ms)
Transactions per Second: 125 (Target: > 200)
Connection Pool Utilization: 92% (Target: < 80%)
Cache Hit Ratio: 78% (Target: > 90%)

QUERY PERFORMANCE ANALYSIS:
───────────────────────────
Slow Queries Identified: 42
Total Query Execution Time: 85% spent on 5 queries

Top 5 Slowest Queries:
1. Query: SELECT * FROM orders WHERE customer_id = ? AND status = ? ORDER BY created_at DESC LIMIT 100
   Average Execution Time: 1,250ms
   Execution Count: 12,850/day
   Issue: Missing composite index on (customer_id, status, created_at)
   Optimization: Add index: CREATE INDEX idx_orders_customer_status_created ON orders(customer_id, status, created_at DESC)

2. Query: SELECT p.*, c.name FROM products p JOIN categories c ON p.category_id = c.id WHERE p.price > ? AND p.stock > 0
   Average Execution Time: 890ms
   Execution Count: 8,420/day
   Issue: Sequential scan on products table (245,000 rows)
   Optimization: Add index: CREATE INDEX idx_products_price_stock ON products(price) WHERE stock > 0

3. Query: UPDATE inventory SET quantity = quantity - ? WHERE product_id = ? AND warehouse_id = ?
   Average Execution Time: 650ms
   Execution Count: 15,230/day
   Issue: Row-level locking contention
   Optimization: Implement optimistic locking or batch updates

4. Query: SELECT user_id, COUNT(*) as order_count FROM orders WHERE created_at > NOW() - INTERVAL '30 days' GROUP BY user_id HAVING COUNT(*) > 5
   Average Execution Time: 1,850ms
   Execution Count: 1,250/day
   Issue: Full table scan with aggregation
   Optimization: Create summary table or materialized view

5. Query: DELETE FROM sessions WHERE expires_at < NOW()
   Average Execution Time: 2,150ms
   Execution Count: 850/day
   Issue: Table bloat and vacuum overhead
   Optimization: Implement batch deletion with index on expires_at

INDEX OPTIMIZATION ANALYSIS:
─────────────────────────────
Current Indexes: 48
Duplicate Indexes: 7
Unused Indexes: 12
Missing Indexes: 9

Index Issues:
• Duplicate: idx_orders_customer (customer_id) and idx_orders_customer_status (customer_id, status)
• Unused: idx_products_supplier (supplier_id) - 0 uses in 30 days
• Missing: idx_orders_created_status (created_at, status) - would benefit 3 frequent queries

Index Recommendations:
1. Drop 7 duplicate indexes: Free 2.8GB storage
2. Drop 12 unused indexes: Free 4.2GB storage
3. Add 9 missing indexes: Improve query performance 35-85%

SCHEMA OPTIMIZATION:
────────────────────
Normalization Issues:
• products table has redundant category_name field (denormalized)
• orders table missing foreign key constraint on customer_id
• users table has JSONB field with frequently queried data (should be separate columns)

Schema Recommendations:
1. Remove redundant category_name from products table
2. Add foreign key constraint: ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id)
3. Extract frequently queried JSONB fields to separate columns
4. Consider partitioning orders table by created_at month

CONFIGURATION TUNING:
──────────────────────
Current Configuration Issues:
• shared_buffers: 128MB (Recommended: 4-8GB for 32GB RAM)
• work_mem: 4MB (Recommended: 64MB for complex queries)
• maintenance_work_mem: 64MB (Recommended: 1-2GB)
• effective_cache_size: 4GB (Recommended: 24GB)
• random_page_cost: 4.0 (Recommended: 1.1 for SSDs)

Configuration Recommendations:
shared_buffers = 8GB
work_mem = 64MB
maintenance_work_mem = 2GB
effective_cache_size = 24GB
random_page_cost = 1.1

CONNECTION POOLING ANALYSIS:
────────────────────────────
Current: 200 connections, 184 active (92% utilization)
Issues: Connection pool exhaustion during peak hours
Recommendations:
• Increase connection pool to 300
• Implement connection pool with pgbouncer
• Set idle connection timeout to 300 seconds

LOCKING AND CONCURRENCY:
─────────────────────────
Lock Wait Events: 12,850/day
Deadlocks: 8/day
Issues: High row-level locking on inventory table
Recommendations:
• Implement optimistic locking for inventory updates
• Use SKIP LOCKED for batch processing
• Reduce transaction isolation level where appropriate

STORAGE OPTIMIZATION:
──────────────────────
Table Sizes:
• orders: 85GB (35% of database)
• products: 42GB (17% of database)
• users: 28GB (11% of database)

Storage Issues:
• orders table has high bloat (32% dead tuples)
• No partitioning on time-series data
• Uncompressed JSONB columns

Storage Recommendations:
1. Vacuum aggressive on orders table
2. Implement partitioning on orders by created_at (monthly)
3. Enable compression for historical data
4. Archive old orders to cold storage

REPLICATION AND SHARDING:
─────────────────────────
Current: Single primary with 2 read replicas
Issues: Replication lag up to 45 seconds during peak
Recommendations:
• Add 1 more read replica
• Implement connection routing (primary for writes, replicas for reads)
• Consider sharding by customer_id for orders table

BACKUP AND RECOVERY:
─────────────────────
Current: Daily full backup, 7-day retention
Issues: Backup takes 4 hours, affects performance
Recommendations:
• Implement incremental backups with WAL archiving
• Increase retention to 30 days
• Test recovery procedure monthly

PERFORMANCE MONITORING:
────────────────────────
Current Monitoring: Basic (CPU, memory, disk)
Missing: Query performance, index usage, lock monitoring
Recommendations:
• Implement pg_stat_statements for query monitoring
• Set up alerts for slow queries (> 500ms)
• Monitor index usage and bloat weekly

COST OPTIMIZATION:
──────────────────
Current Monthly Cost: $1,850 (AWS RDS)
Optimization Opportunities:
• Right-size instance: Save $450/month
• Reserved instance: Save $650/month (3-year)
• Storage optimization: Save $125/month
• Archive old data: Save $85/month

Total Potential Savings: $1,310/month (71%)

IMPLEMENTATION ROADMAP:
────────────────────────
Phase 1: Immediate (1-2 days):
• Add 5 missing indexes for slowest queries
• Drop 7 duplicate indexes
• Tune critical configuration parameters

Phase 2: Short-term (1-2 weeks):
• Implement connection pooling
• Add query performance monitoring
• Optimize backup strategy

Phase 3: Medium-term (3-4 weeks):
• Implement table partitioning
• Add read replica
• Optimize schema (remove redundancy, add constraints)

Phase 4: Long-term (2-3 months):
• Implement sharding strategy
• Archive historical data
• Comprehensive performance testing

EXPECTED RESULTS:
─────────────────
• Query performance improvement: 45-85%
• Storage reduction: 15-25%
• Cost reduction: 50-70%
• Availability improvement: 99.9% → 99.95%
• Maintenance overhead reduction: 40-60%

数据库优化分析
─────────────────────────────────────
分析日期: 2025-01-15T10:30:00Z
数据库系统: PostgreSQL 14.8
数据库大小: 245 GB
分析时长: 15分钟

性能指标:
────────────────────
当前性能得分: 72/100
查询响应时间: 第85百分位: 450ms (目标: < 200ms)
每秒事务数: 125 (目标: > 200)
连接池利用率: 92% (目标: < 80%)
缓存命中率: 78% (目标: > 90%)

查询性能分析:
───────────────────────────
识别到慢查询: 42条
查询总执行时间: 85%的时间消耗在5条查询上

Top 5最慢查询:
1. 查询: SELECT * FROM orders WHERE customer_id = ? AND status = ? ORDER BY created_at DESC LIMIT 100
   平均执行时间: 1,250ms
   每日执行次数: 12,850次
   问题: 缺少(customer_id, status, created_at)组合索引
   优化方案: 添加索引: CREATE INDEX idx_orders_customer_status_created ON orders(customer_id, status, created_at DESC)

2. 查询: SELECT p.*, c.name FROM products p JOIN categories c ON p.category_id = c.id WHERE p.price > ? AND p.stock > 0
   平均执行时间: 890ms
   每日执行次数: 8,420次
   问题: 对products表执行全表扫描（245,000行）
   优化方案: 添加索引: CREATE INDEX idx_products_price_stock ON products(price) WHERE stock > 0

3. 查询: UPDATE inventory SET quantity = quantity - ? WHERE product_id = ? AND warehouse_id = ?
   平均执行时间: 650ms
   每日执行次数: 15,230次
   问题: 行级锁竞争
   优化方案: 实现乐观锁或批量更新

4. 查询: SELECT user_id, COUNT(*) as order_count FROM orders WHERE created_at > NOW() - INTERVAL '30 days' GROUP BY user_id HAVING COUNT(*) > 5
   平均执行时间: 1,850ms
   每日执行次数: 1,250次
   问题: 带聚合的全表扫描
   优化方案: 创建汇总表或物化视图

5. 查询: DELETE FROM sessions WHERE expires_at < NOW()
   平均执行时间: 2,150ms
   每日执行次数: 850次
   问题: 表膨胀与vacuum开销
   优化方案: 基于expires_at索引实现批量删除

索引优化分析:
─────────────────────────────
当前索引数量: 48个
重复索引: 7个
未使用索引: 12个
缺失索引: 9个

索引问题:
• 重复: idx_orders_customer (customer_id) 和 idx_orders_customer_status (customer_id, status)
• 未使用: idx_products_supplier (supplier_id) - 30天内使用次数为0
• 缺失: idx_orders_created_status (created_at, status) - 可使3条频繁查询受益

索引建议:
1. 删除7个重复索引: 释放2.8GB存储空间
2. 删除12个未使用索引: 释放4.2GB存储空间
3. 添加9个缺失索引: 提升查询性能35-85%

Schema优化:
────────────────────
规范化问题:
• products表存在冗余的category_name字段（反规范化）
• orders表缺少customer_id的外键约束
• users表包含频繁查询的JSONB字段（应拆分为独立列）

Schema建议:
1. 移除products表中冗余的category_name字段
2. 添加外键约束: ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id)
3. 将频繁查询的JSONB字段提取为独立列
4. 考虑按created_at月份对orders表进行分区

配置调优:
──────────────────────
当前配置问题:
• shared_buffers: 128MB（建议: 32GB内存环境下设置4-8GB）
• work_mem: 4MB（建议: 复杂查询场景设置64MB）
• maintenance_work_mem: 64MB（建议: 1-2GB）
• effective_cache_size: 4GB（建议: 24GB）
• random_page_cost: 4.0（建议: SSD环境设置1.1）

配置建议:
shared_buffers = 8GB
work_mem = 64MB
maintenance_work_mem = 2GB
effective_cache_size = 24GB
random_page_cost = 1.1

连接池分析:
────────────────────────────
当前状态: 200个连接，184个活跃（利用率92%）
问题: 峰值时段连接池耗尽
建议:
• 将连接池容量提升至300
• 使用pgbouncer实现连接池
• 设置空闲连接超时为300秒

锁与并发:
─────────────────────────
锁等待事件: 每日12,850次
死锁: 每日8次
问题: inventory表行级锁竞争严重
建议:
• 对inventory更新实现乐观锁
• 批量处理时使用SKIP LOCKED
• 在合适场景降低事务隔离级别

存储优化:
──────────────────────
表大小:
• orders: 85GB（占数据库35%）
• products: 42GB（占数据库17%）
• users: 28GB（占数据库11%）

存储问题:
• orders表膨胀严重（32%死元组）
• 时序数据未分区
• JSONB列未压缩

存储建议:
1. 对orders表执行激进的vacuum操作
2. 按created_at（月度）对orders表实现分区
3. 对历史数据启用压缩
4. 将旧订单归档至冷存储

复制与分片:
─────────────────────────
当前架构: 单主节点+2个只读副本
问题: 峰值时段复制延迟可达45秒
建议:
• 新增1个只读副本
• 实现连接路由（主节点处理写请求，副本处理读请求）
• 考虑按customer_id对orders表进行分片

备份与恢复:
─────────────────────
当前策略: 每日全量备份，保留7天
问题: 备份耗时4小时，影响性能
建议:
• 实现增量备份与WAL归档
• 将保留时长提升至30天
• 每月测试恢复流程

性能监控:
────────────────────────
当前监控: 基础监控（CPU、内存、磁盘）
缺失项: 查询性能、索引使用、锁监控
建议:
• 部署pg_stat_statements进行查询监控
• 设置慢查询告警（> 500ms）
• 每周监控索引使用情况与表膨胀

成本优化:
──────────────────
当前月度成本: $1,850（AWS RDS）
优化空间:
• 实例规格调整: 每月节省$450
• 预留实例: 3年期合约每月节省$650
• 存储优化: 每月节省$125
• 旧数据归档: 每月节省$85

潜在总节省: 每月$1,310（71%）

实施路线图:
────────────────────────
阶段1: 立即实施（1-2天）:
• 为最慢的5条查询添加缺失索引
• 删除7个重复索引
• 调优关键配置参数

阶段2: 短期实施（1-2周）:
• 部署连接池
• 添加查询性能监控
• 优化备份策略

阶段3: 中期实施（3-4周）:
• 实现表分区
• 新增只读副本
• 优化Schema（移除冗余、添加约束）

阶段4: 长期实施（2-3个月）:
• 实现分片策略
• 归档历史数据
• 全面性能测试

预期效果:
─────────────────
• 查询性能提升: 45-85%
• 存储占用减少: 15-25%
• 成本降低: 50-70%
• 可用性提升: 99.9% → 99.95%
• 维护开销减少: 40-60%

Notes

注意事项

Database optimization is an iterative process; measure before and after changes
Different database systems require different optimization approaches
Consider workload patterns (OLTP vs OLAP) when optimizing
Test optimization changes in staging before production
Monitor the impact of optimization changes on application performance
Regular maintenance (vacuum, analyze, reindex) is essential for sustained performance
Consider both read and write performance when optimizing
Balance normalization with performance requirements
Implement comprehensive monitoring to detect performance regressions

数据库优化是一个迭代过程，需在变更前后进行性能度量
不同的数据库系统需要不同的优化方法
优化时需考虑工作负载模式（OLTP vs OLAP）
优化变更需先在预发布环境测试，再部署到生产环境
监控优化变更对应用性能的影响
定期维护（vacuum、analyze、reindex）是维持性能的关键
优化时需同时考虑读性能与写性能
在规范化与性能需求之间取得平衡
部署全面的监控以检测性能退化