Database Optimization

Optimize database performance, schema design, indexing, and query performance across different database systems including SQL (PostgreSQL, MySQL, SQL Server) and NoSQL (MongoDB, Cassandra, Redis) databases. This skill provides comprehensive database optimization strategies and tools.

When to use me

Use this skill when:

Database performance is degrading and queries are slow
You need to optimize database schema and indexing strategies
You want to analyze and optimize query performance
You need to tune database configuration parameters
You're migrating databases and need optimization guidance
You want to implement database monitoring and alerting
You need to optimize database for specific workloads (OLTP, OLAP, hybrid)
You want to reduce database costs through optimization

What I do

Query performance analysis: Analyze and optimize SQL and NoSQL queries
Index optimization: Analyze and optimize database indexes
Schema optimization: Optimize database schema design and normalization
Configuration tuning: Tune database configuration parameters for optimal performance
Connection pooling optimization: Optimize database connection management
Locking and concurrency optimization: Optimize locking strategies and concurrency control
Storage optimization: Optimize database storage and partitioning strategies
Replication and sharding optimization: Optimize replication and sharding strategies
Backup and recovery optimization: Optimize backup and recovery strategies
Database monitoring: Implement database performance monitoring and alerting

Examples

bash

# Analyze query performance
./scripts/analyze-database-optimization.sh --query-analysis --database postgresql

# Optimize database indexes
./scripts/analyze-database-optimization.sh --index-optimization --database mysql

# Tune database configuration
./scripts/analyze-database-optimization.sh --configuration-tuning --database mongodb

# Generate optimization report
./scripts/analyze-database-optimization.sh --report --output optimization-report.json

# Monitor database performance
./scripts/analyze-database-optimization.sh --performance-monitoring --interval 60

Output format

Database Optimization Analysis
─────────────────────────────────────
Analysis Date: 2025-01-15T10:30:00Z
Database System: PostgreSQL 14.8
Database Size: 245 GB
Analysis Duration: 15 minutes

PERFORMANCE METRICS:
────────────────────
Current Performance Score: 72/100
Query Response Time: 85th percentile: 450ms (Target: < 200ms)
Transactions per Second: 125 (Target: > 200)
Connection Pool Utilization: 92% (Target: < 80%)
Cache Hit Ratio: 78% (Target: > 90%)

QUERY PERFORMANCE ANALYSIS:
───────────────────────────
Slow Queries Identified: 42
Total Query Execution Time: 85% spent on 5 queries

Top 5 Slowest Queries:
1. Query: SELECT * FROM orders WHERE customer_id = ? AND status = ? ORDER BY created_at DESC LIMIT 100
   Average Execution Time: 1,250ms
   Execution Count: 12,850/day
   Issue: Missing composite index on (customer_id, status, created_at)
   Optimization: Add index: CREATE INDEX idx_orders_customer_status_created ON orders(customer_id, status, created_at DESC)

2. Query: SELECT p.*, c.name FROM products p JOIN categories c ON p.category_id = c.id WHERE p.price > ? AND p.stock > 0
   Average Execution Time: 890ms
   Execution Count: 8,420/day
   Issue: Sequential scan on products table (245,000 rows)
   Optimization: Add index: CREATE INDEX idx_products_price_stock ON products(price) WHERE stock > 0

3. Query: UPDATE inventory SET quantity = quantity - ? WHERE product_id = ? AND warehouse_id = ?
   Average Execution Time: 650ms
   Execution Count: 15,230/day
   Issue: Row-level locking contention
   Optimization: Implement optimistic locking or batch updates

4. Query: SELECT user_id, COUNT(*) as order_count FROM orders WHERE created_at > NOW() - INTERVAL '30 days' GROUP BY user_id HAVING COUNT(*) > 5
   Average Execution Time: 1,850ms
   Execution Count: 1,250/day
   Issue: Full table scan with aggregation
   Optimization: Create summary table or materialized view

5. Query: DELETE FROM sessions WHERE expires_at < NOW()
   Average Execution Time: 2,150ms
   Execution Count: 850/day
   Issue: Table bloat and vacuum overhead
   Optimization: Implement batch deletion with index on expires_at

INDEX OPTIMIZATION ANALYSIS:
─────────────────────────────
Current Indexes: 48
Duplicate Indexes: 7
Unused Indexes: 12
Missing Indexes: 9

Index Issues:
• Duplicate: idx_orders_customer (customer_id) and idx_orders_customer_status (customer_id, status)
• Unused: idx_products_supplier (supplier_id) - 0 uses in 30 days
• Missing: idx_orders_created_status (created_at, status) - would benefit 3 frequent queries

Index Recommendations:
1. Drop 7 duplicate indexes: Free 2.8GB storage
2. Drop 12 unused indexes: Free 4.2GB storage
3. Add 9 missing indexes: Improve query performance 35-85%

SCHEMA OPTIMIZATION:
────────────────────
Normalization Issues:
• products table has redundant category_name field (denormalized)
• orders table missing foreign key constraint on customer_id
• users table has JSONB field with frequently queried data (should be separate columns)

Schema Recommendations:
1. Remove redundant category_name from products table
2. Add foreign key constraint: ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id)
3. Extract frequently queried JSONB fields to separate columns
4. Consider partitioning orders table by created_at month

CONFIGURATION TUNING:
──────────────────────
Current Configuration Issues:
• shared_buffers: 128MB (Recommended: 4-8GB for 32GB RAM)
• work_mem: 4MB (Recommended: 64MB for complex queries)
• maintenance_work_mem: 64MB (Recommended: 1-2GB)
• effective_cache_size: 4GB (Recommended: 24GB)
• random_page_cost: 4.0 (Recommended: 1.1 for SSDs)

Configuration Recommendations:
shared_buffers = 8GB
work_mem = 64MB
maintenance_work_mem = 2GB
effective_cache_size = 24GB
random_page_cost = 1.1

CONNECTION POOLING ANALYSIS:
────────────────────────────
Current: 200 connections, 184 active (92% utilization)
Issues: Connection pool exhaustion during peak hours
Recommendations:
• Increase connection pool to 300
• Implement connection pool with pgbouncer
• Set idle connection timeout to 300 seconds

LOCKING AND CONCURRENCY:
─────────────────────────
Lock Wait Events: 12,850/day
Deadlocks: 8/day
Issues: High row-level locking on inventory table
Recommendations:
• Implement optimistic locking for inventory updates
• Use SKIP LOCKED for batch processing
• Reduce transaction isolation level where appropriate

STORAGE OPTIMIZATION:
──────────────────────
Table Sizes:
• orders: 85GB (35% of database)
• products: 42GB (17% of database)
• users: 28GB (11% of database)

Storage Issues:
• orders table has high bloat (32% dead tuples)
• No partitioning on time-series data
• Uncompressed JSONB columns

Storage Recommendations:
1. Vacuum aggressive on orders table
2. Implement partitioning on orders by created_at (monthly)
3. Enable compression for historical data
4. Archive old orders to cold storage

REPLICATION AND SHARDING:
─────────────────────────
Current: Single primary with 2 read replicas
Issues: Replication lag up to 45 seconds during peak
Recommendations:
• Add 1 more read replica
• Implement connection routing (primary for writes, replicas for reads)
• Consider sharding by customer_id for orders table

BACKUP AND RECOVERY:
─────────────────────
Current: Daily full backup, 7-day retention
Issues: Backup takes 4 hours, affects performance
Recommendations:
• Implement incremental backups with WAL archiving
• Increase retention to 30 days
• Test recovery procedure monthly

PERFORMANCE MONITORING:
────────────────────────
Current Monitoring: Basic (CPU, memory, disk)
Missing: Query performance, index usage, lock monitoring
Recommendations:
• Implement pg_stat_statements for query monitoring
• Set up alerts for slow queries (> 500ms)
• Monitor index usage and bloat weekly

COST OPTIMIZATION:
──────────────────
Current Monthly Cost: $1,850 (AWS RDS)
Optimization Opportunities:
• Right-size instance: Save $450/month
• Reserved instance: Save $650/month (3-year)
• Storage optimization: Save $125/month
• Archive old data: Save $85/month

Total Potential Savings: $1,310/month (71%)

IMPLEMENTATION ROADMAP:
────────────────────────
Phase 1: Immediate (1-2 days):
• Add 5 missing indexes for slowest queries
• Drop 7 duplicate indexes
• Tune critical configuration parameters

Phase 2: Short-term (1-2 weeks):
• Implement connection pooling
• Add query performance monitoring
• Optimize backup strategy

Phase 3: Medium-term (3-4 weeks):
• Implement table partitioning
• Add read replica
• Optimize schema (remove redundancy, add constraints)

Phase 4: Long-term (2-3 months):
• Implement sharding strategy
• Archive historical data
• Comprehensive performance testing

EXPECTED RESULTS:
─────────────────
• Query performance improvement: 45-85%
• Storage reduction: 15-25%
• Cost reduction: 50-70%
• Availability improvement: 99.9% → 99.95%
• Maintenance overhead reduction: 40-60%

Notes

Database optimization is an iterative process; measure before and after changes
Different database systems require different optimization approaches
Consider workload patterns (OLTP vs OLAP) when optimizing
Test optimization changes in staging before production
Monitor the impact of optimization changes on application performance
Regular maintenance (vacuum, analyze, reindex) is essential for sustained performance
Consider both read and write performance when optimizing
Balance normalization with performance requirements
Implement comprehensive monitoring to detect performance regressions

database-optimization

NPX Install

Tags

SKILL.md Content