database-sharding
Comprehensive database sharding patterns for horizontal scaling with hash, range, and directory-based strategies.
Quick Start (10 Minutes)
Step 1: Choose sharding strategy from templates:
bash
# Hash-based (even distribution)
cat templates/hash-router.ts
# Range-based (time-series data)
cat templates/range-router.ts
# Directory-based (multi-tenancy)
cat templates/directory-router.ts
Step 2: Select shard key criteria:
- ✅ High cardinality (millions of unique values)
- ✅ Even distribution (no single value > 5%)
- ✅ Immutable (never changes)
- ✅ Query alignment (in 80%+ of WHERE clauses)
Step 3: Implement router:
typescript
import { HashRouter } from './hash-router';
const router = new HashRouter([
{ id: 'shard_0', connection: { host: 'db0.example.com' } },
{ id: 'shard_1', connection: { host: 'db1.example.com' } },
{ id: 'shard_2', connection: { host: 'db2.example.com' } },
{ id: 'shard_3', connection: { host: 'db3.example.com' } },
]);
// Query single shard
const user = await router.query('user_123', 'SELECT * FROM users WHERE id = $1', ['user_123']);
Critical Rules
✓ Always Do
| Rule | Reason |
|---|
| Include shard key in queries | Avoid scanning all shards (100x slower) |
| Monitor shard distribution | Detect hotspots before they cause outages |
| Plan for rebalancing upfront | Cannot easily add shards later |
| Choose immutable shard key | Changing key = data migration nightmare |
| Test distribution with production data | Synthetic data hides real hotspots |
| Denormalize for data locality | Keep related data on same shard |
✗ Never Do
| Anti-Pattern | Why It's Bad |
|---|
| Sequential ID with range sharding | Latest shard gets all writes (hotspot) |
| Timestamp as shard key | Recent shard overwhelmed |
| Cross-shard transactions without 2PC | Data corruption, inconsistency |
| Simple modulo without consistent hashing | Cannot add shards without full re-shard |
| Nullable shard key | Special NULL handling creates hotspots |
| No shard routing layer | Hardcoded shards = cannot rebalance |
Top 7 Critical Errors
Error 1: Wrong Shard Key Choice (Hotspots)
Symptom: One shard receives 80%+ of traffic
Fix:
typescript
// ❌ Bad: Low cardinality (status field)
shard_key = order.status; // 90% are 'pending' → shard_0 overloaded
// ✅ Good: High cardinality (user_id)
shard_key = order.user_id; // Millions of users, even distribution
Error 2: Missing Shard Key in Queries
Symptom: Queries scan ALL shards (extremely slow)
Fix:
typescript
// ❌ Bad: No shard key
SELECT * FROM orders WHERE status = 'shipped'; // Scans all 100 shards!
// ✅ Good: Include shard key
SELECT * FROM orders WHERE user_id = ? AND status = 'shipped'; // Targets 1 shard
Error 3: Sequential IDs with Range Sharding
Symptom: Latest shard gets all writes
Fix:
typescript
// ❌ Bad: Range sharding with auto-increment
// Shard 0: 1-1M, Shard 1: 1M-2M, Shard 2: 2M+ → All new writes to Shard 2!
// ✅ Good: Hash-based sharding
const shardId = hash(id) % shardCount; // Even distribution
Error 4: No Rebalancing Strategy
Symptom: Stuck with initial shard count, cannot scale
Fix:
typescript
// ❌ Bad: Simple modulo
const shardId = hash(key) % shardCount; // Adding 5th shard breaks ALL keys
// ✅ Good: Consistent hashing
const ring = new ConsistentHashRing(shards);
const shardId = ring.getNode(key); // Only ~25% of keys move when adding shard
Error 5: Cross-Shard Transactions
Symptom: Data inconsistency, partial writes
Fix:
typescript
// ❌ Bad: Cross-shard transaction (will corrupt)
BEGIN;
UPDATE shard_1.accounts SET balance = balance - 100 WHERE id = 'A';
UPDATE shard_2.accounts SET balance = balance + 100 WHERE id = 'B';
COMMIT; // If shard_2 fails, shard_1 already committed!
// ✅ Good: Two-Phase Commit or Saga pattern
const txn = new TwoPhaseCommitTransaction();
txn.addOperation(shard_1, 'UPDATE accounts SET balance = balance - 100 WHERE id = ?', ['A']);
txn.addOperation(shard_2, 'UPDATE accounts SET balance = balance + 100 WHERE id = ?', ['B']);
await txn.execute(); // Atomic across shards
Error 6: Mutable Shard Key
Symptom: Records move shards, causing duplicates
Fix:
typescript
// ❌ Bad: Shard by country (user relocates)
shard_key = user.country; // User moves US → CA, now in different shard!
// ✅ Good: Shard by immutable user_id
shard_key = user.id; // Never changes
Error 7: No Monitoring
Symptom: Silent hotspots, sudden performance degradation
Fix:
typescript
// ✅ Required metrics
- Per-shard record counts (should be within 20%)
- Query distribution (no shard > 40% of queries)
- Storage per shard (alert at 80%)
- Latency p99 per shard
Load references/error-catalog.md
for all 10 errors with detailed fixes.
Sharding Strategies
| Strategy | Best For | Pros | Cons |
|---|
| Hash | User data, even load critical | No hotspots, predictable | Range queries scatter |
| Range | Time-series, logs, append-only | Range queries efficient, archival | Recent shard hotspot |
| Directory | Multi-tenancy, complex routing | Flexible, easy rebalancing | Lookup overhead, SPOF |
Load references/sharding-strategies.md
for detailed comparisons with production examples (Instagram, Discord, Salesforce).
Shard Key Selection Criteria
| Criterion | Importance | Check Method |
|---|
| High cardinality | Critical | COUNT(DISTINCT shard_key)
> shard_count × 100 |
| Even distribution | Critical | No value > 5% of total |
| Immutable | Critical | Value never changes |
| Query alignment | High | 80%+ queries include it |
| Data locality | Medium | Related records together |
Decision Tree:
- User-focused app →
- Multi-tenant SaaS →
- Time-series/logs → (range sharding)
- Product catalog →
Load references/shard-key-selection.md
for comprehensive decision trees and testing strategies.
Configuration Summary
Hash-Based Router
typescript
import { HashRouter } from './templates/hash-router';
const router = new HashRouter([
{ id: 'shard_0', connection: { /* PostgreSQL config */ } },
{ id: 'shard_1', connection: { /* PostgreSQL config */ } },
]);
// Automatically routes to correct shard
const user = await router.query('user_123', 'SELECT * FROM users WHERE id = $1', ['user_123']);
Range-Based Router
typescript
import { RangeRouter } from './templates/range-router';
const router = new RangeRouter(shardConfigs, [
{ start: Date.parse('2024-01-01'), end: Date.parse('2024-04-01'), shardId: 'shard_q1' },
{ start: Date.parse('2024-04-01'), end: Date.parse('2024-07-01'), shardId: 'shard_q2' },
{ start: Date.parse('2024-07-01'), end: Infinity, shardId: 'shard_q3' },
]);
// Range queries target specific shards
const janEvents = await router.queryRange(
Date.parse('2024-01-01'),
Date.parse('2024-02-01'),
'SELECT * FROM events WHERE created_at BETWEEN $1 AND $2'
);
Directory-Based Router
typescript
import { DirectoryRouter } from './templates/directory-router';
const router = new DirectoryRouter(directoryDBConfig, shardConfigs);
// Assign tenant to specific shard
await router.assignShard('tenant_acme', 'shard_enterprise');
// Route automatically
const users = await router.query('tenant_acme', 'SELECT * FROM users');
When to Load References
Choosing Strategy
Load references/sharding-strategies.md
when:
- Deciding between hash, range, directory
- Need production examples (Instagram, Discord)
- Planning hybrid approaches
Selecting Shard Key
Load references/shard-key-selection.md
when:
- Choosing shard key for new project
- Evaluating existing shard key
- Testing distribution with production data
Implementation
Load references/implementation-patterns.md
when:
- Building shard router from scratch
- Implementing consistent hashing
- Need transaction handling (2PC, Saga)
- Setting up monitoring/metrics
Cross-Shard Operations
Load references/cross-shard-queries.md
when:
- Need to aggregate across shards (COUNT, SUM, AVG)
- Implementing cross-shard joins
- Building pagination across shards
- Optimizing scatter-gather patterns
Rebalancing
Load references/rebalancing-guide.md
when:
- Adding new shards
- Migrating data between shards
- Planning zero-downtime migrations
- Balancing uneven load
Error Prevention
Load references/error-catalog.md
when:
- Troubleshooting performance issues
- Reviewing shard architecture
- All 10 documented errors with fixes
Complete Setup Checklist
Before Sharding:
Router Implementation:
Shard Configuration:
Application Changes:
Production Example
Before (Single database overwhelmed):
typescript
// Single PostgreSQL instance
const db = new Pool({ host: 'db.example.com' });
// All 10M users on one server
const users = await db.query('SELECT * FROM users WHERE status = $1', ['active']);
// Query time: 5000ms (slow!)
// DB CPU: 95%
// Disk: 500GB, growing
After (Sharded across 8 servers):
typescript
// Hash-based sharding with 8 shards
const router = new HashRouter([
{ id: 'shard_0', connection: { host: 'db0.example.com' } },
{ id: 'shard_1', connection: { host: 'db1.example.com' } },
// ... 6 more shards
]);
// Query single user (targets 1 shard)
const user = await router.query('user_123', 'SELECT * FROM users WHERE id = $1', ['user_123']);
// Query time: 10ms (500x faster!)
// Query all shards (scatter-gather)
const allActive = await router.queryAll('SELECT * FROM users WHERE status = $1', ['active']);
// Query time: 800ms (parallelized across 8 shards, 6x faster than single)
// Result: Each shard handles ~1.25M users
// DB CPU per shard: 20%
// Disk per shard: 65GB
// Can scale to 16 shards easily (consistent hashing)
Known Issues Prevention
All 10 documented errors prevented:
- ✅ Wrong shard key (hotspots) → Test distribution first
- ✅ Missing shard key in queries → Code review, linting
- ✅ Cross-shard transactions → Use 2PC or Saga pattern
- ✅ Sequential ID hotspots → Use hash-based sharding
- ✅ No rebalancing strategy → Consistent hashing from day 1
- ✅ Timestamp sharding hotspots → Hybrid hash+range approach
- ✅ Mutable shard key → Choose immutable keys (user_id)
- ✅ No routing layer → Abstract with router from start
- ✅ No monitoring → Track per-shard metrics
- ✅ Weak hash function → Use MD5, MurmurHash3, xxHash
See:
references/error-catalog.md
for detailed fixes
Resources
Templates:
- - Hash-based sharding
templates/range-router.ts
- Range-based sharding
templates/directory-router.ts
- Directory-based sharding
templates/cross-shard-aggregation.ts
- Aggregation patterns
References:
references/sharding-strategies.md
- Strategy comparison
references/shard-key-selection.md
- Key selection guide
references/implementation-patterns.md
- Router implementations
references/cross-shard-queries.md
- Query patterns
references/rebalancing-guide.md
- Migration strategies
references/error-catalog.md
- All 10 errors documented
Production Examples:
- Instagram: Range sharding for media
- Discord: Hash sharding for messages
- Salesforce: Directory sharding for orgs
Production-tested | 10 errors prevented | MIT License