ClickHouse Best Practices
Comprehensive guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact.
IMPORTANT: How to Apply This Skill
Before answering ClickHouse questions, follow this priority order:
- Check for applicable rules in the directory
- If rules exist: Apply them and cite them in your response using "Per ..."
- If no rule exists: Use the LLM's ClickHouse knowledge or search documentation
- If uncertain: Use web search for current best practices
- Always cite your source: rule name, "general ClickHouse guidance", or URL
Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance.
For Formal Reviews
When performing a formal review of schemas, queries, or data ingestion:
Review Procedures
For Schema Reviews (CREATE TABLE, ALTER TABLE)
Read these rule files in order:
rules/schema-pk-plan-before-creation.md
- ORDER BY is immutable
rules/schema-pk-cardinality-order.md
- Column ordering in keys
rules/schema-pk-prioritize-filters.md
- Filter column inclusion
rules/schema-types-native-types.md
- Proper type selection
rules/schema-types-minimize-bitwidth.md
- Numeric type sizing
rules/schema-types-lowcardinality.md
- LowCardinality usage
rules/schema-types-avoid-nullable.md
- Nullable vs DEFAULT
rules/schema-partition-low-cardinality.md
- Partition count limits
rules/schema-partition-lifecycle.md
- Partitioning purpose
Check for:
For Query Reviews (SELECT, JOIN, aggregations)
Read these rule files:
rules/query-join-choose-algorithm.md
- Algorithm selection
rules/query-join-filter-before.md
- Pre-join filtering
rules/query-join-use-any.md
- ANY vs regular JOIN
rules/query-index-skipping-indices.md
- Secondary index usage
rules/schema-pk-filter-on-orderby.md
- Filter alignment with ORDER BY
Check for:
For Insert Strategy Reviews (data ingestion, updates, deletes)
Read these rule files:
rules/insert-batch-size.md
- Batch sizing requirements
rules/insert-mutation-avoid-update.md
- UPDATE alternatives
rules/insert-mutation-avoid-delete.md
- DELETE alternatives
rules/insert-async-small-batches.md
- Async insert usage
rules/insert-optimize-avoid-final.md
- OPTIMIZE TABLE risks
Check for:
Output Format
Structure your response as follows:
## Rules Checked
- `rule-name-1` - Compliant / Violation found
- `rule-name-2` - Compliant / Violation found
...
## Findings
### Violations
- **`rule-name`**: Description of the issue
- Current: [what the code does]
- Required: [what it should do]
- Fix: [specific correction]
### Compliant
- `rule-name`: Brief note on why it's correct
## Recommendations
[Prioritized list of changes, citing rules]
Rule Categories by Priority
| Priority | Category | Impact | Prefix | Rule Count |
|---|
| 1 | Primary Key Selection | CRITICAL | | 4 |
| 2 | Data Type Selection | CRITICAL | | 5 |
| 3 | JOIN Optimization | CRITICAL | | 5 |
| 4 | Insert Batching | CRITICAL | | 1 |
| 5 | Mutation Avoidance | CRITICAL | | 2 |
| 6 | Partitioning Strategy | HIGH | | 4 |
| 7 | Skipping Indices | HIGH | | 1 |
| 8 | Materialized Views | HIGH | | 2 |
| 9 | Async Inserts | HIGH | | 2 |
| 10 | OPTIMIZE Avoidance | HIGH | | 1 |
| 11 | JSON Usage | MEDIUM | | 1 |
Quick Reference
Schema Design - Primary Key (CRITICAL)
schema-pk-plan-before-creation
- Plan ORDER BY before table creation (immutable)
schema-pk-cardinality-order
- Order columns low-to-high cardinality
schema-pk-prioritize-filters
- Include frequently filtered columns
schema-pk-filter-on-orderby
- Query filters must use ORDER BY prefix
Schema Design - Data Types (CRITICAL)
schema-types-native-types
- Use native types, not String for everything
schema-types-minimize-bitwidth
- Use smallest numeric type that fits
schema-types-lowcardinality
- LowCardinality for <10K unique strings
- - Enum for finite value sets with validation
schema-types-avoid-nullable
- Avoid Nullable; use DEFAULT instead
Schema Design - Partitioning (HIGH)
schema-partition-low-cardinality
- Keep partition count 100-1,000
schema-partition-lifecycle
- Use partitioning for data lifecycle, not queries
schema-partition-query-tradeoffs
- Understand partition pruning trade-offs
schema-partition-start-without
- Consider starting without partitioning
Schema Design - JSON (MEDIUM)
- - JSON for dynamic schemas; typed columns for known
Query Optimization - JOINs (CRITICAL)
query-join-choose-algorithm
- Select algorithm based on table sizes
- - ANY JOIN when only one match needed
- - Filter tables before joining
query-join-consider-alternatives
- Dictionaries/denormalization vs JOIN
- - join_use_nulls=0 for default values
Query Optimization - Indices (HIGH)
query-index-skipping-indices
- Skipping indices for non-ORDER BY filters
Query Optimization - Materialized Views (HIGH)
- - Incremental MVs for real-time aggregations
- - Refreshable MVs for complex joins
Insert Strategy - Batching (CRITICAL)
- - Batch 10K-100K rows per INSERT
Insert Strategy - Async (HIGH)
insert-async-small-batches
- Async inserts for high-frequency small batches
- - Native format for best performance
Insert Strategy - Mutations (CRITICAL)
insert-mutation-avoid-update
- ReplacingMergeTree instead of ALTER UPDATE
insert-mutation-avoid-delete
- Lightweight DELETE or DROP PARTITION
Insert Strategy - Optimization (HIGH)
insert-optimize-avoid-final
- Let background merges work
When to Apply
This skill activates when you encounter:
- statements
- modifications
- or discussions
- Data type selection questions
- Slow query troubleshooting
- JOIN optimization requests
- Data ingestion pipeline design
- Update/delete strategy questions
- ReplacingMergeTree or other specialized engine usage
- Partitioning strategy decisions
Rule File Structure
Each rule file in
contains:
- YAML frontmatter: title, impact level, tags
- Brief explanation: Why this rule matters
- Incorrect example: Anti-pattern with explanation
- Correct example: Best practice with explanation
- Additional context: Trade-offs, when to apply, references
Full Compiled Document
For the complete guide with all rules expanded inline:
Use
when you need to check multiple rules quickly without reading individual files.