clickhouse-best-practices
Original:🇺🇸 English
Translated
MUST USE when reviewing ClickHouse schemas, queries, or configurations. Contains 28 rules that MUST be checked before providing recommendations. Always read relevant rule files and cite specific rules in responses.
19installs
Sourceclickhouse/agent-skills
Added on
NPX Install
npx skill4agent add clickhouse/agent-skills clickhouse-best-practicesTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →ClickHouse Best Practices
Comprehensive guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact.
Official docs: ClickHouse Best Practices
IMPORTANT: How to Apply This Skill
Before answering ClickHouse questions, follow this priority order:
- Check for applicable rules in the directory
rules/ - If rules exist: Apply them and cite them in your response using "Per ..."
rule-name - If no rule exists: Use the LLM's ClickHouse knowledge or search documentation
- If uncertain: Use web search for current best practices
- Always cite your source: rule name, "general ClickHouse guidance", or URL
Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance.
For Formal Reviews
When performing a formal review of schemas, queries, or data ingestion:
Review Procedures
For Schema Reviews (CREATE TABLE, ALTER TABLE)
Read these rule files in order:
- - ORDER BY is immutable
rules/schema-pk-plan-before-creation.md - - Column ordering in keys
rules/schema-pk-cardinality-order.md - - Filter column inclusion
rules/schema-pk-prioritize-filters.md - - Proper type selection
rules/schema-types-native-types.md - - Numeric type sizing
rules/schema-types-minimize-bitwidth.md - - LowCardinality usage
rules/schema-types-lowcardinality.md - - Nullable vs DEFAULT
rules/schema-types-avoid-nullable.md - - Partition count limits
rules/schema-partition-low-cardinality.md - - Partitioning purpose
rules/schema-partition-lifecycle.md
Check for:
- PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
- Data types match actual data ranges
- LowCardinality applied to appropriate string columns
- Partition key cardinality bounded (100-1,000 values)
- ReplacingMergeTree has version column if used
For Query Reviews (SELECT, JOIN, aggregations)
Read these rule files:
- - Algorithm selection
rules/query-join-choose-algorithm.md - - Pre-join filtering
rules/query-join-filter-before.md - - ANY vs regular JOIN
rules/query-join-use-any.md - - Secondary index usage
rules/query-index-skipping-indices.md - - Filter alignment with ORDER BY
rules/schema-pk-filter-on-orderby.md
Check for:
- Filters use ORDER BY prefix columns
- JOINs filter tables before joining (not after)
- Correct JOIN algorithm for table sizes
- Skipping indices for non-ORDER BY filter columns
For Insert Strategy Reviews (data ingestion, updates, deletes)
Read these rule files:
- - Batch sizing requirements
rules/insert-batch-size.md - - UPDATE alternatives
rules/insert-mutation-avoid-update.md - - DELETE alternatives
rules/insert-mutation-avoid-delete.md - - Async insert usage
rules/insert-async-small-batches.md - - OPTIMIZE TABLE risks
rules/insert-optimize-avoid-final.md
Check for:
- Batch size 10K-100K rows per INSERT
- No ALTER TABLE UPDATE for frequent changes
- ReplacingMergeTree or CollapsingMergeTree for update patterns
- Async inserts enabled for high-frequency small batches
Output Format
Structure your response as follows:
## Rules Checked
- `rule-name-1` - Compliant / Violation found
- `rule-name-2` - Compliant / Violation found
...
## Findings
### Violations
- **`rule-name`**: Description of the issue
- Current: [what the code does]
- Required: [what it should do]
- Fix: [specific correction]
### Compliant
- `rule-name`: Brief note on why it's correct
## Recommendations
[Prioritized list of changes, citing rules]Rule Categories by Priority
| Priority | Category | Impact | Prefix | Rule Count |
|---|---|---|---|---|
| 1 | Primary Key Selection | CRITICAL | | 4 |
| 2 | Data Type Selection | CRITICAL | | 5 |
| 3 | JOIN Optimization | CRITICAL | | 5 |
| 4 | Insert Batching | CRITICAL | | 1 |
| 5 | Mutation Avoidance | CRITICAL | | 2 |
| 6 | Partitioning Strategy | HIGH | | 4 |
| 7 | Skipping Indices | HIGH | | 1 |
| 8 | Materialized Views | HIGH | | 2 |
| 9 | Async Inserts | HIGH | | 2 |
| 10 | OPTIMIZE Avoidance | HIGH | | 1 |
| 11 | JSON Usage | MEDIUM | | 1 |
Quick Reference
Schema Design - Primary Key (CRITICAL)
- - Plan ORDER BY before table creation (immutable)
schema-pk-plan-before-creation - - Order columns low-to-high cardinality
schema-pk-cardinality-order - - Include frequently filtered columns
schema-pk-prioritize-filters - - Query filters must use ORDER BY prefix
schema-pk-filter-on-orderby
Schema Design - Data Types (CRITICAL)
- - Use native types, not String for everything
schema-types-native-types - - Use smallest numeric type that fits
schema-types-minimize-bitwidth - - LowCardinality for <10K unique strings
schema-types-lowcardinality - - Enum for finite value sets with validation
schema-types-enum - - Avoid Nullable; use DEFAULT instead
schema-types-avoid-nullable
Schema Design - Partitioning (HIGH)
- - Keep partition count 100-1,000
schema-partition-low-cardinality - - Use partitioning for data lifecycle, not queries
schema-partition-lifecycle - - Understand partition pruning trade-offs
schema-partition-query-tradeoffs - - Consider starting without partitioning
schema-partition-start-without
Schema Design - JSON (MEDIUM)
- - JSON for dynamic schemas; typed columns for known
schema-json-when-to-use
Query Optimization - JOINs (CRITICAL)
- - Select algorithm based on table sizes
query-join-choose-algorithm - - ANY JOIN when only one match needed
query-join-use-any - - Filter tables before joining
query-join-filter-before - - Dictionaries/denormalization vs JOIN
query-join-consider-alternatives - - join_use_nulls=0 for default values
query-join-null-handling
Query Optimization - Indices (HIGH)
- - Skipping indices for non-ORDER BY filters
query-index-skipping-indices
Query Optimization - Materialized Views (HIGH)
- - Incremental MVs for real-time aggregations
query-mv-incremental - - Refreshable MVs for complex joins
query-mv-refreshable
Insert Strategy - Batching (CRITICAL)
- - Batch 10K-100K rows per INSERT
insert-batch-size
Insert Strategy - Async (HIGH)
- - Async inserts for high-frequency small batches
insert-async-small-batches - - Native format for best performance
insert-format-native
Insert Strategy - Mutations (CRITICAL)
- - ReplacingMergeTree instead of ALTER UPDATE
insert-mutation-avoid-update - - Lightweight DELETE or DROP PARTITION
insert-mutation-avoid-delete
Insert Strategy - Optimization (HIGH)
- - Let background merges work
insert-optimize-avoid-final
When to Apply
This skill activates when you encounter:
- statements
CREATE TABLE - modifications
ALTER TABLE - or
ORDER BYdiscussionsPRIMARY KEY - Data type selection questions
- Slow query troubleshooting
- JOIN optimization requests
- Data ingestion pipeline design
- Update/delete strategy questions
- ReplacingMergeTree or other specialized engine usage
- Partitioning strategy decisions
Rule File Structure
Each rule file in contains:
rules/- YAML frontmatter: title, impact level, tags
- Brief explanation: Why this rule matters
- Incorrect example: Anti-pattern with explanation
- Correct example: Best practice with explanation
- Additional context: Trade-offs, when to apply, references
Full Compiled Document
For the complete guide with all rules expanded inline:
AGENTS.mdUse when you need to check multiple rules quickly without reading individual files.
AGENTS.md