clickhouse-best-practices

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ClickHouse Best Practices

ClickHouse最佳实践

Comprehensive guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact.
这是一份针对ClickHouse的全面指南,涵盖表结构设计、查询优化和数据导入。包含3个主要类别(表结构、查询、插入)的28条规则,按影响优先级排序。
官方文档: ClickHouse最佳实践

IMPORTANT: How to Apply This Skill

重要提示:如何使用此技能

Before answering ClickHouse questions, follow this priority order:
  1. Check for applicable rules in the
    rules/
    directory
  2. If rules exist: Apply them and cite them in your response using "Per
    rule-name
    ..."
  3. If no rule exists: Use the LLM's ClickHouse knowledge or search documentation
  4. If uncertain: Use web search for current best practices
  5. Always cite your source: rule name, "general ClickHouse guidance", or URL
Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance.
在回答ClickHouse相关问题前,请遵循以下优先级顺序:
  1. 检查
    rules/
    目录中的适用规则
  2. 若存在对应规则: 应用规则并在回复中使用「根据
    规则名称
    ...」的格式引用
  3. 若不存在对应规则: 使用大语言模型的ClickHouse相关知识或搜索官方文档
  4. 若不确定: 搜索当前的最佳实践
  5. 始终注明来源: 规则名称、「通用ClickHouse指南」或具体URL
规则优先级更高的原因: ClickHouse具有特定的特性(列式存储、稀疏索引、合并树机制),通用数据库的直觉可能会产生误导。这些规则整合了经过验证的、针对ClickHouse的专属指导。

For Formal Reviews

正式审核流程

When performing a formal review of schemas, queries, or data ingestion:

当对表结构、查询语句或数据导入进行正式审核时:

Review Procedures

审核步骤

For Schema Reviews (CREATE TABLE, ALTER TABLE)

表结构审核(CREATE TABLE、ALTER TABLE)

Read these rule files in order:
  1. rules/schema-pk-plan-before-creation.md
    - ORDER BY is immutable
  2. rules/schema-pk-cardinality-order.md
    - Column ordering in keys
  3. rules/schema-pk-prioritize-filters.md
    - Filter column inclusion
  4. rules/schema-types-native-types.md
    - Proper type selection
  5. rules/schema-types-minimize-bitwidth.md
    - Numeric type sizing
  6. rules/schema-types-lowcardinality.md
    - LowCardinality usage
  7. rules/schema-types-avoid-nullable.md
    - Nullable vs DEFAULT
  8. rules/schema-partition-low-cardinality.md
    - Partition count limits
  9. rules/schema-partition-lifecycle.md
    - Partitioning purpose
Check for:
  • PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
  • Data types match actual data ranges
  • LowCardinality applied to appropriate string columns
  • Partition key cardinality bounded (100-1,000 values)
  • ReplacingMergeTree has version column if used
请按顺序阅读以下规则文件:
  1. rules/schema-pk-plan-before-creation.md
    - ORDER BY子句不可变更
  2. rules/schema-pk-cardinality-order.md
    - 键中的列顺序
  3. rules/schema-pk-prioritize-filters.md
    - 筛选列的纳入原则
  4. rules/schema-types-native-types.md
    - 正确选择数据类型
  5. rules/schema-types-minimize-bitwidth.md
    - 数值类型的大小优化
  6. rules/schema-types-lowcardinality.md
    - LowCardinality的使用
  7. rules/schema-types-avoid-nullable.md
    - Nullable与DEFAULT的选择
  8. rules/schema-partition-low-cardinality.md
    - 分区数量限制
  9. rules/schema-partition-lifecycle.md
    - 分区的用途
检查要点:
  • PRIMARY KEY / ORDER BY的列顺序(从低基数到高基数)
  • 数据类型与实际数据范围匹配
  • 对合适的字符串列应用LowCardinality类型
  • 分区键的基数控制在100-1000个值之间
  • 若使用ReplacingMergeTree引擎,需包含版本列

For Query Reviews (SELECT, JOIN, aggregations)

查询审核(SELECT、JOIN、聚合操作)

Read these rule files:
  1. rules/query-join-choose-algorithm.md
    - Algorithm selection
  2. rules/query-join-filter-before.md
    - Pre-join filtering
  3. rules/query-join-use-any.md
    - ANY vs regular JOIN
  4. rules/query-index-skipping-indices.md
    - Secondary index usage
  5. rules/schema-pk-filter-on-orderby.md
    - Filter alignment with ORDER BY
Check for:
  • Filters use ORDER BY prefix columns
  • JOINs filter tables before joining (not after)
  • Correct JOIN algorithm for table sizes
  • Skipping indices for non-ORDER BY filter columns
请阅读以下规则文件:
  1. rules/query-join-choose-algorithm.md
    - 算法选择
  2. rules/query-join-filter-before.md
    - JOIN前筛选
  3. rules/query-join-use-any.md
    - ANY JOIN与常规JOIN的选择
  4. rules/query-index-skipping-indices.md
    - 二级索引的使用
  5. rules/schema-pk-filter-on-orderby.md
    - 筛选条件与ORDER BY的对齐
检查要点:
  • 筛选条件使用ORDER BY的前缀列
  • 在JOIN前对表进行筛选(而非之后)
  • 根据表大小选择正确的JOIN算法
  • 为非ORDER BY的筛选列设置跳数索引

For Insert Strategy Reviews (data ingestion, updates, deletes)

插入策略审核(数据导入、更新、删除)

Read these rule files:
  1. rules/insert-batch-size.md
    - Batch sizing requirements
  2. rules/insert-mutation-avoid-update.md
    - UPDATE alternatives
  3. rules/insert-mutation-avoid-delete.md
    - DELETE alternatives
  4. rules/insert-async-small-batches.md
    - Async insert usage
  5. rules/insert-optimize-avoid-final.md
    - OPTIMIZE TABLE risks
Check for:
  • Batch size 10K-100K rows per INSERT
  • No ALTER TABLE UPDATE for frequent changes
  • ReplacingMergeTree or CollapsingMergeTree for update patterns
  • Async inserts enabled for high-frequency small batches

请阅读以下规则文件:
  1. rules/insert-batch-size.md
    - 批量大小要求
  2. rules/insert-mutation-avoid-update.md
    - UPDATE的替代方案
  3. rules/insert-mutation-avoid-delete.md
    - DELETE的替代方案
  4. rules/insert-async-small-batches.md
    - 异步插入的使用
  5. rules/insert-optimize-avoid-final.md
    - OPTIMIZE TABLE的风险
检查要点:
  • 每个INSERT语句的批量大小为10000-100000行
  • 避免对频繁变更的场景使用ALTER TABLE UPDATE
  • 对更新场景使用ReplacingMergeTree或CollapsingMergeTree
  • 对高频小批量场景启用异步插入

Output Format

回复格式

Structure your response as follows:
undefined
请按照以下结构组织回复:
undefined

Rules Checked

已检查规则

  • rule-name-1
    - Compliant / Violation found
  • rule-name-2
    - Compliant / Violation found ...
  • rule-name-1
    - 符合要求 / 发现违规
  • rule-name-2
    - 符合要求 / 发现违规 ...

Findings

审核结果

Violations

违规项

  • rule-name
    : Description of the issue
    • Current: [what the code does]
    • Required: [what it should do]
    • Fix: [specific correction]
  • rule-name
    : 问题描述
    • 当前情况:[代码的实际行为]
    • 要求:[应有的行为]
    • 修复方案:[具体修正措施]

Compliant

合规项

  • rule-name
    : Brief note on why it's correct
  • rule-name
    : 符合要求的简要说明

Recommendations

建议

[Prioritized list of changes, citing rules]

---
[按优先级排序的变更列表,需引用对应规则]

---

Rule Categories by Priority

规则类别优先级

PriorityCategoryImpactPrefixRule Count
1Primary Key SelectionCRITICAL
schema-pk-
4
2Data Type SelectionCRITICAL
schema-types-
5
3JOIN OptimizationCRITICAL
query-join-
5
4Insert BatchingCRITICAL
insert-batch-
1
5Mutation AvoidanceCRITICAL
insert-mutation-
2
6Partitioning StrategyHIGH
schema-partition-
4
7Skipping IndicesHIGH
query-index-
1
8Materialized ViewsHIGH
query-mv-
2
9Async InsertsHIGH
insert-async-
2
10OPTIMIZE AvoidanceHIGH
insert-optimize-
1
11JSON UsageMEDIUM
schema-json-
1

优先级类别影响级别前缀规则数量
1主键选择CRITICAL
schema-pk-
4
2数据类型选择CRITICAL
schema-types-
5
3JOIN优化CRITICAL
query-join-
5
4插入批量处理CRITICAL
insert-batch-
1
5避免变更操作CRITICAL
insert-mutation-
2
6分区策略HIGH
schema-partition-
4
7跳数索引HIGH
query-index-
1
8物化视图HIGH
query-mv-
2
9异步插入HIGH
insert-async-
2
10避免OPTIMIZE操作HIGH
insert-optimize-
1
11JSON使用MEDIUM
schema-json-
1

Quick Reference

快速参考

Schema Design - Primary Key (CRITICAL)

表结构设计 - 主键(CRITICAL)

  • schema-pk-plan-before-creation
    - Plan ORDER BY before table creation (immutable)
  • schema-pk-cardinality-order
    - Order columns low-to-high cardinality
  • schema-pk-prioritize-filters
    - Include frequently filtered columns
  • schema-pk-filter-on-orderby
    - Query filters must use ORDER BY prefix
  • schema-pk-plan-before-creation
    - 创建表前规划ORDER BY子句(不可变更)
  • schema-pk-cardinality-order
    - 列顺序按从低基数到高基数排列
  • schema-pk-prioritize-filters
    - 纳入频繁用于筛选的列
  • schema-pk-filter-on-orderby
    - 查询筛选条件必须使用ORDER BY的前缀列

Schema Design - Data Types (CRITICAL)

表结构设计 - 数据类型(CRITICAL)

  • schema-types-native-types
    - Use native types, not String for everything
  • schema-types-minimize-bitwidth
    - Use smallest numeric type that fits
  • schema-types-lowcardinality
    - LowCardinality for <10K unique strings
  • schema-types-enum
    - Enum for finite value sets with validation
  • schema-types-avoid-nullable
    - Avoid Nullable; use DEFAULT instead
  • schema-types-native-types
    - 使用原生类型,避免所有列都用String类型
  • schema-types-minimize-bitwidth
    - 使用能容纳数据的最小数值类型
  • schema-types-lowcardinality
    - 对唯一值少于10000的字符串列使用LowCardinality
  • schema-types-enum
    - 对有限值集合使用Enum类型并启用验证
  • schema-types-avoid-nullable
    - 避免使用Nullable;改用DEFAULT

Schema Design - Partitioning (HIGH)

表结构设计 - 分区(HIGH)

  • schema-partition-low-cardinality
    - Keep partition count 100-1,000
  • schema-partition-lifecycle
    - Use partitioning for data lifecycle, not queries
  • schema-partition-query-tradeoffs
    - Understand partition pruning trade-offs
  • schema-partition-start-without
    - Consider starting without partitioning
  • schema-partition-low-cardinality
    - 分区数量保持在100-1000之间
  • schema-partition-lifecycle
    - 分区用于数据生命周期管理,而非查询优化
  • schema-partition-query-tradeoffs
    - 了解分区修剪的权衡
  • schema-partition-start-without
    - 考虑初始不使用分区

Schema Design - JSON (MEDIUM)

表结构设计 - JSON(MEDIUM)

  • schema-json-when-to-use
    - JSON for dynamic schemas; typed columns for known
  • schema-json-when-to-use
    - 动态表结构使用JSON;已知结构使用类型化列

Query Optimization - JOINs (CRITICAL)

查询优化 - JOIN(CRITICAL)

  • query-join-choose-algorithm
    - Select algorithm based on table sizes
  • query-join-use-any
    - ANY JOIN when only one match needed
  • query-join-filter-before
    - Filter tables before joining
  • query-join-consider-alternatives
    - Dictionaries/denormalization vs JOIN
  • query-join-null-handling
    - join_use_nulls=0 for default values
  • query-join-choose-algorithm
    - 根据表大小选择JOIN算法
  • query-join-use-any
    - 仅需单个匹配时使用ANY JOIN
  • query-join-filter-before
    - JOIN前对表进行筛选
  • query-join-consider-alternatives
    - 考虑使用字典/反规范化替代JOIN
  • query-join-null-handling
    - 启用join_use_nulls=0以使用默认值

Query Optimization - Indices (HIGH)

查询优化 - 索引(HIGH)

  • query-index-skipping-indices
    - Skipping indices for non-ORDER BY filters
  • query-index-skipping-indices
    - 为非ORDER BY的筛选列设置跳数索引

Query Optimization - Materialized Views (HIGH)

查询优化 - 物化视图(HIGH)

  • query-mv-incremental
    - Incremental MVs for real-time aggregations
  • query-mv-refreshable
    - Refreshable MVs for complex joins
  • query-mv-incremental
    - 实时聚合使用增量物化视图
  • query-mv-refreshable
    - 复杂JOIN使用可刷新物化视图

Insert Strategy - Batching (CRITICAL)

插入策略 - 批量处理(CRITICAL)

  • insert-batch-size
    - Batch 10K-100K rows per INSERT
  • insert-batch-size
    - 每个INSERT语句批量处理10000-100000行

Insert Strategy - Async (HIGH)

插入策略 - 异步(HIGH)

  • insert-async-small-batches
    - Async inserts for high-frequency small batches
  • insert-format-native
    - Native format for best performance
  • insert-async-small-batches
    - 高频小批量场景使用异步插入
  • insert-format-native
    - 使用原生格式以获得最佳性能

Insert Strategy - Mutations (CRITICAL)

插入策略 - 变更操作(CRITICAL)

  • insert-mutation-avoid-update
    - ReplacingMergeTree instead of ALTER UPDATE
  • insert-mutation-avoid-delete
    - Lightweight DELETE or DROP PARTITION
  • insert-mutation-avoid-update
    - 使用ReplacingMergeTree替代ALTER UPDATE
  • insert-mutation-avoid-delete
    - 使用轻量DELETE或DROP PARTITION

Insert Strategy - Optimization (HIGH)

插入策略 - 优化(HIGH)

  • insert-optimize-avoid-final
    - Let background merges work

  • insert-optimize-avoid-final
    - 让后台合并进程自动处理

When to Apply

适用场景

This skill activates when you encounter:
  • CREATE TABLE
    statements
  • ALTER TABLE
    modifications
  • ORDER BY
    or
    PRIMARY KEY
    discussions
  • Data type selection questions
  • Slow query troubleshooting
  • JOIN optimization requests
  • Data ingestion pipeline design
  • Update/delete strategy questions
  • ReplacingMergeTree or other specialized engine usage
  • Partitioning strategy decisions

当遇到以下场景时,激活此技能:
  • CREATE TABLE
    语句
  • ALTER TABLE
    修改操作
  • ORDER BY或PRIMARY KEY相关讨论
  • 数据类型选择问题
  • 慢查询排查
  • JOIN优化请求
  • 数据导入管道设计
  • 更新/删除策略问题
  • ReplacingMergeTree或其他专用引擎的使用
  • 分区策略决策

Rule File Structure

规则文件结构

Each rule file in
rules/
contains:
  • YAML frontmatter: title, impact level, tags
  • Brief explanation: Why this rule matters
  • Incorrect example: Anti-pattern with explanation
  • Correct example: Best practice with explanation
  • Additional context: Trade-offs, when to apply, references

rules/
目录下的每个规则文件包含:
  • YAML前置元数据:标题、影响级别、标签
  • 简要说明:规则的重要性
  • 错误示例:反模式及说明
  • 正确示例:最佳实践及说明
  • 附加上下文:权衡、适用场景、参考资料

Full Compiled Document

完整编译文档

For the complete guide with all rules expanded inline:
AGENTS.md
Use
AGENTS.md
when you need to check multiple rules quickly without reading individual files.
如需查看包含所有展开规则的完整指南,请参考:
AGENTS.md
当需要快速检查多个规则而无需阅读单个文件时,使用
AGENTS.md