altinity-expert-clickhouse-schema

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Table Schema and Design Analysis

表Schema与设计分析

Analyze table structure, partitioning, ORDER BY, materialized views, and identify design anti-patterns.

分析表结构、分区、ORDER BY、Materialized Views,并识别设计反模式。

Diagnostics

诊断分析

Run all queries from the file checks.sql and analyze the results.

执行checks.sql文件中的所有查询并分析结果。

Deep Dive Queries (Placeholder-Based)

深度查询(基于占位符)

Partition Distribution for Specific Table

指定表的分区分布

sql
select
    database,
    table,
    count() as partitions,
    sum(rows) as total_rows,
    formatReadableSize(sum(bytes_on_disk)) as total_size,
    formatReadableSize(median(bytes_on_disk)) as median_partition_size,
    min(partition) as oldest_partition,
    max(partition) as newest_partition
from system.parts
where active and database = '{database}' and table = '{table}'
group by database, table, partition
order by partition desc
limit 100
sql
select
    database,
    table,
    count() as partitions,
    sum(rows) as total_rows,
    formatReadableSize(sum(bytes_on_disk)) as total_size,
    formatReadableSize(median(bytes_on_disk)) as median_partition_size,
    min(partition) as oldest_partition,
    max(partition) as newest_partition
from system.parts
where active and database = '{database}' and table = '{table}'
group by database, table, partition
order by partition desc
limit 100

Column Compression Analysis for Specific Table

指定表的列压缩分析

sql
select
    name,
    type,
    formatReadableSize(data_compressed_bytes) as compressed,
    formatReadableSize(data_uncompressed_bytes) as uncompressed,
    round(data_uncompressed_bytes / nullIf(data_compressed_bytes, 0), 2) as ratio,
    compression_codec
from system.columns
where database = '{database}' and table = '{table}'
order by data_compressed_bytes desc
limit 50
Look for:
  • Columns with ratio < 2 → consider better codec or data transformation
  • Large columns without codec → add CODEC(ZSTD) or LZ4HC
  • String columns with low cardinality → consider LowCardinality(String)
sql
select
    name,
    type,
    formatReadableSize(data_compressed_bytes) as compressed,
    formatReadableSize(data_uncompressed_bytes) as uncompressed,
    round(data_uncompressed_bytes / nullIf(data_compressed_bytes, 0), 2) as ratio,
    compression_codec
from system.columns
where database = '{database}' and table = '{table}'
order by data_compressed_bytes desc
limit 50
检查要点:
  • 压缩比<2的列 → 考虑使用更优的Codec或进行数据转换
  • 未使用Codec的大列 → 添加CODEC(ZSTD)或LZ4HC
  • 低基数的String列 → 考虑使用LowCardinality(String)

Index Usage Analysis for Specific Database

指定数据库的索引使用分析

sql
select
    database,
    table,
    name as index_name,
    type,
    expr,
    granularity
from system.data_skipping_indices
where database = '{database}'
order by database, table

sql
select
    database,
    table,
    name as index_name,
    type,
    expr,
    granularity
from system.data_skipping_indices
where database = '{database}'
order by database, table

Schema Design Recommendations

Schema设计建议

Partition Key Guidelines

分区键指南

Data VolumeRecommended GranularityExample
< 10GB/monthNo partitioning or yearly
toYear(ts)
10-100GB/monthMonthly
toYYYYMM(ts)
100GB-1TB/monthWeekly or daily
toMonday(ts)
> 1TB/monthDaily
toDate(ts)
数据量推荐粒度示例
< 10GB/月不分区或按年分区
toYear(ts)
10-100GB/月按月分区
toYYYYMM(ts)
100GB-1TB/月按周或按日分区
toMonday(ts)
> 1TB/月按日分区
toDate(ts)

ORDER BY Guidelines

ORDER BY 指南

  1. First column: Low cardinality, frequently filtered (e.g.,
    tenant_id
    ,
    region
    )
  2. Second column: Time-based if range queries common
  3. Subsequent: Other filter columns by selectivity (most selective last)
Anti-patterns:
  • UUID/hash as first column
  • High-cardinality ID without tenant prefix
  • DateTime64 with microseconds as first column
  1. 第一列:低基数、频繁用于过滤的列(如
    tenant_id
    region
  2. 第二列:如果常用范围查询则使用时间列
  3. 后续列:其他过滤列按选择性排序(选择性最高的放在最后)
反模式:
  • 将UUID/哈希值作为第一列
  • 高基数ID未添加租户前缀
  • 将带微秒的DateTime64作为第一列

Compression Codec Recommendations

压缩Codec建议

Data TypeRecommended Codec
Integers (sequential)
Delta, ZSTD
Integers (random)
ZSTD
or
LZ4HC
Floats
Gorilla, ZSTD
Timestamps
DoubleDelta, ZSTD
Strings (long)
ZSTD(3)
Strings (repetitive)
LowCardinality
+
ZSTD

数据类型推荐Codec
整数(连续)
Delta, ZSTD
整数(随机)
ZSTD
LZ4HC
浮点数
Gorilla, ZSTD
时间戳
DoubleDelta, ZSTD
长字符串
ZSTD(3)
重复字符串
LowCardinality
+
ZSTD

Cross-Module Triggers

跨模块触发规则

FindingLoad ModuleReason
Many small partitions
altinity-expert-clickhouse-ingestion
Check batch sizing
Oversized partitions
altinity-expert-clickhouse-merges
Merge can't complete
High PK memory
altinity-expert-clickhouse-memory
Memory pressure
MV performance issues
altinity-expert-clickhouse-reporting
Query analysis
Too many parts per partition
altinity-expert-clickhouse-merges
Merge backlog

发现问题加载模块原因
大量小分区
altinity-expert-clickhouse-ingestion
检查批量大小设置
分区过大
altinity-expert-clickhouse-merges
合并无法完成
主键占用内存过高
altinity-expert-clickhouse-memory
内存压力问题
Materialized View性能问题
altinity-expert-clickhouse-reporting
查询分析需求
每个分区包含过多parts
altinity-expert-clickhouse-merges
合并任务积压

Settings Reference

配置参考

SettingDefaultRecommendation
index_granularity
8192Lower for point lookups, higher for scans
ttl_only_drop_parts
0Set to 1 if TTL deletes entire partitions
min_bytes_for_wide_part
10MBIncrease if many small parts
配置项默认值推荐设置
index_granularity
8192点查询场景调低,扫描场景调高
ttl_only_drop_parts
0如果TTL仅删除整个分区则设为1
min_bytes_for_wide_part
10MB若存在大量小parts则调大