altinity-expert-clickhouse-schema

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Table Schema and Design Analysis

表Schema与设计分析

Analyze table structure, partitioning, ORDER BY, materialized views, and identify design anti-patterns.

分析表结构、分区、ORDER BY、Materialized Views，并识别设计反模式。

Diagnostics

诊断分析

Run all queries from the file checks.sql and analyze the results.

执行checks.sql文件中的所有查询并分析结果。

Deep Dive Queries (Placeholder-Based)

深度查询（基于占位符）

Partition Distribution for Specific Table

指定表的分区分布

sql

select
    database,
    table,
    count() as partitions,
    sum(rows) as total_rows,
    formatReadableSize(sum(bytes_on_disk)) as total_size,
    formatReadableSize(median(bytes_on_disk)) as median_partition_size,
    min(partition) as oldest_partition,
    max(partition) as newest_partition
from system.parts
where active and database = '{database}' and table = '{table}'
group by database, table, partition
order by partition desc
limit 100

sql

select
    database,
    table,
    count() as partitions,
    sum(rows) as total_rows,
    formatReadableSize(sum(bytes_on_disk)) as total_size,
    formatReadableSize(median(bytes_on_disk)) as median_partition_size,
    min(partition) as oldest_partition,
    max(partition) as newest_partition
from system.parts
where active and database = '{database}' and table = '{table}'
group by database, table, partition
order by partition desc
limit 100

Column Compression Analysis for Specific Table

指定表的列压缩分析

sql

select
    name,
    type,
    formatReadableSize(data_compressed_bytes) as compressed,
    formatReadableSize(data_uncompressed_bytes) as uncompressed,
    round(data_uncompressed_bytes / nullIf(data_compressed_bytes, 0), 2) as ratio,
    compression_codec
from system.columns
where database = '{database}' and table = '{table}'
order by data_compressed_bytes desc
limit 50

Look for:

Columns with ratio < 2 → consider better codec or data transformation
Large columns without codec → add CODEC(ZSTD) or LZ4HC
String columns with low cardinality → consider LowCardinality(String)

sql

select
    name,
    type,
    formatReadableSize(data_compressed_bytes) as compressed,
    formatReadableSize(data_uncompressed_bytes) as uncompressed,
    round(data_uncompressed_bytes / nullIf(data_compressed_bytes, 0), 2) as ratio,
    compression_codec
from system.columns
where database = '{database}' and table = '{table}'
order by data_compressed_bytes desc
limit 50

检查要点：

压缩比<2的列 → 考虑使用更优的Codec或进行数据转换
未使用Codec的大列 → 添加CODEC(ZSTD)或LZ4HC
低基数的String列 → 考虑使用LowCardinality(String)

Index Usage Analysis for Specific Database

指定数据库的索引使用分析

sql

select
    database,
    table,
    name as index_name,
    type,
    expr,
    granularity
from system.data_skipping_indices
where database = '{database}'
order by database, table

sql

select
    database,
    table,
    name as index_name,
    type,
    expr,
    granularity
from system.data_skipping_indices
where database = '{database}'
order by database, table

Schema Design Recommendations

Schema设计建议

Partition Key Guidelines

分区键指南

Data Volume	Recommended Granularity	Example
< 10GB/month	No partitioning or yearly	`toYear(ts)`
10-100GB/month	Monthly	`toYYYYMM(ts)`
100GB-1TB/month	Weekly or daily	`toMonday(ts)`
> 1TB/month	Daily	`toDate(ts)`

数据量	推荐粒度	示例
< 10GB/月	不分区或按年分区	`toYear(ts)`
10-100GB/月	按月分区	`toYYYYMM(ts)`
100GB-1TB/月	按周或按日分区	`toMonday(ts)`
> 1TB/月	按日分区	`toDate(ts)`

ORDER BY Guidelines

ORDER BY 指南

First column: Low cardinality, frequently filtered (e.g.,
```
tenant_id
```
,
```
region
```
)
Second column: Time-based if range queries common
Subsequent: Other filter columns by selectivity (most selective last)

Anti-patterns:

UUID/hash as first column
High-cardinality ID without tenant prefix
DateTime64 with microseconds as first column

第一列：低基数、频繁用于过滤的列（如
```
tenant_id
```
、
```
region
```
）
第二列：如果常用范围查询则使用时间列
后续列：其他过滤列按选择性排序（选择性最高的放在最后）

反模式：

将UUID/哈希值作为第一列
高基数ID未添加租户前缀
将带微秒的DateTime64作为第一列

Compression Codec Recommendations

压缩Codec建议

Data Type	Recommended Codec
Integers (sequential)	`Delta, ZSTD`
Integers (random)	`ZSTD` or `LZ4HC`
Floats	`Gorilla, ZSTD`
Timestamps	`DoubleDelta, ZSTD`
Strings (long)	`ZSTD(3)`
Strings (repetitive)	`LowCardinality` + `ZSTD`

数据类型	推荐Codec
整数（连续）	`Delta, ZSTD`
整数（随机）	`ZSTD` 或 `LZ4HC`
浮点数	`Gorilla, ZSTD`
时间戳	`DoubleDelta, ZSTD`
长字符串	`ZSTD(3)`
重复字符串	`LowCardinality` + `ZSTD`

Cross-Module Triggers

跨模块触发规则

Finding	Load Module	Reason
Many small partitions	`altinity-expert-clickhouse-ingestion`	Check batch sizing
Oversized partitions	`altinity-expert-clickhouse-merges`	Merge can't complete
High PK memory	`altinity-expert-clickhouse-memory`	Memory pressure
MV performance issues	`altinity-expert-clickhouse-reporting`	Query analysis
Too many parts per partition	`altinity-expert-clickhouse-merges`	Merge backlog

发现问题	加载模块	原因
大量小分区	`altinity-expert-clickhouse-ingestion`	检查批量大小设置
分区过大	`altinity-expert-clickhouse-merges`	合并无法完成
主键占用内存过高	`altinity-expert-clickhouse-memory`	内存压力问题
Materialized View性能问题	`altinity-expert-clickhouse-reporting`	查询分析需求
每个分区包含过多parts	`altinity-expert-clickhouse-merges`	合并任务积压

Settings Reference

配置参考

Setting	Default	Recommendation
`index_granularity`	8192	Lower for point lookups, higher for scans
`ttl_only_drop_parts`	0	Set to 1 if TTL deletes entire partitions
`min_bytes_for_wide_part`	10MB	Increase if many small parts

配置项	默认值	推荐设置
`index_granularity`	8192	点查询场景调低，扫描场景调高
`ttl_only_drop_parts`	0	如果TTL仅删除整个分区则设为1
`min_bytes_for_wide_part`	10MB	若存在大量小parts则调大