altinity-expert-clickhouse-schema
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTable Schema and Design Analysis
表Schema与设计分析
Analyze table structure, partitioning, ORDER BY, materialized views, and identify design anti-patterns.
分析表结构、分区、ORDER BY、Materialized Views,并识别设计反模式。
Diagnostics
诊断分析
Run all queries from the file checks.sql and analyze the results.
执行checks.sql文件中的所有查询并分析结果。
Deep Dive Queries (Placeholder-Based)
深度查询(基于占位符)
Partition Distribution for Specific Table
指定表的分区分布
sql
select
database,
table,
count() as partitions,
sum(rows) as total_rows,
formatReadableSize(sum(bytes_on_disk)) as total_size,
formatReadableSize(median(bytes_on_disk)) as median_partition_size,
min(partition) as oldest_partition,
max(partition) as newest_partition
from system.parts
where active and database = '{database}' and table = '{table}'
group by database, table, partition
order by partition desc
limit 100sql
select
database,
table,
count() as partitions,
sum(rows) as total_rows,
formatReadableSize(sum(bytes_on_disk)) as total_size,
formatReadableSize(median(bytes_on_disk)) as median_partition_size,
min(partition) as oldest_partition,
max(partition) as newest_partition
from system.parts
where active and database = '{database}' and table = '{table}'
group by database, table, partition
order by partition desc
limit 100Column Compression Analysis for Specific Table
指定表的列压缩分析
sql
select
name,
type,
formatReadableSize(data_compressed_bytes) as compressed,
formatReadableSize(data_uncompressed_bytes) as uncompressed,
round(data_uncompressed_bytes / nullIf(data_compressed_bytes, 0), 2) as ratio,
compression_codec
from system.columns
where database = '{database}' and table = '{table}'
order by data_compressed_bytes desc
limit 50Look for:
- Columns with ratio < 2 → consider better codec or data transformation
- Large columns without codec → add CODEC(ZSTD) or LZ4HC
- String columns with low cardinality → consider LowCardinality(String)
sql
select
name,
type,
formatReadableSize(data_compressed_bytes) as compressed,
formatReadableSize(data_uncompressed_bytes) as uncompressed,
round(data_uncompressed_bytes / nullIf(data_compressed_bytes, 0), 2) as ratio,
compression_codec
from system.columns
where database = '{database}' and table = '{table}'
order by data_compressed_bytes desc
limit 50检查要点:
- 压缩比<2的列 → 考虑使用更优的Codec或进行数据转换
- 未使用Codec的大列 → 添加CODEC(ZSTD)或LZ4HC
- 低基数的String列 → 考虑使用LowCardinality(String)
Index Usage Analysis for Specific Database
指定数据库的索引使用分析
sql
select
database,
table,
name as index_name,
type,
expr,
granularity
from system.data_skipping_indices
where database = '{database}'
order by database, tablesql
select
database,
table,
name as index_name,
type,
expr,
granularity
from system.data_skipping_indices
where database = '{database}'
order by database, tableSchema Design Recommendations
Schema设计建议
Partition Key Guidelines
分区键指南
| Data Volume | Recommended Granularity | Example |
|---|---|---|
| < 10GB/month | No partitioning or yearly | |
| 10-100GB/month | Monthly | |
| 100GB-1TB/month | Weekly or daily | |
| > 1TB/month | Daily | |
| 数据量 | 推荐粒度 | 示例 |
|---|---|---|
| < 10GB/月 | 不分区或按年分区 | |
| 10-100GB/月 | 按月分区 | |
| 100GB-1TB/月 | 按周或按日分区 | |
| > 1TB/月 | 按日分区 | |
ORDER BY Guidelines
ORDER BY 指南
- First column: Low cardinality, frequently filtered (e.g., ,
tenant_id)region - Second column: Time-based if range queries common
- Subsequent: Other filter columns by selectivity (most selective last)
Anti-patterns:
- UUID/hash as first column
- High-cardinality ID without tenant prefix
- DateTime64 with microseconds as first column
- 第一列:低基数、频繁用于过滤的列(如、
tenant_id)region - 第二列:如果常用范围查询则使用时间列
- 后续列:其他过滤列按选择性排序(选择性最高的放在最后)
反模式:
- 将UUID/哈希值作为第一列
- 高基数ID未添加租户前缀
- 将带微秒的DateTime64作为第一列
Compression Codec Recommendations
压缩Codec建议
| Data Type | Recommended Codec |
|---|---|
| Integers (sequential) | |
| Integers (random) | |
| Floats | |
| Timestamps | |
| Strings (long) | |
| Strings (repetitive) | |
| 数据类型 | 推荐Codec |
|---|---|
| 整数(连续) | |
| 整数(随机) | |
| 浮点数 | |
| 时间戳 | |
| 长字符串 | |
| 重复字符串 | |
Cross-Module Triggers
跨模块触发规则
| Finding | Load Module | Reason |
|---|---|---|
| Many small partitions | | Check batch sizing |
| Oversized partitions | | Merge can't complete |
| High PK memory | | Memory pressure |
| MV performance issues | | Query analysis |
| Too many parts per partition | | Merge backlog |
| 发现问题 | 加载模块 | 原因 |
|---|---|---|
| 大量小分区 | | 检查批量大小设置 |
| 分区过大 | | 合并无法完成 |
| 主键占用内存过高 | | 内存压力问题 |
| Materialized View性能问题 | | 查询分析需求 |
| 每个分区包含过多parts | | 合并任务积压 |
Settings Reference
配置参考
| Setting | Default | Recommendation |
|---|---|---|
| 8192 | Lower for point lookups, higher for scans |
| 0 | Set to 1 if TTL deletes entire partitions |
| 10MB | Increase if many small parts |
| 配置项 | 默认值 | 推荐设置 |
|---|---|---|
| 8192 | 点查询场景调低,扫描场景调高 |
| 0 | 如果TTL仅删除整个分区则设为1 |
| 10MB | 若存在大量小parts则调大 |