altinity-expert-clickhouse-merges
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMerge Performance and Part Management
合并性能与分区管理
Diagnose merge performance, backlog issues, and part management problems.
诊断合并性能、积压问题以及分区管理故障。
Diagnostics
诊断步骤
Run all queries from the file checks.sql and analyze the results.
运行checks.sql文件中的所有查询并分析结果。
Problem-Specific Investigation
特定问题排查
"Too Many Parts" Error Investigation
“分区过多”错误排查
For deep investigation of a specific table, use these ad-hoc queries:
sql
-- Check part creation rate (should be < 1/second)
select
toStartOfMinute(event_time) as minute,
count() as new_parts,
round(avg(rows)) as avg_rows_per_part
from system.part_log
where event_type = 'NewPart'
and database = '{database}'
and table = '{table}'
and event_time > now() - interval 1 hour
group by minute
order by minute desc
limit 30sql
-- Check if merges are keeping up
select
toStartOfMinute(event_time) as minute,
countIf(event_type = 'NewPart') as new_parts,
countIf(event_type = 'MergeParts') as merges,
countIf(event_type = 'MergeParts') - countIf(event_type = 'NewPart') as net_reduction
from system.part_log
where database = '{database}'
and table = '{table}'
and event_time > now() - interval 1 hour
group by minute
order by minute desc
limit 30If is negative consistently → Inserts outpace merges. Solutions:
net_reduction- Increase batch size
- Check setting
max_parts_to_merge_at_once - Verify sufficient CPU for background merges
如需对特定表进行深度排查,可使用以下临时查询:
sql
-- 检查分区创建速率(应小于1个/秒)
select
toStartOfMinute(event_time) as minute,
count() as new_parts,
round(avg(rows)) as avg_rows_per_part
from system.part_log
where event_type = 'NewPart'
and database = '{database}'
and table = '{table}'
and event_time > now() - interval 1 hour
group by minute
order by minute desc
limit 30sql
-- 检查合并操作是否跟上节奏
select
toStartOfMinute(event_time) as minute,
countIf(event_type = 'NewPart') as new_parts,
countIf(event_type = 'MergeParts') as merges,
countIf(event_type = 'MergeParts') - countIf(event_type = 'NewPart') as net_reduction
from system.part_log
where database = '{database}'
and table = '{table}'
and event_time > now() - interval 1 hour
group by minute
order by minute desc
limit 30如果持续为负值 → 插入速度超过合并速度。解决方案:
net_reduction- 增大批量插入的大小
- 检查配置
max_parts_to_merge_at_once - 确认有足够的CPU资源用于后台合并操作
Ad-Hoc Query Guidelines
临时查询指南
Required Safeguards
必要的安全措施
sql
-- Always include LIMIT
limit 100
-- Always time-bound historical queries
where event_date >= today() - 7
-- For part_log, always filter event_type
where event_type in ('NewPart', 'MergeParts', 'MutatePart')sql
-- 始终添加LIMIT限制
limit 100
-- 历史查询必须添加时间范围
where event_date >= today() - 7
-- 查询part_log时,始终过滤event_type
where event_type in ('NewPart', 'MergeParts', 'MutatePart')Safe Exploration Patterns
安全探索模式
sql
-- Discover available merge_reason values
select distinct merge_reason
from system.part_log
where event_type = 'MergeParts'
and event_date = today()
limit 100
-- Check table engine
select
database,
name,
engine,
partition_key,
sorting_key
from system.tables
where database = '{database}'
and name = '{table}'sql
-- 查看可用的merge_reason取值
select distinct merge_reason
from system.part_log
where event_type = 'MergeParts'
and event_date = today()
limit 100
-- 检查表引擎信息
select
database,
name,
engine,
partition_key,
sorting_key
from system.tables
where database = '{database}'
and name = '{table}'Avoid
注意避免
- → Huge, crashes context
select * from system.part_log - Queries without time bounds on tables
*_log - Joining large result sets in context (do aggregation in SQL)
- → 数据量极大,会导致上下文崩溃
select * from system.part_log - 对表执行无时间范围限制的查询
*_log - 在上下文中关联大型结果集(应在SQL中完成聚合操作)
Cross-Module Triggers
跨模块触发条件
| Finding | Load Module | Reason |
|---|---|---|
| Slow merges, normal disk | | Check ORDER BY, partitioning |
| Slow merges, high disk IO | | Storage bottleneck analysis |
| Merges blocked by mutations | | Stuck mutation investigation |
| High memory during merges | | Memory limits, settings |
| Replication lag + merge issues | | Replica queue analysis |
| 发现的问题 | 加载模块 | 原因 |
|---|---|---|
| 合并缓慢,磁盘状态正常 | | 检查ORDER BY、分区设置 |
| 合并缓慢,磁盘IO高 | | 存储瓶颈分析 |
| 合并被mutations阻塞 | | 停滞的mutation排查 |
| 合并期间内存占用高 | | 内存限制、配置检查 |
| 复制延迟 + 合并问题 | | 副本队列分析 |
Key Settings Reference
关键配置参考
| Setting | Default | Impact |
|---|---|---|
| 100 | Max parts in single merge |
| 8 | Throttles large merges when busy |
| 16 | Merge threads |
| 300 | Error threshold |
| 150 | Delay threshold |
| 150GB | Max merge size |
| 配置项 | 默认值 | 影响 |
|---|---|---|
| 100 | 单次合并的最大分区数 |
| 8 | 系统繁忙时限制大型合并操作 |
| 16 | 合并线程数 |
| 300 | 触发错误的阈值 |
| 150 | 触发延迟的阈值 |
| 150GB | 最大合并数据量 |