altinity-expert-clickhouse-mutations

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Mutation Tracking and Analysis

Mutation跟踪与分析

Track and diagnose ALTER UPDATE, ALTER DELETE, and other mutation operations.

跟踪并诊断ALTER UPDATE、ALTER DELETE及其他mutation操作。

Diagnostics

诊断

Run all queries from the file checks.sql and analyze the results.

运行checks.sql文件中的所有查询并分析结果。

Problem Investigation

问题排查

Why Is Mutation Stuck?

为什么Mutation会卡住?

Check for competing operations using these ad-hoc queries:
sql
-- Active merges on same table
select
    database,
    table,
    is_mutation,
    elapsed,
    progress,
    num_parts
from system.merges
where database = '{database}' and table = '{table}'
sql
-- Replication queue for same table
select
    type,
    create_time,
    is_currently_executing,
    num_tries,
    last_exception
from system.replication_queue
where database = '{database}' and table = '{table}'
order by create_time
limit 20
sql
-- Part mutations status
select
    name,
    active,
    mutation_version,
    modification_time
from system.parts
where database = '{database}' and table = '{table}'
order by mutation_version desc
limit 30

使用以下临时查询检查是否存在竞争操作:
sql
-- 同一张表上的活跃合并操作
select
    database,
    table,
    is_mutation,
    elapsed,
    progress,
    num_parts
from system.merges
where database = '{database}' and table = '{table}'
sql
-- 同一张表的复制队列
select
    type,
    create_time,
    is_currently_executing,
    num_tries,
    last_exception
from system.replication_queue
where database = '{database}' and table = '{table}'
order by create_time
limit 20
sql
-- 分区mutation状态
select
    name,
    active,
    mutation_version,
    modification_time
from system.parts
where database = '{database}' and table = '{table}'
order by mutation_version desc
limit 30

Canceling Mutations

取消Mutation操作

To kill a stuck mutation:
sql
-- Find mutation_id first
select mutation_id, command from system.mutations
where database = '{database}' and table = '{table}' and not is_done;

-- Then kill it
-- KILL MUTATION WHERE database = '{database}' AND table = '{table}' AND mutation_id = '{mutation_id}';
Warning: Killed mutations leave table in partially-mutated state.

要终止卡住的mutation:
sql
-- 先找到mutation_id
select mutation_id, command from system.mutations
where database = '{database}' and table = '{table}' and not is_done;

-- 然后终止它
-- KILL MUTATION WHERE database = '{database}' AND table = '{table}' AND mutation_id = '{mutation_id}';
警告: 终止的mutation会使表处于部分修改状态。

Best Practices

最佳实践

Mutation Anti-Patterns

Mutation反模式

Anti-PatternProblemSolution
Frequent small UPDATEsCreates many mutationsBatch updates together
DELETE without WHEREFull table rewriteUse TTL instead
UPDATE on high-cardinality columnSlow, lots of IORestructure data model
Many concurrent mutationsQueue builds upSerialize mutations
反模式问题解决方案
频繁的小批量UPDATE产生大量mutation批量执行更新
不带WHERE条件的DELETE全表重写使用TTL替代
在高基数列上执行UPDATE速度慢、IO量大重构数据模型
大量并发mutation队列堆积序列化mutation执行

Monitoring Mutations

监控Mutation

Set alerts for:
  • Mutations pending > 10
  • Mutation age > 1 hour
  • latest_fail_reason
    not empty

设置以下告警:
  • 待处理Mutation数量>10
  • Mutation存在时长>1小时
  • latest_fail_reason
    不为空

Cross-Module Triggers

跨模块触发

FindingLoad ModuleReason
Mutation blocked by merge
altinity-expert-clickhouse-merges
Merge backlog
Mutation OOM
altinity-expert-clickhouse-memory
Memory limits
Mutation slow due to disk
altinity-expert-clickhouse-storage
IO bottleneck
Replicated mutation stuck
altinity-expert-clickhouse-replication
Replication issues

发现问题加载模块原因
Mutation被合并操作阻塞
altinity-expert-clickhouse-merges
合并操作积压
Mutation出现OOM
altinity-expert-clickhouse-memory
内存限制
Mutation因磁盘问题变慢
altinity-expert-clickhouse-storage
IO瓶颈
复制型Mutation卡住
altinity-expert-clickhouse-replication
复制问题

Settings Reference

设置参考

SettingNotes
mutations_sync
0=async, 1=wait current replica, 2=wait all
max_mutations_in_flight
Max concurrent mutations
number_of_mutations_to_delay
Delay INSERTs threshold
number_of_mutations_to_throw
Throw error threshold
设置项说明
mutations_sync
0=异步,1=等待当前副本完成,2=等待所有副本完成
max_mutations_in_flight
最大并发mutation数
number_of_mutations_to_delay
延迟INSERT的阈值
number_of_mutations_to_throw
抛出错误的阈值