altinity-expert-clickhouse-metrics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Real-Time Metrics Monitoring

实时指标监控

Real-time monitoring of ClickHouse metrics, events, and asynchronous metrics.

实时监控ClickHouse指标、事件及异步指标。

Diagnostics

诊断

Run all queries from the file checks.sql and analyze the results.

运行checks.sql文件中的所有查询并分析结果。

Ad-Hoc Query Guidelines

临时查询指南

Key Tables

核心表

  • system.metrics
    - Current gauge values
  • system.events
    - Cumulative counters since restart
  • system.asynchronous_metrics
    - System-level metrics
  • system.metric_log
    - Historical metrics
  • system.asynchronous_metric_log
    - Historical async metrics
  • system.metrics
    - 当前仪表盘数值
  • system.events
    - 重启后的累计计数器
  • system.asynchronous_metrics
    - 系统级指标
  • system.metric_log
    - 历史指标
  • system.asynchronous_metric_log
    - 历史异步指标

Useful Patterns

实用查询模式

sql
-- Find metrics by pattern
select * from system.metrics where metric like '%pattern%'
select * from system.asynchronous_metrics where metric like '%pattern%'
select * from system.events where event like '%pattern%'

sql
-- 按模式查找指标
select * from system.metrics where metric like '%pattern%'
select * from system.asynchronous_metrics where metric like '%pattern%'
select * from system.events where event like '%pattern%'

Cross-Module Triggers

跨模块触发器

FindingLoad ModuleReason
High memory metrics
altinity-expert-clickhouse-memory
Memory analysis
High replica delay
altinity-expert-clickhouse-replication
Replication issues
High parts count
altinity-expert-clickhouse-merges
Merge backlog
High load average
altinity-expert-clickhouse-reporting
Query analysis
High connections
altinity-expert-clickhouse-reporting
Connection analysis

发现项加载模块原因
内存指标过高
altinity-expert-clickhouse-memory
内存分析
副本延迟过高
altinity-expert-clickhouse-replication
复制问题
分区数量过高
altinity-expert-clickhouse-merges
合并积压
负载均值过高
altinity-expert-clickhouse-reporting
查询分析
连接数过高
altinity-expert-clickhouse-reporting
连接分析

Monitoring Recommendations

监控建议

Key Metrics to Alert On

需设置告警的核心指标

MetricWarningCritical
ReadonlyReplica
-> 0
Query
> 75% max> 90% max
MemoryResident
> 80% RAM> 90% RAM
MaxPartCountForPartition
> parts_to_delay> parts_to_throw
ReplicasMaxAbsoluteDelay
> 5 min> 1 hour
LoadAverage1
> CPU count> 2x CPU count
指标警告阈值严重阈值
ReadonlyReplica
-> 0
Query
> 最大阈值的75%> 最大阈值的90%
MemoryResident
> 内存的80%> 内存的90%
MaxPartCountForPartition
> parts_to_delay> parts_to_throw
ReplicasMaxAbsoluteDelay
> 5分钟> 1小时
LoadAverage1
> CPU核心数> 2倍CPU核心数

Prometheus/Grafana Export

Prometheus/Grafana 导出

ClickHouse exposes metrics at
:9363/metrics
in Prometheus format when enabled.
当启用后,ClickHouse会在
:9363/metrics
地址以Prometheus格式暴露指标。