dashboard-builder
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDashboard Builder
仪表盘构建器
Use this when the task is to build a dashboard people can operate from.
The goal is not "show every metric." The goal is to answer:
- is it healthy?
- where is the bottleneck?
- what changed?
- what action should someone take?
当任务是构建可供运维人员使用的仪表盘时使用本指南。
我们的目标不是「展示所有指标」,而是要解答以下问题:
- 运行是否健康?
- 瓶颈在哪里?
- 发生了什么变更?
- 相关人员应该采取什么行动?
When to Use
适用场景
- "Build a Kafka monitoring dashboard"
- "Create a Grafana dashboard for Elasticsearch"
- "Make a SigNoz dashboard for this service"
- "Turn this metrics list into a real operational dashboard"
- 「构建Kafka监控仪表盘」
- 「为Elasticsearch创建Grafana仪表盘」
- 「为该服务制作SigNoz仪表盘」
- 「将该指标列表转化为真正可运维的仪表盘」
Guardrails
基本原则
- do not start from visual layout; start from operator questions
- do not include every available metric just because it exists
- do not mix health, throughput, and resource panels without structure
- do not ship panels without titles, units, and sane thresholds
- 不要从视觉布局开始设计,要从运维人员的实际问题出发
- 不要仅仅因为指标存在就把所有可用指标都加进来
- 不要无结构地混合健康、吞吐量和资源面板
- 不要发布没有标题、单位和合理阈值的面板
Workflow
工作流程
1. Define the operating questions
1. 定义运维相关问题
Organize around:
- health / availability
- latency / performance
- throughput / volume
- saturation / resources
- service-specific risk
围绕以下维度组织:
- 健康度 / 可用性
- 延迟 / 性能
- 吞吐量 / 流量规模
- 饱和度 / 资源占用
- 服务特有风险
2. Study the target platform schema
2. 研究目标平台schema
Inspect existing dashboards first:
- JSON structure
- query language
- variables
- threshold styling
- section layout
首先检查现有仪表盘:
- JSON结构
- 查询语言
- 变量
- 阈值样式
- 分区布局
3. Build the minimum useful board
3. 构建最小可用仪表盘
Recommended structure:
- overview
- performance
- resources
- service-specific section
推荐结构:
- 概览
- 性能
- 资源
- 服务专属板块
4. Cut vanity panels
4. 移除无实用价值的面板
Every panel should answer a real question. If it does not, remove it.
每个面板都应该能解答一个实际问题。如果不能,就删掉它。
Example Panel Sets
面板组示例
Elasticsearch
Elasticsearch
- cluster health
- shard allocation
- search latency
- indexing rate
- JVM heap / GC
- 集群健康度
- 分片分配情况
- 搜索延迟
- 索引速率
- JVM堆内存 / GC
Kafka
Kafka
- broker count
- under-replicated partitions
- messages in / out
- consumer lag
- disk and network pressure
- 代理节点数量
- 副本不足的分区
- 消息流入/流出量
- 消费者延迟
- 磁盘和网络压力
API gateway / ingress
API网关/入口
- request rate
- p50 / p95 / p99 latency
- error rate
- upstream health
- active connections
- 请求速率
- p50 / p95 / p99延迟
- 错误率
- 上游服务健康度
- 活跃连接数
Quality Checklist
质量检查清单
- valid dashboard JSON
- clear section grouping
- titles and units are present
- thresholds/status colors are meaningful
- variables exist for common filters
- default time range and refresh are sensible
- no vanity panels with no operator value
- 仪表盘JSON有效
- 板块分组清晰
- 所有面板都有标题和单位
- 阈值/状态颜色含义明确
- 提供常用过滤器对应的变量
- 默认时间范围和刷新频率设置合理
- 没有对运维人员无价值的面子工程面板
Related Skills
相关技能
research-opsbackend-patternsterminal-ops
research-opsbackend-patternsterminal-ops