signoz-explaining-dashboards

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Dashboard Explain

仪表盘解释

Prerequisites

前提条件

This skill calls SigNoz MCP server tools (
signoz:signoz_get_dashboard
,
signoz:signoz_list_dashboards
). Before running the workflow, confirm the
signoz:signoz_*
tools are available. If they are not, the SigNoz MCP server is not installed or configured — stop and direct the user to set it up: https://signoz.io/docs/ai/signoz-mcp-server/. Do not guess at a dashboard's contents from its title alone.
本技能调用SigNoz MCP服务器工具(
signoz:signoz_get_dashboard
signoz:signoz_list_dashboards
)。运行工作流前,请确认
signoz:signoz_*
工具可用。如果不可用,说明SigNoz MCP服务器未安装或配置——请停止操作并引导用户完成设置:https://signoz.io/docs/ai/signoz-mcp-server/。切勿仅通过仪表盘标题猜测其内容。

When to use

使用场景

Use this skill when the user asks to:
  • Understand, explain, or interpret an existing dashboard
  • Get a walkthrough of what panels show and why they matter
  • Know what to watch for or what healthy/unhealthy looks like on a dashboard
  • Understand the variables, filters, or queries on a dashboard
Do NOT use when:
  • User wants to modify an existing dashboard →
    signoz-modifying-dashboards
当用户提出以下需求时使用本技能:
  • 理解、解释或解读现有仪表盘
  • 了解各面板展示的内容及其重要性
  • 知晓仪表盘需要关注的要点,以及健康/异常状态的表现
  • 理解仪表盘的变量、过滤器或查询语句
请勿使用的场景:
  • 用户想要修改现有仪表盘 → 请使用
    signoz-modifying-dashboards
    技能

Instructions

操作步骤

Step 1: Identify the target dashboard

步骤1:确定目标仪表盘

Determine which dashboard the user wants explained. If the user provides a dashboard name, UUID, or it is clear from context (e.g., an @mention or auto-context providing a dashboard resource), use that.
If the target dashboard is ambiguous:
  1. Call
    signoz:signoz_list_dashboards
    to list existing dashboards. Paginate through all pages — check
    pagination.hasMore
    in the response. If
    hasMore
    is true, call again with
    offset
    set to
    pagination.nextOffset
    and repeat until all pages are exhausted. Never stop at the first page.
  2. Present matching candidates to the user and ask which one to explain.
明确用户需要解释的仪表盘。如果用户提供了仪表盘名称、UUID,或上下文已明确(例如@提及或自动上下文提供了仪表盘资源),则直接使用该信息。
若目标仪表盘不明确:
  1. 调用
    signoz:signoz_list_dashboards
    列出所有现有仪表盘。需遍历所有分页——检查响应中的
    pagination.hasMore
    字段。如果
    hasMore
    为true,则使用
    offset
    设置为
    pagination.nextOffset
    再次调用,重复此操作直到获取所有页面内容。切勿仅停留在第一页。
  2. 向用户展示匹配的候选仪表盘,并询问需要解释哪一个。

Step 2: Fetch the full dashboard configuration

步骤2:获取完整的仪表盘配置

Call
signoz:signoz_get_dashboard
with the dashboard UUID. This is mandatory — you need the complete JSON to explain the dashboard accurately. Never guess based on the title alone.
Examine the response to understand:
  • title
    ,
    description
    ,
    tags
    — the dashboard identity and author-provided context
  • variables
    — dashboard-level filters (dropdowns the user can change)
  • widgets
    — the panels, their types, titles, and queries
  • layout
    — how panels are arranged in the 12-column grid
  • panelMap
    — which panels belong to which row sections
调用
signoz:signoz_get_dashboard
并传入仪表盘UUID。这一步是必须的——你需要完整的JSON配置才能准确解释仪表盘。切勿仅通过标题猜测内容。
分析响应内容以理解:
  • title
    description
    tags
    ——仪表盘的标识和作者提供的上下文
  • variables
    ——仪表盘级别的过滤器(用户可更改的下拉选项)
  • widgets
    ——各个面板、其类型、标题和查询语句
  • layout
    ——面板在12列网格中的排列方式
  • panelMap
    ——哪些面板属于同一行区域

Step 3: Build the explanation

步骤3:构建解释内容

Structure your explanation in this order:
1. Overview — One paragraph summarizing the dashboard's purpose, what it monitors, and what data sources it draws from (metrics, traces, logs). Mention the
tags
if they provide useful context.
2. Variables and filters — Explain each variable:
  • Name and what it filters (e.g., "The
    service_name
    variable filters all panels to a specific service")
  • Type: DYNAMIC (auto-populated from telemetry), QUERY (SQL-driven dropdown), or TEXTBOX (free-form input)
  • Whether it supports multi-select and has "ALL" option
  • Note if any panels do NOT reference a variable in their filters — changing that variable dropdown would not affect those panels, which can be confusing
3. Panel-by-panel walkthrough — Group panels by their row sections using the
panelMap
structure (row widget titles are the section headers). If the dashboard has no rows (empty
panelMap
), walk through panels in layout order (by
y
then
x
position) and organize by logical theme. For each panel:
  • Title and panel type (graph, value, table, bar, pie, histogram, list)
  • What it shows — interpret the query in plain language. For builder queries, explain the metric/data source, aggregation, filters, and groupBy. For formulas, explain each sub-query and how the formula combines them. For ClickHouse SQL or PromQL, translate the query intent into plain English.
  • What to watch for — describe what healthy looks like and what patterns indicate trouble. Be specific: "sustained usage above 80% means..." not just "watch if it's high". Anchor advice to the actual metric being queried, not generic domain knowledge.
  • Unit — mention the y-axis unit so the user knows how to read the values
For panels with complex queries:
  • Formulas (queryFormulas): explain each sub-query (A, B, ...) separately, then explain what the formula computes and why
  • Multiple queries on one panel: explain each query and how they relate
  • Functions (rate, derivative, clampMin/Max, timeShift): explain the transform in plain terms (e.g., "rate converts the raw counter into a per-second value")
4. Dashboard health observations — After the walkthrough, note any structural issues you spotted:
  • Panels with no queries or empty/disabled queries
  • Variables defined but not referenced in any panel filter
  • Panels missing thresholds where they would be useful (e.g., utilization panels without a saturation warning line)
  • Counters displayed without a rate function (raw counters produce ever-increasing ramps, not operational rates)
  • Very wide step intervals that could hide spikes
  • Panels with high-cardinality groupBy that may produce unreadable charts
5. Coverage gaps — Based on what the dashboard actually monitors, note significant observability areas that are absent. Only mention gaps that are directly related to the technology or domain the dashboard covers — do not speculate about unrelated areas. Frame as suggestions: "You may want to consider adding panels for X to cover Y."
按以下顺序组织解释:
1. 概述——用一段文字总结仪表盘的用途、监控对象以及数据来源(指标、链路追踪、日志)。如果
tags
提供了有用的上下文,请提及。
2. 变量与过滤器——解释每个变量:
  • 名称及其过滤的内容(例如:“
    service_name
    变量用于将所有面板过滤到特定服务”)
  • 类型:DYNAMIC(从遥测数据自动填充)、QUERY(基于SQL的下拉选项)或TEXTBOX(自由输入框)
  • 是否支持多选以及是否有“全部”选项
  • 注意是否有面板未在过滤器中引用变量——更改该变量的下拉选项不会影响这些面板,这可能会造成混淆
3. 逐面板讲解——利用
panelMap
结构按行区域分组面板(行组件标题为区域标题)。如果仪表盘没有行(
panelMap
为空),则按布局顺序(先按
y
坐标,再按
x
坐标)遍历面板,并按逻辑主题分组。对于每个面板:
  • 标题面板类型(图表、数值、表格、柱状图、饼图、直方图、列表)
  • 展示内容——用通俗易懂的语言解读查询语句。对于构建器查询,解释指标/数据源、聚合方式、过滤器和分组依据。对于公式,解释每个子查询以及公式如何组合它们。对于ClickHouse SQL或PromQL,将查询意图转化为简单的英文描述。
  • 关注要点——描述健康状态的表现以及哪些模式表示存在问题。要具体:例如“持续使用率超过80%意味着……”,而不仅仅是“如果数值过高请关注”。建议需基于实际查询的指标,而非通用领域知识。
  • 单位——提及Y轴单位,以便用户理解数值含义
对于包含复杂查询的面板:
  • 公式(queryFormulas):分别解释每个子查询(A、B……),然后解释公式计算的内容及其原因
  • 单面板多查询:解释每个查询以及它们之间的关联
  • 函数(rate、derivative、clampMin/Max、timeShift):用简单的语言解释转换逻辑(例如:“rate函数将原始计数器转换为每秒数值”)
4. 仪表盘健康状况观察——讲解完成后,指出发现的任何结构问题:
  • 无查询语句或查询语句为空/禁用的面板
  • 已定义但未在任何面板过滤器中引用的变量
  • 缺少阈值的面板(例如:利用率面板未设置饱和警告线)
  • 未使用rate函数展示的计数器(原始计数器会产生持续上升的曲线,而非运维所需的速率)
  • 时间步长过大可能隐藏峰值
  • 分组依据基数过高导致图表难以阅读的面板
5. 覆盖缺口——基于仪表盘实际监控的内容,指出明显缺失的可观测性领域。仅提及与仪表盘覆盖的技术或领域直接相关的缺口——不要推测无关领域。以建议的方式表述:“你可能需要考虑添加X面板以覆盖Y场景。”

Step 4: Offer next steps

步骤4:提供后续操作建议

After the explanation, offer actionable follow-ups:
  • "Want me to run the queries from any specific panel to check if they're returning data?"
  • "Want me to add any missing panels or thresholds?" (→
    signoz-modifying-dashboards
    )
解释完成后,提供可执行的后续选项:
  • “需要我运行特定面板的查询语句以检查是否返回数据吗?”
  • “需要我添加缺失的面板或阈值吗?”(→ 调用
    signoz-modifying-dashboards
    技能)

Guardrails

约束规则

  • Fetch before explaining: Always call
    signoz:signoz_get_dashboard
    to get the full configuration. Never explain based on the dashboard title or listing alone.
  • Interpret, don't dump: Translate queries into plain operational language. Never show raw query JSON to the user unless they specifically ask.
  • Anchor to actual content: Base "what to watch for" advice on the actual metrics and queries in the dashboard, not on generic domain knowledge unrelated to the panels present.
  • Group by sections: Use the
    panelMap
    row structure to group panels, not layout coordinates. The row titles are the section headers the dashboard author intended.
  • No data queries by default: Do not run live queries unless the user asks. The explain skill is about understanding the dashboard structure, not inspecting current data.
  • Paginate dashboard listing: When searching for a dashboard by name, always paginate through all pages of
    signoz:signoz_list_dashboards
    before concluding a dashboard does not exist.
  • All query types: Handle builder, ClickHouse SQL, and PromQL queries — each requires a different interpretation approach. For builder queries, read the aggregations, filter expression, and groupBy. For raw SQL/PromQL, parse the query string and explain the intent.
  • Scope boundary: This skill explains dashboards. If the user wants to change something after your explanation, redirect to
    signoz-modifying-dashboards
    .
  • 先获取再解释:务必调用
    signoz:signoz_get_dashboard
    获取完整配置。切勿仅通过仪表盘标题或列表内容进行解释。
  • 解读而非罗列:将查询语句转换为通俗易懂的运维语言。除非用户明确要求,否则不要向用户展示原始查询JSON。
  • 基于实际内容:“关注要点”的建议需基于仪表盘内的实际指标和查询语句,而非与现有面板无关的通用领域知识。
  • 按区域分组:使用
    panelMap
    的行结构对面板分组,而非布局坐标。行标题是仪表盘作者预设的区域标题。
  • 默认不查询数据**:除非用户要求,否则不要运行实时查询。本解释技能的目的是理解仪表盘结构,而非检查当前数据。
  • 遍历仪表盘列表分页:按名称搜索仪表盘时,务必遍历
    signoz:signoz_list_dashboards
    的所有页面,再判定仪表盘不存在。
  • 支持所有查询类型:处理构建器查询、ClickHouse SQL和PromQL查询——每种查询需要不同的解读方式。对于构建器查询,需解读聚合方式、过滤表达式和分组依据。对于原始SQL/PromQL,需解析查询字符串并解释其意图。
  • 范围边界:本技能仅用于解释仪表盘。如果用户在解释后想要修改内容,请引导至
    signoz-modifying-dashboards
    技能。

Examples

示例

User: "Explain my PostgreSQL dashboard"
Agent:
  1. Calls
    signoz:signoz_list_dashboards
    (paginates all pages) — finds "PostgreSQL Overview" dashboard with UUID
    abc-123
    .
  2. Calls
    signoz:signoz_get_dashboard
    with UUID
    abc-123
    — gets full configuration.
  3. Provides structured explanation:
    • Overview: "This dashboard monitors PostgreSQL database health across connections, query performance, buffer cache efficiency, and replication. It uses metrics from the OpenTelemetry PostgreSQL receiver."
    • Variables: "Two variables —
      host_name
      (DYNAMIC, filters by
      host.name
      ) and
      database
      (DYNAMIC, filters by
      postgresql.database.name
      ). Both support multi-select."
    • Panels by section: Walks through each row section, explaining every panel's metric, aggregation, and what to watch for.
    • Health observations: "The 'Connection Count' panel has no threshold configured — consider adding a line at 80% of your max_connections."
    • Coverage gaps: "No panels for vacuum/autovacuum health or lock contention. Consider adding these if you manage write-heavy workloads."
  4. Offers: "Want me to run any panel queries to check for data, or add the suggested panels?"

User: "What does each panel on this dashboard show?" (with dashboard context provided via @mention or auto-context)
Agent:
  1. Extracts dashboard UUID from the provided context.
  2. Calls
    signoz:signoz_get_dashboard
    — gets full configuration.
  3. Provides a focused panel-by-panel walkthrough grouped by row sections, explaining what each panel shows and what to watch for.
  4. Skips the health/gaps sections unless something notable stands out, since the user asked specifically about panels.

User: "Help me understand this LLM monitoring dashboard — I'm new to it"
Agent:
  1. Identifies the dashboard, fetches full config.
  2. Provides a beginner-friendly explanation with more context per panel:
    • Explains what the metrics mean (e.g., "Token usage measures how many tokens your LLM calls consume, which directly impacts cost")
    • Explains what the variables control and recommends starting with the "ALL" option to see the full picture before filtering
    • Highlights the most important panels to watch daily vs. those useful only during debugging
  3. Offers to set up alerts on critical panels.
用户:“解释我的PostgreSQL仪表盘”
Agent:
  1. 调用
    signoz:signoz_list_dashboards
    (遍历所有分页)——找到UUID为
    abc-123
    的“PostgreSQL Overview”仪表盘。
  2. 调用
    signoz:signoz_get_dashboard
    并传入UUID
    abc-123
    ——获取完整配置。
  3. 提供结构化解释:
    • 概述:“本仪表盘从连接情况、查询性能、缓冲缓存效率和复制状态等维度监控PostgreSQL数据库健康状况。它使用OpenTelemetry PostgreSQL接收器采集的指标数据。”
    • 变量:“包含两个变量——
      host_name
      (DYNAMIC类型,按
      host.name
      过滤)和
      database
      (DYNAMIC类型,按
      postgresql.database.name
      过滤)。两者均支持多选。”
    • 按区域讲解面板:逐个讲解每个行区域的面板,解释每个面板的指标、聚合方式和关注要点。
    • 健康状况观察:“‘连接数’面板未配置阈值——建议在最大连接数的80%位置添加一条警告线。”
    • 覆盖缺口:“缺少清理/自动清理健康状态或锁竞争的面板。如果您管理写密集型工作负载,建议添加这些面板。”
  4. 提供选项:“需要我运行任何面板的查询语句检查数据,或添加建议的面板吗?”

用户:“这个仪表盘的每个面板都展示了什么?”(通过@提及或自动上下文提供了仪表盘信息)
Agent:
  1. 从提供的上下文中提取仪表盘UUID。
  2. 调用
    signoz:signoz_get_dashboard
    ——获取完整配置。
  3. 提供聚焦于面板的逐面板讲解,按行区域分组,解释每个面板的展示内容和关注要点。
  4. 除非发现明显问题,否则跳过健康状况/覆盖缺口部分,因为用户明确询问的是面板内容。

用户:“帮我理解这个LLM监控仪表盘——我是新手”
Agent:
  1. 确定仪表盘,获取完整配置。
  2. 提供适合初学者的解释,每个面板附带更多上下文:
    • 解释指标含义(例如:“Token用量衡量您的LLM调用消耗的Token数量,这直接影响成本”)
    • 解释变量的作用,并建议先选择“全部”选项查看整体情况,再进行过滤
    • 突出显示日常需要关注的重要面板,以及仅在调试时有用的面板
  3. 提供设置关键面板告警的选项。