signoz-generating-queries

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Query Generate

查询生成

Prerequisites

前置条件

This skill calls SigNoz MCP server tools heavily (
signoz:signoz_execute_builder_query
,
signoz:signoz_query_metrics
,
signoz:signoz_search_logs
,
signoz:signoz_search_traces
,
signoz:signoz_aggregate_logs
,
signoz:signoz_aggregate_traces
,
signoz:signoz_get_field_keys
,
signoz:signoz_get_field_values
,
signoz:signoz_list_metrics
,
signoz:signoz_list_services
,
signoz:signoz_get_service_top_operations
,
signoz:signoz_get_trace_details
). Before running the workflow, confirm the
signoz:signoz_*
tools are available. If they are not, the SigNoz MCP server is not installed or configured — stop and direct the user to set it up: https://signoz.io/docs/ai/signoz-mcp-server/. Do not fall back to raw HTTP calls or fabricate query results without the MCP tools.
本技能会大量调用SigNoz MCP服务器工具(
signoz:signoz_execute_builder_query
signoz:signoz_query_metrics
signoz:signoz_search_logs
signoz:signoz_search_traces
signoz:signoz_aggregate_logs
signoz:signoz_aggregate_traces
signoz:signoz_get_field_keys
signoz:signoz_get_field_values
signoz:signoz_list_metrics
signoz:signoz_list_services
signoz:signoz_get_service_top_operations
signoz:signoz_get_trace_details
)。在运行工作流之前,请确认
signoz:signoz_*
工具可用。如果不可用,说明SigNoz MCP服务器未安装或配置——请停止操作并引导用户进行设置:https://signoz.io/docs/ai/signoz-mcp-server/。请勿在没有MCP工具的情况下退回到原始HTTP调用或编造查询结果。

When to use

使用场景

Use this skill when the user asks to:
  • Query, search, or look up observability data (traces, logs, metrics)
  • Compute aggregations (error rate, p99 latency, request count, throughput)
  • Find specific log entries, traces, or metric values
  • Investigate patterns (spikes, drops, trends over time)
Do NOT use when:
  • User wants raw ClickHouse SQL for a dashboard panel (custom joins, window functions, regex over log bodies) — that's a separate dashboard-panel SQL workflow, not this skill.
当用户要求执行以下操作时使用本技能:
  • 查询、搜索或查找可观测性数据(链路追踪、日志、指标)
  • 计算聚合值(错误率、p99延迟、请求数、吞吐量)
  • 查找特定日志条目、链路追踪或指标值
  • 调查模式(峰值、下降、随时间变化的趋势)
请勿使用本技能的场景:
  • 用户需要用于仪表板面板的原始ClickHouse SQL(自定义关联、窗口函数、日志正文正则匹配)——这属于单独的仪表板面板SQL工作流,而非本技能的适用范围。

Instructions

操作步骤

Step 1: Determine the signal type

步骤1:确定信号类型

Map the user's intent to the right signal:
User intentSignalWhy
Error rate, latency, throughput, request countmetrics (preferred) or tracesMetrics are pre-aggregated and fastest. Use traces if the user needs per-request detail or no matching metric exists.
p50/p75/p90/p95/p99 latencymetrics (histogram) or traces (aggregate on
durationNano
)
Prefer metrics if a histogram metric exists (e.g.,
signoz_latency_bucket
). Fall back to trace aggregation.
Find specific log entries, error messages, stack traceslogsText search, pattern matching, severity filtering.
Find specific traces, slow requests, error spanstracesPer-request detail, span attributes, duration filtering.
Infrastructure metrics (CPU, memory, disk, network)metricsAlways metrics for resource utilization.
"How many X per Y" (count/rate grouped by dimension)traces or logs (aggregate)Use
signoz:signoz_aggregate_traces
or
signoz:signoz_aggregate_logs
for grouped counts.
If the signal is genuinely ambiguous, ask using
<assistant_question>
.
将用户意图映射到正确的信号类型:
用户意图信号类型原因
错误率、延迟、吞吐量、请求数指标(优先)或链路追踪指标是预聚合的,速度最快。如果用户需要每个请求的详细信息或没有匹配的指标,则使用链路追踪。
p50/p75/p90/p95/p99延迟指标(直方图)或链路追踪(基于
durationNano
聚合)
如果存在直方图指标(如
signoz_latency_bucket
),优先使用指标。否则回退到链路追踪聚合。
查找特定日志条目、错误消息、堆栈追踪日志文本搜索、模式匹配、级别过滤。
查找特定链路追踪、慢请求、错误跨度链路追踪每个请求的详细信息、跨度属性、持续时间过滤。
基础设施指标(CPU、内存、磁盘、网络)指标资源利用率相关查询始终使用指标。
“每Y有多少X”(按维度分组的计数/速率)链路追踪日志(聚合)使用
signoz:signoz_aggregate_traces
signoz:signoz_aggregate_logs
进行分组计数。
如果信号类型确实不明确,请使用
<assistant_question>
询问用户。

Step 2: Discover available data

步骤2:发现可用数据

Always discover before querying. Use only names returned by tools — never guess from training knowledge.
Run discovery calls in parallel where possible:
  • For metrics: Call
    signoz:signoz_list_metrics
    with a
    searchText
    substring matching the user's intent (e.g.,
    searchText: "http"
    ,
    searchText: "latency"
    ). The response includes metric type, temporality, and isMonotonic — pass these to
    signoz:signoz_query_metrics
    to avoid extra lookups.
  • For traces: Call
    signoz:signoz_list_services
    to confirm the service name exists. Optionally call
    signoz:signoz_get_service_top_operations
    for the service to find operation names. Call
    signoz:signoz_get_field_keys(signal: "traces")
    if you need to filter on a non-standard attribute.
  • For logs: Call
    signoz:signoz_get_field_keys(signal: "logs")
    if filtering on attributes beyond
    body
    ,
    severity_text
    , and
    service.name
    . Call
    signoz:signoz_get_field_values
    to validate specific filter values.
If the user already provides exact field names, service names, or metric names from context (e.g., from a dashboard or @mention), skip redundant discovery.
查询前必须先发现数据。仅使用工具返回的名称——切勿根据训练知识猜测。
尽可能并行执行发现调用:
  • 针对指标:调用
    signoz:signoz_list_metrics
    ,传入与用户意图匹配的
    searchText
    子字符串(例如
    searchText: "http"
    searchText: "latency"
    )。响应包含指标类型、时间特性和isMonotonic——将这些参数传递给
    signoz:signoz_query_metrics
    以避免额外查询。
  • 针对链路追踪:调用
    signoz:signoz_list_services
    确认服务名称存在。可选地,调用
    signoz:signoz_get_service_top_operations
    获取该服务的操作名称。如果需要过滤非标准属性,调用
    signoz:signoz_get_field_keys(signal: "traces")
  • 针对日志:如果需要过滤
    body
    severity_text
    service.name
    之外的属性,调用
    signoz:signoz_get_field_keys(signal: "logs")
    。调用
    signoz:signoz_get_field_values
    验证特定过滤值。
如果用户已经从上下文(例如仪表板或@提及)中提供了确切的字段名、服务名或指标名,则跳过冗余的发现步骤。

Step 3: Choose the right tool

步骤3:选择合适的工具

Use the simplest tool that answers the question:
Question typeToolWhen to use
Metric time series or scalar
signoz:signoz_query_metrics
Any metrics query. Handles aggregation defaults automatically. Supports formulas via
formula
+
formulaQueries
params.
Log search (find matching entries)
signoz:signoz_search_logs
Finding specific log lines. Use
searchText
for body text,
query
for field filters,
severity
for level filtering.
Trace search (find matching spans)
signoz:signoz_search_traces
Finding specific traces/spans. Use
service
,
operation
,
error
,
minDuration
/
maxDuration
shortcuts plus
query
for field filters.
Log aggregation (count, avg, percentiles)
signoz:signoz_aggregate_logs
"How many errors?", "error count by service", "p99 response time from logs". Set
requestType
to
scalar
for totals or
time_series
for trends.
Trace aggregation (count, avg, percentiles)
signoz:signoz_aggregate_traces
"p99 latency for checkout", "error count per operation", "request rate by endpoint". Set
requestType
to
scalar
for totals or
time_series
for trends.
Complex multi-query or formula
signoz:signoz_execute_builder_query
Only when the simpler tools above cannot express the query — e.g., joining multiple data sources, complex filter expressions, or queries needing the full Query Builder v5 schema. Read
signoz://traces/query-builder-guide
before using.
requestType
decision for aggregations:
  • scalar
    (default): "How many?", "What is the p99?", "Which service has the most?"
  • time_series
    : "When did errors spike?", "How did latency change?", "Show trend"
  • If the question has ANY temporal component (spike, trend, change), use
    time_series
使用能回答问题的最简单工具:
问题类型工具使用场景
指标时间序列或标量
signoz:signoz_query_metrics
任何指标查询。自动处理聚合默认值。支持通过
formula
+
formulaQueries
参数使用公式。
日志搜索(查找匹配条目)
signoz:signoz_search_logs
查找特定日志行。使用
searchText
搜索正文文本,
query
进行字段过滤,
severity
进行级别过滤。
链路追踪搜索(查找匹配跨度)
signoz:signoz_search_traces
查找特定链路追踪/跨度。使用
service
operation
error
minDuration
/
maxDuration
快捷参数,加上
query
进行字段过滤。
日志聚合(计数、平均值、百分位数)
signoz:signoz_aggregate_logs
“有多少错误?”“按服务统计错误数”“从日志中获取p99响应时间”。将
requestType
设置为
scalar
以获取总计,设置为
time_series
以获取趋势。
链路追踪聚合(计数、平均值、百分位数)
signoz:signoz_aggregate_traces
“结账服务的p99延迟”“按操作统计错误数”“按端点统计请求速率”。将
requestType
设置为
scalar
以获取总计,设置为
time_series
以获取趋势。
复杂多查询或公式
signoz:signoz_execute_builder_query
仅当上述简单工具无法表达查询时使用——例如,连接多个数据源、复杂过滤表达式或需要完整Query Builder v5 schema的查询。使用前请阅读
signoz://traces/query-builder-guide
聚合操作的
requestType
决策:
  • scalar
    (默认):“有多少?”“p99值是多少?”“哪个服务的数量最多?”
  • time_series
    :“错误何时出现峰值?”“延迟如何变化?”“显示趋势”
  • 如果问题包含任何时间相关成分(峰值、趋势、变化),请使用
    time_series

Step 4: Execute the query

步骤4:执行查询

  • Always include
    searchContext
    with the user's original question — it improves result relevance.
  • Default time range is last 1 hour. Respect the user's time range if specified. Convert relative times ("last 6 hours", "yesterday") to
    timeRange
    param format (e.g.,
    6h
    ,
    24h
    ) or Unix millisecond
    start
    /
    end
    .
  • Use shortcut parameters (
    service
    ,
    severity
    ,
    operation
    ,
    error
    ) when they match the user's filters — they are simpler and less error-prone than building
    query
    expressions.
  • Combine shortcut params with
    query
    /
    filter
    for additional constraints — they are ANDed together.
  • For
    signoz:signoz_query_metrics
    , pass
    metricType
    ,
    temporality
    , and
    isMonotonic
    from the
    signoz:signoz_list_metrics
    response to avoid an extra auto-fetch round trip.
  • 始终包含
    searchContext
    ,传入用户的原始问题——这有助于提高结果相关性。
  • 默认时间范围为过去1小时。如果用户指定了时间范围,请遵循用户的设置。将相对时间(“过去6小时”“昨天”)转换为
    timeRange
    参数格式(例如
    6h
    24h
    )或Unix毫秒级的
    start
    /
    end
  • 当快捷参数(
    service
    severity
    operation
    error
    )与用户的过滤条件匹配时,请使用这些参数——它们比构建
    query
    表达式更简单且不易出错。
  • 将快捷参数与
    query
    /
    filter
    结合使用以添加额外约束——它们是逻辑与的关系。
  • 对于
    signoz:signoz_query_metrics
    ,传递来自
    signoz:signoz_list_metrics
    响应的
    metricType
    temporality
    isMonotonic
    ,以避免额外的自动获取往返。

Step 5: Handle results

步骤5:处理结果

Data returned:
  • Present findings as neutral observations with timestamps and values.
  • Include the time range in your response.
  • For aggregations with
    groupBy
    , highlight the top entries and mention total group count if truncated by
    limit
    .
  • For search results, summarize patterns rather than listing every entry.
No data returned — apply three-way distinction:
  1. Healthy zero: The query ran successfully but the count is zero. Say so: "No errors found for checkout-service in the last hour — error count is zero."
  2. No data in range: The field/metric exists but no data points fall in the time window. Suggest expanding: "No data in the last hour. Try a wider range?"
  3. Missing instrumentation: The metric, field, or service doesn't exist in discovery results. Say what's missing and suggest how to instrument.
Drill-down:
  • If an aggregation reveals an interesting pattern (spike, outlier service), offer to drill into individual traces or logs for that scope.
  • If a trace search returns interesting spans, offer to fetch full trace details via
    signoz:signoz_get_trace_details
    .
返回的数据:
  • 以中立的观察结果呈现发现,包含时间戳和数值。
  • 在响应中包含时间范围。
  • 对于带有
    groupBy
    的聚合,突出显示排名靠前的条目,如果结果被
    limit
    截断,请提及分组总数。
  • 对于搜索结果,总结模式而非列出每个条目。
未返回数据——区分三种情况:
  1. 正常零值:查询成功运行但计数为零。直接说明:“过去一小时内未在checkout-service中发现错误——错误计数为零。”
  2. 时间范围内无数据:字段/指标存在但时间窗口内没有数据点。建议扩大范围:“过去一小时内无数据。尝试扩大时间范围?”
  3. 缺少埋点:指标、字段或服务未在发现结果中存在。说明缺少的内容并建议如何进行埋点。
向下钻取:
  • 如果聚合结果显示出有趣的模式(峰值、异常服务),主动提供针对该范围的单个链路追踪或日志的钻取查询。
  • 如果链路追踪搜索返回了有趣的跨度,主动提供通过
    signoz:signoz_get_trace_details
    获取完整链路追踪详情的服务。

Guardrails

约束规则

  • Discovery first: Never guess metric names, field names, or service names. Use discovery tools or context to confirm they exist before querying.
  • Never claim root cause: Present data patterns and correlations. Write "Error rate for checkout increased from 0.2% to 4.1% at 14:05" not "The deployment caused the errors."
  • One focused query per question: Do not scatter-shot multiple queries when one precise query answers the question. Use parallel discovery calls, but be precise for execution.
  • Respect MCP server rules: The MCP server enforces rules about resource attribute filters, filter operators, and redundant queries. Follow them — especially preferring resource attributes in filters for faster queries.
  • No raw ClickHouse SQL: Always use the Query Builder tools. Never construct raw SQL.
  • Scope boundary: This skill queries data. If the user wants to wrap the query into a recurring alert, redirect to
    signoz-creating-alerts
    .
  • Emit
    apply_filter
    on the final message.
    When the user asks you to write, build, generate, or show a query, include an
    apply_filter
    action on your final assistant message with the resolved
    compositeQuery
    from the tool result and the appropriate
    signal
    field (
    metrics
    ,
    logs
    , or
    traces
    ). This signals to the SigNoz UI that the user wants to apply the query to an explorer page. Only emit
    apply_filter
    when the user's primary intent is to obtain a runnable query — not when the user is asking a one-shot data question that the analysis text already answers.
  • 先发现后查询:切勿猜测指标名、字段名或服务名。使用发现工具或上下文确认它们存在后再进行查询。
  • 切勿断言根本原因:呈现数据模式和相关性。例如,应写“checkout的错误率在14:05从0.2%上升到4.1%”,而非“部署导致了错误”。
  • 一个问题对应一个聚焦查询:当一个精确查询可以回答问题时,不要分散执行多个查询。可以并行执行发现调用,但执行查询时要精准。
  • 遵守MCP服务器规则:MCP服务器对资源属性过滤、过滤操作符和冗余查询有规则限制。请遵循这些规则——尤其是在过滤时优先使用资源属性以提高查询速度。
  • 禁止使用原始ClickHouse SQL:始终使用Query Builder工具。切勿构建原始SQL。
  • 范围边界:本技能仅用于查询数据。如果用户希望将查询包装为定期警报,请引导至
    signoz-creating-alerts
  • 在最终消息中触发
    apply_filter
    :当用户要求编写、构建、生成或展示查询时,请在最终助手消息中包含
    apply_filter
    操作,传入工具结果中解析后的
    compositeQuery
    和相应的
    signal
    字段(
    metrics
    logs
    traces
    )。这会向SigNoz UI发出信号,表明用户希望将该查询应用到探索页面。仅当用户的主要意图是获取可运行的查询时才触发
    apply_filter
    ——当用户询问的是一次性数据问题且分析文本已给出答案时,则无需触发。

Examples

示例

User: "Show me the error rate for the checkout service in the last hour"
Agent:
  1. Calls
    signoz:signoz_list_metrics(searchText: "error")
    — finds
    signoz_calls_total
    metric.
  2. Calls
    signoz:signoz_query_metrics
    with
    metricName: "signoz_calls_total"
    ,
    filter: "service.name = 'checkout'"
    ,
    formula: "A / B * 100"
    where A is error count and B is total count,
    timeRange: "1h"
    .
  3. Presents: "Error rate for checkout-service: 2.3% over the last hour (14:00– 15:00 UTC). 47 errors out of 2,041 total requests."
  4. Offers drill-down: "Want me to check which operations have the highest error rate?"

User: "Find timeout errors in logs"
Agent:
  1. Calls
    signoz:signoz_search_logs(searchText: "timeout", severity: "ERROR", timeRange: "1h")
    .
  2. Presents: "Found 23 timeout errors in the last hour. Most are from payment-service (18) and gateway (5). Common pattern: 'context deadline exceeded' in gRPC calls to downstream-api."
  3. Offers: "Want me to check traces for these services to see where the timeouts occur?"

User: "What's the p99 latency for the cart service?"
Agent:
  1. Calls
    signoz:signoz_aggregate_traces(aggregation: "p99", aggregateOn: "durationNano", service: "cart-service", requestType: "scalar", timeRange: "1h")
    .
  2. Presents: "p99 latency for cart-service: 1.2s over the last hour."
  3. Offers: "Want me to break this down by operation or show the trend over time?"

User: "When did errors spike for the frontend?"
Agent:
  1. Calls
    signoz:signoz_aggregate_traces(aggregation: "count", error: "true", service: "frontend", requestType: "time_series", timeRange: "6h")
    .
  2. Presents: "Error count for frontend over the last 6 hours. Spike at 11:30 UTC — error count jumped from ~5/min to ~45/min, returning to baseline by 12:15."
  3. Offers: "Want me to check what error types appeared during the spike?"
用户:“显示过去一小时内结账服务的错误率”
Agent:
  1. 调用
    signoz:signoz_list_metrics(searchText: "error")
    ——找到
    signoz_calls_total
    指标。
  2. 调用
    signoz:signoz_query_metrics
    ,参数为
    metricName: "signoz_calls_total"
    filter: "service.name = 'checkout'"
    formula: "A / B * 100"
    (其中A是错误计数,B是总请求数)、
    timeRange: "1h"
  3. 呈现结果:“checkout-service的错误率:过去一小时(UTC时间14:00–15:00)为2.3%。2041次总请求中出现47次错误。”
  4. 提供钻取选项:“需要我查看哪些操作的错误率最高吗?”

用户:“在日志中查找超时错误”
Agent:
  1. 调用
    signoz:signoz_search_logs(searchText: "timeout", severity: "ERROR", timeRange: "1h")
  2. 呈现结果:“过去一小时内发现23次超时错误。其中大多数来自payment-service(18次)和gateway(5次)。常见模式:gRPC调用downstream-api时出现'context deadline exceeded'。”
  3. 提供选项:“需要我查看这些服务的链路追踪以确定超时发生的位置吗?”

用户:“购物车服务的p99延迟是多少?”
Agent:
  1. 调用
    signoz:signoz_aggregate_traces(aggregation: "p99", aggregateOn: "durationNano", service: "cart-service", requestType: "scalar", timeRange: "1h")
  2. 呈现结果:“cart-service的p99延迟:过去一小时为1.2秒。”
  3. 提供选项:“需要我按操作拆分该数据或显示随时间变化的趋势吗?”

用户:“前端的错误何时出现峰值?”
Agent:
  1. 调用
    signoz:signoz_aggregate_traces(aggregation: "count", error: "true", service: "frontend", requestType: "time_series", timeRange: "6h")
  2. 呈现结果:“过去6小时内前端的错误计数。UTC时间11:30出现峰值——错误计数从约5次/分钟跃升至约45次/分钟,12:15恢复到基线水平。”
  3. 提供选项:“需要我查看峰值期间出现的错误类型吗?”