signoz-generating-queries
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseQuery Generate
查询生成
Prerequisites
前置条件
This skill calls SigNoz MCP server tools heavily (,
, , ,
, , ,
, , ,
, ). Before
running the workflow, confirm the tools are available. If they
are not, the SigNoz MCP server is not installed or configured — stop and
direct the user to set it up: https://signoz.io/docs/ai/signoz-mcp-server/.
Do not fall back to raw HTTP calls or fabricate query results without the
MCP tools.
signoz:signoz_execute_builder_querysignoz:signoz_query_metricssignoz:signoz_search_logssignoz:signoz_search_tracessignoz:signoz_aggregate_logssignoz:signoz_aggregate_tracessignoz:signoz_get_field_keyssignoz:signoz_get_field_valuessignoz:signoz_list_metricssignoz:signoz_list_servicessignoz:signoz_get_service_top_operationssignoz:signoz_get_trace_detailssignoz:signoz_*本技能会大量调用SigNoz MCP服务器工具(、、、、、、、、、、、)。在运行工作流之前,请确认工具可用。如果不可用,说明SigNoz MCP服务器未安装或配置——请停止操作并引导用户进行设置:https://signoz.io/docs/ai/signoz-mcp-server/。请勿在没有MCP工具的情况下退回到原始HTTP调用或编造查询结果。
signoz:signoz_execute_builder_querysignoz:signoz_query_metricssignoz:signoz_search_logssignoz:signoz_search_tracessignoz:signoz_aggregate_logssignoz:signoz_aggregate_tracessignoz:signoz_get_field_keyssignoz:signoz_get_field_valuessignoz:signoz_list_metricssignoz:signoz_list_servicessignoz:signoz_get_service_top_operationssignoz:signoz_get_trace_detailssignoz:signoz_*When to use
使用场景
Use this skill when the user asks to:
- Query, search, or look up observability data (traces, logs, metrics)
- Compute aggregations (error rate, p99 latency, request count, throughput)
- Find specific log entries, traces, or metric values
- Investigate patterns (spikes, drops, trends over time)
Do NOT use when:
- User wants raw ClickHouse SQL for a dashboard panel (custom joins, window functions, regex over log bodies) — that's a separate dashboard-panel SQL workflow, not this skill.
当用户要求执行以下操作时使用本技能:
- 查询、搜索或查找可观测性数据(链路追踪、日志、指标)
- 计算聚合值(错误率、p99延迟、请求数、吞吐量)
- 查找特定日志条目、链路追踪或指标值
- 调查模式(峰值、下降、随时间变化的趋势)
请勿使用本技能的场景:
- 用户需要用于仪表板面板的原始ClickHouse SQL(自定义关联、窗口函数、日志正文正则匹配)——这属于单独的仪表板面板SQL工作流,而非本技能的适用范围。
Instructions
操作步骤
Step 1: Determine the signal type
步骤1:确定信号类型
Map the user's intent to the right signal:
| User intent | Signal | Why |
|---|---|---|
| Error rate, latency, throughput, request count | metrics (preferred) or traces | Metrics are pre-aggregated and fastest. Use traces if the user needs per-request detail or no matching metric exists. |
| p50/p75/p90/p95/p99 latency | metrics (histogram) or traces (aggregate on | Prefer metrics if a histogram metric exists (e.g., |
| Find specific log entries, error messages, stack traces | logs | Text search, pattern matching, severity filtering. |
| Find specific traces, slow requests, error spans | traces | Per-request detail, span attributes, duration filtering. |
| Infrastructure metrics (CPU, memory, disk, network) | metrics | Always metrics for resource utilization. |
| "How many X per Y" (count/rate grouped by dimension) | traces or logs (aggregate) | Use |
If the signal is genuinely ambiguous, ask using .
<assistant_question>将用户意图映射到正确的信号类型:
| 用户意图 | 信号类型 | 原因 |
|---|---|---|
| 错误率、延迟、吞吐量、请求数 | 指标(优先)或链路追踪 | 指标是预聚合的,速度最快。如果用户需要每个请求的详细信息或没有匹配的指标,则使用链路追踪。 |
| p50/p75/p90/p95/p99延迟 | 指标(直方图)或链路追踪(基于 | 如果存在直方图指标(如 |
| 查找特定日志条目、错误消息、堆栈追踪 | 日志 | 文本搜索、模式匹配、级别过滤。 |
| 查找特定链路追踪、慢请求、错误跨度 | 链路追踪 | 每个请求的详细信息、跨度属性、持续时间过滤。 |
| 基础设施指标(CPU、内存、磁盘、网络) | 指标 | 资源利用率相关查询始终使用指标。 |
| “每Y有多少X”(按维度分组的计数/速率) | 链路追踪或日志(聚合) | 使用 |
如果信号类型确实不明确,请使用询问用户。
<assistant_question>Step 2: Discover available data
步骤2:发现可用数据
Always discover before querying. Use only names returned by tools — never
guess from training knowledge.
Run discovery calls in parallel where possible:
- For metrics: Call with a
signoz:signoz_list_metricssubstring matching the user's intent (e.g.,searchText,searchText: "http"). The response includes metric type, temporality, and isMonotonic — pass these tosearchText: "latency"to avoid extra lookups.signoz:signoz_query_metrics - For traces: Call to confirm the service name exists. Optionally call
signoz:signoz_list_servicesfor the service to find operation names. Callsignoz:signoz_get_service_top_operationsif you need to filter on a non-standard attribute.signoz:signoz_get_field_keys(signal: "traces") - For logs: Call if filtering on attributes beyond
signoz:signoz_get_field_keys(signal: "logs"),body, andseverity_text. Callservice.nameto validate specific filter values.signoz:signoz_get_field_values
If the user already provides exact field names, service names, or metric names
from context (e.g., from a dashboard or @mention), skip redundant discovery.
查询前必须先发现数据。仅使用工具返回的名称——切勿根据训练知识猜测。
尽可能并行执行发现调用:
- 针对指标:调用,传入与用户意图匹配的
signoz:signoz_list_metrics子字符串(例如searchText、searchText: "http")。响应包含指标类型、时间特性和isMonotonic——将这些参数传递给searchText: "latency"以避免额外查询。signoz:signoz_query_metrics - 针对链路追踪:调用确认服务名称存在。可选地,调用
signoz:signoz_list_services获取该服务的操作名称。如果需要过滤非标准属性,调用signoz:signoz_get_service_top_operations。signoz:signoz_get_field_keys(signal: "traces") - 针对日志:如果需要过滤、
body和severity_text之外的属性,调用service.name。调用signoz:signoz_get_field_keys(signal: "logs")验证特定过滤值。signoz:signoz_get_field_values
如果用户已经从上下文(例如仪表板或@提及)中提供了确切的字段名、服务名或指标名,则跳过冗余的发现步骤。
Step 3: Choose the right tool
步骤3:选择合适的工具
Use the simplest tool that answers the question:
| Question type | Tool | When to use |
|---|---|---|
| Metric time series or scalar | | Any metrics query. Handles aggregation defaults automatically. Supports formulas via |
| Log search (find matching entries) | | Finding specific log lines. Use |
| Trace search (find matching spans) | | Finding specific traces/spans. Use |
| Log aggregation (count, avg, percentiles) | | "How many errors?", "error count by service", "p99 response time from logs". Set |
| Trace aggregation (count, avg, percentiles) | | "p99 latency for checkout", "error count per operation", "request rate by endpoint". Set |
| Complex multi-query or formula | | Only when the simpler tools above cannot express the query — e.g., joining multiple data sources, complex filter expressions, or queries needing the full Query Builder v5 schema. Read |
requestType- (default): "How many?", "What is the p99?", "Which service has the most?"
scalar - : "When did errors spike?", "How did latency change?", "Show trend"
time_series - If the question has ANY temporal component (spike, trend, change), use
time_series
使用能回答问题的最简单工具:
| 问题类型 | 工具 | 使用场景 |
|---|---|---|
| 指标时间序列或标量 | | 任何指标查询。自动处理聚合默认值。支持通过 |
| 日志搜索(查找匹配条目) | | 查找特定日志行。使用 |
| 链路追踪搜索(查找匹配跨度) | | 查找特定链路追踪/跨度。使用 |
| 日志聚合(计数、平均值、百分位数) | | “有多少错误?”“按服务统计错误数”“从日志中获取p99响应时间”。将 |
| 链路追踪聚合(计数、平均值、百分位数) | | “结账服务的p99延迟”“按操作统计错误数”“按端点统计请求速率”。将 |
| 复杂多查询或公式 | | 仅当上述简单工具无法表达查询时使用——例如,连接多个数据源、复杂过滤表达式或需要完整Query Builder v5 schema的查询。使用前请阅读 |
聚合操作的决策:
requestType- (默认):“有多少?”“p99值是多少?”“哪个服务的数量最多?”
scalar - :“错误何时出现峰值?”“延迟如何变化?”“显示趋势”
time_series - 如果问题包含任何时间相关成分(峰值、趋势、变化),请使用
time_series
Step 4: Execute the query
步骤4:执行查询
- Always include with the user's original question — it improves result relevance.
searchContext - Default time range is last 1 hour. Respect the user's time range if specified.
Convert relative times ("last 6 hours", "yesterday") to param format (e.g.,
timeRange,6h) or Unix millisecond24h/start.end - Use shortcut parameters (,
service,severity,operation) when they match the user's filters — they are simpler and less error-prone than buildingerrorexpressions.query - Combine shortcut params with /
queryfor additional constraints — they are ANDed together.filter - For , pass
signoz:signoz_query_metrics,metricType, andtemporalityfrom theisMonotonicresponse to avoid an extra auto-fetch round trip.signoz:signoz_list_metrics
- 始终包含,传入用户的原始问题——这有助于提高结果相关性。
searchContext - 默认时间范围为过去1小时。如果用户指定了时间范围,请遵循用户的设置。将相对时间(“过去6小时”“昨天”)转换为参数格式(例如
timeRange、6h)或Unix毫秒级的24h/start。end - 当快捷参数(、
service、severity、operation)与用户的过滤条件匹配时,请使用这些参数——它们比构建error表达式更简单且不易出错。query - 将快捷参数与/
query结合使用以添加额外约束——它们是逻辑与的关系。filter - 对于,传递来自
signoz:signoz_query_metrics响应的signoz:signoz_list_metrics、metricType和temporality,以避免额外的自动获取往返。isMonotonic
Step 5: Handle results
步骤5:处理结果
Data returned:
- Present findings as neutral observations with timestamps and values.
- Include the time range in your response.
- For aggregations with , highlight the top entries and mention total group count if truncated by
groupBy.limit - For search results, summarize patterns rather than listing every entry.
No data returned — apply three-way distinction:
- Healthy zero: The query ran successfully but the count is zero. Say so: "No errors found for checkout-service in the last hour — error count is zero."
- No data in range: The field/metric exists but no data points fall in the time window. Suggest expanding: "No data in the last hour. Try a wider range?"
- Missing instrumentation: The metric, field, or service doesn't exist in discovery results. Say what's missing and suggest how to instrument.
Drill-down:
- If an aggregation reveals an interesting pattern (spike, outlier service), offer to drill into individual traces or logs for that scope.
- If a trace search returns interesting spans, offer to fetch full trace details
via .
signoz:signoz_get_trace_details
返回的数据:
- 以中立的观察结果呈现发现,包含时间戳和数值。
- 在响应中包含时间范围。
- 对于带有的聚合,突出显示排名靠前的条目,如果结果被
groupBy截断,请提及分组总数。limit - 对于搜索结果,总结模式而非列出每个条目。
未返回数据——区分三种情况:
- 正常零值:查询成功运行但计数为零。直接说明:“过去一小时内未在checkout-service中发现错误——错误计数为零。”
- 时间范围内无数据:字段/指标存在但时间窗口内没有数据点。建议扩大范围:“过去一小时内无数据。尝试扩大时间范围?”
- 缺少埋点:指标、字段或服务未在发现结果中存在。说明缺少的内容并建议如何进行埋点。
向下钻取:
- 如果聚合结果显示出有趣的模式(峰值、异常服务),主动提供针对该范围的单个链路追踪或日志的钻取查询。
- 如果链路追踪搜索返回了有趣的跨度,主动提供通过获取完整链路追踪详情的服务。
signoz:signoz_get_trace_details
Guardrails
约束规则
- Discovery first: Never guess metric names, field names, or service names. Use discovery tools or context to confirm they exist before querying.
- Never claim root cause: Present data patterns and correlations. Write "Error rate for checkout increased from 0.2% to 4.1% at 14:05" not "The deployment caused the errors."
- One focused query per question: Do not scatter-shot multiple queries when one precise query answers the question. Use parallel discovery calls, but be precise for execution.
- Respect MCP server rules: The MCP server enforces rules about resource attribute filters, filter operators, and redundant queries. Follow them — especially preferring resource attributes in filters for faster queries.
- No raw ClickHouse SQL: Always use the Query Builder tools. Never construct raw SQL.
- Scope boundary: This skill queries data. If the user wants to wrap the
query into a recurring alert, redirect to .
signoz-creating-alerts - Emit on the final message. When the user asks you to write, build, generate, or show a query, include an
apply_filteraction on your final assistant message with the resolvedapply_filterfrom the tool result and the appropriatecompositeQueryfield (signal,metrics, orlogs). This signals to the SigNoz UI that the user wants to apply the query to an explorer page. Only emittraceswhen the user's primary intent is to obtain a runnable query — not when the user is asking a one-shot data question that the analysis text already answers.apply_filter
- 先发现后查询:切勿猜测指标名、字段名或服务名。使用发现工具或上下文确认它们存在后再进行查询。
- 切勿断言根本原因:呈现数据模式和相关性。例如,应写“checkout的错误率在14:05从0.2%上升到4.1%”,而非“部署导致了错误”。
- 一个问题对应一个聚焦查询:当一个精确查询可以回答问题时,不要分散执行多个查询。可以并行执行发现调用,但执行查询时要精准。
- 遵守MCP服务器规则:MCP服务器对资源属性过滤、过滤操作符和冗余查询有规则限制。请遵循这些规则——尤其是在过滤时优先使用资源属性以提高查询速度。
- 禁止使用原始ClickHouse SQL:始终使用Query Builder工具。切勿构建原始SQL。
- 范围边界:本技能仅用于查询数据。如果用户希望将查询包装为定期警报,请引导至。
signoz-creating-alerts - 在最终消息中触发:当用户要求编写、构建、生成或展示查询时,请在最终助手消息中包含
apply_filter操作,传入工具结果中解析后的apply_filter和相应的compositeQuery字段(signal、metrics或logs)。这会向SigNoz UI发出信号,表明用户希望将该查询应用到探索页面。仅当用户的主要意图是获取可运行的查询时才触发traces——当用户询问的是一次性数据问题且分析文本已给出答案时,则无需触发。apply_filter
Examples
示例
User: "Show me the error rate for the checkout service in the last hour"
Agent:
- Calls — finds
signoz:signoz_list_metrics(searchText: "error")metric.signoz_calls_total - Calls with
signoz:signoz_query_metrics,metricName: "signoz_calls_total",filter: "service.name = 'checkout'"where A is error count and B is total count,formula: "A / B * 100".timeRange: "1h" - Presents: "Error rate for checkout-service: 2.3% over the last hour (14:00– 15:00 UTC). 47 errors out of 2,041 total requests."
- Offers drill-down: "Want me to check which operations have the highest error rate?"
User: "Find timeout errors in logs"
Agent:
- Calls .
signoz:signoz_search_logs(searchText: "timeout", severity: "ERROR", timeRange: "1h") - Presents: "Found 23 timeout errors in the last hour. Most are from payment-service (18) and gateway (5). Common pattern: 'context deadline exceeded' in gRPC calls to downstream-api."
- Offers: "Want me to check traces for these services to see where the timeouts occur?"
User: "What's the p99 latency for the cart service?"
Agent:
- Calls .
signoz:signoz_aggregate_traces(aggregation: "p99", aggregateOn: "durationNano", service: "cart-service", requestType: "scalar", timeRange: "1h") - Presents: "p99 latency for cart-service: 1.2s over the last hour."
- Offers: "Want me to break this down by operation or show the trend over time?"
User: "When did errors spike for the frontend?"
Agent:
- Calls .
signoz:signoz_aggregate_traces(aggregation: "count", error: "true", service: "frontend", requestType: "time_series", timeRange: "6h") - Presents: "Error count for frontend over the last 6 hours. Spike at 11:30 UTC — error count jumped from ~5/min to ~45/min, returning to baseline by 12:15."
- Offers: "Want me to check what error types appeared during the spike?"
用户:“显示过去一小时内结账服务的错误率”
Agent:
- 调用——找到
signoz:signoz_list_metrics(searchText: "error")指标。signoz_calls_total - 调用,参数为
signoz:signoz_query_metrics、metricName: "signoz_calls_total"、filter: "service.name = 'checkout'"(其中A是错误计数,B是总请求数)、formula: "A / B * 100"。timeRange: "1h" - 呈现结果:“checkout-service的错误率:过去一小时(UTC时间14:00–15:00)为2.3%。2041次总请求中出现47次错误。”
- 提供钻取选项:“需要我查看哪些操作的错误率最高吗?”
用户:“在日志中查找超时错误”
Agent:
- 调用。
signoz:signoz_search_logs(searchText: "timeout", severity: "ERROR", timeRange: "1h") - 呈现结果:“过去一小时内发现23次超时错误。其中大多数来自payment-service(18次)和gateway(5次)。常见模式:gRPC调用downstream-api时出现'context deadline exceeded'。”
- 提供选项:“需要我查看这些服务的链路追踪以确定超时发生的位置吗?”
用户:“购物车服务的p99延迟是多少?”
Agent:
- 调用。
signoz:signoz_aggregate_traces(aggregation: "p99", aggregateOn: "durationNano", service: "cart-service", requestType: "scalar", timeRange: "1h") - 呈现结果:“cart-service的p99延迟:过去一小时为1.2秒。”
- 提供选项:“需要我按操作拆分该数据或显示随时间变化的趋势吗?”
用户:“前端的错误何时出现峰值?”
Agent:
- 调用。
signoz:signoz_aggregate_traces(aggregation: "count", error: "true", service: "frontend", requestType: "time_series", timeRange: "6h") - 呈现结果:“过去6小时内前端的错误计数。UTC时间11:30出现峰值——错误计数从约5次/分钟跃升至约45次/分钟,12:15恢复到基线水平。”
- 提供选项:“需要我查看峰值期间出现的错误类型吗?”