dt-obs-tracing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseApplication Tracing Skill
应用追踪技能
Overview
概述
Distributed traces in Dynatrace consist of spans - building blocks representing units of work. With Traces in Grail, every span is accessible via DQL with full-text searchability on all attributes. This skill covers trace fundamentals, common analysis patterns, and span-type specific queries.
Dynatrace中的分布式追踪由span组成,span是代表工作单元的基础构建块。借助Grail中的追踪能力,所有span都可以通过DQL访问,且所有属性都支持全文搜索。本技能涵盖追踪基础概念、常用分析模式以及不同span类型的专属查询方法。
Core Concepts
核心概念
Understanding Traces and Spans
理解追踪与Span
Spans represent logical units of work in distributed traces:
- HTTP requests, RPC calls, database operations
- Messaging system interactions
- Internal function invocations
- Custom instrumentation points
Span kinds:
- - Incoming call to a service
span.kind: server - - Outgoing call from a service
span.kind: client - - Incoming message consumption call to a service
span.kind: consumer - - Outgoing message production call from a service
span.kind: producer - - Internal operation within a service
span.kind: internal
Root spans: A request root span () represents an incoming call to a service. Use this to analyze end-to-end request performance.
request.is_root_span == trueSpan 代表分布式追踪中的逻辑工作单元:
- HTTP请求、RPC调用、数据库操作
- 消息系统交互
- 内部函数调用
- 自定义埋点
Span类型:
- - 服务的入站调用
span.kind: server - - 服务的出站调用
span.kind: client - - 服务的入站消息消费调用
span.kind: consumer - - 服务的出站消息生产调用
span.kind: producer - - 服务内部的操作
span.kind: internal
根Span:请求根span()代表服务的入站调用,可用于分析端到端的请求性能。
request.is_root_span == trueKey Trace Attributes
核心追踪属性
Essential attributes for trace analysis:
| Attribute | Description |
|---|---|
| Unique trace identifier |
| Unique span identifier |
| Parent span ID (null for root spans) |
| Boolean, true for request entry points |
| Boolean, true if request failed |
| Span duration in nanoseconds |
| Overall CPU time of the span (stable) |
| CPU time excluding child spans (stable) |
| Service Smartscape node ID |
| Dynatrace service name derived from service detection rules. It is equal to the Smartscape service node name. |
| Endpoint/route name |
追踪分析的必备属性:
| 属性 | 描述 |
|---|---|
| 唯一追踪标识符 |
| 唯一span标识符 |
| 父span ID(根span为null) |
| 布尔值,请求入口点为true |
| 布尔值,请求失败时为true |
| span耗时,单位为纳秒 |
| span的总CPU耗时(稳定值) |
| 排除子span后的CPU耗时(稳定值) |
| 服务Smartscape节点ID |
| 根据服务检测规则生成的Dynatrace服务名称,与Smartscape服务节点名称一致。 |
| 端点/路由名称 |
Service Context
服务上下文
Spans reference services via Smartscape node IDs and the detected service name which is also present on every span.
dt.service.namedql
fetch spans
| summarize spans=count(), by: { dt.smartscape.service, dt.service.name }Span通过Smartscape节点ID和检测到的服务名称关联对应服务,每个span上都会携带该字段。
dt.service.namedql
fetch spans
| summarize spans=count(), by: { dt.smartscape.service, dt.service.name }Sampling and Extrapolation
采样与外推
One span can represent multiple real operations due to:
- Aggregation: Multiple operations in one span ()
aggregation.count - ATM (Adaptive Traffic Management): Head-based sampling by agent
- ALR (Adaptive Load Reduction): Server-side sampling
- Read Sampling: Query-time sampling via parameter
samplingRatio
When to extrapolate: Always extrapolate when counting actual operations (not just spans). Use the multiplicity factor:
dql
fetch spans
| fieldsAdd sampling.probability = (power(2, 56) - coalesce(sampling.threshold, 0)) * power(2, -56)
| fieldsAdd sampling.multiplicity = 1 / sampling.probability
| fieldsAdd multiplicity = coalesce(sampling.multiplicity, 1)
* coalesce(aggregation.count, 1)
* dt.system.sampling_ratio
| summarize operation_count = sum(multiplicity)📖 Learn more: See Sampling and Extrapolation for detailed formulas and examples.
单个span可能代表多个实际操作,原因包括:
- 聚合:多个操作合并为一个span()
aggregation.count - ATM(自适应流量管理):Agent端的头部采样
- ALR(自适应负载削减):服务端采样
- 读取采样:查询时通过参数采样
samplingRatio
何时需要外推:统计实际操作数量(而非span数量)时必须外推,使用乘数因子计算:
dql
fetch spans
| fieldsAdd sampling.probability = (power(2, 56) - coalesce(sampling.threshold, 0)) * power(2, -56)
| fieldsAdd sampling.multiplicity = 1 / sampling.probability
| fieldsAdd multiplicity = coalesce(sampling.multiplicity, 1)
* coalesce(aggregation.count, 1)
* dt.system.sampling_ratio
| summarize operation_count = sum(multiplicity)📖 了解更多:查看采样与外推获取详细公式与示例。
Common Query Patterns
常用查询模式
Basic Span Access
基础Span查询
Fetch spans and explore by type:
dql
fetch spans | limit 1Explore spans by function and type:
dql
fetch spans
| summarize count(), by: { span.kind, code.namespace, code.function }拉取span并按类型探索:
dql
fetch spans | limit 1按函数和类型探索span:
dql
fetch spans
| summarize count(), by: { span.kind, code.namespace, code.function }Request Root Filtering
请求根Span过滤
List request root spans (incoming service calls):
dql
fetch spans
| filter request.is_root_span == true
| fields trace.id, span.id, start_time, response_time = duration, endpoint.name
| limit 100列出请求根span(服务入站调用):
dql
fetch spans
| filter request.is_root_span == true
| fields trace.id, span.id, start_time, response_time = duration, endpoint.name
| limit 100Service Performance Summary
服务性能汇总
Analyze service performance with error rates:
dql
fetch spans
| filter request.is_root_span == true
| summarize
total_requests = count(),
failed_requests = countIf(request.is_failed == true),
avg_duration = avg(duration),
p95_duration = percentile(duration, 95),
by: {dt.service.name}
| fieldsAdd error_rate = (failed_requests * 100.0) / total_requests
| sort error_rate desc分析服务性能与错误率:
dql
fetch spans
| filter request.is_root_span == true
| summarize
total_requests = count(),
failed_requests = countIf(request.is_failed == true),
avg_duration = avg(duration),
p95_duration = percentile(duration, 95),
by: {dt.service.name}
| fieldsAdd error_rate = (failed_requests * 100.0) / total_requests
| sort error_rate descTrace ID Lookup
Trace ID查询
Find all spans in a specific trace:
dql
fetch spans
| filter trace.id == toUid("abc123def456")
| fields span.name, duration, dt.service.name查找指定追踪下的所有span:
dql
fetch spans
| filter trace.id == toUid("abc123def456")
| fields span.name, duration, dt.service.namePerformance Analysis
性能分析
Response Time Percentiles
响应时间百分位
Calculate percentiles by endpoint:
dql
fetch spans
| filter request.is_root_span == true
| summarize {
requests=count(),
avg_duration=avg(duration),
p95=percentile(duration, 95),
p99=percentile(duration, 99)
}, by: { endpoint.name }
| sort p99 desc💡 Best practice: Use percentiles (p95, p99) over averages for performance insights.
按端点计算百分位:
dql
fetch spans
| filter request.is_root_span == true
| summarize {
requests=count(),
avg_duration=avg(duration),
p95=percentile(duration, 95),
p99=percentile(duration, 99)
}, by: { endpoint.name }
| sort p99 desc💡 最佳实践:性能分析优先使用百分位(p95、p99)而非平均值。
Slow Trace Detection
慢追踪检测
Find requests exceeding a threshold:
dql
fetch spans, from:now() - 2h
| filter request.is_root_span == true
| filter duration > 5s
| fields trace.id, span.name, dt.service.name, duration
| sort duration desc
| limit 50查找超过阈值的请求:
dql
fetch spans, from:now() - 2h
| filter request.is_root_span == true
| filter duration > 5s
| fields trace.id, span.name, dt.service.name, duration
| sort duration desc
| limit 50Duration Buckets with Exemplars
带示例的耗时分桶
Group requests into duration buckets with example traces:
dql
fetch spans, from:now() - 24h
| filter http.route == "/api/v1/storage/findByISBN"
| summarize {
spans=count(),
trace=takeAny(record(start_time, trace.id))
}, by: { bin(duration, 10ms) }
| fields `bin(duration, 10ms)`, spans, trace.id=trace[trace.id], start_time=trace[start_time]将请求分组到耗时分桶中并附带示例追踪:
dql
fetch spans, from:now() - 24h
| filter http.route == "/api/v1/storage/findByISBN"
| summarize {
spans=count(),
trace=takeAny(record(start_time, trace.id))
}, by: { bin(duration, 10ms) }
| fields `bin(duration, 10ms)`, spans, trace.id=trace[trace.id], start_time=trace[start_time]Performance Timeseries
性能时序数据
Extract response time as timeseries:
dql
fetch spans, from:now() - 24h
| filter request.is_root_span == true
| makeTimeseries {
requests=count(),
avg_duration=avg(duration),
p95=percentile(duration, 95),
p99=percentile(duration, 99)
}, by: { endpoint.name }📖 Learn more: See Performance Analysis for advanced patterns and timeseries techniques.
提取响应时间时序数据:
dql
fetch spans, from:now() - 24h
| filter request.is_root_span == true
| makeTimeseries {
requests=count(),
avg_duration=avg(duration),
p95=percentile(duration, 95),
p99=percentile(duration, 99)
}, by: { endpoint.name }📖 了解更多:查看性能分析获取高级模式与时序分析技巧。
Failure Investigation
故障排查
Failed Request Summary
失败请求汇总
Summarize failures by service:
dql
fetch spans
| filter request.is_root_span == true
| summarize
total = count(),
failed = countIf(request.is_failed == true),
by: { dt.service.name }
| fieldsAdd failure_rate = (failed * 100.0) / total
| sort failure_rate desc按服务汇总故障情况:
dql
fetch spans
| filter request.is_root_span == true
| summarize
total = count(),
failed = countIf(request.is_failed == true),
by: { dt.service.name }
| fieldsAdd failure_rate = (failed * 100.0) / total
| sort failure_rate descFailure Reason Analysis
故障原因分析
Breakdown by failure detection reason:
dql
fetch spans
| filter request.is_failed == true and isNotNull(dt.failure_detection.results)
| expand dt.failure_detection.results
| summarize count(), by: { dt.failure_detection.results[reason] }Failure reasons:
- - HTTP response code triggered failure
http_code - - gRPC status code triggered failure
grpc_code - - Exception caused failure
exception - - Span status indicated failure
span_status - - Custom failure detection rule matched
custom_rule
按故障检测原因拆分统计:
dql
fetch spans
| filter request.is_failed == true and isNotNull(dt.failure_detection.results)
| expand dt.failure_detection.results
| summarize count(), by: { dt.failure_detection.results[reason] }故障原因:
- - HTTP响应码触发故障
http_code - - gRPC状态码触发故障
grpc_code - - 异常导致故障
exception - - Span状态标识故障
span_status - - 匹配自定义故障检测规则
custom_rule
HTTP Code Failures
HTTP状态码故障
Find failures by HTTP status code:
dql
fetch spans
| filter request.is_failed == true
| filter iAny(dt.failure_detection.results[][reason] == "http_code")
| summarize count(), by: { http.response.status_code, endpoint.name }
| sort `count()` desc按HTTP状态码查询故障:
dql
fetch spans
| filter request.is_failed == true
| filter iAny(dt.failure_detection.results[][reason] == "http_code")
| summarize count(), by: { http.response.status_code, endpoint.name }
| sort `count()` descRecent Failed Requests
最近失败请求
List recent failures with details:
dql
fetch spans
| filter request.is_root_span == true and request.is_failed == true
| fields
start_time,
trace.id,
endpoint.name,
http.response.status_code,
duration
| sort start_time desc
| limit 100📖 Learn more: See Failure Detection for exception analysis and custom rule investigation.
列出最近的故障详情:
dql
fetch spans
| filter request.is_root_span == true and request.is_failed == true
| fields
start_time,
trace.id,
endpoint.name,
http.response.status_code,
duration
| sort start_time desc
| limit 100📖 了解更多:查看故障检测获取异常分析与自定义规则排查方法。
Service Dependencies
服务依赖
Service Communication
服务通信
Analyze incoming and outgoing service communication:
dql
fetch spans, from:now() - 1h
| filter isNotNull(server.address)
| fieldsAdd
remote_side = server.address
| summarize
call_count = count(),
avg_duration = avg(duration),
by: {dt.service.name, remote_side}
| sort call_count desc分析入站与出站服务通信:
dql
fetch spans, from:now() - 1h
| filter isNotNull(server.address)
| fieldsAdd
remote_side = server.address
| summarize
call_count = count(),
avg_duration = avg(duration),
by: {dt.service.name, remote_side}
| sort call_count descOutgoing HTTP Calls
出站HTTP调用
Identify external API dependencies:
dql
fetch spans
| filter span.kind == "client" and isNotNull(http.request.method)
| summarize
calls = count(),
avg_latency = avg(duration),
p99_latency = percentile(duration, 99),
by: { dt.service.name, server.address, server.port }
| sort calls desc识别外部API依赖:
dql
fetch spans
| filter span.kind == "client" and isNotNull(http.request.method)
| summarize
calls = count(),
avg_latency = avg(duration),
p99_latency = percentile(duration, 99),
by: { dt.service.name, server.address, server.port }
| sort calls descTrace Aggregation
追踪聚合
Complete Trace Analysis
完整追踪分析
Aggregate all spans in a trace to understand full request flow:
dql
fetch spans, from:now() - 30m
| summarize {
spans = count(),
client_spans = countIf(span.kind == "client"),
// Endpoints involved in the trace
endpoints = toString(arrayRemoveNulls(collectDistinct(endpoint.name))),
// Extract the first request root in the trace
trace_root = takeMin(record(
root_detection_helper = coalesce(
if(request.is_root_span, 1),
if(isNull(span.parent_id), 2),
3),
start_time, endpoint.name, duration
))
}, by: { trace.id }
| fieldsFlatten trace_root
| fieldsRemove trace_root.root_detection_helper, trace_root
| fields
start_time = trace_root.start_time,
endpoint = trace_root.endpoint.name,
response_time = trace_root.duration,
spans,
client_spans,
endpoints,
trace.id
| sort start_time
| limit 100Root detection strategy: Use with a detection helper to reliably find the root request:
takeMin(record(...))- Priority 1: Spans with
request.is_root_span == true - Priority 2: Spans without parent (root spans)
- Priority 3: All other spans
聚合单个追踪下的所有span,了解完整请求流:
dql
fetch spans, from:now() - 30m
| summarize {
spans = count(),
client_spans = countIf(span.kind == "client"),
// 追踪涉及的端点
endpoints = toString(arrayRemoveNulls(collectDistinct(endpoint.name))),
// 提取追踪中的第一个请求根span
trace_root = takeMin(record(
root_detection_helper = coalesce(
if(request.is_root_span, 1),
if(isNull(span.parent_id), 2),
3),
start_time, endpoint.name, duration
))
}, by: { trace.id }
| fieldsFlatten trace_root
| fieldsRemove trace_root.root_detection_helper, trace_root
| fields
start_time = trace_root.start_time,
endpoint = trace_root.endpoint.name,
response_time = trace_root.duration,
spans,
client_spans,
endpoints,
trace.id
| sort start_time
| limit 100根span检测策略:结合检测辅助字段使用可以可靠找到根请求:
takeMin(record(...))- 优先级1:的span
request.is_root_span == true - 优先级2:无父节点的span(根span)
- 优先级3:所有其他span
Multi-Service Traces
多服务追踪
Find traces spanning multiple services:
dql
fetch spans, from:now() - 1h
| summarize {
services = collectDistinct(dt.service.name),
trace_root = takeMin(record(
root_detection_helper = coalesce(if(request.is_root_span, 1), 2),
endpoint.name
))
}, by: { trace.id }
| fieldsAdd service_count = arraySize(services)
| filter service_count > 1
| fields
endpoint = trace_root[endpoint.name],
service_count,
services = toString(services),
trace.id
| sort service_count desc
| limit 50查找跨多个服务的追踪:
dql
fetch spans, from:now() - 1h
| summarize {
services = collectDistinct(dt.service.name),
trace_root = takeMin(record(
root_detection_helper = coalesce(if(request.is_root_span, 1), 2),
endpoint.name
))
}, by: { trace.id }
| fieldsAdd service_count = arraySize(services)
| filter service_count > 1
| fields
endpoint = trace_root[endpoint.name],
service_count,
services = toString(services),
trace.id
| sort service_count desc
| limit 50Request-Level Analysis
请求级分析
Request Attributes
请求属性
Access custom request attributes captured by OneAgent on request root spans:
dql
fetch spans
| filter request.is_root_span == true
| filter isNotNull(request_attribute.PaidAmount)
| makeTimeseries sum(request_attribute.PaidAmount)Field pattern:
request_attribute.<name>For attributes with special characters, use backticks:
dql
fetch spans
| filter isNotNull(`request_attribute.My Customer ID`)访问OneAgent在请求根span上采集的自定义请求属性:
dql
fetch spans
| filter request.is_root_span == true
| filter isNotNull(request_attribute.PaidAmount)
| makeTimeseries sum(request_attribute.PaidAmount)字段格式:
request_attribute.<name>包含特殊字符的属性需要使用反引号:
dql
fetch spans
| filter isNotNull(`request_attribute.My Customer ID`)Captured Attributes
采集属性
Access attributes captured from method parameters (always as arrays):
dql
fetch spans
| filter isNotNull(captured_attribute.BookID_purchased)
| fields trace.id, span.id, code.namespace, code.function, captured_attribute.BookID_purchased
| limit 1Field pattern:
captured_attribute.<name>访问从方法参数中采集的属性(始终为数组类型):
dql
fetch spans
| filter isNotNull(captured_attribute.BookID_purchased)
| fields trace.id, span.id, code.namespace, code.function, captured_attribute.BookID_purchased
| limit 1字段格式:
captured_attribute.<name>Request ID Aggregation
请求ID聚合
Aggregate all spans belonging to a single request using (OneAgent traces only):
request.iddql
fetch spans
| filter isNotNull(request.id)
| summarize {
spans = count(),
client_spans = countIf(span.kind == "client"),
request_root = takeMin(record(
root_detection_helper = coalesce(if(request.is_root_span, 1), 2),
start_time, endpoint.name, duration
))
}, by: { trace.id, request.id }
| fieldsFlatten request_root
| fields
start_time = request_root.start_time,
endpoint = request_root.endpoint.name,
response_time = request_root.duration,
spans,
client_spans
| limit 100📖 Learn more: See Request Attributes for complete patterns on request attributes, captured attributes, and request-level aggregation.
使用聚合属于单个请求的所有span(仅支持OneAgent采集的追踪):
request.iddql
fetch spans
| filter isNotNull(request.id)
| summarize {
spans = count(),
client_spans = countIf(span.kind == "client"),
request_root = takeMin(record(
root_detection_helper = coalesce(if(request.is_root_span, 1), 2),
start_time, endpoint.name, duration
))
}, by: { trace.id, request.id }
| fieldsFlatten request_root
| fields
start_time = request_root.start_time,
endpoint = request_root.endpoint.name,
response_time = request_root.duration,
spans,
client_spans
| limit 100📖 了解更多:查看请求属性获取请求属性、采集属性与请求级聚合的完整用法。
Span Types
Span类型
HTTP Spans
HTTP Span
HTTP spans capture web requests and API calls:
Server-side (incoming requests):
dql
fetch spans
| filter span.kind == "server" and isNotNull(http.request.method)
| summarize
requests = count(),
avg_duration = avg(duration),
by: { http.request.method, http.route }
| sort requests descClient-side (outgoing calls):
dql
fetch spans
| filter span.kind == "client" and isNotNull(http.request.method)
| summarize
calls = count(),
avg_duration = avg(duration),
by: { server.address, http.request.method }
| sort calls desc📖 Learn more: See HTTP Span Analysis for status codes, payload analysis, and client IP tracking.
HTTP Span采集Web请求与API调用:
服务端(入站请求):
dql
fetch spans
| filter span.kind == "server" and isNotNull(http.request.method)
| summarize
requests = count(),
avg_duration = avg(duration),
by: { http.request.method, http.route }
| sort requests desc客户端(出站调用):
dql
fetch spans
| filter span.kind == "client" and isNotNull(http.request.method)
| summarize
calls = count(),
avg_duration = avg(duration),
by: { server.address, http.request.method }
| sort calls desc📖 了解更多:查看HTTP Span分析获取状态码、Payload分析与客户端IP追踪方法。
Database Spans
数据库Span
Database operations appear as client spans with attributes:
db.*dql
fetch spans
| filter span.kind == "client" and isNotNull(db.system) and isNotNull(db.namespace)
| summarize {
spans=count(),
avg_duration=avg(duration)
}, by: { dt.service.name, db.system, db.namespace }
| sort spans desc⚠️ Important: Database spans can be aggregated (one span = multiple calls). Always use extrapolation for accurate counts.
📖 Learn more: See Database Span Analysis for extrapolated counts and slow query detection.
数据库操作为带有属性的客户端span:
db.*dql
fetch spans
| filter span.kind == "client" and isNotNull(db.system) and isNotNull(db.namespace)
| summarize {
spans=count(),
avg_duration=avg(duration)
}, by: { dt.service.name, db.system, db.namespace }
| sort spans desc⚠️ 重要提示:数据库span可能是聚合后的结果(1个span代表多次调用),统计准确数量时必须使用外推。
📖 了解更多:查看数据库Span分析获取外推统计与慢查询检测方法。
Messaging Spans
消息队列Span
Messaging spans capture Kafka, RabbitMQ, SQS operations:
dql
fetch spans
| filter isNotNull(messaging.system)
| summarize
spans = count(),
messages = sum(coalesce(messaging.batch.message_count, 1)),
by: { messaging.system, messaging.destination.name, messaging.operation.type }
| sort messages desc📖 Learn more: See Messaging Span Analysis for throughput, latency, and failure patterns.
消息队列span采集Kafka、RabbitMQ、SQS操作:
dql
fetch spans
| filter isNotNull(messaging.system)
| summarize
spans = count(),
messages = sum(coalesce(messaging.batch.message_count, 1)),
by: { messaging.system, messaging.destination.name, messaging.operation.type }
| sort messages desc📖 了解更多:查看消息队列Span分析获取吞吐量、延迟与故障模式分析方法。
RPC Spans
RPC Span
RPC spans cover gRPC, SOAP, and other RPC frameworks:
dql
fetch spans
| filter isNotNull(rpc.system)
| summarize
calls = count(),
avg_duration = avg(duration),
by: { rpc.system, rpc.service, rpc.method }
| sort calls desc📖 Learn more: See RPC Span Analysis for gRPC status codes and service dependencies.
RPC Span涵盖gRPC、SOAP等RPC框架:
dql
fetch spans
| filter isNotNull(rpc.system)
| summarize
calls = count(),
avg_duration = avg(duration),
by: { rpc.system, rpc.service, rpc.method }
| sort calls desc📖 了解更多:查看RPC Span分析获取gRPC状态码与服务依赖分析方法。
Serverless Spans
无服务Span
FaaS spans capture Lambda, Azure Functions, and GCP Cloud Functions:
dql
fetch spans
| filter isNotNull(faas.name) and span.kind == "server"
| summarize
invocations = count(),
avg_duration = avg(duration),
p99_duration = percentile(duration, 99),
by: { faas.name, cloud.provider }
| sort invocations desc📖 Learn more: See Serverless Span Analysis for cold start analysis and trigger types.
FaaS span采集Lambda、Azure Functions、GCP Cloud Functions操作:
dql
fetch spans
| filter isNotNull(faas.name) and span.kind == "server"
| summarize
invocations = count(),
avg_duration = avg(duration),
p99_duration = percentile(duration, 99),
by: { faas.name, cloud.provider }
| sort invocations desc📖 了解更多:查看无服务Span分析获取冷启动分析与触发器类型相关内容。
Advanced Topics
高级主题
Exception Analysis
异常分析
Exceptions are stored as within spans:
span.eventsdql
fetch spans
| filter iAny(span.events[][span_event.name] == "exception")
| expand span.events
| fieldsFlatten span.events, fields: { exception.type }
| summarize {
count(),
trace=takeAny(record(start_time, trace.id))
}, by: { exception.type }
| fields exception.type, `count()`, trace.id=trace[trace.id], start_time=trace[start_time]💡 Tip: Use to check conditions within span event arrays.
iAny()异常存储在span的字段中:
span.eventsdql
fetch spans
| filter iAny(span.events[][span_event.name] == "exception")
| expand span.events
| fieldsFlatten span.events, fields: { exception.type }
| summarize {
count(),
trace=takeAny(record(start_time, trace.id))
}, by: { exception.type }
| fields exception.type, `count()`, trace.id=trace[trace.id], start_time=trace[start_time]💡 提示:使用检查span事件数组中的条件。
iAny()Logs and Traces Correlation
日志与追踪关联
Join logs with traces using trace IDs:
dql
fetch spans, from:now() - 30m
| join [ fetch logs | fieldsAdd trace.id = toUid(trace_id) ]
, on: { trace.id }
, fields: { content, loglevel }
| fields start_time, trace.id, span.id, loglevel, content
| limit 100📖 Learn more: See Logs Correlation for filtering traces by log content and finding logs for failed requests.
通过trace ID关联日志与追踪:
dql
fetch spans, from:now() - 30m
| join [ fetch logs | fieldsAdd trace.id = toUid(trace_id) ]
, on: { trace.id }
, fields: { content, loglevel }
| fields start_time, trace.id, span.id, loglevel, content
| limit 100📖 了解更多:查看日志关联获取按日志内容过滤追踪、查找失败请求对应日志的方法。
Network Analysis
网络分析
Analyze IP addresses, DNS resolution, and client geography:
dql
fetch spans, from:now() - 24h
| filter isNotNull(client.ip)
| fieldsAdd client.ip = toIp(client.ip)
| fieldsAdd client.subnet = ipMask(client.ip, 24)
| summarize {
requests=count(),
unique_clients=countDistinct(client.ip)
}, by: { client.subnet, endpoint.name }
| sort requests desc📖 Learn more: See Network Analysis for server address resolution and communication mapping.
分析IP地址、DNS解析与客户端地域:
dql
fetch spans, from:now() - 24h
| filter isNotNull(client.ip)
| fieldsAdd client.ip = toIp(client.ip)
| fieldsAdd client.subnet = ipMask(client.ip, 24)
| summarize {
requests=count(),
unique_clients=countDistinct(client.ip)
}, by: { client.subnet, endpoint.name }
| sort requests desc📖 了解更多:查看网络分析获取服务地址解析与通信映射相关内容。
Best Practices
最佳实践
Query Optimization
查询优化
- Filter early: Apply and endpoint filters first
request.is_root_span == true - Use : Reduce data volume for better performance (e.g.,
samplingRatioreads 1%)samplingRatio:100 - Limit results: Always use for exploratory queries
limit - Percentiles over averages: Use p95/p99 for performance insights
- 提前过滤:优先应用与端点过滤条件
request.is_root_span == true - 使用:降低数据量提升查询性能(例如
samplingRatio仅读取1%数据)samplingRatio:100 - 限制结果:探索性查询始终使用
limit - 百分位优于平均值:使用p95/p99获取更准确的性能洞察
Node Lookups
节点查询
- Use : Simplest way to add service names
getNodeName() - Prefer subqueries: Use Smartscape node filters and for filtering
traverse - Cache node info: Store node lookups in fields for reuse
- 使用:添加服务名称的最简单方式
getNodeName() - 优先使用子查询:结合Smartscape节点过滤器与进行过滤
traverse - 缓存节点信息:将节点查询结果存储在字段中复用
Aggregation Patterns
聚合模式
- Request roots: Use for end-to-end analysis
request.is_root_span == true - Trace-level: Group by for complete trace metrics
trace.id - Request-level: Group by for request metrics (OneAgent traces only)
request.id - Always extrapolate: Use multiplicity for accurate operation counts
- 请求根span:端到端分析使用
request.is_root_span == true - 追踪级:按分组获取完整追踪指标
trace.id - 请求级:仅OneAgent采集的追踪可按分组获取请求指标
request.id - 始终外推:使用乘数因子获取准确的操作计数
Trace Exemplars
追踪示例
Include example traces for drilldown:
dql
| summarize {
count(),
trace=takeAny(record(start_time, trace.id))
}, by: { grouping_field }
| fields ..., trace.id=trace[trace.id], start_time=trace[start_time]This enables "Open With" functionality in Dynatrace UI.
聚合时包含示例追踪便于下钻分析:
dql
| summarize {
count(),
trace=takeAny(record(start_time, trace.id))
}, by: { grouping_field }
| fields ..., trace.id=trace[trace.id], start_time=trace[start_time]该写法支持Dynatrace UI中的「打开方式」功能。
References
参考资料
Detailed documentation for specific topics:
- Performance Analysis - Advanced timeseries, duration buckets, endpoint ranking
- Failure Detection - Failure reasons, exception investigation, custom rules
- Sampling and Extrapolation - Multiplicity calculation, database extrapolation
- Request Attributes - Request attributes, captured attributes, request ID aggregation
- Entity Lookups - Advanced node lookups, infrastructure correlation, hardware analysis
- HTTP Span Analysis - Status codes, payload analysis, client IPs
- Database Span Analysis - Extrapolated counts, slow queries, statement analysis
- Messaging Span Analysis - Kafka, RabbitMQ, SQS throughput and latency
- RPC Span Analysis - gRPC, SOAP, service dependencies
- Serverless Span Analysis - Lambda, Azure Functions, cold start analysis
- Logs Correlation - Joining logs and traces, correlation patterns
- Network Analysis - IP addresses, DNS resolution, communication mapping
特定主题的详细文档:
- 性能分析 - 高级时序分析、耗时分桶、端点排名
- 故障检测 - 故障原因、异常排查、自定义规则
- 采样与外推 - 乘数计算、数据库外推
- 请求属性 - 请求属性、采集属性、请求ID聚合
- 实体查询 - 高级节点查询、基础设施关联、硬件分析
- HTTP Span分析 - 状态码、Payload分析、客户端IP
- 数据库Span分析 - 外推计数、慢查询、语句分析
- 消息队列Span分析 - Kafka、RabbitMQ、SQS吞吐量与延迟
- RPC Span分析 - gRPC、SOAP、服务依赖
- 无服务Span分析 - Lambda、Azure Functions、冷启动分析
- 日志关联 - 日志与追踪关联、关联模式
- 网络分析 - IP地址、DNS解析、通信映射
Related Skills
相关技能
- dt-dql-essentials - Core DQL syntax for querying trace data
- dt-app-dashboards - Embed trace queries in dashboards
- dt-migration - Smartscape entity model and relationship navigation
- dt-dql-essentials - 查询追踪数据的核心DQL语法
- dt-app-dashboards - 在仪表盘中嵌入追踪查询
- dt-migration - Smartscape实体模型与关系遍历