dt-obs-logs
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLog Analysis Skill
日志分析技能
Query, filter, and analyze Dynatrace log data using DQL for troubleshooting and monitoring.
使用DQL查询、过滤和分析Dynatrace日志数据,用于故障排查和监控。
What This Skill Covers
技能涵盖范围
- Fetching and filtering logs by severity, content, and entity
- Searching log messages using pattern matching
- Calculating error rates and statistics
- Analyzing log patterns and trends
- Grouping and aggregating log data by dimensions
- 按严重程度、内容和实体获取并过滤日志
- 使用模式匹配搜索日志消息
- 计算错误率和统计数据
- 分析日志模式和趋势
- 按维度对日志数据进行分组和聚合
When to Use This Skill
适用场景
Use this skill when users want to:
- Find specific log entries (e.g., "show me error logs from the last hour")
- Filter logs by severity, process group, or content
- Search logs for specific keywords or phrases
- Calculate error rates or log statistics
- Identify common error messages or patterns
- Analyze log trends over time
- Troubleshoot issues using log data
当用户需要完成以下操作时使用本技能:
- 查找特定日志条目(例如:"给我展示过去一小时的错误日志")
- 按严重程度、进程组或内容过滤日志
- 在日志中搜索特定关键词或短语
- 计算错误率或日志统计数据
- 识别常见错误消息或模式
- 分析日志随时间变化的趋势
- 使用日志数据排查问题
Key Concepts
核心概念
Log Data Model
日志数据模型
- timestamp: When the log entry was created
- content: The log message text
- status: Log level (ERROR, FATAL, WARN, INFO, etc.)
- dt.process_group.id: Associated process group entity
- dt.process_group.detected_name: Resolves process group IDs to human-readable names
- timestamp: 日志条目创建时间
- content: 日志消息文本
- status: 日志级别(ERROR、FATAL、WARN、INFO等)
- dt.process_group.id: 关联的进程组实体
- dt.process_group.detected_name: 将进程组ID解析为人类可读的名称
Query Patterns
查询模式
- fetch logs: Primary command for log data access
- Time ranges: Use for time windows
from:now() - <duration> - Filtering: Apply severity, content, and entity filters
- Aggregation: Group and summarize log data
- Pattern Detection: Use and
matchesPhrase()for content searchcontains()
- fetch logs: 访问日志数据的主要命令
- 时间范围: 使用指定时间窗口
from:now() - <duration> - 过滤: 应用严重程度、内容和实体过滤条件
- 聚合: 对日志数据进行分组和汇总
- 模式检测: 使用和
matchesPhrase()进行内容搜索contains()
Common Operations
常用操作
- Severity filtering (single or multiple levels)
- Content search (simple and full-text)
- Entity-based filtering (process groups)
- Time-series analysis (bucketing, sorting)
- Error rate calculation
- Pattern analysis (exceptions, timeouts, etc.)
- 严重程度过滤(单个或多个级别)
- 内容搜索(简单搜索和全文搜索)
- 基于实体的过滤(进程组)
- 时间序列分析(分桶、排序)
- 错误率计算
- 模式分析(异常、超时等)
Core Workflows
核心工作流
1. Log Searching
1. 日志搜索
Find specific log entries by time, severity, and content.
Typical steps:
- Define time range
- Filter by severity (optional)
- Search content for keywords
- Select relevant fields
- Sort and limit results
Example:
dql
fetch logs, from:now() - 1h
| filter status == "ERROR"
| fields timestamp, content, process_group = dt.process_group.detected_name
| sort timestamp desc
| limit 100按时间、严重程度和内容查找特定日志条目。
典型步骤:
- 定义时间范围
- 按严重程度过滤(可选)
- 搜索内容中的关键词
- 选择相关字段
- 对结果排序并限制数量
示例:
dql
fetch logs, from:now() - 1h
| filter status == "ERROR"
| fields timestamp, content, process_group = dt.process_group.detected_name
| sort timestamp desc
| limit 1002. Log Filtering
2. 日志过滤
Narrow down logs using multiple criteria (severity, entity, content).
Typical steps:
- Fetch logs with time range
- Apply severity filters
- Filter by entity (process_group)
- Apply content filters
- Format and sort output
Example:
dql
fetch logs, from:now() - 2h
| filter in(status, {"ERROR", "FATAL", "WARN"})
| summarize count(), by: {dt.process_group.id, dt.process_group.detected_name}
| fieldsAdd process_group = dt.process_group.detected_name
| sort `count()` desc使用多个条件(严重程度、实体、内容)缩小日志范围。
典型步骤:
- 获取指定时间范围内的日志
- 应用严重程度过滤条件
- 按实体(进程组)过滤
- 应用内容过滤条件
- 格式化输出并排序
示例:
dql
fetch logs, from:now() - 2h
| filter in(status, {"ERROR", "FATAL", "WARN"})
| summarize count(), by: {dt.process_group.id, dt.process_group.detected_name}
| fieldsAdd process_group = dt.process_group.detected_name
| sort `count()` desc3. Pattern Analysis
3. 模式分析
Identify patterns, trends, and anomalies in log data.
Typical steps:
- Fetch logs with time range
- Add pattern detection fields
- Aggregate by entity or time
- Calculate statistics and ratios
- Sort by frequency or rate
Example:
dql
fetch logs, from:now() - 2h
| filter status == "ERROR"
| fieldsAdd
has_exception = if(matchesPhrase(content, "exception"), true, else: false),
has_timeout = if(matchesPhrase(content, "timeout"), true, else: false)
| summarize
count(),
exception_count = countIf(has_exception == true),
timeout_count = countIf(has_timeout == true),
by: {process_group = dt.process_group.detected_name}识别日志数据中的模式、趋势和异常。
典型步骤:
- 获取指定时间范围内的日志
- 添加模式检测字段
- 按实体或时间聚合
- 计算统计数据和比率
- 按频率或比率排序
示例:
dql
fetch logs, from:now() - 2h
| filter status == "ERROR"
| fieldsAdd
has_exception = if(matchesPhrase(content, "exception"), true, else: false),
has_timeout = if(matchesPhrase(content, "timeout"), true, else: false)
| summarize
count(),
exception_count = countIf(has_exception == true),
timeout_count = countIf(has_timeout == true),
by: {process_group = dt.process_group.detected_name}Key Functions
核心函数
Filtering
过滤
- - Filter by status level
filter status == "ERROR" - - Multi-status filter
in(status, "ERROR", "FATAL", "WARN") - - Simple substring search
contains(content, "keyword") - - Full-text phrase search
matchesPhrase(content, "exact phrase")
- - 按状态级别过滤
filter status == "ERROR" - - 多状态过滤
in(status, "ERROR", "FATAL", "WARN") - - 简单子字符串搜索
contains(content, "keyword") - - 全文短语搜索
matchesPhrase(content, "exact phrase")
Entity Operations
实体操作
- - Get human-readable process group name
dt.process_group.detected_name - - Filter by specific entity
filter process_group == "service-name"
- - 获取人类可读的进程组名称
dt.process_group.detected_name - - 按特定实体过滤
filter process_group == "service-name"
Aggregation
聚合
- - Count all log entries
count() - - Conditional count
countIf(condition) - - Group by entity or time bucket
by: {dimension} - - Time bucketing for trends
bin(timestamp, 5m)
- - 统计所有日志条目数量
count() - - 条件计数
countIf(condition) - - 按实体或时间桶分组
by: {dimension} - - 按时间分桶用于趋势分析
bin(timestamp, 5m)
Field Operations
字段操作
- - Select specific fields
fields timestamp, content, status - - Add computed fields
fieldsAdd name = expression - - Conditional logic
if(condition, true_value, else: false_value)
- - 选择特定字段
fields timestamp, content, status - - 添加计算字段
fieldsAdd name = expression - - 条件逻辑
if(condition, true_value, else: false_value)
Common Patterns
常用模式
Content Search
内容搜索
Simple substring search:
dql
fetch logs, from:now() - 1h
| filter contains(content, "database")
| fields timestamp, content, statusFull-text phrase search:
dql
fetch logs, from:now() - 1h
| filter matchesPhrase(content, "connection timeout")
| fields timestamp, content, process_group = dt.process_group.detected_name简单子字符串搜索:
dql
fetch logs, from:now() - 1h
| filter contains(content, "database")
| fields timestamp, content, status全文短语搜索:
dql
fetch logs, from:now() - 1h
| filter matchesPhrase(content, "connection timeout")
| fields timestamp, content, process_group = dt.process_group.detected_nameError Rate Calculation
错误率计算
Calculate error rates over time:
dql
fetch logs, from:now() - 2h
| summarize
total_logs = count(),
error_logs = countIf(status == "ERROR"),
by: {time_bucket = bin(timestamp, 5m)}
| fieldsAdd error_rate = (error_logs * 100.0) / total_logs
| sort time_bucket asc计算随时间变化的错误率:
dql
fetch logs, from:now() - 2h
| summarize
total_logs = count(),
error_logs = countIf(status == "ERROR"),
by: {time_bucket = bin(timestamp, 5m)}
| fieldsAdd error_rate = (error_logs * 100.0) / total_logs
| sort time_bucket ascTop Error Messages
高频错误消息
Find most common errors:
dql
fetch logs, from:now() - 24h
| filter status == "ERROR"
| summarize error_count = count(), by: {content}
| sort error_count desc
| limit 20查找最常见的错误:
dql
fetch logs, from:now() - 24h
| filter status == "ERROR"
| summarize error_count = count(), by: {content}
| sort error_count desc
| limit 20Process Group-Specific Logs
特定进程组日志
Filter logs by process group:
dql
fetch logs, from:now() - 1h
| fieldsAdd process_group = dt.process_group.detected_name
| filter process_group == "payment-service"
| filter status == "ERROR"
| fields timestamp, content, status
| sort timestamp desc按进程组过滤日志:
dql
fetch logs, from:now() - 1h
| fieldsAdd process_group = dt.process_group.detected_name
| filter process_group == "payment-service"
| filter status == "ERROR"
| fields timestamp, content, status
| sort timestamp descStructured / JSON Log Parsing
结构化/JSON日志解析
Many applications emit JSON-formatted log lines. Use to extract fields instead of dumping raw content:
parsedql
fetch logs, from:now() - 1h
| filter status == "ERROR"
| parse content, "JSON:log"
| fieldsAdd level = log[level], message = log[msg], error = log[error]
| fields timestamp, level, message, error
| sort timestamp desc
| limit 50Aggregate by a parsed field:
dql
fetch logs, from:now() - 4h
| filter status == "ERROR"
| parse content, "JSON:log"
| fieldsAdd message = log[msg]
| summarize error_count = count(), by: {message}
| sort error_count desc
| limit 20Notes:
- creates a record field
parse content, "JSON:log"— access nested values withloglog[key] - Filter logs with before
contains()to reduce parsing overheadparse - Works with any JSON-structured field, not just
content
很多应用输出JSON格式的日志行。使用提取字段,而不是输出原始内容:
parsedql
fetch logs, from:now() - 1h
| filter status == "ERROR"
| parse content, "JSON:log"
| fieldsAdd level = log[level], message = log[msg], error = log[error]
| fields timestamp, level, message, error
| sort timestamp desc
| limit 50按解析后的字段聚合:
dql
fetch logs, from:now() - 4h
| filter status == "ERROR"
| parse content, "JSON:log"
| fieldsAdd message = log[msg]
| summarize error_count = count(), by: {message}
| sort error_count desc
| limit 20注意事项:
- 会创建一个记录字段
parse content, "JSON:log"— 用log访问嵌套值log[key] - 在之前先用
parse过滤日志,减少解析开销contains() - 适用于所有JSON结构的字段,不仅限于
content
Best Practices
最佳实践
- Always specify time ranges - Use to limit data
from:now() - <duration> - Apply filters early - Filter by severity and entity before aggregation
- Use appropriate search methods - for simple,
contains()for exactmatchesPhrase() - Limit results - Add to prevent overwhelming output
| limit 100 - Sort meaningfully - Sort by timestamp for recent logs, by count for top errors
- Name entities - Use or
dt.process_group.detected_namefor human-readable outputgetNodeName() - Use time buckets for trends - for time-series analysis
bin(timestamp, 5m)
- 始终指定时间范围 - 使用限制数据量
from:now() - <duration> - 尽早应用过滤条件 - 在聚合前先按严重程度和实体过滤
- 使用合适的搜索方法 - 简单搜索用,精确短语搜索用
contains()matchesPhrase() - 限制结果数量 - 添加避免输出过多内容
| limit 100 - 合理排序 - 查看最新日志按时间戳排序,查看高频错误按计数排序
- 实体名称转换 - 使用或
dt.process_group.detected_name输出人类可读的内容getNodeName() - 趋势分析使用时间桶 - 时间序列分析用
bin(timestamp, 5m)
Integration Points
集成点
- Entity model: Uses for service correlation
dt.process_group.id - Time series: Supports temporal analysis with and time ranges
bin() - Content search: Full-text search capabilities via
matchesPhrase() - Aggregation: Statistical analysis using and conditional functions
summarize
- 实体模型: 使用进行服务关联
dt.process_group.id - 时间序列: 通过和时间范围支持时间分析
bin() - 内容搜索: 通过提供全文搜索能力
matchesPhrase() - 聚合: 使用和条件函数进行统计分析
summarize
Limitations & Notes
限制与注意事项
- Log availability depends on OneAgent configuration and log ingestion
- Full-text search () may have performance implications on large datasets
matchesPhrase - Entity names require proper OneAgent monitoring for resolution
- Time ranges should be reasonable (avoid unbounded queries)
- 日志可用性取决于OneAgent配置和日志摄入设置
- 全文搜索()在大数据集上可能存在性能影响
matchesPhrase - 实体名称需要正确的OneAgent监控配置才能解析
- 时间范围应合理(避免无边界查询)
Related Skills
相关技能
- dt-dql-essentials - Core DQL syntax and query structure for log queries
- dt-obs-tracing - Correlate logs with distributed traces using trace IDs
- dt-obs-problems - Correlate logs with DAVIS-detected problems
- dt-dql-essentials - 用于日志查询的核心DQL语法和查询结构
- dt-obs-tracing - 使用追踪ID将日志与分布式追踪关联
- dt-obs-problems - 将日志与DAVIS检测到的问题关联