observability-analyze-logs

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Analyze Logs

日志分析

Table of Contents

目录

Core Sections

核心章节

Advanced Topics

进阶主题

Supporting Resources

支持资源

  • Technical Reference - OpenTelemetry format, data models, parsing logic, AI configuration
  • Response Templates - 8 response templates for different contexts
  • Related Documentation - Tool implementation, usage examples, LangChain client
  • Quick Reference Card - One-line command cheatsheet

  • 技术参考 - OpenTelemetry格式、数据模型、解析逻辑、AI配置
  • 响应模板 - 适用于不同场景的8种响应模板
  • 相关文档 - 工具实现、使用示例、LangChain客户端
  • 快速参考卡 - 单行命令速查表

Purpose

用途

Intelligent log analysis for any project using OpenTelemetry trace reconstruction and AI-powered error diagnosis. Parses OTEL-formatted logs, reconstructs execution traces, extracts errors with call chain context, and provides root cause analysis.
针对使用OpenTelemetry的项目提供智能日志分析,包括链路重建和AI驱动的错误诊断。解析OTEL格式的日志,重建执行链路,提取带有调用链上下文的错误信息,并提供根因分析。

What This Skill Does

该技能的功能

Intelligent log analysis for any project using OpenTelemetry trace reconstruction and AI-powered error diagnosis. Works with projects that generate OpenTelemetry-formatted logs in a configurable log directory.
Core capabilities:
  • Parse OpenTelemetry-formatted logs with trace/span IDs
  • Reconstruct complete execution traces
  • Extract errors with full call chain context
  • AI-powered root cause analysis
  • Multiple output formats (summary, markdown, JSON)
  • Advanced filtering (by error ID, trace ID, file)
针对使用OpenTelemetry的项目提供智能日志分析,包括链路重建和AI驱动的错误诊断。适用于在可配置日志目录中生成OpenTelemetry格式日志的项目。
核心能力:
  • 解析带有trace/span ID的OpenTelemetry格式日志
  • 重建完整的执行链路
  • 提取包含完整调用链上下文的错误信息
  • AI驱动的根因分析
  • 多种输出格式(摘要、markdown、JSON)
  • 高级过滤(按错误ID、链路ID、文件)

When to Use This Skill

何时使用该技能

Invoke this skill when users mention:
  • "check the logs"
  • "look at the logs"
  • "analyze errors"
  • "what's failing?"
  • "debug this issue"
  • "show me the traces"
  • "investigate the error"
  • "view log file"
  • Any mention of project log files ({{LOG_DIR}}/{{LOG_FILE}}.log)
当用户提及以下内容时调用该技能:
  • "查看日志"
  • "分析错误"
  • "哪里出问题了?"
  • "调试这个问题"
  • "展示执行链路"
  • "排查该错误"
  • "查看日志文件"
  • 任何提及项目日志文件的内容({{LOG_DIR}}/{{LOG_FILE}}.log)

Quick Start

快速开始

Most common usage (quick health check):
bash
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
This gives you a summary table with error IDs and trace IDs, perfect for quick health checks. Replace
{{LOG_DIR}}
with your project's log directory (e.g.,
logs
) and
{{LOG_FILE}}
with the log filename (e.g.,
app.log
).
最常用方式(快速健康检查):
bash
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
该命令会生成包含错误ID和链路ID的摘要表格,非常适合快速健康检查。将
{{LOG_DIR}}
替换为项目的日志目录(例如
logs
),将
{{LOG_FILE}}
替换为日志文件名(例如
app.log
)。

Instructions

操作指南

Step 1: Determine User's Need

步骤1:确定用户需求

Quick Health Check:
  • User asks: "Are there any errors?" or "What's happening in the logs?"
  • Action: Run summary mode (default)
Specific Error Investigation:
  • User mentions specific error or asks for details
  • Action: Get error ID from summary, then use --error-id
Trace-Based Debugging:
  • User asks "what led to this error?" or wants execution flow
  • Action: Use --trace with trace ID
File-Specific Analysis:
  • User mentions specific file or module
  • Action: Use --file filter
Real-Time Monitoring:
  • User wants to watch logs as they happen
  • Action: Use tail -f
快速健康检查:
  • 用户提问:"有错误吗?"或"日志里有什么情况?"
  • 操作:运行默认的摘要模式
特定错误排查:
  • 用户提及特定错误或要求查看详情
  • 操作:从摘要中获取错误ID,然后使用--error-id参数
基于链路的调试:
  • 用户提问"这个错误是怎么产生的?"或想要了解执行流
  • 操作:使用--trace参数搭配链路ID
特定文件分析:
  • 用户提及特定文件
  • 操作:使用--file过滤器
实时监控:
  • 用户想要实时查看日志
  • 操作:使用tail -f命令

Step 2: Choose Analysis Mode

步骤2:选择分析模式

Mode 1: Summary (Default) - Start here 90% of the time
bash
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
Output: Compact table with error IDs, trace IDs, file:line, function, and message preview.
Use when:
  • Initial investigation
  • Quick health check
  • Getting error IDs for deeper analysis
  • User asks "what errors do we have?"
Mode 2: Error Detail - Deep dive into specific error
bash
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
Output: Full error details including complete message, call chain, stack trace, related errors.
Use when:
  • User asks about specific error from summary
  • Need full error message (summary truncates)
  • Want to see complete stack trace
  • Investigating single failure
Mode 3: Trace Analysis - Understand execution flow
bash
python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown
Output: All errors in that trace with full execution context.
Use when:
  • Multiple related errors in same trace
  • Need to understand execution sequence
  • Debugging distributed operations
  • User asks "what happened in this execution?"
Mode 4: File Filter - Find all errors in specific file
bash
python3 .claude/tools/utils/log_analyzer.py --file database.py --format markdown
Output: All errors from that file with trace context.
Use when:
  • User mentions specific file
  • Investigating module-specific issues
  • Finding patterns in one component
Mode 5: Fast Parsing (No AI) - Quick and free
bash
python3 .claude/tools/utils/log_analyzer.py --no-ai
Output: Same as summary but skips AI analysis (faster, no cost).
Use when:
  • Quick checks during development
  • Want to avoid LLM costs
  • Just need parsed errors without analysis
  • Automated scripts or frequent polling
Mode 6: Real-Time Monitoring
bash
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log
Output: Live log stream (Ctrl+C to exit).
Use when:
  • Watching logs during testing
  • Monitoring server startup
  • Debugging in real-time
  • User runs operations and wants to see results
模式1:摘要(默认)- 90%的场景从这里开始
bash
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
输出:包含错误ID、链路ID、文件:行号、函数、消息预览的紧凑表格。
适用场景:
  • 初步排查
  • 快速健康检查
  • 获取用于深度分析的错误ID
  • 用户提问"我们有哪些错误?"
模式2:错误详情 - 深入分析特定错误
bash
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
输出:完整的错误详情,包括完整消息、调用链、堆栈跟踪、相关错误。
适用场景:
  • 用户询问摘要中的特定错误
  • 需要完整的错误消息(摘要会截断)
  • 想要查看完整的堆栈跟踪
  • 排查单个故障
模式3:链路分析 - 理解执行流
bash
python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown
输出:该链路中的所有错误及完整执行上下文。
适用场景:
  • 同一链路中存在多个相关错误
  • 需要理解执行顺序
  • 调试分布式操作
  • 用户提问"这次执行发生了什么?"
模式4:文件过滤 - 查找特定文件中的所有错误
bash
python3 .claude/tools/utils/log_analyzer.py --file database.py --format markdown
输出:该文件中的所有错误及链路上下文。
适用场景:
  • 用户提及特定文件
  • 排查模块特定问题
  • 查找单个组件中的错误模式
模式5:快速解析(无AI)- 快速且免费
bash
python3 .claude/tools/utils/log_analyzer.py --no-ai
输出:与摘要模式相同,但跳过AI分析(更快、无成本)。
适用场景:
  • 开发过程中的快速检查
  • 想要避免大语言模型成本
  • 仅需要解析后的错误信息,不需要分析
  • 自动化脚本或频繁轮询
模式6:实时监控
bash
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log
输出:实时日志流(按Ctrl+C退出)。
适用场景:
  • 测试期间查看日志
  • 监控服务器启动
  • 实时调试
  • 用户执行操作后想要查看结果

Step 3: Execute Analysis

步骤3:执行分析

Execute the appropriate command using the Bash tool:
bash
undefined
使用Bash工具执行相应命令:
bash
undefined

Example for summary

摘要示例

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
undefined
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
undefined

Step 4: Interpret Results

步骤4:解读结果

For Summary Output:
  1. Check "Total errors" count
  2. Scan error table for patterns (same file, same trace)
  3. Note error IDs for deeper investigation
  4. Use provided Quick Commands to drill down
For Detailed Output:
  1. Read Full Message (not truncated)
  2. Review Call Chain (execution flow leading to error)
  3. Check Related Errors (other failures in same trace)
  4. Examine Stack Trace (if available)
  5. Look for Recovery Attempts (logs after error)
For AI Analysis (markdown with --no-ai not set):
  1. Read Root Causes section
  2. Check Patterns (recurring issues)
  3. Review Priority (what to fix first)
  4. Follow Fixes (specific file:line changes)
  5. Consider Systemic Issues (larger architectural problems)
对于摘要输出:
  1. 查看"总错误数"
  2. 扫描错误表格寻找模式(相同文件、相同链路)
  3. 记录用于深度排查的错误ID
  4. 使用提供的快速命令进行深入分析
对于详细输出:
  1. 阅读完整消息(未截断)
  2. 查看调用链(导致错误的执行流)
  3. 检查相关错误(同一链路中的其他故障)
  4. 分析堆栈跟踪(如果有)
  5. 查看恢复尝试(错误后的日志)
对于AI分析(未设置--no-ai的markdown格式):
  1. 阅读根因分析部分
  2. 检查错误模式(重复出现的问题)
  3. 查看优先级(优先修复哪些问题)
  4. 遵循修复建议(特定文件:行号的修改)
  5. 考虑系统性问题(更大的架构问题)

Step 5: Report Findings

步骤5:报告发现

Always provide:
  1. Summary of error count and severity
  2. Most critical issues (from AI analysis or your judgment)
  3. Specific file:line references for user to investigate
  4. Suggested next steps or commands to run
Example response format:
Found 5 errors in the logs:

Critical Issues:
1. Neo4j connection failure in database.py:123 (trace: abc123)
   - Appears 3 times across different operations
   - Root cause: Connection timeout after 5s

2. Invalid config in settings.py:45 (trace: def456)
   - Missing required parameter 'api_key'

To investigate further:
- View error 1 details: python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
- See all Neo4j errors: python3 .claude/tools/utils/log_analyzer.py --file database.py --format markdown
始终需要提供:
  1. 错误数量和严重程度的摘要
  2. 最关键的问题(来自AI分析或你的判断)
  3. 供用户排查的特定文件:行号参考
  4. 建议的下一步操作或可运行的命令
示例响应格式:
在日志中发现5个错误:

关键问题:
1. database.py:123中Neo4j连接失败(链路:abc123)
   - 在不同操作中出现3次
   - 根因:5秒后连接超时

2. settings.py:45中配置无效(链路:def456)
   - 缺少必填参数'api_key'

进一步排查建议:
- 查看错误1详情:python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
- 查看所有Neo4j错误:python3 .claude/tools/utils/log_analyzer.py --file database.py --format markdown

Command Reference

命令参考

Basic Commands

基础命令

bash
undefined
bash
undefined

Quick summary (default)

快速摘要(默认)

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

Detailed markdown with AI analysis

带AI分析的详细markdown格式输出

python3 .claude/tools/utils/log_analyzer.py --format markdown
python3 .claude/tools/utils/log_analyzer.py --format markdown

JSON output for programmatic use

用于程序化调用的JSON输出

python3 .claude/tools/utils/log_analyzer.py --format json
undefined
python3 .claude/tools/utils/log_analyzer.py --format json
undefined

Filtering Commands

过滤命令

bash
undefined
bash
undefined

View specific error details (get ID from summary)

查看特定错误详情(从摘要中获取ID)

python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown

View all errors in a trace

查看某一链路中的所有错误

python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown
python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown

Filter by file name

按文件名过滤

python3 .claude/tools/utils/log_analyzer.py --file database.py
python3 .claude/tools/utils/log_analyzer.py --file database.py

Combine filters

组合过滤条件

python3 .claude/tools/utils/log_analyzer.py --file database.py --trace abc123 --format markdown
undefined
python3 .claude/tools/utils/log_analyzer.py --file database.py --trace abc123 --format markdown
undefined

Performance Commands

性能优化命令

bash
undefined
bash
undefined

Skip AI analysis (faster, free)

跳过AI分析(更快、免费)

python3 .claude/tools/utils/log_analyzer.py --no-ai
python3 .claude/tools/utils/log_analyzer.py --no-ai

Save to file instead of stdout

保存到文件而非标准输出

python3 .claude/tools/utils/log_analyzer.py --format markdown --output report.md
python3 .claude/tools/utils/log_analyzer.py --format markdown --output report.md

Real-time log monitoring

实时日志监控

tail -f {{LOG_DIR}}/{{LOG_FILE}}.log
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log

Last 50 lines

最后50行日志

tail -n 50 {{LOG_DIR}}/{{LOG_FILE}}.log
tail -n 50 {{LOG_DIR}}/{{LOG_FILE}}.log

Follow and filter for errors

实时监控并过滤错误日志

tail -f {{LOG_DIR}}/{{LOG_FILE}}.log | grep ERROR
undefined
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log | grep ERROR
undefined

Understanding the Output

理解输出内容

Log Format

日志格式

OpenTelemetry format with trace/span IDs:
2025-10-16 14:32:15 - [trace:abc123 | span:def456] - module.name - ERROR - [file.py:123] - function() - Error message
Key fields:
  • timestamp: When the log occurred
  • trace: Unique ID for entire execution (groups related logs)
  • span: Unique ID for operation within trace
  • module: Python module path
  • level: ERROR, WARNING, INFO, DEBUG
  • file:line: Source location
  • function: Function name
  • message: Log message
带有trace/span ID的OpenTelemetry格式:
2025-10-16 14:32:15 - [trace:abc123 | span:def456] - module.name - ERROR - [file.py:123] - function() - Error message
关键字段:
  • timestamp: 日志生成时间
  • trace: 整个执行的唯一ID(用于关联相关日志)
  • span: 链路中单个操作的唯一ID
  • module: Python模块路径
  • level: ERROR、WARNING、INFO、DEBUG
  • file:line: 代码位置
  • function: 函数名
  • message: 日志消息

Summary Table

摘要表格

ID   | Trace      | File:Line              | Function    | Message
-----|------------|------------------------|-------------|------------------
1    | abc123     | database.py:45         | connect     | Connection failed
2    | abc123     | database.py:67         | query       | No active connection
Columns explained:
  • ID: Error number (use with --error-id)
  • Trace: First 8 chars of trace ID (same trace = related errors)
  • File:Line: Where error occurred
  • Function: Function that logged error
  • Message: Preview (truncated, use --error-id for full message)
ID   | Trace      | File:Line              | Function    | Message
-----|------------|------------------------|-------------|------------------
1    | abc123     | database.py:45         | connect     | Connection failed
2    | abc123     | database.py:67         | query       | No active connection
列说明:
  • ID: 错误编号(与--error-id参数配合使用)
  • Trace: 链路ID的前8个字符(相同链路ID表示错误相关)
  • File:Line: 错误发生位置
  • Function: 记录错误的函数
  • Message: 消息预览(已截断,使用--error-id查看完整消息)

Trace Execution

执行链路

A trace represents one complete execution flow:
  • Starts at entry point (e.g., MCP tool call)
  • Includes all operations in that execution
  • Ends when execution completes
  • Multiple errors in same trace are related
Example trace flow:
Trace abc123:
  1. automatic_indexing() [deprecated] called
  2. connect_to_neo4j() called
  3. ERROR: Connection timeout
  4. retry_connection() called
  5. ERROR: Retry failed
All errors share trace ID
abc123
, so they're related failures.
链路代表一次完整的执行流:
  • 从入口点开始(例如MCP工具调用)
  • 包含该执行中的所有操作
  • 执行完成后结束
  • 同一链路中的多个错误是相关的
示例链路流:
Trace abc123:
  1. 调用了automatic_indexing() [已弃用]
  2. 调用了connect_to_neo4j()
  3. 错误:连接超时
  4. 调用了retry_connection()
  5. 错误:重试失败
所有错误共享链路ID
abc123
,因此它们是相关的故障。

Best Practices

最佳实践

1. Always Start with Summary

1. 始终从摘要模式开始

Don't jump straight to markdown or detail view. Get the overview first:
bash
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
不要直接跳转到markdown或详情视图。先获取整体概览:
bash
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

2. Use Error IDs for Deep Dives

2. 使用错误ID进行深度排查

Summary gives you error IDs. Use them:
bash
undefined
摘要会提供错误ID,合理使用它们:
bash
undefined

From summary, identify interesting error (e.g., ID 3)

从摘要中找到感兴趣的错误(例如ID 3)

python3 .claude/tools/utils/log_analyzer.py --error-id 3 --format markdown
undefined
python3 .claude/tools/utils/log_analyzer.py --error-id 3 --format markdown
undefined

3. Group Related Errors by Trace

3. 按链路ID分组相关错误

If multiple errors share a trace ID in summary, view the whole trace:
bash
python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown
如果摘要中多个错误共享同一链路ID,查看整个链路:
bash
python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown

4. Use --no-ai for Quick Checks

4. 使用--no-ai进行快速检查

During active development, skip AI to save time and cost:
bash
python3 .claude/tools/utils/log_analyzer.py --no-ai
在开发过程中,跳过AI分析以节省时间和成本:
bash
python3 .claude/tools/utils/log_analyzer.py --no-ai

5. Combine with Real-Time Monitoring

5. 结合实时监控

When running tests or operations:
bash
undefined
运行测试或操作时:
bash
undefined

Terminal 1: Run operation

终端1:运行操作

uv run pytest tests/integration/
uv run pytest tests/integration/

Terminal 2: Watch logs

终端2:查看日志

tail -f {{LOG_DIR}}/{{LOG_FILE}}.log
undefined
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log
undefined

6. Save Reports for Later

6. 保存报告供后续使用

Generate markdown reports for documentation:
bash
python3 .claude/tools/utils/log_analyzer.py --format markdown --output reports/$(date +%Y-%m-%d)-errors.md
生成markdown报告用于文档记录:
bash
python3 .claude/tools/utils/log_analyzer.py --format markdown --output reports/$(date +%Y-%m-%d)-errors.md

7. Check Logs BEFORE Making Changes

7. 先检查日志再做修改

When user reports an issue, check logs first:
  1. Run summary to see current state
  2. Identify error patterns
  3. Then make code changes
  4. Re-run summary to verify fix
当用户报告问题时,先检查日志:
  1. 运行摘要模式查看当前状态
  2. 识别错误模式
  3. 然后进行代码修改
  4. 重新运行摘要模式验证修复效果

Common Scenarios

常见场景

Scenario 1: User says "check the logs"

场景1:用户说“查看日志”

bash
undefined
bash
undefined

Step 1: Run summary

步骤1:运行摘要模式

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

Step 2: Report findings

步骤2:报告发现

"Found 3 errors. Most critical is Neo4j connection failure..."

“发现3个错误。最关键的是Neo4j连接失败...”

Step 3: Offer deeper analysis

步骤3:提供深度分析选项

"Would you like me to investigate error #1 in detail?"

“需要我详细排查错误#1吗?”

undefined
undefined

Scenario 2: User says "why did X fail?"

场景2:用户说“为什么X失败了?”

bash
undefined
bash
undefined

Step 1: Run summary to find X-related errors

步骤1:运行摘要模式查找与X相关的错误

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

Step 2: Get error ID for X

步骤2:获取X的错误ID

Step 3: Analyze that error

步骤3:分析该错误

python3 .claude/tools/utils/log_analyzer.py --error-id 2 --format markdown
python3 .claude/tools/utils/log_analyzer.py --error-id 2 --format markdown

Step 4: Explain root cause from call chain and AI analysis

步骤4:根据调用链和AI分析解释根因

undefined
undefined

Scenario 3: Debugging test failures

场景3:调试测试失败

bash
undefined
bash
undefined

Step 1: Monitor logs during test

步骤1:测试期间监控日志

tail -f {{LOG_DIR}}/{{LOG_FILE}}.log &
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log &

Step 2: Run tests

步骤2:运行测试

uv run pytest tests/integration/test_xyz.py -v
uv run pytest tests/integration/test_xyz.py -v

Step 3: Analyze results

步骤3:分析结果

python3 .claude/tools/utils/log_analyzer.py --file test_xyz.py --format markdown
undefined
python3 .claude/tools/utils/log_analyzer.py --file test_xyz.py --format markdown
undefined

Scenario 4: Production issue investigation

场景4:生产环境问题排查

bash
undefined
bash
undefined

Step 1: Get full report with AI analysis

步骤1:生成带AI分析的完整报告

python3 .claude/tools/utils/log_analyzer.py --format markdown --output incident-report.md
python3 .claude/tools/utils/log_analyzer.py --format markdown --output incident-report.md

Step 2: Review AI Root Causes and Fixes

步骤2:查看AI根因分析和修复建议

Step 3: Identify trace IDs for critical errors

步骤3:识别关键错误的链路ID

Step 4: Drill into specific traces

步骤4:深入分析特定链路

python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown
undefined
python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown
undefined

Usage Examples

使用示例

Example 1: Quick Health Check

示例1:快速健康检查

bash
undefined
bash
undefined

User says: "Check the logs"

用户说:“查看日志”

Run summary mode:

运行摘要模式:

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

Output: Summary table showing 3 errors

输出:显示3个错误的摘要表格

Action: Report findings, offer detailed investigation

操作:报告发现,提供详细分析选项

undefined
undefined

Example 2: Detailed Error Investigation

示例2:详细错误排查

bash
undefined
bash
undefined

User says: "What caused error #1?"

用户说:“错误#1是什么原因导致的?”

From summary, get error ID, then:

从摘要中获取错误ID,然后运行:

python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown

Output: Full error details with call chain, stack trace, AI analysis

输出:包含调用链、堆栈跟踪、AI分析的完整错误详情

Action: Explain root cause, suggest fix with file:line references

操作:解释根因,提供带文件:行号参考的修复建议

undefined
undefined

Example 3: Trace-Based Debugging

示例3:基于链路的调试

bash
undefined
bash
undefined

User says: "Show me everything that happened in that execution"

用户说:“展示那次执行的所有情况”

From summary, get trace ID, then:

从摘要中获取链路ID,然后运行:

python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown
python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown

Output: All errors in trace with execution context

输出:链路中的所有错误及执行上下文

Action: Explain execution flow, identify failure cascade

操作:解释执行流,识别故障连锁反应

undefined
undefined

Expected Outcomes

预期结果

Successful Analysis

成功分析

✓ Log Analysis Summary
  Total entries: 1,523
  Total traces: 12
  Errors found: 7
  Files affected: 4

  Error table with IDs, traces, files, functions, messages
  Quick commands for drill-down investigation
✓ 日志分析摘要
  总条目数:1,523
  总链路数:12
  发现错误数:7
  受影响文件数:4

  包含ID、链路、文件、函数、消息的错误表格
  用于深度排查的快速命令

Detailed Investigation

详细排查

✓ Markdown report with:
  - AI-powered root cause analysis
  - Complete execution traces
  - Call chains leading to errors
  - Related errors grouped by trace
  - File:line references for fixes
  - Stack traces (if available)
✓ Markdown报告包含:
  - AI驱动的根因分析
  - 完整的执行链路
  - 导致错误的调用链
  - 按链路分组的相关错误
  - 用于修复的文件:行号参考
  - 堆栈跟踪(如果有)

Integration Points

集成点

With CLAUDE.md Convention

与CLAUDE.md规范的集成

This skill implements the CLAUDE.md logging convention:
CLAUDE.md says:
When users say "Logs", analyze
{{LOG_DIR}}/{{LOG_FILE}}.log
from your project
This skill provides:
  1. Automatic log file location
  2. Multiple analysis modes
  3. AI-powered diagnosis
  4. Trace reconstruction
  5. Interactive debugging workflow
Recommended workflow:
  1. User says "Logs" → Run this skill
  2. Start with summary mode
  3. Identify critical errors
  4. Drill down with --error-id or --trace
  5. Report findings with file:line references
该技能实现了CLAUDE.md日志规范:
CLAUDE.md规定:
当用户说“Logs”时,分析项目中的
{{LOG_DIR}}/{{LOG_FILE}}.log
该技能提供:
  1. 自动定位日志文件
  2. 多种分析模式
  3. AI驱动的诊断
  4. 链路重建
  5. 交互式调试工作流
推荐工作流:
  1. 用户说“Logs” → 运行该技能
  2. 从摘要模式开始
  3. 识别关键错误
  4. 使用--error-id或--trace进行深度分析
  5. 提供带文件:行号参考的分析结果

With Debug Workflows

与调试工作流的集成

Integrates with test debugging and issue investigation:
Test fails → Check logs → Identify error → Fix code → Re-run → Verify
与测试调试和问题排查工作流集成:
测试失败 → 检查日志 → 识别错误 → 修复代码 → 重新运行 → 验证

With Quality Gates

与质量门禁的集成

Use in pre-commit checks and CI/CD:
bash
undefined
在提交前检查和CI/CD中使用:
bash
undefined

Before committing, check for new errors

提交前检查是否有新错误

python3 .claude/tools/utils/log_analyzer.py --no-ai
python3 .claude/tools/utils/log_analyzer.py --no-ai

If errors found, investigate and fix

如果发现错误,排查并修复

undefined
undefined

Expected Benefits

预期收益

MetricWithout SkillWith SkillImprovement
Error Diagnosis Time15-30 min (manual parsing)2-5 min (automated)6-10x faster
Root Cause Accuracy~60% (assumptions)~90% (AI analysis)50% improvement
Trace ReconstructionManual, error-proneAutomatic100% coverage
Context AwarenessLimited (single logs)Full (trace grouping)Complete context
Report GenerationManual markdownAutomatedInstant reports
指标无该技能时使用该技能后提升效果
错误诊断时间15-30分钟(手动解析)2-5分钟(自动化)快6-10倍
根因准确性~60%(基于假设)~90%(AI分析)提升50%
链路重建手动、易出错自动化100%覆盖
上下文感知有限(单条日志)完整(链路分组)完整上下文
报告生成手动编写markdown自动化即时生成

Success Metrics

成功指标

After implementing this skill:
  • <5 second analysis time - Fast log parsing and error extraction
  • 100% trace reconstruction - All related errors grouped by trace
  • 90% root cause accuracy - AI-powered diagnosis with high confidence
  • Markdown/JSON export - Automated report generation
  • Zero manual log parsing - Automated OpenTelemetry parsing
实施该技能后:
  • 分析时间<5秒 - 快速日志解析和错误提取
  • 100%链路重建 - 所有相关错误按链路分组
  • 90%根因准确性 - AI驱动的高可信度诊断
  • Markdown/JSON导出 - 自动化报告生成
  • 零手动日志解析 - 自动化OpenTelemetry解析

Validation Process

验证流程

Step 1: Quick Health Check

步骤1:快速健康检查

bash
undefined
bash
undefined

Summary mode for initial assessment

摘要模式用于初步评估

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
undefined
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
undefined

Step 2: Identify Critical Errors

步骤2:识别关键错误

bash
undefined
bash
undefined

Get error IDs from summary

从摘要中获取错误ID

Focus on errors in critical paths

重点关注关键路径中的错误

undefined
undefined

Step 3: Deep Dive Analysis

步骤3:深度分析

bash
undefined
bash
undefined

Use error IDs for detailed investigation

使用错误ID进行详细排查

python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
undefined
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
undefined

Step 4: Trace Reconstruction

步骤4:链路重建

bash
undefined
bash
undefined

Group related errors by trace

按链路分组相关错误

python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown
undefined
python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown
undefined

Step 5: Report Findings

步骤5:报告发现

bash
undefined
bash
undefined

Generate markdown report for documentation

生成markdown报告用于文档记录

python3 .claude/tools/utils/log_analyzer.py --format markdown --output report.md
undefined
python3 .claude/tools/utils/log_analyzer.py --format markdown --output report.md
undefined

Red Flags to Avoid

需要避免的误区

DON'T:
  • Parse logs manually with grep/sed (use analyzer tool)
  • Ignore trace IDs (they group related errors)
  • Look at single error in isolation (check full trace)
  • Skip AI analysis when investigating production issues
  • Assume errors are unrelated (verify with trace grouping)
  • Make changes without seeing actual log output
DO:
  • Always start with summary mode
  • Use error IDs for detailed investigation
  • Group errors by trace ID
  • Use --no-ai for quick checks, AI for production debugging
  • Generate markdown reports for documentation
  • Combine with real-time monitoring (tail -f)
不要:
  • 使用grep/sed手动解析日志(使用分析器工具)
  • 忽略链路ID(它们用于关联相关错误)
  • 孤立地查看单个错误(检查完整链路)
  • 排查生产环境问题时跳过AI分析
  • 假设错误无关(使用链路分组验证)
  • 未查看实际日志输出就进行修改
要:
  • 始终从摘要模式开始
  • 使用错误ID进行详细排查
  • 按链路ID分组错误
  • 快速检查使用--no-ai,生产环境调试使用AI
  • 生成markdown报告用于文档记录
  • 结合实时监控(tail -f)

Troubleshooting

故障排除

Error: Log file not found
bash
undefined
错误:日志文件未找到
bash
undefined

Check if log file exists

检查日志文件是否存在

ls -lh {{LOG_DIR}}/{{LOG_FILE}}.log
ls -lh {{LOG_DIR}}/{{LOG_FILE}}.log

If missing, run the MCP server first to generate logs

如果不存在,先运行MCP服务器生成日志

./run-mcp-server.sh

**Error: Permission denied**
```bash
./run-mcp-server.sh

**错误:权限被拒绝**
```bash

Make script executable

为脚本添加执行权限

chmod +x .claude/tools/utils/log_analyzer.py

**Error: Module not found (langchain_client)**
```bash
chmod +x .claude/tools/utils/log_analyzer.py

**错误:模块未找到(langchain_client)**
```bash

Ensure you're in project root

确保处于项目根目录

pwd # Should be your project root directory
pwd # 应该是你的项目根目录

Script adds parent dir to path automatically

脚本会自动将父目录添加到路径中


**AI analysis fails or skipped**
```bash

**AI分析失败或被跳过**
```bash

Use --no-ai flag to skip AI analysis

使用--no-ai标志跳过AI分析

python3 .claude/tools/utils/log_analyzer.py --no-ai
python3 .claude/tools/utils/log_analyzer.py --no-ai

Or check LangChain client configuration

或检查LangChain客户端配置

cat .claude/tools/langchain_client.py

**Empty log file**
```bash
cat .claude/tools/langchain_client.py

**日志文件为空**
```bash

Check log file size

检查日志文件大小

ls -lh {{LOG_DIR}}/{{LOG_FILE}}.log
ls -lh {{LOG_DIR}}/{{LOG_FILE}}.log

If empty, server hasn't run yet or logging not configured

如果为空,说明服务器尚未运行或未配置日志

Run server to generate logs

运行服务器生成日志

./run-mcp-server.sh
undefined
./run-mcp-server.sh
undefined

Advanced Usage

进阶用法

Custom Analysis Scripts

自定义分析脚本

See scripts/example_usage.py for advanced patterns:
The example script demonstrates:
  1. Basic log analysis with default settings and LangChain client
  2. Custom LLM client with budget tracking (monthly/daily limits)
  3. Parsing only (no LLM) - faster, free error extraction
  4. Filtering specific errors - finding Neo4j-related issues
  5. Trace-based analysis - grouping errors by execution trace
python
undefined
查看scripts/example_usage.py获取进阶模式:
示例脚本演示:
  1. 基础日志分析 - 使用默认设置和LangChain客户端
  2. 自定义LLM客户端 - 带预算跟踪(每月/每日限额)
  3. 仅解析(无LLM) - 更快、免费的错误提取
  4. 过滤特定错误 - 查找与Neo4j相关的问题
  5. 基于链路的分析 - 按执行链路分组错误
python
undefined

Example: Parse without LLM (from scripts/example_usage.py)

示例:不使用LLM进行解析(来自scripts/example_usage.py)

from utils.log_analyzer import OTelLogParser, ErrorExtractor
parser = OTelLogParser() entries = parser.parse_file(Path("{{LOG_DIR}}/{{LOG_FILE}}.log"))
extractor = ErrorExtractor(context_lines=3) errors = extractor.extract_errors(entries)
from utils.log_analyzer import OTelLogParser, ErrorExtractor
parser = OTelLogParser() entries = parser.parse_file(Path("{{LOG_DIR}}/{{LOG_FILE}}.log"))
extractor = ErrorExtractor(context_lines=3) errors = extractor.extract_errors(entries)

Filter specific patterns

过滤特定模式

neo4j_errors = [e for e in errors if "Neo4j" in e.error.message]

**Run the examples:**
```bash
python3 .claude/skills/observability-analyze-logs/scripts/example_usage.py
neo4j_errors = [e for e in errors if "Neo4j" in e.error.message]

**运行示例:**
```bash
python3 .claude/skills/observability-analyze-logs/scripts/example_usage.py

Programmatic Access

程序化访问

bash
undefined
bash
undefined

Get JSON output for scripts

获取JSON输出供脚本使用

python3 .claude/tools/utils/log_analyzer.py --format json > errors.json
python3 .claude/tools/utils/log_analyzer.py --format json > errors.json

Parse with jq

使用jq解析

python3 .claude/tools/utils/log_analyzer.py --format json | jq '.errors[] | select(.file == "database.py")'
undefined
python3 .claude/tools/utils/log_analyzer.py --format json | jq '.errors[] | select(.file == "database.py")'
undefined

Budget Tracking

预算跟踪

The analyzer uses LangChain client with budget tracking. See example_usage.py for details on monitoring costs.
分析器使用带预算跟踪的LangChain客户端。查看example_usage.py了解成本监控详情。

Requirements

要求

Tools needed:
  • Python 3.12+ (already in project)
  • Log analyzer tool:
    .claude/tools/utils/log_analyzer.py
    (bundled)
  • LangChain client:
    .claude/tools/langchain_client.py
    (bundled)
Log file:
  • Default location:
    {{LOG_DIR}}/{{LOG_FILE}}.log
    (customize for your project)
  • Generated by running your project's server startup script (e.g.,
    ./run-server.sh
    )
Dependencies:
  • Standard library modules (no additional installation needed)
  • LangChain client uses project's existing dependencies
Verification:
bash
undefined
所需工具:
  • Python 3.12+(已包含在项目中)
  • 日志分析器工具:
    .claude/tools/utils/log_analyzer.py
    (已捆绑)
  • LangChain客户端:
    .claude/tools/langchain_client.py
    (已捆绑)
日志文件:
  • 默认位置:
    {{LOG_DIR}}/{{LOG_FILE}}.log
    (可根据项目自定义)
  • 运行项目的服务器启动脚本生成(例如
    ./run-server.sh
依赖项:
  • 标准库模块(无需额外安装)
  • LangChain客户端使用项目现有依赖
验证:
bash
undefined

Verify log analyzer exists

验证日志分析器是否存在

ls .claude/tools/utils/log_analyzer.py
ls .claude/tools/utils/log_analyzer.py

Verify log file exists (or generate by running server)

验证日志文件是否存在(或运行服务器生成)

ls {{LOG_DIR}}/{{LOG_FILE}}.log || ./run-server.sh
ls {{LOG_DIR}}/{{LOG_FILE}}.log || ./run-server.sh

Test the analyzer

测试分析器

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
undefined
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
undefined

Supporting Files

支持文件

This skill follows progressive disclosure with supporting files for depth:
  • references/reference.md - Technical documentation:
    • OpenTelemetry log format specification
    • Data models (LogEntry, TraceExecution, ErrorContext)
    • Parsing logic and trace reconstruction algorithm
    • AI analysis configuration and cost tracking
    • Performance characteristics and scaling limits
  • references/log-rotation-guide.md - Log rotation configuration and management
  • references/log-rotation-analysis.md - Log rotation analysis and patterns
  • references/logging-and-rotation-guide.md - Complete logging and rotation guide
  • templates/response-template.md - Response formatting:
    • 8 response templates for different contexts
    • Summary, detailed error, trace analysis, file-specific
    • AI-powered pattern analysis, real-time monitoring
    • No errors found, before/after comparison
该技能采用渐进式披露,提供深度支持文件:
  • references/reference.md - 技术文档:
    • OpenTelemetry日志格式规范
    • 数据模型(LogEntry、TraceExecution、ErrorContext)
    • 解析逻辑和链路重建算法
    • AI分析配置和成本跟踪
    • 性能特征和扩展限制
  • references/log-rotation-guide.md - 日志轮转配置和管理
  • references/log-rotation-analysis.md - 日志轮转分析和模式
  • references/logging-and-rotation-guide.md - 完整的日志和轮转指南
  • templates/response-template.md - 响应格式:
    • 适用于不同场景的8种响应模板
    • 摘要、详细错误、链路分析、特定文件
    • AI驱动的模式分析、实时监控
    • 未发现错误、前后对比

Expected Outcomes

预期结果

Successful Analysis

成功分析

Summary mode output:
✓ Log Analysis Summary
  Total entries: 1,523
  Total traces: 12
  Errors found: 7
  Files affected: 4

  Error table with IDs, traces, files, functions, messages
  Quick commands for drill-down investigation
Detailed mode output:
✓ Markdown report with:
  - AI-powered root cause analysis
  - Complete execution traces
  - Call chains leading to errors
  - Related errors grouped by trace
  - File:line references for fixes
  - Stack traces (if available)
摘要模式输出:
✓ 日志分析摘要
  总条目数:1,523
  总链路数:12
  发现错误数:7
  受影响文件数:4

  包含ID、链路、文件、函数、消息的错误表格
  用于深度排查的快速命令
详细模式输出:
✓ Markdown报告包含:
  - AI驱动的根因分析
  - 完整的执行链路
  - 导致错误的调用链
  - 按链路分组的相关错误
  - 用于修复的文件:行号参考
  - 堆栈跟踪(如果有)

Common Outcomes

常见结果

  • No errors found: "✓ HEALTHY - Analyzed 1,234 entries, no errors found"
  • Configuration issues: Clear error with fix steps (e.g., "Start Neo4j:
    neo4j start
    ")
  • Test failures: Trace showing execution flow to failure with root cause
  • Production issues: AI analysis with priority ranking and systemic issues
  • 未发现错误:"✓ 健康状态 - 分析了1,234条条目,未发现错误"
  • 配置问题:清晰的错误信息及修复步骤(例如"启动Neo4j:
    neo4j start
    ")
  • 测试失败:显示执行流到故障点的链路及根因
  • 生产环境问题:带优先级排序和系统性问题分析的AI结果

Related Documentation

相关文档

  • Tool implementation:
    .claude/tools/utils/log_analyzer.py
  • Usage examples:
    .claude/tools/utils/example_usage.py
  • LangChain client:
    .claude/tools/langchain_client.py
  • CLAUDE.md conventions:
    CLAUDE.md
    (section on "Logs")
  • Log file location:
    {{LOG_DIR}}/{{LOG_FILE}}.log
  • 工具实现:
    .claude/tools/utils/log_analyzer.py
  • 使用示例:
    .claude/tools/utils/example_usage.py
  • LangChain客户端:
    .claude/tools/langchain_client.py
  • CLAUDE.md规范:
    CLAUDE.md
    (“Logs”部分)
  • 日志文件位置:
    {{LOG_DIR}}/{{LOG_FILE}}.log

Quick Reference Card

快速参考卡

GoalCommand
Health check
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
Detailed error
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
Trace analysis
python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown
File-specific
python3 .claude/tools/utils/log_analyzer.py --file database.py
Fast & free
python3 .claude/tools/utils/log_analyzer.py --no-ai
Real-time
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log
Save report
python3 .claude/tools/utils/log_analyzer.py --format markdown -o report.md
JSON output
python3 .claude/tools/utils/log_analyzer.py --format json

Remember: Always start with summary mode, then drill down using error IDs or trace IDs based on what you find.
目标命令
健康检查
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
详细错误分析
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
链路分析
python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown
特定文件分析
python3 .claude/tools/utils/log_analyzer.py --file database.py
快速免费检查
python3 .claude/tools/utils/log_analyzer.py --no-ai
实时监控
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log
保存报告
python3 .claude/tools/utils/log_analyzer.py --format markdown -o report.md
JSON输出
python3 .claude/tools/utils/log_analyzer.py --format json

注意: 始终从摘要模式开始,然后根据发现的内容使用错误ID或链路ID进行深度分析。