observability-analyze-logs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Analyze Logs

日志分析

What This Skill Does - Intelligent OpenTelemetry log analysis with trace reconstruction
When to Use This Skill - Trigger phrases and common scenarios
Quick Start - Most common usage for quick health checks
Analysis Workflow - Complete step-by-step implementation guide
- Step 1: Determine User's Need - Identify analysis type (health check, error investigation, trace debugging)
- Step 2: Choose Analysis Mode - 6 modes: Summary, Error Detail, Trace Analysis, File Filter, Fast Parsing, Real-Time
- Step 3: Execute Analysis - Running commands with Bash tool
- Step 4: Interpret Results - Understanding summary tables, detailed output, AI analysis
- Step 5: Report Findings - Communicating results to users
Command Reference - All analyzer commands with examples
- Basic Commands (summary, markdown, JSON)
- Filtering Commands (--error-id, --trace, --file)
- Performance Commands (--no-ai, --output, tail)
Understanding the Output - Log format, summary tables, trace execution
Best Practices - 7 essential practices for effective log analysis
Common Scenarios - Real-world examples with commands
Integration with CLAUDE.md - How this skill implements CLAUDE.md conventions

该技能的功能 - 具备链路重建能力的智能OpenTelemetry日志分析
何时使用该技能 - 触发短语与常见场景
快速开始 - 用于快速健康检查的最常用方式
分析工作流 - 完整的分步实施指南
- 步骤1：确定用户需求 - 识别分析类型（健康检查、错误排查、链路调试）
- 步骤2：选择分析模式 - 6种模式：摘要、错误详情、链路分析、文件过滤、快速解析、实时监控
- 步骤3：执行分析 - 使用Bash工具运行命令
- 步骤4：解读结果 - 理解摘要表格、详细输出、AI分析内容
- 步骤5：报告发现 - 向用户传达分析结果
命令参考 - 所有分析器命令及示例
- 基础命令（summary、markdown、JSON）
- 过滤命令（--error-id、--trace、--file）
- 性能优化命令（--no-ai、--output、tail）
理解输出内容 - 日志格式、摘要表格、执行链路
最佳实践 - 7项有效日志分析的关键实践
常见场景 - 带命令的真实案例
与CLAUDE.md的集成 - 该技能如何实现CLAUDE.md规范

Advanced Topics

进阶主题

Troubleshooting - Common issues and fixes
Advanced Usage - Custom scripts, programmatic access, budget tracking
Expected Outcomes - Success criteria and common results
Requirements - Tools, files, dependencies, verification

故障排除 - 常见问题及修复方案
进阶用法 - 自定义脚本、程序化访问、预算跟踪
预期结果 - 成功标准与常见输出
要求 - 工具、文件、依赖项、验证方法

Supporting Resources

支持资源

Technical Reference - OpenTelemetry format, data models, parsing logic, AI configuration
Response Templates - 8 response templates for different contexts
Related Documentation - Tool implementation, usage examples, LangChain client
Quick Reference Card - One-line command cheatsheet

技术参考 - OpenTelemetry格式、数据模型、解析逻辑、AI配置
响应模板 - 适用于不同场景的8种响应模板
相关文档 - 工具实现、使用示例、LangChain客户端
快速参考卡 - 单行命令速查表

Purpose

用途

Intelligent log analysis for any project using OpenTelemetry trace reconstruction and AI-powered error diagnosis. Parses OTEL-formatted logs, reconstructs execution traces, extracts errors with call chain context, and provides root cause analysis.

针对使用OpenTelemetry的项目提供智能日志分析，包括链路重建和AI驱动的错误诊断。解析OTEL格式的日志，重建执行链路，提取带有调用链上下文的错误信息，并提供根因分析。

What This Skill Does

该技能的功能

Intelligent log analysis for any project using OpenTelemetry trace reconstruction and AI-powered error diagnosis. Works with projects that generate OpenTelemetry-formatted logs in a configurable log directory.

Core capabilities:

Parse OpenTelemetry-formatted logs with trace/span IDs
Reconstruct complete execution traces
Extract errors with full call chain context
AI-powered root cause analysis
Multiple output formats (summary, markdown, JSON)
Advanced filtering (by error ID, trace ID, file)

针对使用OpenTelemetry的项目提供智能日志分析，包括链路重建和AI驱动的错误诊断。适用于在可配置日志目录中生成OpenTelemetry格式日志的项目。

核心能力：

解析带有trace/span ID的OpenTelemetry格式日志
重建完整的执行链路
提取包含完整调用链上下文的错误信息
AI驱动的根因分析
多种输出格式（摘要、markdown、JSON）
高级过滤（按错误ID、链路ID、文件）

When to Use This Skill

何时使用该技能

Invoke this skill when users mention:

"check the logs"
"look at the logs"
"analyze errors"
"what's failing?"
"debug this issue"
"show me the traces"
"investigate the error"
"view log file"
Any mention of project log files ({{LOG_DIR}}/{{LOG_FILE}}.log)

当用户提及以下内容时调用该技能：

"查看日志"
"分析错误"
"哪里出问题了？"
"调试这个问题"
"展示执行链路"
"排查该错误"
"查看日志文件"
任何提及项目日志文件的内容（{{LOG_DIR}}/{{LOG_FILE}}.log）

Quick Start

快速开始

Most common usage (quick health check):

bash

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

This gives you a summary table with error IDs and trace IDs, perfect for quick health checks. Replace

{{LOG_DIR}}

with your project's log directory (e.g.,

logs

) and

{{LOG_FILE}}

with the log filename (e.g.,

app.log

最常用方式（快速健康检查）：

bash

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

该命令会生成包含错误ID和链路ID的摘要表格，非常适合快速健康检查。将

{{LOG_DIR}}

替换为项目的日志目录（例如

logs

），将

{{LOG_FILE}}

替换为日志文件名（例如

app.log

）。

Instructions

操作指南

Step 1: Determine User's Need

步骤1：确定用户需求

Quick Health Check:

User asks: "Are there any errors?" or "What's happening in the logs?"
Action: Run summary mode (default)

Specific Error Investigation:

User mentions specific error or asks for details
Action: Get error ID from summary, then use --error-id

Trace-Based Debugging:

User asks "what led to this error?" or wants execution flow
Action: Use --trace with trace ID

File-Specific Analysis:

User mentions specific file or module
Action: Use --file filter

Real-Time Monitoring:

User wants to watch logs as they happen
Action: Use tail -f

快速健康检查：

用户提问："有错误吗？"或"日志里有什么情况？"
操作：运行默认的摘要模式

特定错误排查：

用户提及特定错误或要求查看详情
操作：从摘要中获取错误ID，然后使用--error-id参数

基于链路的调试：

用户提问"这个错误是怎么产生的？"或想要了解执行流
操作：使用--trace参数搭配链路ID

特定文件分析：

用户提及特定文件
操作：使用--file过滤器

实时监控：

用户想要实时查看日志
操作：使用tail -f命令

Step 2: Choose Analysis Mode

步骤2：选择分析模式

Mode 1: Summary (Default) - Start here 90% of the time

bash

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

Output: Compact table with error IDs, trace IDs, file:line, function, and message preview.

Use when:

Initial investigation
Quick health check
Getting error IDs for deeper analysis
User asks "what errors do we have?"

Mode 2: Error Detail - Deep dive into specific error

bash

python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown

Output: Full error details including complete message, call chain, stack trace, related errors.

Use when:

User asks about specific error from summary
Need full error message (summary truncates)
Want to see complete stack trace
Investigating single failure

Mode 3: Trace Analysis - Understand execution flow

bash

python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown

Output: All errors in that trace with full execution context.

Use when:

Multiple related errors in same trace
Need to understand execution sequence
Debugging distributed operations
User asks "what happened in this execution?"

Mode 4: File Filter - Find all errors in specific file

bash

python3 .claude/tools/utils/log_analyzer.py --file database.py --format markdown

Output: All errors from that file with trace context.

Use when:

User mentions specific file
Investigating module-specific issues
Finding patterns in one component

Mode 5: Fast Parsing (No AI) - Quick and free

bash

python3 .claude/tools/utils/log_analyzer.py --no-ai

Output: Same as summary but skips AI analysis (faster, no cost).

Use when:

Quick checks during development
Want to avoid LLM costs
Just need parsed errors without analysis
Automated scripts or frequent polling

Mode 6: Real-Time Monitoring

bash

tail -f {{LOG_DIR}}/{{LOG_FILE}}.log

Output: Live log stream (Ctrl+C to exit).

Use when:

Watching logs during testing
Monitoring server startup
Debugging in real-time
User runs operations and wants to see results

模式1：摘要（默认）- 90%的场景从这里开始

bash

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

输出：包含错误ID、链路ID、文件:行号、函数、消息预览的紧凑表格。

适用场景：

初步排查
快速健康检查
获取用于深度分析的错误ID
用户提问"我们有哪些错误？"

模式2：错误详情 - 深入分析特定错误

bash

python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown

输出：完整的错误详情，包括完整消息、调用链、堆栈跟踪、相关错误。

适用场景：

用户询问摘要中的特定错误
需要完整的错误消息（摘要会截断）
想要查看完整的堆栈跟踪
排查单个故障

模式3：链路分析 - 理解执行流

bash

python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown

输出：该链路中的所有错误及完整执行上下文。

适用场景：

同一链路中存在多个相关错误
需要理解执行顺序
调试分布式操作
用户提问"这次执行发生了什么？"

模式4：文件过滤 - 查找特定文件中的所有错误

bash

python3 .claude/tools/utils/log_analyzer.py --file database.py --format markdown

输出：该文件中的所有错误及链路上下文。

适用场景：

用户提及特定文件
排查模块特定问题
查找单个组件中的错误模式

模式5：快速解析（无AI）- 快速且免费

bash

python3 .claude/tools/utils/log_analyzer.py --no-ai

输出：与摘要模式相同，但跳过AI分析（更快、无成本）。

适用场景：

开发过程中的快速检查
想要避免大语言模型成本
仅需要解析后的错误信息，不需要分析
自动化脚本或频繁轮询

模式6：实时监控

bash

tail -f {{LOG_DIR}}/{{LOG_FILE}}.log

输出：实时日志流（按Ctrl+C退出）。

适用场景：

测试期间查看日志
监控服务器启动
实时调试
用户执行操作后想要查看结果

Step 3: Execute Analysis

步骤3：执行分析

Execute the appropriate command using the Bash tool:

bash

undefined

使用Bash工具执行相应命令：

bash

undefined

Example for summary

摘要示例

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

undefined

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

undefined

Step 4: Interpret Results

步骤4：解读结果

For Summary Output:

Check "Total errors" count
Scan error table for patterns (same file, same trace)
Note error IDs for deeper investigation
Use provided Quick Commands to drill down

For Detailed Output:

Read Full Message (not truncated)
Review Call Chain (execution flow leading to error)
Check Related Errors (other failures in same trace)
Examine Stack Trace (if available)
Look for Recovery Attempts (logs after error)

For AI Analysis (markdown with --no-ai not set):

Read Root Causes section
Check Patterns (recurring issues)
Review Priority (what to fix first)
Follow Fixes (specific file:line changes)
Consider Systemic Issues (larger architectural problems)

对于摘要输出：

查看"总错误数"
扫描错误表格寻找模式（相同文件、相同链路）
记录用于深度排查的错误ID
使用提供的快速命令进行深入分析

对于详细输出：

阅读完整消息（未截断）
查看调用链（导致错误的执行流）
检查相关错误（同一链路中的其他故障）
分析堆栈跟踪（如果有）
查看恢复尝试（错误后的日志）

对于AI分析（未设置--no-ai的markdown格式）：

阅读根因分析部分
检查错误模式（重复出现的问题）
查看优先级（优先修复哪些问题）
遵循修复建议（特定文件:行号的修改）
考虑系统性问题（更大的架构问题）

Step 5: Report Findings

步骤5：报告发现

Always provide:

Summary of error count and severity
Most critical issues (from AI analysis or your judgment)
Specific file:line references for user to investigate
Suggested next steps or commands to run

Example response format:

Found 5 errors in the logs:

Critical Issues:
1. Neo4j connection failure in database.py:123 (trace: abc123)
   - Appears 3 times across different operations
   - Root cause: Connection timeout after 5s

2. Invalid config in settings.py:45 (trace: def456)
   - Missing required parameter 'api_key'

To investigate further:
- View error 1 details: python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
- See all Neo4j errors: python3 .claude/tools/utils/log_analyzer.py --file database.py --format markdown

始终需要提供：

错误数量和严重程度的摘要
最关键的问题（来自AI分析或你的判断）
供用户排查的特定文件:行号参考
建议的下一步操作或可运行的命令

示例响应格式：

在日志中发现5个错误：

关键问题：
1. database.py:123中Neo4j连接失败（链路：abc123）
   - 在不同操作中出现3次
   - 根因：5秒后连接超时

2. settings.py:45中配置无效（链路：def456）
   - 缺少必填参数'api_key'

进一步排查建议：
- 查看错误1详情：python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
- 查看所有Neo4j错误：python3 .claude/tools/utils/log_analyzer.py --file database.py --format markdown

Command Reference

命令参考

Basic Commands

基础命令

bash

undefined

bash

undefined

Quick summary (default)

快速摘要（默认）

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

Detailed markdown with AI analysis

带AI分析的详细markdown格式输出

python3 .claude/tools/utils/log_analyzer.py --format markdown

JSON output for programmatic use

用于程序化调用的JSON输出

python3 .claude/tools/utils/log_analyzer.py --format json

undefined

python3 .claude/tools/utils/log_analyzer.py --format json

undefined

Filtering Commands

过滤命令

bash

undefined

bash

undefined

View specific error details (get ID from summary)

查看特定错误详情（从摘要中获取ID）

python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown

View all errors in a trace

查看某一链路中的所有错误

python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown

Filter by file name

按文件名过滤

python3 .claude/tools/utils/log_analyzer.py --file database.py

Combine filters

组合过滤条件

python3 .claude/tools/utils/log_analyzer.py --file database.py --trace abc123 --format markdown

undefined

python3 .claude/tools/utils/log_analyzer.py --file database.py --trace abc123 --format markdown

undefined

Performance Commands

性能优化命令

bash

undefined

bash

undefined

Skip AI analysis (faster, free)

跳过AI分析（更快、免费）

python3 .claude/tools/utils/log_analyzer.py --no-ai

Save to file instead of stdout

保存到文件而非标准输出

python3 .claude/tools/utils/log_analyzer.py --format markdown --output report.md

Real-time log monitoring

实时日志监控

tail -f {{LOG_DIR}}/{{LOG_FILE}}.log

Last 50 lines

最后50行日志

tail -n 50 {{LOG_DIR}}/{{LOG_FILE}}.log

Follow and filter for errors

实时监控并过滤错误日志

tail -f {{LOG_DIR}}/{{LOG_FILE}}.log | grep ERROR

undefined

tail -f {{LOG_DIR}}/{{LOG_FILE}}.log | grep ERROR

undefined

Understanding the Output

理解输出内容

Log Format

日志格式

OpenTelemetry format with trace/span IDs:

2025-10-16 14:32:15 - [trace:abc123 | span:def456] - module.name - ERROR - [file.py:123] - function() - Error message

Key fields:

timestamp: When the log occurred
trace: Unique ID for entire execution (groups related logs)
span: Unique ID for operation within trace
module: Python module path
level: ERROR, WARNING, INFO, DEBUG
file:line: Source location
function: Function name
message: Log message

带有trace/span ID的OpenTelemetry格式：

2025-10-16 14:32:15 - [trace:abc123 | span:def456] - module.name - ERROR - [file.py:123] - function() - Error message

关键字段：

timestamp: 日志生成时间
trace: 整个执行的唯一ID（用于关联相关日志）
span: 链路中单个操作的唯一ID
module: Python模块路径
level: ERROR、WARNING、INFO、DEBUG
file:line: 代码位置
function: 函数名
message: 日志消息

Summary Table

摘要表格

ID   | Trace      | File:Line              | Function    | Message
-----|------------|------------------------|-------------|------------------
1    | abc123     | database.py:45         | connect     | Connection failed
2    | abc123     | database.py:67         | query       | No active connection

Columns explained:

ID: Error number (use with --error-id)
Trace: First 8 chars of trace ID (same trace = related errors)
File:Line: Where error occurred
Function: Function that logged error
Message: Preview (truncated, use --error-id for full message)

ID   | Trace      | File:Line              | Function    | Message
-----|------------|------------------------|-------------|------------------
1    | abc123     | database.py:45         | connect     | Connection failed
2    | abc123     | database.py:67         | query       | No active connection

列说明：

ID: 错误编号（与--error-id参数配合使用）
Trace: 链路ID的前8个字符（相同链路ID表示错误相关）
File:Line: 错误发生位置
Function: 记录错误的函数
Message: 消息预览（已截断，使用--error-id查看完整消息）

Trace Execution

执行链路

A trace represents one complete execution flow:

Starts at entry point (e.g., MCP tool call)
Includes all operations in that execution
Ends when execution completes
Multiple errors in same trace are related

Example trace flow:

Trace abc123:
  1. automatic_indexing() [deprecated] called
  2. connect_to_neo4j() called
  3. ERROR: Connection timeout
  4. retry_connection() called
  5. ERROR: Retry failed

All errors share trace ID

abc123

, so they're related failures.

链路代表一次完整的执行流：

从入口点开始（例如MCP工具调用）
包含该执行中的所有操作
执行完成后结束
同一链路中的多个错误是相关的

示例链路流：

Trace abc123:
  1. 调用了automatic_indexing() [已弃用]
  2. 调用了connect_to_neo4j()
  3. 错误：连接超时
  4. 调用了retry_connection()
  5. 错误：重试失败

所有错误共享链路ID

abc123

，因此它们是相关的故障。

Best Practices

最佳实践

1. Always Start with Summary

1. 始终从摘要模式开始

Don't jump straight to markdown or detail view. Get the overview first:

bash

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

不要直接跳转到markdown或详情视图。先获取整体概览：

bash

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

2. Use Error IDs for Deep Dives

2. 使用错误ID进行深度排查

Summary gives you error IDs. Use them:

bash

undefined

摘要会提供错误ID，合理使用它们：

bash

undefined

From summary, identify interesting error (e.g., ID 3)

从摘要中找到感兴趣的错误（例如ID 3）

python3 .claude/tools/utils/log_analyzer.py --error-id 3 --format markdown

undefined

python3 .claude/tools/utils/log_analyzer.py --error-id 3 --format markdown

undefined

3. Group Related Errors by Trace

3. 按链路ID分组相关错误

If multiple errors share a trace ID in summary, view the whole trace:

bash

python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown

如果摘要中多个错误共享同一链路ID，查看整个链路：

bash

python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown

4. Use --no-ai for Quick Checks

4. 使用--no-ai进行快速检查

During active development, skip AI to save time and cost:

bash

python3 .claude/tools/utils/log_analyzer.py --no-ai

在开发过程中，跳过AI分析以节省时间和成本：

bash

python3 .claude/tools/utils/log_analyzer.py --no-ai

5. Combine with Real-Time Monitoring

5. 结合实时监控

When running tests or operations:

bash

undefined

运行测试或操作时：

bash

undefined

Terminal 1: Run operation

终端1：运行操作

uv run pytest tests/integration/

Terminal 2: Watch logs

终端2：查看日志

tail -f {{LOG_DIR}}/{{LOG_FILE}}.log

undefined

tail -f {{LOG_DIR}}/{{LOG_FILE}}.log

undefined

6. Save Reports for Later

6. 保存报告供后续使用

Generate markdown reports for documentation:

bash

python3 .claude/tools/utils/log_analyzer.py --format markdown --output reports/$(date +%Y-%m-%d)-errors.md

生成markdown报告用于文档记录：

bash

python3 .claude/tools/utils/log_analyzer.py --format markdown --output reports/$(date +%Y-%m-%d)-errors.md

7. Check Logs BEFORE Making Changes

7. 先检查日志再做修改

When user reports an issue, check logs first:

Run summary to see current state
Identify error patterns
Then make code changes
Re-run summary to verify fix

当用户报告问题时，先检查日志：

运行摘要模式查看当前状态
识别错误模式
然后进行代码修改
重新运行摘要模式验证修复效果

Common Scenarios

常见场景

Scenario 1: User says "check the logs"

场景1：用户说“查看日志”

bash

undefined

bash

undefined

Step 1: Run summary

步骤1：运行摘要模式

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

Step 2: Report findings

步骤2：报告发现

"Found 3 errors. Most critical is Neo4j connection failure..."

“发现3个错误。最关键的是Neo4j连接失败...”

Step 3: Offer deeper analysis

步骤3：提供深度分析选项

"Would you like me to investigate error #1 in detail?"

“需要我详细排查错误#1吗？”

undefined

undefined

Scenario 2: User says "why did X fail?"

场景2：用户说“为什么X失败了？”

bash

undefined

bash

undefined

Step 1: Run summary to find X-related errors

步骤1：运行摘要模式查找与X相关的错误

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

Step 2: Get error ID for X

步骤2：获取X的错误ID

Step 3: Analyze that error

步骤3：分析该错误

python3 .claude/tools/utils/log_analyzer.py --error-id 2 --format markdown

Step 4: Explain root cause from call chain and AI analysis

步骤4：根据调用链和AI分析解释根因

undefined

undefined

Scenario 3: Debugging test failures

场景3：调试测试失败

bash

undefined

bash

undefined

Step 1: Monitor logs during test

步骤1：测试期间监控日志

tail -f {{LOG_DIR}}/{{LOG_FILE}}.log &

Step 2: Run tests

步骤2：运行测试

uv run pytest tests/integration/test_xyz.py -v

Step 3: Analyze results

步骤3：分析结果

python3 .claude/tools/utils/log_analyzer.py --file test_xyz.py --format markdown

undefined

python3 .claude/tools/utils/log_analyzer.py --file test_xyz.py --format markdown

undefined

Scenario 4: Production issue investigation

场景4：生产环境问题排查

bash

undefined

bash

undefined

Step 1: Get full report with AI analysis

步骤1：生成带AI分析的完整报告

python3 .claude/tools/utils/log_analyzer.py --format markdown --output incident-report.md

Step 2: Review AI Root Causes and Fixes

步骤2：查看AI根因分析和修复建议

Step 3: Identify trace IDs for critical errors

步骤3：识别关键错误的链路ID

Step 4: Drill into specific traces

步骤4：深入分析特定链路

python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown

undefined

python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown

undefined

Usage Examples

使用示例

Example 1: Quick Health Check

示例1：快速健康检查

bash

undefined

bash

undefined

User says: "Check the logs"

用户说：“查看日志”

Run summary mode:

运行摘要模式：

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

Output: Summary table showing 3 errors

输出：显示3个错误的摘要表格

Action: Report findings, offer detailed investigation

操作：报告发现，提供详细分析选项

undefined

undefined

Example 2: Detailed Error Investigation

示例2：详细错误排查

bash

undefined

bash

undefined

User says: "What caused error #1?"

用户说：“错误#1是什么原因导致的？”

From summary, get error ID, then:

从摘要中获取错误ID，然后运行：

python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown

Output: Full error details with call chain, stack trace, AI analysis

输出：包含调用链、堆栈跟踪、AI分析的完整错误详情

Action: Explain root cause, suggest fix with file:line references

操作：解释根因，提供带文件:行号参考的修复建议

undefined

undefined

Example 3: Trace-Based Debugging

示例3：基于链路的调试

bash

undefined

bash

undefined

User says: "Show me everything that happened in that execution"

用户说：“展示那次执行的所有情况”

From summary, get trace ID, then:

从摘要中获取链路ID，然后运行：

python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown

Output: All errors in trace with execution context

输出：链路中的所有错误及执行上下文

Action: Explain execution flow, identify failure cascade

操作：解释执行流，识别故障连锁反应

undefined

undefined

Expected Outcomes

预期结果

Successful Analysis

成功分析

✓ Log Analysis Summary
  Total entries: 1,523
  Total traces: 12
  Errors found: 7
  Files affected: 4

  Error table with IDs, traces, files, functions, messages
  Quick commands for drill-down investigation

✓ 日志分析摘要
  总条目数：1,523
  总链路数：12
  发现错误数：7
  受影响文件数：4

  包含ID、链路、文件、函数、消息的错误表格
  用于深度排查的快速命令

Detailed Investigation

详细排查

✓ Markdown report with:
  - AI-powered root cause analysis
  - Complete execution traces
  - Call chains leading to errors
  - Related errors grouped by trace
  - File:line references for fixes
  - Stack traces (if available)

✓ Markdown报告包含：
  - AI驱动的根因分析
  - 完整的执行链路
  - 导致错误的调用链
  - 按链路分组的相关错误
  - 用于修复的文件:行号参考
  - 堆栈跟踪（如果有）

Integration Points

集成点

With CLAUDE.md Convention

与CLAUDE.md规范的集成

This skill implements the CLAUDE.md logging convention:

CLAUDE.md says:

When users say "Logs", analyze
{{LOG_DIR}}/{{LOG_FILE}}.log
from your project

This skill provides:

Automatic log file location
Multiple analysis modes
AI-powered diagnosis
Trace reconstruction
Interactive debugging workflow

Recommended workflow:

User says "Logs" → Run this skill
Start with summary mode
Identify critical errors
Drill down with --error-id or --trace
Report findings with file:line references

该技能实现了CLAUDE.md日志规范：

CLAUDE.md规定：

当用户说“Logs”时，分析项目中的
{{LOG_DIR}}/{{LOG_FILE}}.log

该技能提供：

自动定位日志文件
多种分析模式
AI驱动的诊断
链路重建
交互式调试工作流

推荐工作流：

用户说“Logs” → 运行该技能
从摘要模式开始
识别关键错误
使用--error-id或--trace进行深度分析
提供带文件:行号参考的分析结果

With Debug Workflows

与调试工作流的集成

Integrates with test debugging and issue investigation:

Test fails → Check logs → Identify error → Fix code → Re-run → Verify

与测试调试和问题排查工作流集成：

测试失败 → 检查日志 → 识别错误 → 修复代码 → 重新运行 → 验证

With Quality Gates

与质量门禁的集成

Use in pre-commit checks and CI/CD:

bash

undefined

在提交前检查和CI/CD中使用：

bash

undefined

Before committing, check for new errors

提交前检查是否有新错误

python3 .claude/tools/utils/log_analyzer.py --no-ai

If errors found, investigate and fix

如果发现错误，排查并修复

undefined

undefined

Expected Benefits

预期收益

Metric	Without Skill	With Skill	Improvement
Error Diagnosis Time	15-30 min (manual parsing)	2-5 min (automated)	6-10x faster
Root Cause Accuracy	~60% (assumptions)	~90% (AI analysis)	50% improvement
Trace Reconstruction	Manual, error-prone	Automatic	100% coverage
Context Awareness	Limited (single logs)	Full (trace grouping)	Complete context
Report Generation	Manual markdown	Automated	Instant reports

指标	无该技能时	使用该技能后	提升效果
错误诊断时间	15-30分钟（手动解析）	2-5分钟（自动化）	快6-10倍
根因准确性	~60%（基于假设）	~90%（AI分析）	提升50%
链路重建	手动、易出错	自动化	100%覆盖
上下文感知	有限（单条日志）	完整（链路分组）	完整上下文
报告生成	手动编写markdown	自动化	即时生成

Success Metrics

成功指标

After implementing this skill:

<5 second analysis time - Fast log parsing and error extraction
100% trace reconstruction - All related errors grouped by trace
90% root cause accuracy - AI-powered diagnosis with high confidence
Markdown/JSON export - Automated report generation
Zero manual log parsing - Automated OpenTelemetry parsing

实施该技能后：

分析时间<5秒 - 快速日志解析和错误提取
100%链路重建 - 所有相关错误按链路分组
90%根因准确性 - AI驱动的高可信度诊断
Markdown/JSON导出 - 自动化报告生成
零手动日志解析 - 自动化OpenTelemetry解析

Validation Process

验证流程

Step 1: Quick Health Check

步骤1：快速健康检查

bash

undefined

bash

undefined

Summary mode for initial assessment

摘要模式用于初步评估

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

undefined

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

undefined

Step 2: Identify Critical Errors

步骤2：识别关键错误

bash

undefined

bash

undefined

Get error IDs from summary

从摘要中获取错误ID

Focus on errors in critical paths

重点关注关键路径中的错误

undefined

undefined

Step 3: Deep Dive Analysis

步骤3：深度分析

bash

undefined

bash

undefined

Use error IDs for detailed investigation

使用错误ID进行详细排查

python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown

undefined

python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown

undefined

Step 4: Trace Reconstruction

步骤4：链路重建

bash

undefined

bash

undefined

Group related errors by trace

按链路分组相关错误

python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown

undefined

python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown

undefined

Step 5: Report Findings

步骤5：报告发现

bash

undefined

bash

undefined

Generate markdown report for documentation

生成markdown报告用于文档记录

python3 .claude/tools/utils/log_analyzer.py --format markdown --output report.md

undefined

python3 .claude/tools/utils/log_analyzer.py --format markdown --output report.md

undefined

Red Flags to Avoid

需要避免的误区

❌ DON'T:

Parse logs manually with grep/sed (use analyzer tool)
Ignore trace IDs (they group related errors)
Look at single error in isolation (check full trace)
Skip AI analysis when investigating production issues
Assume errors are unrelated (verify with trace grouping)
Make changes without seeing actual log output

✅ DO:

Always start with summary mode
Use error IDs for detailed investigation
Group errors by trace ID
Use --no-ai for quick checks, AI for production debugging
Generate markdown reports for documentation
Combine with real-time monitoring (tail -f)

❌ 不要：

使用grep/sed手动解析日志（使用分析器工具）
忽略链路ID（它们用于关联相关错误）
孤立地查看单个错误（检查完整链路）
排查生产环境问题时跳过AI分析
假设错误无关（使用链路分组验证）
未查看实际日志输出就进行修改

✅ 要：

始终从摘要模式开始
使用错误ID进行详细排查
按链路ID分组错误
快速检查使用--no-ai，生产环境调试使用AI
生成markdown报告用于文档记录
结合实时监控（tail -f）

Troubleshooting

故障排除

Error: Log file not found

bash

undefined

错误：日志文件未找到

bash

undefined

Check if log file exists

检查日志文件是否存在

ls -lh {{LOG_DIR}}/{{LOG_FILE}}.log

If missing, run the MCP server first to generate logs

如果不存在，先运行MCP服务器生成日志

./run-mcp-server.sh


**Error: Permission denied**
```bash

./run-mcp-server.sh


**错误：权限被拒绝**
```bash

Make script executable

为脚本添加执行权限

chmod +x .claude/tools/utils/log_analyzer.py


**Error: Module not found (langchain_client)**
```bash

chmod +x .claude/tools/utils/log_analyzer.py


**错误：模块未找到（langchain_client）**
```bash

Ensure you're in project root

确保处于项目根目录

pwd # Should be your project root directory

pwd # 应该是你的项目根目录

Script adds parent dir to path automatically

脚本会自动将父目录添加到路径中


**AI analysis fails or skipped**
```bash


**AI分析失败或被跳过**
```bash

Use --no-ai flag to skip AI analysis

使用--no-ai标志跳过AI分析

python3 .claude/tools/utils/log_analyzer.py --no-ai

Or check LangChain client configuration

或检查LangChain客户端配置

cat .claude/tools/langchain_client.py


**Empty log file**
```bash

cat .claude/tools/langchain_client.py


**日志文件为空**
```bash

Check log file size

检查日志文件大小

ls -lh {{LOG_DIR}}/{{LOG_FILE}}.log

If empty, server hasn't run yet or logging not configured

如果为空，说明服务器尚未运行或未配置日志

Run server to generate logs

运行服务器生成日志

./run-mcp-server.sh

undefined

./run-mcp-server.sh

undefined

Advanced Usage

进阶用法

Custom Analysis Scripts

自定义分析脚本

See scripts/example_usage.py for advanced patterns:

The example script demonstrates:

Basic log analysis with default settings and LangChain client
Custom LLM client with budget tracking (monthly/daily limits)
Parsing only (no LLM) - faster, free error extraction
Filtering specific errors - finding Neo4j-related issues
Trace-based analysis - grouping errors by execution trace

python

undefined

查看scripts/example_usage.py获取进阶模式：

示例脚本演示：

基础日志分析 - 使用默认设置和LangChain客户端
自定义LLM客户端 - 带预算跟踪（每月/每日限额）
仅解析（无LLM） - 更快、免费的错误提取
过滤特定错误 - 查找与Neo4j相关的问题
基于链路的分析 - 按执行链路分组错误

python

undefined

Example: Parse without LLM (from scripts/example_usage.py)

示例：不使用LLM进行解析（来自scripts/example_usage.py）

from utils.log_analyzer import OTelLogParser, ErrorExtractor

parser = OTelLogParser() entries = parser.parse_file(Path("{{LOG_DIR}}/{{LOG_FILE}}.log"))

extractor = ErrorExtractor(context_lines=3) errors = extractor.extract_errors(entries)

from utils.log_analyzer import OTelLogParser, ErrorExtractor

parser = OTelLogParser() entries = parser.parse_file(Path("{{LOG_DIR}}/{{LOG_FILE}}.log"))

extractor = ErrorExtractor(context_lines=3) errors = extractor.extract_errors(entries)

Filter specific patterns

过滤特定模式

neo4j_errors = [e for e in errors if "Neo4j" in e.error.message]


**Run the examples:**
```bash
python3 .claude/skills/observability-analyze-logs/scripts/example_usage.py

neo4j_errors = [e for e in errors if "Neo4j" in e.error.message]


**运行示例：**
```bash
python3 .claude/skills/observability-analyze-logs/scripts/example_usage.py

Programmatic Access

程序化访问

bash

undefined

bash

undefined

Get JSON output for scripts

获取JSON输出供脚本使用

python3 .claude/tools/utils/log_analyzer.py --format json > errors.json

Parse with jq

使用jq解析

python3 .claude/tools/utils/log_analyzer.py --format json | jq '.errors[] | select(.file == "database.py")'

undefined

python3 .claude/tools/utils/log_analyzer.py --format json | jq '.errors[] | select(.file == "database.py")'

undefined

Budget Tracking

预算跟踪

The analyzer uses LangChain client with budget tracking. See example_usage.py for details on monitoring costs.

分析器使用带预算跟踪的LangChain客户端。查看example_usage.py了解成本监控详情。

Requirements

要求

Tools needed:

Python 3.12+ (already in project)
Log analyzer tool:
```
.claude/tools/utils/log_analyzer.py
```
(bundled)
LangChain client:
```
.claude/tools/langchain_client.py
```
(bundled)

Log file:

Default location:
```
{{LOG_DIR}}/{{LOG_FILE}}.log
```
(customize for your project)
Generated by running your project's server startup script (e.g.,
```
./run-server.sh
```
)

Dependencies:

Standard library modules (no additional installation needed)
LangChain client uses project's existing dependencies

Verification:

bash

undefined

所需工具：

Python 3.12+（已包含在项目中）
日志分析器工具：
```
.claude/tools/utils/log_analyzer.py
```
（已捆绑）
LangChain客户端：
```
.claude/tools/langchain_client.py
```
（已捆绑）

日志文件：

默认位置：
```
{{LOG_DIR}}/{{LOG_FILE}}.log
```
（可根据项目自定义）
运行项目的服务器启动脚本生成（例如
```
./run-server.sh
```
）

依赖项：

标准库模块（无需额外安装）
LangChain客户端使用项目现有依赖

验证：

bash

undefined

Verify log analyzer exists

验证日志分析器是否存在

ls .claude/tools/utils/log_analyzer.py

Verify log file exists (or generate by running server)

验证日志文件是否存在（或运行服务器生成）

ls {{LOG_DIR}}/{{LOG_FILE}}.log || ./run-server.sh

Test the analyzer

测试分析器

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

undefined

python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log

undefined

Supporting Files

支持文件

This skill follows progressive disclosure with supporting files for depth:

references/reference.md - Technical documentation:
- OpenTelemetry log format specification
- Data models (LogEntry, TraceExecution, ErrorContext)
- Parsing logic and trace reconstruction algorithm
- AI analysis configuration and cost tracking
- Performance characteristics and scaling limits
references/log-rotation-guide.md - Log rotation configuration and management
references/log-rotation-analysis.md - Log rotation analysis and patterns
references/logging-and-rotation-guide.md - Complete logging and rotation guide
templates/response-template.md - Response formatting:
- 8 response templates for different contexts
- Summary, detailed error, trace analysis, file-specific
- AI-powered pattern analysis, real-time monitoring
- No errors found, before/after comparison

该技能采用渐进式披露，提供深度支持文件：

references/reference.md - 技术文档：
- OpenTelemetry日志格式规范
- 数据模型（LogEntry、TraceExecution、ErrorContext）
- 解析逻辑和链路重建算法
- AI分析配置和成本跟踪
- 性能特征和扩展限制
references/log-rotation-guide.md - 日志轮转配置和管理
references/log-rotation-analysis.md - 日志轮转分析和模式
references/logging-and-rotation-guide.md - 完整的日志和轮转指南
templates/response-template.md - 响应格式：
- 适用于不同场景的8种响应模板
- 摘要、详细错误、链路分析、特定文件
- AI驱动的模式分析、实时监控
- 未发现错误、前后对比

Expected Outcomes

预期结果

Successful Analysis

成功分析

Summary mode output:

✓ Log Analysis Summary
  Total entries: 1,523
  Total traces: 12
  Errors found: 7
  Files affected: 4

  Error table with IDs, traces, files, functions, messages
  Quick commands for drill-down investigation

Detailed mode output:

✓ Markdown report with:
  - AI-powered root cause analysis
  - Complete execution traces
  - Call chains leading to errors
  - Related errors grouped by trace
  - File:line references for fixes
  - Stack traces (if available)

摘要模式输出：

✓ 日志分析摘要
  总条目数：1,523
  总链路数：12
  发现错误数：7
  受影响文件数：4

  包含ID、链路、文件、函数、消息的错误表格
  用于深度排查的快速命令

详细模式输出：

✓ Markdown报告包含：
  - AI驱动的根因分析
  - 完整的执行链路
  - 导致错误的调用链
  - 按链路分组的相关错误
  - 用于修复的文件:行号参考
  - 堆栈跟踪（如果有）

Common Outcomes

常见结果

No errors found: "✓ HEALTHY - Analyzed 1,234 entries, no errors found"
Configuration issues: Clear error with fix steps (e.g., "Start Neo4j:
```
neo4j start
```
")
Test failures: Trace showing execution flow to failure with root cause
Production issues: AI analysis with priority ranking and systemic issues

未发现错误："✓ 健康状态 - 分析了1,234条条目，未发现错误"
配置问题：清晰的错误信息及修复步骤（例如"启动Neo4j：
```
neo4j start
```
"）
测试失败：显示执行流到故障点的链路及根因
生产环境问题：带优先级排序和系统性问题分析的AI结果

Quick Reference Card

快速参考卡

Goal	Command
Health check	`python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log`
Detailed error	`python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown`
Trace analysis	`python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown`
File-specific	`python3 .claude/tools/utils/log_analyzer.py --file database.py`
Fast & free	`python3 .claude/tools/utils/log_analyzer.py --no-ai`
Real-time	`tail -f {{LOG_DIR}}/{{LOG_FILE}}.log`
Save report	`python3 .claude/tools/utils/log_analyzer.py --format markdown -o report.md`
JSON output	`python3 .claude/tools/utils/log_analyzer.py --format json`

Remember: Always start with summary mode, then drill down using error IDs or trace IDs based on what you find.

目标	命令
健康检查	`python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log`
详细错误分析	`python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown`
链路分析	`python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown`
特定文件分析	`python3 .claude/tools/utils/log_analyzer.py --file database.py`
快速免费检查	`python3 .claude/tools/utils/log_analyzer.py --no-ai`
实时监控	`tail -f {{LOG_DIR}}/{{LOG_FILE}}.log`
保存报告	`python3 .claude/tools/utils/log_analyzer.py --format markdown -o report.md`
JSON输出	`python3 .claude/tools/utils/log_analyzer.py --format json`

注意： 始终从摘要模式开始，然后根据发现的内容使用错误ID或链路ID进行深度分析。

observability-analyze-logs

Original

Translation

Analyze Logs

日志分析

Table of Contents

目录

Core Sections

核心章节

Advanced Topics

进阶主题

Supporting Resources

支持资源

Purpose

用途

What This Skill Does

该技能的功能

When to Use This Skill

何时使用该技能

Quick Start

快速开始

Instructions

操作指南

Step 1: Determine User's Need

步骤1：确定用户需求

Step 2: Choose Analysis Mode

步骤2：选择分析模式

Step 3: Execute Analysis

步骤3：执行分析

Example for summary

摘要示例

Step 4: Interpret Results

步骤4：解读结果

Step 5: Report Findings

步骤5：报告发现

Command Reference

命令参考

Basic Commands

基础命令

Quick summary (default)

快速摘要（默认）

Detailed markdown with AI analysis

带AI分析的详细markdown格式输出

JSON output for programmatic use

用于程序化调用的JSON输出

Filtering Commands

过滤命令

View specific error details (get ID from summary)

查看特定错误详情（从摘要中获取ID）

View all errors in a trace

查看某一链路中的所有错误

Filter by file name

按文件名过滤

Combine filters

组合过滤条件

Performance Commands

性能优化命令

Skip AI analysis (faster, free)

跳过AI分析（更快、免费）

Save to file instead of stdout

保存到文件而非标准输出

Real-time log monitoring

实时日志监控

Last 50 lines

最后50行日志

Follow and filter for errors

实时监控并过滤错误日志

Understanding the Output

理解输出内容

Log Format

日志格式

Summary Table

摘要表格

Trace Execution

执行链路

Best Practices

最佳实践

1. Always Start with Summary

1. 始终从摘要模式开始

2. Use Error IDs for Deep Dives