observability-analyze-logs
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAnalyze Logs
日志分析
Table of Contents
目录
Core Sections
核心章节
- What This Skill Does - Intelligent OpenTelemetry log analysis with trace reconstruction
- When to Use This Skill - Trigger phrases and common scenarios
- Quick Start - Most common usage for quick health checks
- Analysis Workflow - Complete step-by-step implementation guide
- Step 1: Determine User's Need - Identify analysis type (health check, error investigation, trace debugging)
- Step 2: Choose Analysis Mode - 6 modes: Summary, Error Detail, Trace Analysis, File Filter, Fast Parsing, Real-Time
- Step 3: Execute Analysis - Running commands with Bash tool
- Step 4: Interpret Results - Understanding summary tables, detailed output, AI analysis
- Step 5: Report Findings - Communicating results to users
- Command Reference - All analyzer commands with examples
- Basic Commands (summary, markdown, JSON)
- Filtering Commands (--error-id, --trace, --file)
- Performance Commands (--no-ai, --output, tail)
- Understanding the Output - Log format, summary tables, trace execution
- Best Practices - 7 essential practices for effective log analysis
- Common Scenarios - Real-world examples with commands
- Integration with CLAUDE.md - How this skill implements CLAUDE.md conventions
- 该技能的功能 - 具备链路重建能力的智能OpenTelemetry日志分析
- 何时使用该技能 - 触发短语与常见场景
- 快速开始 - 用于快速健康检查的最常用方式
- 分析工作流 - 完整的分步实施指南
- 步骤1:确定用户需求 - 识别分析类型(健康检查、错误排查、链路调试)
- 步骤2:选择分析模式 - 6种模式:摘要、错误详情、链路分析、文件过滤、快速解析、实时监控
- 步骤3:执行分析 - 使用Bash工具运行命令
- 步骤4:解读结果 - 理解摘要表格、详细输出、AI分析内容
- 步骤5:报告发现 - 向用户传达分析结果
- 命令参考 - 所有分析器命令及示例
- 基础命令(summary、markdown、JSON)
- 过滤命令(--error-id、--trace、--file)
- 性能优化命令(--no-ai、--output、tail)
- 理解输出内容 - 日志格式、摘要表格、执行链路
- 最佳实践 - 7项有效日志分析的关键实践
- 常见场景 - 带命令的真实案例
- 与CLAUDE.md的集成 - 该技能如何实现CLAUDE.md规范
Advanced Topics
进阶主题
- Troubleshooting - Common issues and fixes
- Advanced Usage - Custom scripts, programmatic access, budget tracking
- Expected Outcomes - Success criteria and common results
- Requirements - Tools, files, dependencies, verification
Supporting Resources
支持资源
- Technical Reference - OpenTelemetry format, data models, parsing logic, AI configuration
- Response Templates - 8 response templates for different contexts
- Related Documentation - Tool implementation, usage examples, LangChain client
- Quick Reference Card - One-line command cheatsheet
Purpose
用途
Intelligent log analysis for any project using OpenTelemetry trace reconstruction and AI-powered error diagnosis. Parses OTEL-formatted logs, reconstructs execution traces, extracts errors with call chain context, and provides root cause analysis.
针对使用OpenTelemetry的项目提供智能日志分析,包括链路重建和AI驱动的错误诊断。解析OTEL格式的日志,重建执行链路,提取带有调用链上下文的错误信息,并提供根因分析。
What This Skill Does
该技能的功能
Intelligent log analysis for any project using OpenTelemetry trace reconstruction and AI-powered error diagnosis. Works with projects that generate OpenTelemetry-formatted logs in a configurable log directory.
Core capabilities:
- Parse OpenTelemetry-formatted logs with trace/span IDs
- Reconstruct complete execution traces
- Extract errors with full call chain context
- AI-powered root cause analysis
- Multiple output formats (summary, markdown, JSON)
- Advanced filtering (by error ID, trace ID, file)
针对使用OpenTelemetry的项目提供智能日志分析,包括链路重建和AI驱动的错误诊断。适用于在可配置日志目录中生成OpenTelemetry格式日志的项目。
核心能力:
- 解析带有trace/span ID的OpenTelemetry格式日志
- 重建完整的执行链路
- 提取包含完整调用链上下文的错误信息
- AI驱动的根因分析
- 多种输出格式(摘要、markdown、JSON)
- 高级过滤(按错误ID、链路ID、文件)
When to Use This Skill
何时使用该技能
Invoke this skill when users mention:
- "check the logs"
- "look at the logs"
- "analyze errors"
- "what's failing?"
- "debug this issue"
- "show me the traces"
- "investigate the error"
- "view log file"
- Any mention of project log files ({{LOG_DIR}}/{{LOG_FILE}}.log)
当用户提及以下内容时调用该技能:
- "查看日志"
- "分析错误"
- "哪里出问题了?"
- "调试这个问题"
- "展示执行链路"
- "排查该错误"
- "查看日志文件"
- 任何提及项目日志文件的内容({{LOG_DIR}}/{{LOG_FILE}}.log)
Quick Start
快速开始
Most common usage (quick health check):
bash
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.logThis gives you a summary table with error IDs and trace IDs, perfect for quick health checks. Replace with your project's log directory (e.g., ) and with the log filename (e.g., ).
{{LOG_DIR}}logs{{LOG_FILE}}app.log最常用方式(快速健康检查):
bash
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log该命令会生成包含错误ID和链路ID的摘要表格,非常适合快速健康检查。将替换为项目的日志目录(例如),将替换为日志文件名(例如)。
{{LOG_DIR}}logs{{LOG_FILE}}app.logInstructions
操作指南
Step 1: Determine User's Need
步骤1:确定用户需求
Quick Health Check:
- User asks: "Are there any errors?" or "What's happening in the logs?"
- Action: Run summary mode (default)
Specific Error Investigation:
- User mentions specific error or asks for details
- Action: Get error ID from summary, then use --error-id
Trace-Based Debugging:
- User asks "what led to this error?" or wants execution flow
- Action: Use --trace with trace ID
File-Specific Analysis:
- User mentions specific file or module
- Action: Use --file filter
Real-Time Monitoring:
- User wants to watch logs as they happen
- Action: Use tail -f
快速健康检查:
- 用户提问:"有错误吗?"或"日志里有什么情况?"
- 操作:运行默认的摘要模式
特定错误排查:
- 用户提及特定错误或要求查看详情
- 操作:从摘要中获取错误ID,然后使用--error-id参数
基于链路的调试:
- 用户提问"这个错误是怎么产生的?"或想要了解执行流
- 操作:使用--trace参数搭配链路ID
特定文件分析:
- 用户提及特定文件
- 操作:使用--file过滤器
实时监控:
- 用户想要实时查看日志
- 操作:使用tail -f命令
Step 2: Choose Analysis Mode
步骤2:选择分析模式
Mode 1: Summary (Default) - Start here 90% of the time
bash
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.logOutput: Compact table with error IDs, trace IDs, file:line, function, and message preview.
Use when:
- Initial investigation
- Quick health check
- Getting error IDs for deeper analysis
- User asks "what errors do we have?"
Mode 2: Error Detail - Deep dive into specific error
bash
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdownOutput: Full error details including complete message, call chain, stack trace, related errors.
Use when:
- User asks about specific error from summary
- Need full error message (summary truncates)
- Want to see complete stack trace
- Investigating single failure
Mode 3: Trace Analysis - Understand execution flow
bash
python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdownOutput: All errors in that trace with full execution context.
Use when:
- Multiple related errors in same trace
- Need to understand execution sequence
- Debugging distributed operations
- User asks "what happened in this execution?"
Mode 4: File Filter - Find all errors in specific file
bash
python3 .claude/tools/utils/log_analyzer.py --file database.py --format markdownOutput: All errors from that file with trace context.
Use when:
- User mentions specific file
- Investigating module-specific issues
- Finding patterns in one component
Mode 5: Fast Parsing (No AI) - Quick and free
bash
python3 .claude/tools/utils/log_analyzer.py --no-aiOutput: Same as summary but skips AI analysis (faster, no cost).
Use when:
- Quick checks during development
- Want to avoid LLM costs
- Just need parsed errors without analysis
- Automated scripts or frequent polling
Mode 6: Real-Time Monitoring
bash
tail -f {{LOG_DIR}}/{{LOG_FILE}}.logOutput: Live log stream (Ctrl+C to exit).
Use when:
- Watching logs during testing
- Monitoring server startup
- Debugging in real-time
- User runs operations and wants to see results
模式1:摘要(默认)- 90%的场景从这里开始
bash
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log输出:包含错误ID、链路ID、文件:行号、函数、消息预览的紧凑表格。
适用场景:
- 初步排查
- 快速健康检查
- 获取用于深度分析的错误ID
- 用户提问"我们有哪些错误?"
模式2:错误详情 - 深入分析特定错误
bash
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown输出:完整的错误详情,包括完整消息、调用链、堆栈跟踪、相关错误。
适用场景:
- 用户询问摘要中的特定错误
- 需要完整的错误消息(摘要会截断)
- 想要查看完整的堆栈跟踪
- 排查单个故障
模式3:链路分析 - 理解执行流
bash
python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown输出:该链路中的所有错误及完整执行上下文。
适用场景:
- 同一链路中存在多个相关错误
- 需要理解执行顺序
- 调试分布式操作
- 用户提问"这次执行发生了什么?"
模式4:文件过滤 - 查找特定文件中的所有错误
bash
python3 .claude/tools/utils/log_analyzer.py --file database.py --format markdown输出:该文件中的所有错误及链路上下文。
适用场景:
- 用户提及特定文件
- 排查模块特定问题
- 查找单个组件中的错误模式
模式5:快速解析(无AI)- 快速且免费
bash
python3 .claude/tools/utils/log_analyzer.py --no-ai输出:与摘要模式相同,但跳过AI分析(更快、无成本)。
适用场景:
- 开发过程中的快速检查
- 想要避免大语言模型成本
- 仅需要解析后的错误信息,不需要分析
- 自动化脚本或频繁轮询
模式6:实时监控
bash
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log输出:实时日志流(按Ctrl+C退出)。
适用场景:
- 测试期间查看日志
- 监控服务器启动
- 实时调试
- 用户执行操作后想要查看结果
Step 3: Execute Analysis
步骤3:执行分析
Execute the appropriate command using the Bash tool:
bash
undefined使用Bash工具执行相应命令:
bash
undefinedExample for summary
摘要示例
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
undefinedpython3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
undefinedStep 4: Interpret Results
步骤4:解读结果
For Summary Output:
- Check "Total errors" count
- Scan error table for patterns (same file, same trace)
- Note error IDs for deeper investigation
- Use provided Quick Commands to drill down
For Detailed Output:
- Read Full Message (not truncated)
- Review Call Chain (execution flow leading to error)
- Check Related Errors (other failures in same trace)
- Examine Stack Trace (if available)
- Look for Recovery Attempts (logs after error)
For AI Analysis (markdown with --no-ai not set):
- Read Root Causes section
- Check Patterns (recurring issues)
- Review Priority (what to fix first)
- Follow Fixes (specific file:line changes)
- Consider Systemic Issues (larger architectural problems)
对于摘要输出:
- 查看"总错误数"
- 扫描错误表格寻找模式(相同文件、相同链路)
- 记录用于深度排查的错误ID
- 使用提供的快速命令进行深入分析
对于详细输出:
- 阅读完整消息(未截断)
- 查看调用链(导致错误的执行流)
- 检查相关错误(同一链路中的其他故障)
- 分析堆栈跟踪(如果有)
- 查看恢复尝试(错误后的日志)
对于AI分析(未设置--no-ai的markdown格式):
- 阅读根因分析部分
- 检查错误模式(重复出现的问题)
- 查看优先级(优先修复哪些问题)
- 遵循修复建议(特定文件:行号的修改)
- 考虑系统性问题(更大的架构问题)
Step 5: Report Findings
步骤5:报告发现
Always provide:
- Summary of error count and severity
- Most critical issues (from AI analysis or your judgment)
- Specific file:line references for user to investigate
- Suggested next steps or commands to run
Example response format:
Found 5 errors in the logs:
Critical Issues:
1. Neo4j connection failure in database.py:123 (trace: abc123)
- Appears 3 times across different operations
- Root cause: Connection timeout after 5s
2. Invalid config in settings.py:45 (trace: def456)
- Missing required parameter 'api_key'
To investigate further:
- View error 1 details: python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
- See all Neo4j errors: python3 .claude/tools/utils/log_analyzer.py --file database.py --format markdown始终需要提供:
- 错误数量和严重程度的摘要
- 最关键的问题(来自AI分析或你的判断)
- 供用户排查的特定文件:行号参考
- 建议的下一步操作或可运行的命令
示例响应格式:
在日志中发现5个错误:
关键问题:
1. database.py:123中Neo4j连接失败(链路:abc123)
- 在不同操作中出现3次
- 根因:5秒后连接超时
2. settings.py:45中配置无效(链路:def456)
- 缺少必填参数'api_key'
进一步排查建议:
- 查看错误1详情:python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
- 查看所有Neo4j错误:python3 .claude/tools/utils/log_analyzer.py --file database.py --format markdownCommand Reference
命令参考
Basic Commands
基础命令
bash
undefinedbash
undefinedQuick summary (default)
快速摘要(默认)
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
Detailed markdown with AI analysis
带AI分析的详细markdown格式输出
python3 .claude/tools/utils/log_analyzer.py --format markdown
python3 .claude/tools/utils/log_analyzer.py --format markdown
JSON output for programmatic use
用于程序化调用的JSON输出
python3 .claude/tools/utils/log_analyzer.py --format json
undefinedpython3 .claude/tools/utils/log_analyzer.py --format json
undefinedFiltering Commands
过滤命令
bash
undefinedbash
undefinedView specific error details (get ID from summary)
查看特定错误详情(从摘要中获取ID)
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
View all errors in a trace
查看某一链路中的所有错误
python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown
python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown
Filter by file name
按文件名过滤
python3 .claude/tools/utils/log_analyzer.py --file database.py
python3 .claude/tools/utils/log_analyzer.py --file database.py
Combine filters
组合过滤条件
python3 .claude/tools/utils/log_analyzer.py --file database.py --trace abc123 --format markdown
undefinedpython3 .claude/tools/utils/log_analyzer.py --file database.py --trace abc123 --format markdown
undefinedPerformance Commands
性能优化命令
bash
undefinedbash
undefinedSkip AI analysis (faster, free)
跳过AI分析(更快、免费)
python3 .claude/tools/utils/log_analyzer.py --no-ai
python3 .claude/tools/utils/log_analyzer.py --no-ai
Save to file instead of stdout
保存到文件而非标准输出
python3 .claude/tools/utils/log_analyzer.py --format markdown --output report.md
python3 .claude/tools/utils/log_analyzer.py --format markdown --output report.md
Real-time log monitoring
实时日志监控
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log
Last 50 lines
最后50行日志
tail -n 50 {{LOG_DIR}}/{{LOG_FILE}}.log
tail -n 50 {{LOG_DIR}}/{{LOG_FILE}}.log
Follow and filter for errors
实时监控并过滤错误日志
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log | grep ERROR
undefinedtail -f {{LOG_DIR}}/{{LOG_FILE}}.log | grep ERROR
undefinedUnderstanding the Output
理解输出内容
Log Format
日志格式
OpenTelemetry format with trace/span IDs:
2025-10-16 14:32:15 - [trace:abc123 | span:def456] - module.name - ERROR - [file.py:123] - function() - Error messageKey fields:
- timestamp: When the log occurred
- trace: Unique ID for entire execution (groups related logs)
- span: Unique ID for operation within trace
- module: Python module path
- level: ERROR, WARNING, INFO, DEBUG
- file:line: Source location
- function: Function name
- message: Log message
带有trace/span ID的OpenTelemetry格式:
2025-10-16 14:32:15 - [trace:abc123 | span:def456] - module.name - ERROR - [file.py:123] - function() - Error message关键字段:
- timestamp: 日志生成时间
- trace: 整个执行的唯一ID(用于关联相关日志)
- span: 链路中单个操作的唯一ID
- module: Python模块路径
- level: ERROR、WARNING、INFO、DEBUG
- file:line: 代码位置
- function: 函数名
- message: 日志消息
Summary Table
摘要表格
ID | Trace | File:Line | Function | Message
-----|------------|------------------------|-------------|------------------
1 | abc123 | database.py:45 | connect | Connection failed
2 | abc123 | database.py:67 | query | No active connectionColumns explained:
- ID: Error number (use with --error-id)
- Trace: First 8 chars of trace ID (same trace = related errors)
- File:Line: Where error occurred
- Function: Function that logged error
- Message: Preview (truncated, use --error-id for full message)
ID | Trace | File:Line | Function | Message
-----|------------|------------------------|-------------|------------------
1 | abc123 | database.py:45 | connect | Connection failed
2 | abc123 | database.py:67 | query | No active connection列说明:
- ID: 错误编号(与--error-id参数配合使用)
- Trace: 链路ID的前8个字符(相同链路ID表示错误相关)
- File:Line: 错误发生位置
- Function: 记录错误的函数
- Message: 消息预览(已截断,使用--error-id查看完整消息)
Trace Execution
执行链路
A trace represents one complete execution flow:
- Starts at entry point (e.g., MCP tool call)
- Includes all operations in that execution
- Ends when execution completes
- Multiple errors in same trace are related
Example trace flow:
Trace abc123:
1. automatic_indexing() [deprecated] called
2. connect_to_neo4j() called
3. ERROR: Connection timeout
4. retry_connection() called
5. ERROR: Retry failedAll errors share trace ID , so they're related failures.
abc123链路代表一次完整的执行流:
- 从入口点开始(例如MCP工具调用)
- 包含该执行中的所有操作
- 执行完成后结束
- 同一链路中的多个错误是相关的
示例链路流:
Trace abc123:
1. 调用了automatic_indexing() [已弃用]
2. 调用了connect_to_neo4j()
3. 错误:连接超时
4. 调用了retry_connection()
5. 错误:重试失败所有错误共享链路ID ,因此它们是相关的故障。
abc123Best Practices
最佳实践
1. Always Start with Summary
1. 始终从摘要模式开始
Don't jump straight to markdown or detail view. Get the overview first:
bash
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log不要直接跳转到markdown或详情视图。先获取整体概览:
bash
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log2. Use Error IDs for Deep Dives
2. 使用错误ID进行深度排查
Summary gives you error IDs. Use them:
bash
undefined摘要会提供错误ID,合理使用它们:
bash
undefinedFrom summary, identify interesting error (e.g., ID 3)
从摘要中找到感兴趣的错误(例如ID 3)
python3 .claude/tools/utils/log_analyzer.py --error-id 3 --format markdown
undefinedpython3 .claude/tools/utils/log_analyzer.py --error-id 3 --format markdown
undefined3. Group Related Errors by Trace
3. 按链路ID分组相关错误
If multiple errors share a trace ID in summary, view the whole trace:
bash
python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown如果摘要中多个错误共享同一链路ID,查看整个链路:
bash
python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown4. Use --no-ai for Quick Checks
4. 使用--no-ai进行快速检查
During active development, skip AI to save time and cost:
bash
python3 .claude/tools/utils/log_analyzer.py --no-ai在开发过程中,跳过AI分析以节省时间和成本:
bash
python3 .claude/tools/utils/log_analyzer.py --no-ai5. Combine with Real-Time Monitoring
5. 结合实时监控
When running tests or operations:
bash
undefined运行测试或操作时:
bash
undefinedTerminal 1: Run operation
终端1:运行操作
uv run pytest tests/integration/
uv run pytest tests/integration/
Terminal 2: Watch logs
终端2:查看日志
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log
undefinedtail -f {{LOG_DIR}}/{{LOG_FILE}}.log
undefined6. Save Reports for Later
6. 保存报告供后续使用
Generate markdown reports for documentation:
bash
python3 .claude/tools/utils/log_analyzer.py --format markdown --output reports/$(date +%Y-%m-%d)-errors.md生成markdown报告用于文档记录:
bash
python3 .claude/tools/utils/log_analyzer.py --format markdown --output reports/$(date +%Y-%m-%d)-errors.md7. Check Logs BEFORE Making Changes
7. 先检查日志再做修改
When user reports an issue, check logs first:
- Run summary to see current state
- Identify error patterns
- Then make code changes
- Re-run summary to verify fix
当用户报告问题时,先检查日志:
- 运行摘要模式查看当前状态
- 识别错误模式
- 然后进行代码修改
- 重新运行摘要模式验证修复效果
Common Scenarios
常见场景
Scenario 1: User says "check the logs"
场景1:用户说“查看日志”
bash
undefinedbash
undefinedStep 1: Run summary
步骤1:运行摘要模式
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
Step 2: Report findings
步骤2:报告发现
"Found 3 errors. Most critical is Neo4j connection failure..."
“发现3个错误。最关键的是Neo4j连接失败...”
Step 3: Offer deeper analysis
步骤3:提供深度分析选项
"Would you like me to investigate error #1 in detail?"
“需要我详细排查错误#1吗?”
undefinedundefinedScenario 2: User says "why did X fail?"
场景2:用户说“为什么X失败了?”
bash
undefinedbash
undefinedStep 1: Run summary to find X-related errors
步骤1:运行摘要模式查找与X相关的错误
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
Step 2: Get error ID for X
步骤2:获取X的错误ID
Step 3: Analyze that error
步骤3:分析该错误
python3 .claude/tools/utils/log_analyzer.py --error-id 2 --format markdown
python3 .claude/tools/utils/log_analyzer.py --error-id 2 --format markdown
Step 4: Explain root cause from call chain and AI analysis
步骤4:根据调用链和AI分析解释根因
undefinedundefinedScenario 3: Debugging test failures
场景3:调试测试失败
bash
undefinedbash
undefinedStep 1: Monitor logs during test
步骤1:测试期间监控日志
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log &
tail -f {{LOG_DIR}}/{{LOG_FILE}}.log &
Step 2: Run tests
步骤2:运行测试
uv run pytest tests/integration/test_xyz.py -v
uv run pytest tests/integration/test_xyz.py -v
Step 3: Analyze results
步骤3:分析结果
python3 .claude/tools/utils/log_analyzer.py --file test_xyz.py --format markdown
undefinedpython3 .claude/tools/utils/log_analyzer.py --file test_xyz.py --format markdown
undefinedScenario 4: Production issue investigation
场景4:生产环境问题排查
bash
undefinedbash
undefinedStep 1: Get full report with AI analysis
步骤1:生成带AI分析的完整报告
python3 .claude/tools/utils/log_analyzer.py --format markdown --output incident-report.md
python3 .claude/tools/utils/log_analyzer.py --format markdown --output incident-report.md
Step 2: Review AI Root Causes and Fixes
步骤2:查看AI根因分析和修复建议
Step 3: Identify trace IDs for critical errors
步骤3:识别关键错误的链路ID
Step 4: Drill into specific traces
步骤4:深入分析特定链路
python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown
undefinedpython3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown
undefinedUsage Examples
使用示例
Example 1: Quick Health Check
示例1:快速健康检查
bash
undefinedbash
undefinedUser says: "Check the logs"
用户说:“查看日志”
Run summary mode:
运行摘要模式:
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
Output: Summary table showing 3 errors
输出:显示3个错误的摘要表格
Action: Report findings, offer detailed investigation
操作:报告发现,提供详细分析选项
undefinedundefinedExample 2: Detailed Error Investigation
示例2:详细错误排查
bash
undefinedbash
undefinedUser says: "What caused error #1?"
用户说:“错误#1是什么原因导致的?”
From summary, get error ID, then:
从摘要中获取错误ID,然后运行:
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
Output: Full error details with call chain, stack trace, AI analysis
输出:包含调用链、堆栈跟踪、AI分析的完整错误详情
Action: Explain root cause, suggest fix with file:line references
操作:解释根因,提供带文件:行号参考的修复建议
undefinedundefinedExample 3: Trace-Based Debugging
示例3:基于链路的调试
bash
undefinedbash
undefinedUser says: "Show me everything that happened in that execution"
用户说:“展示那次执行的所有情况”
From summary, get trace ID, then:
从摘要中获取链路ID,然后运行:
python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown
python3 .claude/tools/utils/log_analyzer.py --trace abc123def456 --format markdown
Output: All errors in trace with execution context
输出:链路中的所有错误及执行上下文
Action: Explain execution flow, identify failure cascade
操作:解释执行流,识别故障连锁反应
undefinedundefinedExpected Outcomes
预期结果
Successful Analysis
成功分析
✓ Log Analysis Summary
Total entries: 1,523
Total traces: 12
Errors found: 7
Files affected: 4
Error table with IDs, traces, files, functions, messages
Quick commands for drill-down investigation✓ 日志分析摘要
总条目数:1,523
总链路数:12
发现错误数:7
受影响文件数:4
包含ID、链路、文件、函数、消息的错误表格
用于深度排查的快速命令Detailed Investigation
详细排查
✓ Markdown report with:
- AI-powered root cause analysis
- Complete execution traces
- Call chains leading to errors
- Related errors grouped by trace
- File:line references for fixes
- Stack traces (if available)✓ Markdown报告包含:
- AI驱动的根因分析
- 完整的执行链路
- 导致错误的调用链
- 按链路分组的相关错误
- 用于修复的文件:行号参考
- 堆栈跟踪(如果有)Integration Points
集成点
With CLAUDE.md Convention
与CLAUDE.md规范的集成
This skill implements the CLAUDE.md logging convention:
CLAUDE.md says:
When users say "Logs", analyzefrom your project{{LOG_DIR}}/{{LOG_FILE}}.log
This skill provides:
- Automatic log file location
- Multiple analysis modes
- AI-powered diagnosis
- Trace reconstruction
- Interactive debugging workflow
Recommended workflow:
- User says "Logs" → Run this skill
- Start with summary mode
- Identify critical errors
- Drill down with --error-id or --trace
- Report findings with file:line references
该技能实现了CLAUDE.md日志规范:
CLAUDE.md规定:
当用户说“Logs”时,分析项目中的{{LOG_DIR}}/{{LOG_FILE}}.log
该技能提供:
- 自动定位日志文件
- 多种分析模式
- AI驱动的诊断
- 链路重建
- 交互式调试工作流
推荐工作流:
- 用户说“Logs” → 运行该技能
- 从摘要模式开始
- 识别关键错误
- 使用--error-id或--trace进行深度分析
- 提供带文件:行号参考的分析结果
With Debug Workflows
与调试工作流的集成
Integrates with test debugging and issue investigation:
Test fails → Check logs → Identify error → Fix code → Re-run → Verify与测试调试和问题排查工作流集成:
测试失败 → 检查日志 → 识别错误 → 修复代码 → 重新运行 → 验证With Quality Gates
与质量门禁的集成
Use in pre-commit checks and CI/CD:
bash
undefined在提交前检查和CI/CD中使用:
bash
undefinedBefore committing, check for new errors
提交前检查是否有新错误
python3 .claude/tools/utils/log_analyzer.py --no-ai
python3 .claude/tools/utils/log_analyzer.py --no-ai
If errors found, investigate and fix
如果发现错误,排查并修复
undefinedundefinedExpected Benefits
预期收益
| Metric | Without Skill | With Skill | Improvement |
|---|---|---|---|
| Error Diagnosis Time | 15-30 min (manual parsing) | 2-5 min (automated) | 6-10x faster |
| Root Cause Accuracy | ~60% (assumptions) | ~90% (AI analysis) | 50% improvement |
| Trace Reconstruction | Manual, error-prone | Automatic | 100% coverage |
| Context Awareness | Limited (single logs) | Full (trace grouping) | Complete context |
| Report Generation | Manual markdown | Automated | Instant reports |
| 指标 | 无该技能时 | 使用该技能后 | 提升效果 |
|---|---|---|---|
| 错误诊断时间 | 15-30分钟(手动解析) | 2-5分钟(自动化) | 快6-10倍 |
| 根因准确性 | ~60%(基于假设) | ~90%(AI分析) | 提升50% |
| 链路重建 | 手动、易出错 | 自动化 | 100%覆盖 |
| 上下文感知 | 有限(单条日志) | 完整(链路分组) | 完整上下文 |
| 报告生成 | 手动编写markdown | 自动化 | 即时生成 |
Success Metrics
成功指标
After implementing this skill:
- <5 second analysis time - Fast log parsing and error extraction
- 100% trace reconstruction - All related errors grouped by trace
- 90% root cause accuracy - AI-powered diagnosis with high confidence
- Markdown/JSON export - Automated report generation
- Zero manual log parsing - Automated OpenTelemetry parsing
实施该技能后:
- 分析时间<5秒 - 快速日志解析和错误提取
- 100%链路重建 - 所有相关错误按链路分组
- 90%根因准确性 - AI驱动的高可信度诊断
- Markdown/JSON导出 - 自动化报告生成
- 零手动日志解析 - 自动化OpenTelemetry解析
Validation Process
验证流程
Step 1: Quick Health Check
步骤1:快速健康检查
bash
undefinedbash
undefinedSummary mode for initial assessment
摘要模式用于初步评估
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
undefinedpython3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
undefinedStep 2: Identify Critical Errors
步骤2:识别关键错误
bash
undefinedbash
undefinedGet error IDs from summary
从摘要中获取错误ID
Focus on errors in critical paths
重点关注关键路径中的错误
undefinedundefinedStep 3: Deep Dive Analysis
步骤3:深度分析
bash
undefinedbash
undefinedUse error IDs for detailed investigation
使用错误ID进行详细排查
python3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
undefinedpython3 .claude/tools/utils/log_analyzer.py --error-id 1 --format markdown
undefinedStep 4: Trace Reconstruction
步骤4:链路重建
bash
undefinedbash
undefinedGroup related errors by trace
按链路分组相关错误
python3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown
undefinedpython3 .claude/tools/utils/log_analyzer.py --trace TRACE_ID --format markdown
undefinedStep 5: Report Findings
步骤5:报告发现
bash
undefinedbash
undefinedGenerate markdown report for documentation
生成markdown报告用于文档记录
python3 .claude/tools/utils/log_analyzer.py --format markdown --output report.md
undefinedpython3 .claude/tools/utils/log_analyzer.py --format markdown --output report.md
undefinedRed Flags to Avoid
需要避免的误区
❌ DON'T:
- Parse logs manually with grep/sed (use analyzer tool)
- Ignore trace IDs (they group related errors)
- Look at single error in isolation (check full trace)
- Skip AI analysis when investigating production issues
- Assume errors are unrelated (verify with trace grouping)
- Make changes without seeing actual log output
✅ DO:
- Always start with summary mode
- Use error IDs for detailed investigation
- Group errors by trace ID
- Use --no-ai for quick checks, AI for production debugging
- Generate markdown reports for documentation
- Combine with real-time monitoring (tail -f)
❌ 不要:
- 使用grep/sed手动解析日志(使用分析器工具)
- 忽略链路ID(它们用于关联相关错误)
- 孤立地查看单个错误(检查完整链路)
- 排查生产环境问题时跳过AI分析
- 假设错误无关(使用链路分组验证)
- 未查看实际日志输出就进行修改
✅ 要:
- 始终从摘要模式开始
- 使用错误ID进行详细排查
- 按链路ID分组错误
- 快速检查使用--no-ai,生产环境调试使用AI
- 生成markdown报告用于文档记录
- 结合实时监控(tail -f)
Troubleshooting
故障排除
Error: Log file not found
bash
undefined错误:日志文件未找到
bash
undefinedCheck if log file exists
检查日志文件是否存在
ls -lh {{LOG_DIR}}/{{LOG_FILE}}.log
ls -lh {{LOG_DIR}}/{{LOG_FILE}}.log
If missing, run the MCP server first to generate logs
如果不存在,先运行MCP服务器生成日志
./run-mcp-server.sh
**Error: Permission denied**
```bash./run-mcp-server.sh
**错误:权限被拒绝**
```bashMake script executable
为脚本添加执行权限
chmod +x .claude/tools/utils/log_analyzer.py
**Error: Module not found (langchain_client)**
```bashchmod +x .claude/tools/utils/log_analyzer.py
**错误:模块未找到(langchain_client)**
```bashEnsure you're in project root
确保处于项目根目录
pwd # Should be your project root directory
pwd # 应该是你的项目根目录
Script adds parent dir to path automatically
脚本会自动将父目录添加到路径中
**AI analysis fails or skipped**
```bash
**AI分析失败或被跳过**
```bashUse --no-ai flag to skip AI analysis
使用--no-ai标志跳过AI分析
python3 .claude/tools/utils/log_analyzer.py --no-ai
python3 .claude/tools/utils/log_analyzer.py --no-ai
Or check LangChain client configuration
或检查LangChain客户端配置
cat .claude/tools/langchain_client.py
**Empty log file**
```bashcat .claude/tools/langchain_client.py
**日志文件为空**
```bashCheck log file size
检查日志文件大小
ls -lh {{LOG_DIR}}/{{LOG_FILE}}.log
ls -lh {{LOG_DIR}}/{{LOG_FILE}}.log
If empty, server hasn't run yet or logging not configured
如果为空,说明服务器尚未运行或未配置日志
Run server to generate logs
运行服务器生成日志
./run-mcp-server.sh
undefined./run-mcp-server.sh
undefinedAdvanced Usage
进阶用法
Custom Analysis Scripts
自定义分析脚本
See scripts/example_usage.py for advanced patterns:
The example script demonstrates:
- Basic log analysis with default settings and LangChain client
- Custom LLM client with budget tracking (monthly/daily limits)
- Parsing only (no LLM) - faster, free error extraction
- Filtering specific errors - finding Neo4j-related issues
- Trace-based analysis - grouping errors by execution trace
python
undefined查看scripts/example_usage.py获取进阶模式:
示例脚本演示:
- 基础日志分析 - 使用默认设置和LangChain客户端
- 自定义LLM客户端 - 带预算跟踪(每月/每日限额)
- 仅解析(无LLM) - 更快、免费的错误提取
- 过滤特定错误 - 查找与Neo4j相关的问题
- 基于链路的分析 - 按执行链路分组错误
python
undefinedExample: Parse without LLM (from scripts/example_usage.py)
示例:不使用LLM进行解析(来自scripts/example_usage.py)
from utils.log_analyzer import OTelLogParser, ErrorExtractor
parser = OTelLogParser()
entries = parser.parse_file(Path("{{LOG_DIR}}/{{LOG_FILE}}.log"))
extractor = ErrorExtractor(context_lines=3)
errors = extractor.extract_errors(entries)
from utils.log_analyzer import OTelLogParser, ErrorExtractor
parser = OTelLogParser()
entries = parser.parse_file(Path("{{LOG_DIR}}/{{LOG_FILE}}.log"))
extractor = ErrorExtractor(context_lines=3)
errors = extractor.extract_errors(entries)
Filter specific patterns
过滤特定模式
neo4j_errors = [e for e in errors if "Neo4j" in e.error.message]
**Run the examples:**
```bash
python3 .claude/skills/observability-analyze-logs/scripts/example_usage.pyneo4j_errors = [e for e in errors if "Neo4j" in e.error.message]
**运行示例:**
```bash
python3 .claude/skills/observability-analyze-logs/scripts/example_usage.pyProgrammatic Access
程序化访问
bash
undefinedbash
undefinedGet JSON output for scripts
获取JSON输出供脚本使用
python3 .claude/tools/utils/log_analyzer.py --format json > errors.json
python3 .claude/tools/utils/log_analyzer.py --format json > errors.json
Parse with jq
使用jq解析
python3 .claude/tools/utils/log_analyzer.py --format json | jq '.errors[] | select(.file == "database.py")'
undefinedpython3 .claude/tools/utils/log_analyzer.py --format json | jq '.errors[] | select(.file == "database.py")'
undefinedBudget Tracking
预算跟踪
The analyzer uses LangChain client with budget tracking. See example_usage.py for details on monitoring costs.
分析器使用带预算跟踪的LangChain客户端。查看example_usage.py了解成本监控详情。
Requirements
要求
Tools needed:
- Python 3.12+ (already in project)
- Log analyzer tool: (bundled)
.claude/tools/utils/log_analyzer.py - LangChain client: (bundled)
.claude/tools/langchain_client.py
Log file:
- Default location: (customize for your project)
{{LOG_DIR}}/{{LOG_FILE}}.log - Generated by running your project's server startup script (e.g., )
./run-server.sh
Dependencies:
- Standard library modules (no additional installation needed)
- LangChain client uses project's existing dependencies
Verification:
bash
undefined所需工具:
- Python 3.12+(已包含在项目中)
- 日志分析器工具:(已捆绑)
.claude/tools/utils/log_analyzer.py - LangChain客户端:(已捆绑)
.claude/tools/langchain_client.py
日志文件:
- 默认位置:(可根据项目自定义)
{{LOG_DIR}}/{{LOG_FILE}}.log - 运行项目的服务器启动脚本生成(例如)
./run-server.sh
依赖项:
- 标准库模块(无需额外安装)
- LangChain客户端使用项目现有依赖
验证:
bash
undefinedVerify log analyzer exists
验证日志分析器是否存在
ls .claude/tools/utils/log_analyzer.py
ls .claude/tools/utils/log_analyzer.py
Verify log file exists (or generate by running server)
验证日志文件是否存在(或运行服务器生成)
ls {{LOG_DIR}}/{{LOG_FILE}}.log || ./run-server.sh
ls {{LOG_DIR}}/{{LOG_FILE}}.log || ./run-server.sh
Test the analyzer
测试分析器
python3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
undefinedpython3 .claude/tools/utils/log_analyzer.py {{LOG_DIR}}/{{LOG_FILE}}.log
undefinedSupporting Files
支持文件
This skill follows progressive disclosure with supporting files for depth:
-
references/reference.md - Technical documentation:
- OpenTelemetry log format specification
- Data models (LogEntry, TraceExecution, ErrorContext)
- Parsing logic and trace reconstruction algorithm
- AI analysis configuration and cost tracking
- Performance characteristics and scaling limits
-
references/log-rotation-guide.md - Log rotation configuration and management
-
references/log-rotation-analysis.md - Log rotation analysis and patterns
-
references/logging-and-rotation-guide.md - Complete logging and rotation guide
-
templates/response-template.md - Response formatting:
- 8 response templates for different contexts
- Summary, detailed error, trace analysis, file-specific
- AI-powered pattern analysis, real-time monitoring
- No errors found, before/after comparison
该技能采用渐进式披露,提供深度支持文件:
-
references/reference.md - 技术文档:
- OpenTelemetry日志格式规范
- 数据模型(LogEntry、TraceExecution、ErrorContext)
- 解析逻辑和链路重建算法
- AI分析配置和成本跟踪
- 性能特征和扩展限制
-
references/log-rotation-guide.md - 日志轮转配置和管理
-
references/log-rotation-analysis.md - 日志轮转分析和模式
-
references/logging-and-rotation-guide.md - 完整的日志和轮转指南
-
templates/response-template.md - 响应格式:
- 适用于不同场景的8种响应模板
- 摘要、详细错误、链路分析、特定文件
- AI驱动的模式分析、实时监控
- 未发现错误、前后对比
Expected Outcomes
预期结果
Successful Analysis
成功分析
Summary mode output:
✓ Log Analysis Summary
Total entries: 1,523
Total traces: 12
Errors found: 7
Files affected: 4
Error table with IDs, traces, files, functions, messages
Quick commands for drill-down investigationDetailed mode output:
✓ Markdown report with:
- AI-powered root cause analysis
- Complete execution traces
- Call chains leading to errors
- Related errors grouped by trace
- File:line references for fixes
- Stack traces (if available)摘要模式输出:
✓ 日志分析摘要
总条目数:1,523
总链路数:12
发现错误数:7
受影响文件数:4
包含ID、链路、文件、函数、消息的错误表格
用于深度排查的快速命令详细模式输出:
✓ Markdown报告包含:
- AI驱动的根因分析
- 完整的执行链路
- 导致错误的调用链
- 按链路分组的相关错误
- 用于修复的文件:行号参考
- 堆栈跟踪(如果有)Common Outcomes
常见结果
- No errors found: "✓ HEALTHY - Analyzed 1,234 entries, no errors found"
- Configuration issues: Clear error with fix steps (e.g., "Start Neo4j: ")
neo4j start - Test failures: Trace showing execution flow to failure with root cause
- Production issues: AI analysis with priority ranking and systemic issues
- 未发现错误:"✓ 健康状态 - 分析了1,234条条目,未发现错误"
- 配置问题:清晰的错误信息及修复步骤(例如"启动Neo4j:")
neo4j start - 测试失败:显示执行流到故障点的链路及根因
- 生产环境问题:带优先级排序和系统性问题分析的AI结果
Related Documentation
相关文档
- Tool implementation:
.claude/tools/utils/log_analyzer.py - Usage examples:
.claude/tools/utils/example_usage.py - LangChain client:
.claude/tools/langchain_client.py - CLAUDE.md conventions: (section on "Logs")
CLAUDE.md - Log file location:
{{LOG_DIR}}/{{LOG_FILE}}.log
- 工具实现:
.claude/tools/utils/log_analyzer.py - 使用示例:
.claude/tools/utils/example_usage.py - LangChain客户端:
.claude/tools/langchain_client.py - CLAUDE.md规范: (“Logs”部分)
CLAUDE.md - 日志文件位置:
{{LOG_DIR}}/{{LOG_FILE}}.log
Quick Reference Card
快速参考卡
| Goal | Command |
|---|---|
| Health check | |
| Detailed error | |
| Trace analysis | |
| File-specific | |
| Fast & free | |
| Real-time | |
| Save report | |
| JSON output | |
Remember: Always start with summary mode, then drill down using error IDs or trace IDs based on what you find.
| 目标 | 命令 |
|---|---|
| 健康检查 | |
| 详细错误分析 | |
| 链路分析 | |
| 特定文件分析 | |
| 快速免费检查 | |
| 实时监控 | |
| 保存报告 | |
| JSON输出 | |
注意: 始终从摘要模式开始,然后根据发现的内容使用错误ID或链路ID进行深度分析。