reading-logs

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Reading Logs

日志阅读

IRON LAW: Filter first, then read. Never open a large log file without narrowing it first.
铁律: 先过滤,再阅读。绝不要在未缩小范围的情况下打开大型日志文件。

Core Principles

核心原则

  1. Filter first - Search/filter before reading
  2. Iterative narrowing - Start broad (severity), refine with patterns/time
  3. Small context windows - Fetch 5-10 lines around matches, not entire files
  4. Summaries over dumps - Present findings concisely, not raw output
  1. 先过滤 - 阅读前先进行搜索/过滤
  2. 迭代缩小范围 - 从宽泛条件(如日志级别)开始,通过模式/时间逐步细化
  3. 小上下文窗口 - 获取匹配内容前后5-10行,而非整个文件
  4. 优先摘要而非完整转储 - 简洁呈现发现,不要输出原始内容

Tool Strategy

工具策略

1. Find Logs (Glob)

1. 查找日志(Glob模式)

bash
**/*.log
**/logs/**
**/*.log.*  # Rotated logs
bash
**/*.log
**/logs/**
**/*.log.*  # 轮转日志

2. Filter with Grep

2. 使用Grep过滤

bash
undefined
bash
undefined

Severity search

按日志级别搜索

grep -Ei "error|warn" app.log
grep -Ei "error|warn" app.log

Exclude noise

排除无关内容

grep -i "ERROR" app.log | grep -v "known-benign"
grep -i "ERROR" app.log | grep -v "known-benign"

Context around matches

匹配内容的上下文

grep -C 5 "ERROR" app.log # 5 lines before/after
grep -C 5 "ERROR" app.log # 前后各5行

Time window

时间窗口过滤

grep "2025-12-04T11:" app.log | grep "ERROR"
grep "2025-12-04T11:" app.log | grep "ERROR"

Count occurrences

统计出现次数

grep -c "connection refused" app.log
undefined
grep -c "connection refused" app.log
undefined

3. Chain with Bash

3. 结合Bash命令链式处理

bash
undefined
bash
undefined

Recent only

仅查看最新内容

tail -n 2000 app.log | grep -Ei "error"
tail -n 2000 app.log | grep -Ei "error"

Top recurring

统计高频错误

grep -i "ERROR" app.log | sort | uniq -c | sort -nr | head -20
undefined
grep -i "ERROR" app.log | sort | uniq -c | sort -nr | head -20
undefined

4. Read Last

4. 最后再阅读

Only after narrowing with Grep. Use context flags (
-C
,
-A
,
-B
) to grab targeted chunks.
仅在通过Grep缩小范围后再阅读。使用上下文参数(
-C
,
-A
,
-B
)获取目标片段。

Investigation Workflows

调查工作流

Single Incident

单个事件排查

  1. Get time window, error text, correlation IDs
  2. Find logs covering that time (
    Glob
    )
  3. Time-window grep:
    grep "2025-12-04T11:" service.log | grep -i "timeout"
  4. Trace by ID:
    grep "req-abc123" *.log
  5. Expand context:
    grep -C 10 "req-abc123" app.log
  1. 获取时间窗口、错误文本、关联ID
  2. 查找覆盖该时间范围的日志(使用Glob模式)
  3. 按时间窗口过滤:
    grep "2025-12-04T11:" service.log | grep -i "timeout"
  4. 按ID追踪:
    grep "req-abc123" *.log
  5. 扩展上下文:
    grep -C 10 "req-abc123" app.log

Recurring Patterns

重复模式排查

  1. Filter by severity:
    grep -Ei "error|warn" app.log
  2. Group and count:
    grep -i "ERROR" app.log | sort | uniq -c | sort -nr | head
  3. Exclude known noise
  4. Drill into top patterns with context
  1. 按日志级别过滤:
    grep -Ei "error|warn" app.log
  2. 分组统计:
    grep -i "ERROR" app.log | sort | uniq -c | sort -nr | head
  3. 排除已知无关内容
  4. 针对高频模式查看上下文细节

Red Flags

警示信号

  • Opening >10MB file without filtering
  • Using Read before Grep
  • Dumping raw output without summarizing
  • Searching without time bounds on multi-day logs
  • 未过滤就打开大于10MB的日志文件
  • 先阅读再使用Grep过滤
  • 直接输出原始内容而不做摘要
  • 对多日日志搜索时未设置时间范围

Utility Scripts

实用脚本

For complex operations, use the scripts in
scripts/
:
bash
undefined
对于复杂操作,可使用
scripts/
目录下的脚本:
bash
undefined

Aggregate errors by frequency (normalizes timestamps/IDs)

按频率聚合错误(标准化时间戳/ID)

bash scripts/aggregate-errors.sh app.log "ERROR" 20
bash scripts/aggregate-errors.sh app.log "ERROR" 20

Extract and group stack traces by type

提取并按类型分组堆栈跟踪

bash scripts/extract-stack-traces.sh app.log "NullPointer"
bash scripts/extract-stack-traces.sh app.log "NullPointer"

Parse JSON logs with jq filter

使用jq过滤解析JSON格式日志

bash scripts/parse-json-logs.sh app.log 'select(.level == "error")'
bash scripts/parse-json-logs.sh app.log 'select(.level == "error")'

Show error distribution over time (hourly/minute buckets)

展示错误随时间的分布(按小时/分钟分组)

bash scripts/timeline.sh app.log "ERROR" hour
bash scripts/timeline.sh app.log "ERROR" hour

Trace a request ID across multiple log files

跨多个日志文件追踪请求ID

bash scripts/trace-request.sh req-abc123 logs/
bash scripts/trace-request.sh req-abc123 logs/

Find slow operations by duration

查找耗时较长的操作(阈值1000毫秒,显示前20条)

bash scripts/slow-requests.sh app.log 1000 20
undefined
bash scripts/slow-requests.sh app.log 1000 20
undefined

Output Format

输出格式

  1. State what you searched (files, patterns)
  2. Provide short snippets illustrating the issue
  3. Explain what likely happened and why
  4. Suggest next steps
  1. 说明搜索范围(文件、模式)
  2. 提供能说明问题的简短片段
  3. 解释可能的原因和发生过程
  4. 给出下一步建议