observability

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Observability

可观测性

Implement the three pillars of observability: logs, metrics, and traces.
实现可观测性的三大支柱:日志、指标和追踪。

The Three Pillars

三大支柱

PillarPurposeKey Question
LogsDiscrete events with contextWhat happened?
MetricsAggregated measurementsHow much/many?
TracesRequest flow across servicesWhere did time go?
支柱用途核心问题
日志带上下文的离散事件发生了什么?
指标聚合度量数据有多少/多频繁?
追踪跨服务的请求流时间消耗在哪里?

Quick Pick

快速选择

  • Debug specific request? → Logs + Traces
  • Alert on thresholds? → Metrics
  • Understand system health? → All three
  • Starting from zero? → Logs first, then metrics, then traces
  • 调试特定请求? → 日志 + 追踪
  • 基于阈值告警? → 指标
  • 了解系统健康状况? → 三者结合
  • 从零开始? → 先日志,再指标,最后追踪

Key Principles

核心原则

  • Use structured logging (JSON) with correlation IDs across all services
  • Instrument the four golden signals: latency, traffic, errors, saturation
  • Define SLIs/SLOs before building dashboards or alerts
  • Alert on symptoms (user impact), not causes (CPU usage)
  • 使用带关联ID的结构化日志(JSON格式)覆盖所有服务
  • 埋点四大黄金信号:延迟、流量、错误、饱和度
  • 在构建仪表盘或告警前先定义SLI/SLO
  • 针对症状(用户影响)告警,而非原因(CPU使用率)

Quick Start Checklist

快速入门清单

  1. Set up structured logger (Pino recommended for Node.js)
  2. Add request correlation IDs (middleware)
  3. Instrument key metrics (RED: Rate, Errors, Duration)
  4. Configure distributed tracing (OpenTelemetry)
  5. Create dashboards for golden signals
  6. Set up alerts with appropriate severity levels
  1. 搭建结构化日志记录器(Node.js推荐Pino)
  2. 添加请求关联ID(中间件)
  3. 埋点关键指标(RED:请求速率、错误数、持续时间)
  4. 配置分布式追踪(OpenTelemetry)
  5. 为黄金信号创建仪表盘
  6. 设置对应严重级别的告警

References

参考资料

ReferenceDescription
logging-patterns.mdStructured logging, log levels, Pino/Winston setup
metrics-guide.mdPrometheus, counters/gauges/histograms, golden signals
tracing-basics.mdOpenTelemetry, distributed tracing, span design
alerting-guide.mdAlert design, SLIs/SLOs, severity levels, dashboards
参考文档描述
logging-patterns.md结构化日志、日志级别、Pino/Winston配置
metrics-guide.mdPrometheus、计数器/仪表盘/直方图、黄金信号
tracing-basics.mdOpenTelemetry、分布式追踪、Span设计
alerting-guide.md告警设计、SLI/SLO、严重级别、仪表盘