observability

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Observability — OpenTelemetry & Distributed Tracing

可观测性 — OpenTelemetry & 分布式追踪

Instrumentation moderne avec OpenTelemetry pour métriques, traces et logs structurés.
使用OpenTelemetry进行现代化埋点,以实现指标、追踪和结构化日志。

Piliers de l'Observabilité

可观测性三大支柱

PilierTechnologiesMétriques clés
MetricsPrometheus, Grafana, DatadogRED (Rate, Errors, Duration), USE (Utilization, Saturation, Errors)
TracesOpenTelemetry, Jaeger, TempoP95 latency, span duration, error rate
LogsLoki, ElasticSearch, DatadogStructured JSON, correlation IDs
支柱技术栈关键指标
指标(Metrics)Prometheus, Grafana, DatadogRED(请求率、错误率、耗时)、USE(使用率、饱和度、错误率)
追踪(Traces)OpenTelemetry, Jaeger, TempoP95延迟、Span时长、错误率
日志(Logs)Loki, ElasticSearch, Datadog结构化JSON、关联ID

OpenTelemetry (OTel) Stack

OpenTelemetry (OTel) 技术栈

javascript
// Node.js — Auto-instrumentation
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter(),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();
javascript
// Node.js — Auto-instrumentation
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter(),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Golden Signals (Google SRE)

黄金信号(Google SRE)

SignalDescriptionSeuil typique
LatencyP50, P95, P99 response timeP95 < 200ms
TrafficRequests per secondBaseline + alerting
ErrorsError rate (5xx, exceptions)< 0.1%
SaturationCPU, Memory, Disk< 80% sustained
信号描述典型阈值
延迟(Latency)P50、P95、P99响应时间P95 < 200ms
流量(Traffic)每秒请求数基线值 + 告警机制
错误(Errors)错误率(5xx状态码、异常)< 0.1%
饱和度(Saturation)CPU、内存、磁盘使用率< 80% 持续负载

Structured Logging (JSON)

结构化日志(JSON)

json
{
  "timestamp": "2026-04-17T10:30:00Z",
  "level": "error",
  "message": "Payment processing failed",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "span_id": "00f067aa0ba902b7",
  "service.name": "payment-api",
  "error.type": "PaymentGatewayTimeout"
}
json
{
  "timestamp": "2026-04-17T10:30:00Z",
  "level": "error",
  "message": "Payment processing failed",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "span_id": "00f067aa0ba902b7",
  "service.name": "payment-api",
  "error.type": "PaymentGatewayTimeout"
}

SLI / SLO / SLA

SLI / SLO / SLA

ConceptExemple
SLI (Indicator)99.5% requests < 200ms
SLO (Objective)99.9% uptime mensuel
SLA (Agreement)99.95% uptime + pénalités

Pour instrumentation détaillée par stack : invoquer
@observability-engineer
概念示例
SLI(服务水平指标)99.5% 的请求响应时间 < 200ms
SLO(服务水平目标)月度99.9% 在线率
SLA(服务水平协议)99.95% 在线率 + 违约赔偿

如需针对不同技术栈的详细埋点方案:请调用
@observability-engineer