observability
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseObservability — OpenTelemetry & Distributed Tracing
可观测性 — OpenTelemetry & 分布式追踪
Instrumentation moderne avec OpenTelemetry pour métriques, traces et logs structurés.
使用OpenTelemetry进行现代化埋点,以实现指标、追踪和结构化日志。
Piliers de l'Observabilité
可观测性三大支柱
| Pilier | Technologies | Métriques clés |
|---|---|---|
| Metrics | Prometheus, Grafana, Datadog | RED (Rate, Errors, Duration), USE (Utilization, Saturation, Errors) |
| Traces | OpenTelemetry, Jaeger, Tempo | P95 latency, span duration, error rate |
| Logs | Loki, ElasticSearch, Datadog | Structured JSON, correlation IDs |
| 支柱 | 技术栈 | 关键指标 |
|---|---|---|
| 指标(Metrics) | Prometheus, Grafana, Datadog | RED(请求率、错误率、耗时)、USE(使用率、饱和度、错误率) |
| 追踪(Traces) | OpenTelemetry, Jaeger, Tempo | P95延迟、Span时长、错误率 |
| 日志(Logs) | Loki, ElasticSearch, Datadog | 结构化JSON、关联ID |
OpenTelemetry (OTel) Stack
OpenTelemetry (OTel) 技术栈
javascript
// Node.js — Auto-instrumentation
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter(),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();javascript
// Node.js — Auto-instrumentation
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter(),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();Golden Signals (Google SRE)
黄金信号(Google SRE)
| Signal | Description | Seuil typique |
|---|---|---|
| Latency | P50, P95, P99 response time | P95 < 200ms |
| Traffic | Requests per second | Baseline + alerting |
| Errors | Error rate (5xx, exceptions) | < 0.1% |
| Saturation | CPU, Memory, Disk | < 80% sustained |
| 信号 | 描述 | 典型阈值 |
|---|---|---|
| 延迟(Latency) | P50、P95、P99响应时间 | P95 < 200ms |
| 流量(Traffic) | 每秒请求数 | 基线值 + 告警机制 |
| 错误(Errors) | 错误率(5xx状态码、异常) | < 0.1% |
| 饱和度(Saturation) | CPU、内存、磁盘使用率 | < 80% 持续负载 |
Structured Logging (JSON)
结构化日志(JSON)
json
{
"timestamp": "2026-04-17T10:30:00Z",
"level": "error",
"message": "Payment processing failed",
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"span_id": "00f067aa0ba902b7",
"service.name": "payment-api",
"error.type": "PaymentGatewayTimeout"
}json
{
"timestamp": "2026-04-17T10:30:00Z",
"level": "error",
"message": "Payment processing failed",
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"span_id": "00f067aa0ba902b7",
"service.name": "payment-api",
"error.type": "PaymentGatewayTimeout"
}SLI / SLO / SLA
SLI / SLO / SLA
| Concept | Exemple |
|---|---|
| SLI (Indicator) | 99.5% requests < 200ms |
| SLO (Objective) | 99.9% uptime mensuel |
| SLA (Agreement) | 99.95% uptime + pénalités |
Pour instrumentation détaillée par stack : invoquer
@observability-engineer| 概念 | 示例 |
|---|---|
| SLI(服务水平指标) | 99.5% 的请求响应时间 < 200ms |
| SLO(服务水平目标) | 月度99.9% 在线率 |
| SLA(服务水平协议) | 99.95% 在线率 + 违约赔偿 |
如需针对不同技术栈的详细埋点方案:请调用
@observability-engineer