opentelemetry

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenTelemetry Implementation Guide

OpenTelemetry 实现指南

Overview

概述

OpenTelemetry (OTel) is a vendor-neutral observability framework for instrumenting, generating, collecting, and exporting telemetry data (traces, metrics, logs). This skill provides guidance for implementing OTEL in Kubernetes environments.
OpenTelemetry(OTel)是一个厂商中立的可观测性框架,用于埋点、生成、收集和导出遥测数据(链路追踪、指标、日志)。本指南提供了在Kubernetes环境中实现OTEL的相关指导。

Quick Start

快速开始

Deploy OTEL Collector on Kubernetes

在Kubernetes上部署OTEL Collector

bash
undefined
bash
undefined

Add Helm repo

添加Helm仓库

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts helm repo update
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts helm repo update

Install with basic config

使用基础配置安装

helm install otel-collector open-telemetry/opentelemetry-collector
--namespace monitoring --create-namespace
--set mode=daemonset
undefined
helm install otel-collector open-telemetry/opentelemetry-collector
--namespace monitoring --create-namespace
--set mode=daemonset
undefined

Send Test Data via OTLP

通过OTLP发送测试数据

bash
undefined
bash
undefined

gRPC endpoint: 4317, HTTP endpoint: 4318

gRPC端点: 4317, HTTP端点: 4318

curl -X POST http://otel-collector:4318/v1/traces
-H "Content-Type: application/json"
-d '{"resourceSpans":[]}'
undefined
curl -X POST http://otel-collector:4318/v1/traces
-H "Content-Type: application/json"
-d '{"resourceSpans":[]}'
undefined

Core Concepts

核心概念

Signals: Three types of telemetry data:
  • Traces: Distributed request flows across services
  • Metrics: Numerical measurements (counters, gauges, histograms)
  • Logs: Event records with structured/unstructured data
Collector Components:
  • Receivers: Accept data (OTLP, Prometheus, Jaeger, Zipkin)
  • Processors: Transform data (batch, memory_limiter, k8sattributes)
  • Exporters: Send data (prometheusremotewrite, loki, otlp)
  • Extensions: Add capabilities (health_check, pprof, zpages)
信号:三类遥测数据:
  • 链路追踪(Traces):跨服务的分布式请求流程
  • 指标(Metrics):数值型测量数据(计数器、仪表盘、直方图)
  • 日志(Logs):包含结构化/非结构化数据的事件记录
Collector组件
  • 接收器(Receivers):接收数据(OTLP、Prometheus、Jaeger、Zipkin)
  • 处理器(Processors):转换数据(批量处理、内存限制器、K8s属性提取)
  • 导出器(Exporters):发送数据(Prometheus远程写入、Loki、OTLP)
  • 扩展(Extensions):添加功能(健康检查、pprof、zpages)

Collector Configuration

Collector配置

Basic Pipeline Structure

基础流水线结构

yaml
config:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: ${env:MY_POD_IP}:4317
        http:
          endpoint: ${env:MY_POD_IP}:4318

  processors:
    batch:
      timeout: 10s
      send_batch_size: 1024
    memory_limiter:
      check_interval: 5s
      limit_percentage: 80
      spike_limit_percentage: 25

  exporters:
    prometheusremotewrite:
      endpoint: "http://prometheus:9090/api/v1/write"
    loki:
      endpoint: "http://loki:3100/loki/api/v1/push"

  service:
    pipelines:
      metrics:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [prometheusremotewrite]
      logs:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [loki]
      traces:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [otlp/tempo]
yaml
config:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: ${env:MY_POD_IP}:4317
        http:
          endpoint: ${env:MY_POD_IP}:4318

  processors:
    batch:
      timeout: 10s
      send_batch_size: 1024
    memory_limiter:
      check_interval: 5s
      limit_percentage: 80
      spike_limit_percentage: 25

  exporters:
    prometheusremotewrite:
      endpoint: "http://prometheus:9090/api/v1/write"
    loki:
      endpoint: "http://loki:3100/loki/api/v1/push"

  service:
    pipelines:
      metrics:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [prometheusremotewrite]
      logs:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [loki]
      traces:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [otlp/tempo]

Kubernetes Attributes Enrichment

Kubernetes属性增强

yaml
processors:
  k8sattributes:
    auth_type: "serviceAccount"
    passthrough: false
    filter:
      node_from_env_var: ${env:K8S_NODE_NAME}
    extract:
      metadata:
        - k8s.pod.name
        - k8s.namespace.name
        - k8s.deployment.name
        - k8s.node.name
yaml
processors:
  k8sattributes:
    auth_type: "serviceAccount"
    passthrough: false
    filter:
      node_from_env_var: ${env:K8S_NODE_NAME}
    extract:
      metadata:
        - k8s.pod.name
        - k8s.namespace.name
        - k8s.deployment.name
        - k8s.node.name

Deployment Modes

部署模式

ModeUse CaseProsCons
DaemonSetNode-level collectionFull coverage, host metricsHigher resource usage
DeploymentCentralized gatewayScalable, easier managementSingle point of failure
SidecarPer-pod collectionIsolated, fine-grainedResource overhead per pod
模式适用场景优势劣势
DaemonSet节点级数据收集覆盖全面,可收集主机指标资源占用较高
Deployment集中式网关可扩展,易于管理存在单点故障风险
Sidecar按Pod收集数据隔离性好,粒度精细每个Pod都有资源开销

Common Patterns

常见模式

Development Environment

开发环境

  • Enable debug exporter for visibility
  • Lower resource limits (250m CPU, 512Mi memory)
  • Include spot instance tolerations for cost savings
  • 启用调试导出器以提升可见性
  • 降低资源限制(250m CPU,512Mi内存)
  • 加入抢占实例容忍度以节省成本

Production Environment

生产环境

  • Implement sampling (10-50% for traces)
  • Higher batch sizes (2048-4096)
  • Enable autoscaling and PodDisruptionBudget
  • Use TLS for all endpoints
  • 实现采样(链路追踪采样比例10-50%)
  • 增大批量处理大小(2048-4096)
  • 启用自动扩缩容和PodDisruptionBudget
  • 所有端点使用TLS加密

Detailed References

详细参考

For in-depth guidance, see:
  • Collector Configuration: COLLECTOR.md
  • Kubernetes Deployment: KUBERNETES.md
  • Troubleshooting: TROUBLESHOOTING.md
  • Instrumentation: INSTRUMENTATION.md
如需深入指导,请查看:
  • Collector配置COLLECTOR.md
  • Kubernetes部署KUBERNETES.md
  • 故障排查TROUBLESHOOTING.md
  • 埋点INSTRUMENTATION.md

Validation Commands

验证命令

bash
undefined
bash
undefined

Check collector pods

检查Collector Pod状态

kubectl get pods -n monitoring -l app.kubernetes.io/name=otel-collector
kubectl get pods -n monitoring -l app.kubernetes.io/name=otel-collector

View collector logs

查看Collector日志

kubectl logs -n monitoring -l app.kubernetes.io/name=otel-collector --tail=100
kubectl logs -n monitoring -l app.kubernetes.io/name=otel-collector --tail=100

Test OTLP endpoint

测试OTLP端点

kubectl run test-otlp --image=curlimages/curl:latest --rm -it --
curl -v http://otel-collector.monitoring:4318/v1/traces
kubectl run test-otlp --image=curlimages/curl:latest --rm -it --
curl -v http://otel-collector.monitoring:4318/v1/traces

Validate config syntax

验证配置语法

otelcol validate --config=config.yaml
undefined
otelcol validate --config=config.yaml
undefined

Key Helm Chart Values

关键Helm Chart配置值

yaml
mode: "daemonset"  # or "deployment"
presets:
  logsCollection:
    enabled: true
  hostMetrics:
    enabled: true
  kubernetesAttributes:
    enabled: true
  kubeletMetrics:
    enabled: true
useGOMEMLIMIT: true
resources:
  limits:
    cpu: 500m
    memory: 1Gi
  requests:
    cpu: 100m
    memory: 256Mi
yaml
mode: "daemonset"  # 或 "deployment"
presets:
  logsCollection:
    enabled: true
  hostMetrics:
    enabled: true
  kubernetesAttributes:
    enabled: true
  kubeletMetrics:
    enabled: true
useGOMEMLIMIT: true
resources:
  limits:
    cpu: 500m
    memory: 1Gi
  requests:
    cpu: 100m
    memory: 256Mi