opentelemetry
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOpenTelemetry Implementation Guide
OpenTelemetry 实现指南
Overview
概述
OpenTelemetry (OTel) is a vendor-neutral observability framework for instrumenting, generating, collecting, and exporting telemetry data (traces, metrics, logs). This skill provides guidance for implementing OTEL in Kubernetes environments.
OpenTelemetry(OTel)是一个厂商中立的可观测性框架,用于埋点、生成、收集和导出遥测数据(链路追踪、指标、日志)。本指南提供了在Kubernetes环境中实现OTEL的相关指导。
Quick Start
快速开始
Deploy OTEL Collector on Kubernetes
在Kubernetes上部署OTEL Collector
bash
undefinedbash
undefinedAdd Helm repo
添加Helm仓库
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
Install with basic config
使用基础配置安装
helm install otel-collector open-telemetry/opentelemetry-collector
--namespace monitoring --create-namespace
--set mode=daemonset
--namespace monitoring --create-namespace
--set mode=daemonset
undefinedhelm install otel-collector open-telemetry/opentelemetry-collector
--namespace monitoring --create-namespace
--set mode=daemonset
--namespace monitoring --create-namespace
--set mode=daemonset
undefinedSend Test Data via OTLP
通过OTLP发送测试数据
bash
undefinedbash
undefinedgRPC endpoint: 4317, HTTP endpoint: 4318
gRPC端点: 4317, HTTP端点: 4318
curl -X POST http://otel-collector:4318/v1/traces
-H "Content-Type: application/json"
-d '{"resourceSpans":[]}'
-H "Content-Type: application/json"
-d '{"resourceSpans":[]}'
undefinedcurl -X POST http://otel-collector:4318/v1/traces
-H "Content-Type: application/json"
-d '{"resourceSpans":[]}'
-H "Content-Type: application/json"
-d '{"resourceSpans":[]}'
undefinedCore Concepts
核心概念
Signals: Three types of telemetry data:
- Traces: Distributed request flows across services
- Metrics: Numerical measurements (counters, gauges, histograms)
- Logs: Event records with structured/unstructured data
Collector Components:
- Receivers: Accept data (OTLP, Prometheus, Jaeger, Zipkin)
- Processors: Transform data (batch, memory_limiter, k8sattributes)
- Exporters: Send data (prometheusremotewrite, loki, otlp)
- Extensions: Add capabilities (health_check, pprof, zpages)
信号:三类遥测数据:
- 链路追踪(Traces):跨服务的分布式请求流程
- 指标(Metrics):数值型测量数据(计数器、仪表盘、直方图)
- 日志(Logs):包含结构化/非结构化数据的事件记录
Collector组件:
- 接收器(Receivers):接收数据(OTLP、Prometheus、Jaeger、Zipkin)
- 处理器(Processors):转换数据(批量处理、内存限制器、K8s属性提取)
- 导出器(Exporters):发送数据(Prometheus远程写入、Loki、OTLP)
- 扩展(Extensions):添加功能(健康检查、pprof、zpages)
Collector Configuration
Collector配置
Basic Pipeline Structure
基础流水线结构
yaml
config:
receivers:
otlp:
protocols:
grpc:
endpoint: ${env:MY_POD_IP}:4317
http:
endpoint: ${env:MY_POD_IP}:4318
processors:
batch:
timeout: 10s
send_batch_size: 1024
memory_limiter:
check_interval: 5s
limit_percentage: 80
spike_limit_percentage: 25
exporters:
prometheusremotewrite:
endpoint: "http://prometheus:9090/api/v1/write"
loki:
endpoint: "http://loki:3100/loki/api/v1/push"
service:
pipelines:
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loki]
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/tempo]yaml
config:
receivers:
otlp:
protocols:
grpc:
endpoint: ${env:MY_POD_IP}:4317
http:
endpoint: ${env:MY_POD_IP}:4318
processors:
batch:
timeout: 10s
send_batch_size: 1024
memory_limiter:
check_interval: 5s
limit_percentage: 80
spike_limit_percentage: 25
exporters:
prometheusremotewrite:
endpoint: "http://prometheus:9090/api/v1/write"
loki:
endpoint: "http://loki:3100/loki/api/v1/push"
service:
pipelines:
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loki]
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/tempo]Kubernetes Attributes Enrichment
Kubernetes属性增强
yaml
processors:
k8sattributes:
auth_type: "serviceAccount"
passthrough: false
filter:
node_from_env_var: ${env:K8S_NODE_NAME}
extract:
metadata:
- k8s.pod.name
- k8s.namespace.name
- k8s.deployment.name
- k8s.node.nameyaml
processors:
k8sattributes:
auth_type: "serviceAccount"
passthrough: false
filter:
node_from_env_var: ${env:K8S_NODE_NAME}
extract:
metadata:
- k8s.pod.name
- k8s.namespace.name
- k8s.deployment.name
- k8s.node.nameDeployment Modes
部署模式
| Mode | Use Case | Pros | Cons |
|---|---|---|---|
| DaemonSet | Node-level collection | Full coverage, host metrics | Higher resource usage |
| Deployment | Centralized gateway | Scalable, easier management | Single point of failure |
| Sidecar | Per-pod collection | Isolated, fine-grained | Resource overhead per pod |
| 模式 | 适用场景 | 优势 | 劣势 |
|---|---|---|---|
| DaemonSet | 节点级数据收集 | 覆盖全面,可收集主机指标 | 资源占用较高 |
| Deployment | 集中式网关 | 可扩展,易于管理 | 存在单点故障风险 |
| Sidecar | 按Pod收集数据 | 隔离性好,粒度精细 | 每个Pod都有资源开销 |
Common Patterns
常见模式
Development Environment
开发环境
- Enable debug exporter for visibility
- Lower resource limits (250m CPU, 512Mi memory)
- Include spot instance tolerations for cost savings
- 启用调试导出器以提升可见性
- 降低资源限制(250m CPU,512Mi内存)
- 加入抢占实例容忍度以节省成本
Production Environment
生产环境
- Implement sampling (10-50% for traces)
- Higher batch sizes (2048-4096)
- Enable autoscaling and PodDisruptionBudget
- Use TLS for all endpoints
- 实现采样(链路追踪采样比例10-50%)
- 增大批量处理大小(2048-4096)
- 启用自动扩缩容和PodDisruptionBudget
- 所有端点使用TLS加密
Detailed References
详细参考
For in-depth guidance, see:
- Collector Configuration: COLLECTOR.md
- Kubernetes Deployment: KUBERNETES.md
- Troubleshooting: TROUBLESHOOTING.md
- Instrumentation: INSTRUMENTATION.md
如需深入指导,请查看:
- Collector配置:COLLECTOR.md
- Kubernetes部署:KUBERNETES.md
- 故障排查:TROUBLESHOOTING.md
- 埋点:INSTRUMENTATION.md
Validation Commands
验证命令
bash
undefinedbash
undefinedCheck collector pods
检查Collector Pod状态
kubectl get pods -n monitoring -l app.kubernetes.io/name=otel-collector
kubectl get pods -n monitoring -l app.kubernetes.io/name=otel-collector
View collector logs
查看Collector日志
kubectl logs -n monitoring -l app.kubernetes.io/name=otel-collector --tail=100
kubectl logs -n monitoring -l app.kubernetes.io/name=otel-collector --tail=100
Test OTLP endpoint
测试OTLP端点
kubectl run test-otlp --image=curlimages/curl:latest --rm -it --
curl -v http://otel-collector.monitoring:4318/v1/traces
curl -v http://otel-collector.monitoring:4318/v1/traces
kubectl run test-otlp --image=curlimages/curl:latest --rm -it --
curl -v http://otel-collector.monitoring:4318/v1/traces
curl -v http://otel-collector.monitoring:4318/v1/traces
Validate config syntax
验证配置语法
otelcol validate --config=config.yaml
undefinedotelcol validate --config=config.yaml
undefinedKey Helm Chart Values
关键Helm Chart配置值
yaml
mode: "daemonset" # or "deployment"
presets:
logsCollection:
enabled: true
hostMetrics:
enabled: true
kubernetesAttributes:
enabled: true
kubeletMetrics:
enabled: true
useGOMEMLIMIT: true
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 256Miyaml
mode: "daemonset" # 或 "deployment"
presets:
logsCollection:
enabled: true
hostMetrics:
enabled: true
kubernetesAttributes:
enabled: true
kubeletMetrics:
enabled: true
useGOMEMLIMIT: true
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi