distributed-tracing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Distributed Tracing

分布式追踪

Implement distributed tracing with Jaeger and OpenTelemetry for request flow visibility.
使用Jaeger和OpenTelemetry实现分布式追踪,以实现请求流可见性。

Trace Structure

追踪结构

Trace (Request ID: abc123)
Span (frontend) [100ms]
Span (api-gateway) [80ms]
  ├→ Span (auth-service) [10ms]
  └→ Span (user-service) [60ms]
      └→ Span (database) [40ms]
Trace (Request ID: abc123)
Span (frontend) [100ms]
Span (api-gateway) [80ms]
  ├→ Span (auth-service) [10ms]
  └→ Span (user-service) [60ms]
      └→ Span (database) [40ms]

Key Components

核心组件

ComponentDescription
TraceEnd-to-end request journey
SpanSingle operation within a trace
ContextMetadata propagated between services
TagsKey-value pairs for filtering
组件描述
Trace端到端请求链路
Span追踪中的单个操作
Context在服务间传播的元数据
Tags用于过滤的键值对

OpenTelemetry Setup (Python)

OpenTelemetry 配置(Python)

python
from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.flask import FlaskInstrumentor
python
from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.flask import FlaskInstrumentor

Initialize

Initialize

provider = TracerProvider() processor = BatchSpanProcessor(JaegerExporter( agent_host_name="jaeger", agent_port=6831, )) provider.add_span_processor(processor) trace.set_tracer_provider(provider)
provider = TracerProvider() processor = BatchSpanProcessor(JaegerExporter( agent_host_name="jaeger", agent_port=6831, )) provider.add_span_processor(processor) trace.set_tracer_provider(provider)

Instrument Flask

Instrument Flask

app = Flask(name) FlaskInstrumentor().instrument_app(app)
app = Flask(name) FlaskInstrumentor().instrument_app(app)

Custom spans

Custom spans

@app.route('/api/users') def get_users(): tracer = trace.get_tracer(name) with tracer.start_as_current_span("get_users") as span: span.set_attribute("user.count", 100) return fetch_users()
undefined
@app.route('/api/users') def get_users(): tracer = trace.get_tracer(name) with tracer.start_as_current_span("get_users") as span: span.set_attribute("user.count", 100) return fetch_users()
undefined

OpenTelemetry Setup (Node.js)

OpenTelemetry 配置(Node.js)

javascript
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');

const provider = new NodeTracerProvider();
provider.addSpanProcessor(new BatchSpanProcessor(
    new JaegerExporter({ endpoint: 'http://jaeger:14268/api/traces' })
));
provider.register();
javascript
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');

const provider = new NodeTracerProvider();
provider.addSpanProcessor(new BatchSpanProcessor(
    new JaegerExporter({ endpoint: 'http://jaeger:14268/api/traces' })
));
provider.register();

Context Propagation

上下文传播

python
undefined
python
undefined

Inject trace context into HTTP headers

Inject trace context into HTTP headers

from opentelemetry.propagate import inject
headers = {} inject(headers) # Adds traceparent header response = requests.get('http://downstream/api', headers=headers)
undefined
from opentelemetry.propagate import inject
headers = {} inject(headers) # Adds traceparent header response = requests.get('http://downstream/api', headers=headers)
undefined

Sampling Strategies

采样策略

yaml
undefined
yaml
undefined

Probabilistic - sample 1%

Probabilistic - sample 1%

sampler: type: probabilistic param: 0.01
sampler: type: probabilistic param: 0.01

Rate limiting - max 100/sec

Rate limiting - max 100/sec

sampler: type: ratelimiting param: 100
undefined
sampler: type: ratelimiting param: 100
undefined

Jaeger Queries

Jaeger 查询

undefined
undefined

Find slow requests

Find slow requests

service=my-service duration > 1s
service=my-service duration > 1s

Find errors

Find errors

service=my-service error=true tags.http.status_code >= 500
undefined
service=my-service error=true tags.http.status_code >= 500
undefined

Correlated Logging

关联日志

python
def process_request():
    span = trace.get_current_span()
    trace_id = span.get_span_context().trace_id
    logger.info("Processing", extra={"trace_id": format(trace_id, '032x')})
python
def process_request():
    span = trace.get_current_span()
    trace_id = span.get_span_context().trace_id
    logger.info("Processing", extra={"trace_id": format(trace_id, '032x')})

Best Practices

最佳实践

  1. Sample appropriately (1-10% in production)
  2. Add meaningful tags (user_id, request_id)
  3. Propagate context across all boundaries
  4. Log exceptions in spans
  5. Use consistent naming for operations
  6. Monitor tracing overhead (<1% CPU impact)
  7. Correlate with logs using trace IDs
  1. 合理采样(生产环境中1-10%)
  2. 添加有意义的Tags(user_id、request_id)
  3. 跨所有边界传播上下文
  4. 在Span中记录异常
  5. 使用一致的操作命名
  6. 监控追踪开销(CPU影响<1%)
  7. 使用Trace ID关联日志