distributed-tracing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDistributed Tracing
分布式追踪
Implement distributed tracing with Jaeger and OpenTelemetry for request flow visibility.
使用Jaeger和OpenTelemetry实现分布式追踪,以实现请求流可见性。
Trace Structure
追踪结构
Trace (Request ID: abc123)
↓
Span (frontend) [100ms]
↓
Span (api-gateway) [80ms]
├→ Span (auth-service) [10ms]
└→ Span (user-service) [60ms]
└→ Span (database) [40ms]Trace (Request ID: abc123)
↓
Span (frontend) [100ms]
↓
Span (api-gateway) [80ms]
├→ Span (auth-service) [10ms]
└→ Span (user-service) [60ms]
└→ Span (database) [40ms]Key Components
核心组件
| Component | Description |
|---|---|
| Trace | End-to-end request journey |
| Span | Single operation within a trace |
| Context | Metadata propagated between services |
| Tags | Key-value pairs for filtering |
| 组件 | 描述 |
|---|---|
| Trace | 端到端请求链路 |
| Span | 追踪中的单个操作 |
| Context | 在服务间传播的元数据 |
| Tags | 用于过滤的键值对 |
OpenTelemetry Setup (Python)
OpenTelemetry 配置(Python)
python
from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.flask import FlaskInstrumentorpython
from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.flask import FlaskInstrumentorInitialize
Initialize
provider = TracerProvider()
processor = BatchSpanProcessor(JaegerExporter(
agent_host_name="jaeger",
agent_port=6831,
))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
provider = TracerProvider()
processor = BatchSpanProcessor(JaegerExporter(
agent_host_name="jaeger",
agent_port=6831,
))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
Instrument Flask
Instrument Flask
app = Flask(name)
FlaskInstrumentor().instrument_app(app)
app = Flask(name)
FlaskInstrumentor().instrument_app(app)
Custom spans
Custom spans
@app.route('/api/users')
def get_users():
tracer = trace.get_tracer(name)
with tracer.start_as_current_span("get_users") as span:
span.set_attribute("user.count", 100)
return fetch_users()
undefined@app.route('/api/users')
def get_users():
tracer = trace.get_tracer(name)
with tracer.start_as_current_span("get_users") as span:
span.set_attribute("user.count", 100)
return fetch_users()
undefinedOpenTelemetry Setup (Node.js)
OpenTelemetry 配置(Node.js)
javascript
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');
const provider = new NodeTracerProvider();
provider.addSpanProcessor(new BatchSpanProcessor(
new JaegerExporter({ endpoint: 'http://jaeger:14268/api/traces' })
));
provider.register();javascript
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');
const provider = new NodeTracerProvider();
provider.addSpanProcessor(new BatchSpanProcessor(
new JaegerExporter({ endpoint: 'http://jaeger:14268/api/traces' })
));
provider.register();Context Propagation
上下文传播
python
undefinedpython
undefinedInject trace context into HTTP headers
Inject trace context into HTTP headers
from opentelemetry.propagate import inject
headers = {}
inject(headers) # Adds traceparent header
response = requests.get('http://downstream/api', headers=headers)
undefinedfrom opentelemetry.propagate import inject
headers = {}
inject(headers) # Adds traceparent header
response = requests.get('http://downstream/api', headers=headers)
undefinedSampling Strategies
采样策略
yaml
undefinedyaml
undefinedProbabilistic - sample 1%
Probabilistic - sample 1%
sampler:
type: probabilistic
param: 0.01
sampler:
type: probabilistic
param: 0.01
Rate limiting - max 100/sec
Rate limiting - max 100/sec
sampler:
type: ratelimiting
param: 100
undefinedsampler:
type: ratelimiting
param: 100
undefinedJaeger Queries
Jaeger 查询
undefinedundefinedFind slow requests
Find slow requests
service=my-service duration > 1s
service=my-service duration > 1s
Find errors
Find errors
service=my-service error=true tags.http.status_code >= 500
undefinedservice=my-service error=true tags.http.status_code >= 500
undefinedCorrelated Logging
关联日志
python
def process_request():
span = trace.get_current_span()
trace_id = span.get_span_context().trace_id
logger.info("Processing", extra={"trace_id": format(trace_id, '032x')})python
def process_request():
span = trace.get_current_span()
trace_id = span.get_span_context().trace_id
logger.info("Processing", extra={"trace_id": format(trace_id, '032x')})Best Practices
最佳实践
- Sample appropriately (1-10% in production)
- Add meaningful tags (user_id, request_id)
- Propagate context across all boundaries
- Log exceptions in spans
- Use consistent naming for operations
- Monitor tracing overhead (<1% CPU impact)
- Correlate with logs using trace IDs
- 合理采样(生产环境中1-10%)
- 添加有意义的Tags(user_id、request_id)
- 跨所有边界传播上下文
- 在Span中记录异常
- 使用一致的操作命名
- 监控追踪开销(CPU影响<1%)
- 使用Trace ID关联日志