tempo
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGrafana Tempo Skill
Grafana Tempo 全面指南
Comprehensive guide for Grafana Tempo - the cost-effective, high-scale distributed tracing backend designed for OpenTelemetry.
Grafana Tempo 全面指南——一款为OpenTelemetry打造的高性价比、高扩展性分布式追踪后端。
What is Tempo?
什么是Tempo?
Tempo is a high-scale distributed tracing backend that:
- Trace-ID lookup model - No indexing of every attribute, keeps ingestion fast and storage costs low
- OpenTelemetry native - First-class support for OTLP protocol
- Object storage backed - Stores traces in affordable S3, GCS, or Azure Blob Storage
- TraceQL query language - Powerful query language inspired by PromQL and LogQL
- Apache Parquet format - 5-10x less data pulled per query vs legacy formats
- Multi-tenant by default - Built-in tenant isolation via header
X-Scope-OrgID
Tempo是一款高扩展性分布式追踪后端,具备以下特性:
- Trace-ID 查找模型 - 不对每个属性建立索引,确保数据摄入速度快且存储成本低
- 原生支持OpenTelemetry - 对OTLP协议提供一等支持
- 基于对象存储 - 将追踪数据存储在高性价比的S3、GCS或Azure Blob Storage中
- TraceQL 查询语言 - 受PromQL和LogQL启发的强大查询语言
- Apache Parquet 格式 - 相比传统格式,每次查询拉取的数据量减少5-10倍
- 默认支持多租户 - 通过header实现内置租户隔离
X-Scope-OrgID
Architecture Overview
架构概述
Core Components
核心组件
| Component | Purpose |
|---|---|
| Distributor | Entry point for trace data, routes to ingesters via consistent hash ring |
| Ingester | Buffers traces in memory, creates Parquet blocks, flushes to storage |
| Query Frontend | Query orchestration, shards blockID space, coordinates queriers |
| Querier | Locates traces in ingesters or storage using bloom filters |
| Compactor | Compresses blocks, deduplicates data, manages retention |
| Metrics Generator | Optional: derives metrics from traces |
| 组件 | 用途 |
|---|---|
| Distributor | 追踪数据的入口,通过一致性哈希环将数据路由至Ingester |
| Ingester | 在内存中缓冲追踪数据,创建Parquet块并刷新至存储系统 |
| Query Frontend | 查询编排,对blockID空间进行分片,协调Querier工作 |
| Querier | 使用布隆过滤器在Ingester或存储系统中定位追踪数据 |
| Compactor | 压缩数据块、去重数据、管理数据保留策略 |
| Metrics Generator | 可选组件:从追踪数据中衍生指标 |
Data Flow
数据流
Write Path:
Applications → Collector → Distributor → Ingester → Object Storage
↓
Consistent Hash Ring
(routes by traceID)Read Path:
Query Request → Query Frontend → Queriers → Ingesters (recent data)
↓ ↓
Block Sharding Object Storage (historical data)
↓ ↓
Parallel Querier Work Bloom Filters + Indexes写入路径:
Applications → Collector → Distributor → Ingester → Object Storage
↓
Consistent Hash Ring
(routes by traceID)读取路径:
Query Request → Query Frontend → Queriers → Ingesters (recent data)
↓ ↓
Block Sharding Object Storage (historical data)
↓ ↓
Parallel Querier Work Bloom Filters + IndexesDeployment Modes
部署模式
1. Monolithic Mode (-target=all
)
-target=all1. 单体模式 (-target=all
)
-target=all- All components in single process
- Best for: Local testing, small-scale deployments
- Cannot horizontally scale component count
- Scale by increasing replicas
- 所有组件运行在单个进程中
- 最佳适用场景:本地测试、小规模部署
- 无法横向扩展组件数量
- 通过增加副本数实现扩容
2. Scalable Monolithic (-target=scalable-single-binary
)
-target=scalable-single-binary2. 可扩展单体模式 (-target=scalable-single-binary
)
-target=scalable-single-binary- All components in one process with horizontal scaling
- Each instance runs all components
- Good for development with scaling needs
- 所有组件在单个进程中运行,支持横向扩展
- 每个实例运行全部组件
- 适合有扩容需求的开发环境
3. Microservices Mode (Distributed) - Recommended for Production
3. 微服务模式(分布式)- 生产环境推荐
yaml
undefinedyaml
undefinedUsing tempo-distributed Helm chart
使用tempo-distributed Helm Chart
distributor:
replicas: 3
ingester:
replicas: 3
querier:
replicas: 2
queryFrontend:
replicas: 2
compactor:
replicas: 1
undefineddistributor:
replicas: 3
ingester:
replicas: 3
querier:
replicas: 2
queryFrontend:
replicas: 2
compactor:
replicas: 1
undefinedHelm Deployment
Helm 部署
Add Repository
添加仓库
bash
helm repo add grafana https://grafana.github.io/helm-charts
helm repo updatebash
helm repo add grafana https://grafana.github.io/helm-charts
helm repo updateInstall Distributed Tempo
安装分布式Tempo
bash
helm install tempo grafana/tempo-distributed \
--namespace monitoring \
--values values.yamlbash
helm install tempo grafana/tempo-distributed \
--namespace monitoring \
--values values.yamlProduction Values Example
生产环境配置示例
yaml
undefinedyaml
undefinedStorage configuration
存储配置
storage:
trace:
backend: azure # or s3, gcs
azure:
container_name: tempo-traces
storage_account_name: mystorageaccount
use_federated_token: true # Workload Identity
storage:
trace:
backend: azure # 可选s3、gcs
azure:
container_name: tempo-traces
storage_account_name: mystorageaccount
use_federated_token: true # 工作负载身份
Distributor
Distributor配置
distributor:
replicas: 3
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
memory: 4Gi
distributor:
replicas: 3
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
memory: 4Gi
Ingester
Ingester配置
ingester:
replicas: 3
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
memory: 8Gi # Spikes to 8GB periodically
persistence:
enabled: true
size: 20Gi
ingester:
replicas: 3
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
memory: 8Gi # 周期性峰值可达8GB
persistence:
enabled: true
size: 20Gi
Querier
Querier配置
querier:
replicas: 2
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
memory: 4Gi
querier:
replicas: 2
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
memory: 4Gi
Query Frontend
Query Frontend配置
queryFrontend:
replicas: 2
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
memory: 2Gi
queryFrontend:
replicas: 2
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
memory: 2Gi
Compactor
Compactor配置
compactor:
replicas: 1
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
memory: 6Gi
compactor:
replicas: 1
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
memory: 6Gi
Block retention
数据块保留策略
compactor:
compaction:
block_retention: 336h # 14 days
compactor:
compaction:
block_retention: 336h # 14天
Gateway for external access
外部访问网关
gateway:
enabled: true
replicas: 1
gateway:
enabled: true
replicas: 1
Metrics Generator (optional)
指标生成器(可选)
metricsGenerator:
enabled: false
undefinedmetricsGenerator:
enabled: false
undefinedStorage Configuration
存储配置
Azure Blob Storage (Recommended for Azure)
Azure Blob Storage(Azure环境推荐)
yaml
storage:
trace:
backend: azure
azure:
container_name: tempo-traces
storage_account_name: <storage-account-name>
# Option 1: Workload Identity (Recommended)
use_federated_token: true
# Option 2: User-Assigned Managed Identity
use_managed_identity: true
user_assigned_id: <identity-client-id>
# Option 3: Account Key (Dev only)
# storage_account_key: <account-key>
endpoint_suffix: blob.core.windows.net
hedge_requests_at: 400ms
hedge_requests_up_to: 2yaml
storage:
trace:
backend: azure
azure:
container_name: tempo-traces
storage_account_name: <storage-account-name>
# 选项1:工作负载身份(推荐)
use_federated_token: true
# 选项2:用户分配的托管身份
use_managed_identity: true
user_assigned_id: <identity-client-id>
# 选项3:账户密钥(仅开发环境使用)
# storage_account_key: <account-key>
endpoint_suffix: blob.core.windows.net
hedge_requests_at: 400ms
hedge_requests_up_to: 2AWS S3
AWS S3
yaml
storage:
trace:
backend: s3
s3:
bucket: my-tempo-bucket
region: us-east-1
endpoint: s3.us-east-1.amazonaws.com
# Use IAM roles or access keys
access_key: <access-key>
secret_key: <secret-key>yaml
storage:
trace:
backend: s3
s3:
bucket: my-tempo-bucket
region: us-east-1
endpoint: s3.us-east-1.amazonaws.com
# 使用IAM角色或访问密钥
access_key: <access-key>
secret_key: <secret-key>Google Cloud Storage
Google Cloud Storage
yaml
storage:
trace:
backend: gcs
gcs:
bucket_name: my-tempo-bucket
# Uses Workload Identity or service accountyaml
storage:
trace:
backend: gcs
gcs:
bucket_name: my-tempo-bucket
# 使用工作负载身份或服务账号TraceQL Query Language
TraceQL 查询语言
Basic Queries
基础查询
traceql
undefinedtraceql
undefinedSimplest query - all spans
最简单的查询 - 所有跨度
{ }
{ }
Filter by service
按服务过滤
{ resource.service.name = "frontend" }
{ resource.service.name = "frontend" }
Filter by operation
按操作过滤
{ span:name = "GET /api/orders" }
{ span:name = "GET /api/orders" }
Filter by status
按状态过滤
{ span:status = error }
{ span:status = error }
Filter by duration
按耗时过滤
{ span:duration > 500ms }
{ span:duration > 500ms }
Multiple conditions
多条件过滤
{ resource.service.name = "api" && span:status = error }
undefined{ resource.service.name = "api" && span:status = error }
undefinedStructural Operators
结构运算符
traceql
undefinedtraceql
undefinedDirect parent-child relationship
直接父子关系
{ resource.service.name = "frontend" } > { resource.service.name = "api" }
{ resource.service.name = "frontend" } > { resource.service.name = "api" }
Ancestor-descendant relationship
祖先-后代关系
{ span:name = "GET /api/products" } >> { span.db.system = "postgresql" }
{ span:name = "GET /api/products" } >> { span.db.system = "postgresql" }
Sibling relationship
兄弟关系
{ span:name = "span-a" } ~ { span:name = "span-b" }
undefined{ span:name = "span-a" } ~ { span:name = "span-b" }
undefinedAggregation Functions
聚合函数
traceql
undefinedtraceql
undefinedCount spans
统计跨度数量
{ } | count() > 10
{ } | count() > 10
Average duration
平均耗时
{ } | avg(span:duration) > 20ms
{ } | avg(span:duration) > 20ms
Max duration
最大耗时
{ span:status = error } | max(span:duration)
undefined{ span:status = error } | max(span:duration)
undefinedMetrics Functions
指标函数
traceql
undefinedtraceql
undefinedRate of errors
错误发生率
{ span:status = error } | rate()
{ span:status = error } | rate()
Count over time
时间范围内的数量统计
{ span:name = "GET /:endpoint" } | count_over_time()
{ span:name = "GET /:endpoint" } | count_over_time()
Percentile latency
延迟百分位数
{ span:name = "GET /:endpoint" } | quantile_over_time(span:duration, .99)
{ span:name = "GET /:endpoint" } | quantile_over_time(span:duration, .99)
Group by service
按服务分组
{ span:status = error } | rate() by(resource.service.name)
{ span:status = error } | rate() by(resource.service.name)
Top 10 by error rate
错误率Top 10服务
{ span:status = error } | rate() by(resource.service.name) | topk(10)
undefined{ span:status = error } | rate() by(resource.service.name) | topk(10)
undefinedTrace Structure
追踪结构
Intrinsic Fields (colon separator)
内置字段(冒号分隔)
| Field | Description |
|---|---|
| Operation name |
| Elapsed time (e.g., "10ms", "1.5s") |
| |
| |
| Total trace duration |
| Root span name |
| Root span service |
| 字段 | 描述 |
|---|---|
| 操作名称 |
| 耗时(例如:"10ms", "1.5s") |
| |
| |
| 追踪总耗时 |
| 根跨度名称 |
| 根跨度所属服务 |
Attribute Scopes (period separator)
属性作用域(点分隔)
| Scope | Example | Description |
|---|---|---|
| | Span-level attributes |
| | Resource attributes |
| | Event attributes |
| | Link attributes |
| 作用域 | 示例 | 描述 |
|---|---|---|
| | 跨度级属性 |
| | 资源属性 |
| | 事件属性 |
| | 关联属性 |
Receiver Endpoints
接收器端点
| Protocol | Port | Endpoint |
|---|---|---|
| OTLP gRPC | 4317 | |
| OTLP HTTP | 4318 | |
| Jaeger gRPC | 14250 | - |
| Jaeger Thrift HTTP | 14268 | |
| Jaeger Thrift Compact | 6831 | UDP |
| Jaeger Thrift Binary | 6832 | UDP |
| Zipkin | 9411 | |
| 协议 | 端口 | 端点 |
|---|---|---|
| OTLP gRPC | 4317 | |
| OTLP HTTP | 4318 | |
| Jaeger gRPC | 14250 | - |
| Jaeger Thrift HTTP | 14268 | |
| Jaeger Thrift Compact | 6831 | UDP |
| Jaeger Thrift Binary | 6832 | UDP |
| Zipkin | 9411 | |
Multi-Tenancy
多租户配置
yaml
undefinedyaml
undefinedEnable multi-tenancy
启用多租户
multitenancy_enabled: true
multitenancy_enabled: true
All requests must include X-Scope-OrgID header
所有请求必须包含X-Scope-OrgID请求头
Example:
示例:
curl -H "X-Scope-OrgID: tenant-1" http://tempo:3200/api/traces/<traceID>
curl -H "X-Scope-OrgID: tenant-1" http://tempo:3200/api/traces/<traceID>
undefinedundefinedAzure Identity Configuration
Azure 身份配置
Workload Identity Federation (Recommended)
工作负载身份联邦(推荐)
1. Enable Workload Identity on AKS:
bash
az aks update \
--name <aks-cluster> \
--resource-group <rg> \
--enable-oidc-issuer \
--enable-workload-identity2. Create User-Assigned Managed Identity:
bash
az identity create \
--name tempo-identity \
--resource-group <rg>
IDENTITY_CLIENT_ID=$(az identity show --name tempo-identity --resource-group <rg> --query clientId -o tsv)3. Assign Storage Permission:
bash
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee-object-id <principal-id> \
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>4. Create Federated Credential:
bash
az identity federated-credential create \
--name tempo-federated \
--identity-name tempo-identity \
--resource-group <rg> \
--issuer <aks-oidc-issuer-url> \
--subject system:serviceaccount:monitoring:tempo \
--audiences api://AzureADTokenExchange5. Configure Helm Values:
yaml
serviceAccount:
annotations:
azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>
podLabels:
azure.workload.identity/use: "true"
storage:
trace:
azure:
use_federated_token: true1. 在AKS上启用工作负载身份:
bash
az aks update \
--name <aks-cluster> \
--resource-group <rg> \
--enable-oidc-issuer \
--enable-workload-identity2. 创建用户分配的托管身份:
bash
az identity create \
--name tempo-identity \
--resource-group <rg>
IDENTITY_CLIENT_ID=$(az identity show --name tempo-identity --resource-group <rg> --query clientId -o tsv)3. 分配存储权限:
bash
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee-object-id <principal-id> \
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>4. 创建联邦凭据:
bash
az identity federated-credential create \
--name tempo-federated \
--identity-name tempo-identity \
--resource-group <rg> \
--issuer <aks-oidc-issuer-url> \
--subject system:serviceaccount:monitoring:tempo \
--audiences api://AzureADTokenExchange5. 配置Helm参数:
yaml
serviceAccount:
annotations:
azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>
podLabels:
azure.workload.identity/use: "true"
storage:
trace:
azure:
use_federated_token: trueTroubleshooting
故障排查
Common Issues
常见问题
1. Container Not Found (Azure)
bash
az storage container create --name tempo-traces --account-name <storage>2. Authorization Failure (Azure)
bash
undefined1. 容器不存在(Azure环境)
bash
az storage container create --name tempo-traces --account-name <storage>2. 授权失败(Azure环境)
bash
undefinedVerify RBAC assignment
验证RBAC权限分配
az role assignment list --scope <storage-scope>
az role assignment list --scope <storage-scope>
Assign if missing
若缺失则分配权限
az role assignment create
--role "Storage Blob Data Contributor"
--assignee-object-id <principal-id>
--scope <storage-scope>
--role "Storage Blob Data Contributor"
--assignee-object-id <principal-id>
--scope <storage-scope>
**3. Ingester OOM**
```yaml
ingester:
resources:
limits:
memory: 16Gi # Increase from 8Gi4. Query Timeout
yaml
querier:
query_timeout: 5m
max_concurrent_queries: 20az role assignment create
--role "Storage Blob Data Contributor"
--assignee-object-id <principal-id>
--scope <storage-scope>
--role "Storage Blob Data Contributor"
--assignee-object-id <principal-id>
--scope <storage-scope>
**3. Ingester内存溢出**
```yaml
ingester:
resources:
limits:
memory: 16Gi # 从8Gi调整为16Gi4. 查询超时
yaml
querier:
query_timeout: 5m
max_concurrent_queries: 20Diagnostic Commands
诊断命令
bash
undefinedbash
undefinedCheck pod status
检查Pod状态
kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo
kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo
Check distributor logs
查看Distributor日志
kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100
kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100
Check ingester logs
查看Ingester日志
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100
Verify readiness
验证就绪状态
kubectl exec -it <tempo-pod> -n monitoring -- wget -qO- http://localhost:3200/ready
kubectl exec -it <tempo-pod> -n monitoring -- wget -qO- http://localhost:3200/ready
Check ring status
查看哈希环状态
kubectl port-forward svc/tempo-distributor 3200:3200 -n monitoring
curl http://localhost:3200/distributor/ring
undefinedkubectl port-forward svc/tempo-distributor 3200:3200 -n monitoring
curl http://localhost:3200/distributor/ring
undefinedAPI Reference
API 参考
Trace Retrieval
追踪数据获取
bash
undefinedbash
undefinedGet trace by ID
通过TraceID获取追踪数据
GET /api/traces/<traceID>
GET /api/traces/<traceID>
Search traces (TraceQL)
搜索追踪数据(TraceQL)
GET /api/search?q={resource.service.name="api"}
GET /api/search?q={resource.service.name="api"}
Search tags
搜索标签
GET /api/search/tags
GET /api/search/tag/<tag>/values
undefinedGET /api/search/tags
GET /api/search/tag/<tag>/values
undefinedHealth
健康检查
bash
GET /ready
GET /metricsbash
GET /ready
GET /metricsReference Documentation
参考文档
For detailed configuration by topic:
- Storage Configuration: Object stores, retention, caching
- TraceQL Reference: Query syntax and examples
- Configuration Reference: Full configuration manifest
按主题分类的详细配置说明:
- 存储配置:对象存储、数据保留、缓存
- TraceQL 参考:查询语法及示例
- 配置参考:完整配置清单