tempo

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Grafana Tempo Skill

Grafana Tempo 全面指南

Comprehensive guide for Grafana Tempo - the cost-effective, high-scale distributed tracing backend designed for OpenTelemetry.
Grafana Tempo 全面指南——一款为OpenTelemetry打造的高性价比、高扩展性分布式追踪后端。

What is Tempo?

什么是Tempo?

Tempo is a high-scale distributed tracing backend that:
  • Trace-ID lookup model - No indexing of every attribute, keeps ingestion fast and storage costs low
  • OpenTelemetry native - First-class support for OTLP protocol
  • Object storage backed - Stores traces in affordable S3, GCS, or Azure Blob Storage
  • TraceQL query language - Powerful query language inspired by PromQL and LogQL
  • Apache Parquet format - 5-10x less data pulled per query vs legacy formats
  • Multi-tenant by default - Built-in tenant isolation via
    X-Scope-OrgID
    header
Tempo是一款高扩展性分布式追踪后端,具备以下特性:
  • Trace-ID 查找模型 - 不对每个属性建立索引,确保数据摄入速度快且存储成本低
  • 原生支持OpenTelemetry - 对OTLP协议提供一等支持
  • 基于对象存储 - 将追踪数据存储在高性价比的S3、GCS或Azure Blob Storage中
  • TraceQL 查询语言 - 受PromQL和LogQL启发的强大查询语言
  • Apache Parquet 格式 - 相比传统格式,每次查询拉取的数据量减少5-10倍
  • 默认支持多租户 - 通过
    X-Scope-OrgID
    header实现内置租户隔离

Architecture Overview

架构概述

Core Components

核心组件

ComponentPurpose
DistributorEntry point for trace data, routes to ingesters via consistent hash ring
IngesterBuffers traces in memory, creates Parquet blocks, flushes to storage
Query FrontendQuery orchestration, shards blockID space, coordinates queriers
QuerierLocates traces in ingesters or storage using bloom filters
CompactorCompresses blocks, deduplicates data, manages retention
Metrics GeneratorOptional: derives metrics from traces
组件用途
Distributor追踪数据的入口,通过一致性哈希环将数据路由至Ingester
Ingester在内存中缓冲追踪数据,创建Parquet块并刷新至存储系统
Query Frontend查询编排,对blockID空间进行分片,协调Querier工作
Querier使用布隆过滤器在Ingester或存储系统中定位追踪数据
Compactor压缩数据块、去重数据、管理数据保留策略
Metrics Generator可选组件:从追踪数据中衍生指标

Data Flow

数据流

Write Path:
Applications → Collector → Distributor → Ingester → Object Storage
                           Consistent Hash Ring
                           (routes by traceID)
Read Path:
Query Request → Query Frontend → Queriers → Ingesters (recent data)
                      ↓                            ↓
                 Block Sharding          Object Storage (historical data)
                      ↓                            ↓
              Parallel Querier Work      Bloom Filters + Indexes
写入路径:
Applications → Collector → Distributor → Ingester → Object Storage
                           Consistent Hash Ring
                           (routes by traceID)
读取路径:
Query Request → Query Frontend → Queriers → Ingesters (recent data)
                      ↓                            ↓
                 Block Sharding          Object Storage (historical data)
                      ↓                            ↓
              Parallel Querier Work      Bloom Filters + Indexes

Deployment Modes

部署模式

1. Monolithic Mode (
-target=all
)

1. 单体模式 (
-target=all
)

  • All components in single process
  • Best for: Local testing, small-scale deployments
  • Cannot horizontally scale component count
  • Scale by increasing replicas
  • 所有组件运行在单个进程中
  • 最佳适用场景:本地测试、小规模部署
  • 无法横向扩展组件数量
  • 通过增加副本数实现扩容

2. Scalable Monolithic (
-target=scalable-single-binary
)

2. 可扩展单体模式 (
-target=scalable-single-binary
)

  • All components in one process with horizontal scaling
  • Each instance runs all components
  • Good for development with scaling needs
  • 所有组件在单个进程中运行,支持横向扩展
  • 每个实例运行全部组件
  • 适合有扩容需求的开发环境

3. Microservices Mode (Distributed) - Recommended for Production

3. 微服务模式(分布式)- 生产环境推荐

yaml
undefined
yaml
undefined

Using tempo-distributed Helm chart

使用tempo-distributed Helm Chart

distributor: replicas: 3
ingester: replicas: 3
querier: replicas: 2
queryFrontend: replicas: 2
compactor: replicas: 1
undefined
distributor: replicas: 3
ingester: replicas: 3
querier: replicas: 2
queryFrontend: replicas: 2
compactor: replicas: 1
undefined

Helm Deployment

Helm 部署

Add Repository

添加仓库

bash
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
bash
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Install Distributed Tempo

安装分布式Tempo

bash
helm install tempo grafana/tempo-distributed \
  --namespace monitoring \
  --values values.yaml
bash
helm install tempo grafana/tempo-distributed \
  --namespace monitoring \
  --values values.yaml

Production Values Example

生产环境配置示例

yaml
undefined
yaml
undefined

Storage configuration

存储配置

storage: trace: backend: azure # or s3, gcs azure: container_name: tempo-traces storage_account_name: mystorageaccount use_federated_token: true # Workload Identity
storage: trace: backend: azure # 可选s3、gcs azure: container_name: tempo-traces storage_account_name: mystorageaccount use_federated_token: true # 工作负载身份

Distributor

Distributor配置

distributor: replicas: 3 resources: requests: cpu: 500m memory: 2Gi limits: memory: 4Gi
distributor: replicas: 3 resources: requests: cpu: 500m memory: 2Gi limits: memory: 4Gi

Ingester

Ingester配置

ingester: replicas: 3 resources: requests: cpu: 1000m memory: 2Gi limits: memory: 8Gi # Spikes to 8GB periodically persistence: enabled: true size: 20Gi
ingester: replicas: 3 resources: requests: cpu: 1000m memory: 2Gi limits: memory: 8Gi # 周期性峰值可达8GB persistence: enabled: true size: 20Gi

Querier

Querier配置

querier: replicas: 2 resources: requests: cpu: 100m memory: 256Mi limits: memory: 4Gi
querier: replicas: 2 resources: requests: cpu: 100m memory: 256Mi limits: memory: 4Gi

Query Frontend

Query Frontend配置

queryFrontend: replicas: 2 resources: requests: cpu: 100m memory: 100Mi limits: memory: 2Gi
queryFrontend: replicas: 2 resources: requests: cpu: 100m memory: 100Mi limits: memory: 2Gi

Compactor

Compactor配置

compactor: replicas: 1 resources: requests: cpu: 500m memory: 2Gi limits: memory: 6Gi
compactor: replicas: 1 resources: requests: cpu: 500m memory: 2Gi limits: memory: 6Gi

Block retention

数据块保留策略

compactor: compaction: block_retention: 336h # 14 days
compactor: compaction: block_retention: 336h # 14天

Gateway for external access

外部访问网关

gateway: enabled: true replicas: 1
gateway: enabled: true replicas: 1

Metrics Generator (optional)

指标生成器(可选)

metricsGenerator: enabled: false
undefined
metricsGenerator: enabled: false
undefined

Storage Configuration

存储配置

Azure Blob Storage (Recommended for Azure)

Azure Blob Storage(Azure环境推荐)

yaml
storage:
  trace:
    backend: azure
    azure:
      container_name: tempo-traces
      storage_account_name: <storage-account-name>
      # Option 1: Workload Identity (Recommended)
      use_federated_token: true
      # Option 2: User-Assigned Managed Identity
      use_managed_identity: true
      user_assigned_id: <identity-client-id>
      # Option 3: Account Key (Dev only)
      # storage_account_key: <account-key>
      endpoint_suffix: blob.core.windows.net
      hedge_requests_at: 400ms
      hedge_requests_up_to: 2
yaml
storage:
  trace:
    backend: azure
    azure:
      container_name: tempo-traces
      storage_account_name: <storage-account-name>
      # 选项1:工作负载身份(推荐)
      use_federated_token: true
      # 选项2:用户分配的托管身份
      use_managed_identity: true
      user_assigned_id: <identity-client-id>
      # 选项3:账户密钥(仅开发环境使用)
      # storage_account_key: <account-key>
      endpoint_suffix: blob.core.windows.net
      hedge_requests_at: 400ms
      hedge_requests_up_to: 2

AWS S3

AWS S3

yaml
storage:
  trace:
    backend: s3
    s3:
      bucket: my-tempo-bucket
      region: us-east-1
      endpoint: s3.us-east-1.amazonaws.com
      # Use IAM roles or access keys
      access_key: <access-key>
      secret_key: <secret-key>
yaml
storage:
  trace:
    backend: s3
    s3:
      bucket: my-tempo-bucket
      region: us-east-1
      endpoint: s3.us-east-1.amazonaws.com
      # 使用IAM角色或访问密钥
      access_key: <access-key>
      secret_key: <secret-key>

Google Cloud Storage

Google Cloud Storage

yaml
storage:
  trace:
    backend: gcs
    gcs:
      bucket_name: my-tempo-bucket
      # Uses Workload Identity or service account
yaml
storage:
  trace:
    backend: gcs
    gcs:
      bucket_name: my-tempo-bucket
      # 使用工作负载身份或服务账号

TraceQL Query Language

TraceQL 查询语言

Basic Queries

基础查询

traceql
undefined
traceql
undefined

Simplest query - all spans

最简单的查询 - 所有跨度

{ }
{ }

Filter by service

按服务过滤

{ resource.service.name = "frontend" }
{ resource.service.name = "frontend" }

Filter by operation

按操作过滤

{ span:name = "GET /api/orders" }
{ span:name = "GET /api/orders" }

Filter by status

按状态过滤

{ span:status = error }
{ span:status = error }

Filter by duration

按耗时过滤

{ span:duration > 500ms }
{ span:duration > 500ms }

Multiple conditions

多条件过滤

{ resource.service.name = "api" && span:status = error }
undefined
{ resource.service.name = "api" && span:status = error }
undefined

Structural Operators

结构运算符

traceql
undefined
traceql
undefined

Direct parent-child relationship

直接父子关系

{ resource.service.name = "frontend" } > { resource.service.name = "api" }
{ resource.service.name = "frontend" } > { resource.service.name = "api" }

Ancestor-descendant relationship

祖先-后代关系

{ span:name = "GET /api/products" } >> { span.db.system = "postgresql" }
{ span:name = "GET /api/products" } >> { span.db.system = "postgresql" }

Sibling relationship

兄弟关系

{ span:name = "span-a" } ~ { span:name = "span-b" }
undefined
{ span:name = "span-a" } ~ { span:name = "span-b" }
undefined

Aggregation Functions

聚合函数

traceql
undefined
traceql
undefined

Count spans

统计跨度数量

{ } | count() > 10
{ } | count() > 10

Average duration

平均耗时

{ } | avg(span:duration) > 20ms
{ } | avg(span:duration) > 20ms

Max duration

最大耗时

{ span:status = error } | max(span:duration)
undefined
{ span:status = error } | max(span:duration)
undefined

Metrics Functions

指标函数

traceql
undefined
traceql
undefined

Rate of errors

错误发生率

{ span:status = error } | rate()
{ span:status = error } | rate()

Count over time

时间范围内的数量统计

{ span:name = "GET /:endpoint" } | count_over_time()
{ span:name = "GET /:endpoint" } | count_over_time()

Percentile latency

延迟百分位数

{ span:name = "GET /:endpoint" } | quantile_over_time(span:duration, .99)
{ span:name = "GET /:endpoint" } | quantile_over_time(span:duration, .99)

Group by service

按服务分组

{ span:status = error } | rate() by(resource.service.name)
{ span:status = error } | rate() by(resource.service.name)

Top 10 by error rate

错误率Top 10服务

{ span:status = error } | rate() by(resource.service.name) | topk(10)
undefined
{ span:status = error } | rate() by(resource.service.name) | topk(10)
undefined

Trace Structure

追踪结构

Intrinsic Fields (colon separator)

内置字段(冒号分隔)

FieldDescription
span:name
Operation name
span:duration
Elapsed time (e.g., "10ms", "1.5s")
span:status
ok
,
error
, or
unset
span:kind
server
,
client
,
producer
,
consumer
,
internal
trace:duration
Total trace duration
trace:rootName
Root span name
trace:rootService
Root span service
字段描述
span:name
操作名称
span:duration
耗时(例如:"10ms", "1.5s")
span:status
ok
error
unset
span:kind
server
client
producer
consumer
internal
trace:duration
追踪总耗时
trace:rootName
根跨度名称
trace:rootService
根跨度所属服务

Attribute Scopes (period separator)

属性作用域(点分隔)

ScopeExampleDescription
span.
span.http.method
Span-level attributes
resource.
resource.service.name
Resource attributes
event.
event.exception.message
Event attributes
link.
link.traceID
Link attributes
作用域示例描述
span.
span.http.method
跨度级属性
resource.
resource.service.name
资源属性
event.
event.exception.message
事件属性
link.
link.traceID
关联属性

Receiver Endpoints

接收器端点

ProtocolPortEndpoint
OTLP gRPC4317
/v1/traces
OTLP HTTP4318
/v1/traces
Jaeger gRPC14250-
Jaeger Thrift HTTP14268
/api/traces
Jaeger Thrift Compact6831UDP
Jaeger Thrift Binary6832UDP
Zipkin9411
/api/v2/spans
协议端口端点
OTLP gRPC4317
/v1/traces
OTLP HTTP4318
/v1/traces
Jaeger gRPC14250-
Jaeger Thrift HTTP14268
/api/traces
Jaeger Thrift Compact6831UDP
Jaeger Thrift Binary6832UDP
Zipkin9411
/api/v2/spans

Multi-Tenancy

多租户配置

yaml
undefined
yaml
undefined

Enable multi-tenancy

启用多租户

multitenancy_enabled: true
multitenancy_enabled: true

All requests must include X-Scope-OrgID header

所有请求必须包含X-Scope-OrgID请求头

Example:

示例:

curl -H "X-Scope-OrgID: tenant-1" http://tempo:3200/api/traces/<traceID>

curl -H "X-Scope-OrgID: tenant-1" http://tempo:3200/api/traces/<traceID>

undefined
undefined

Azure Identity Configuration

Azure 身份配置

Workload Identity Federation (Recommended)

工作负载身份联邦(推荐)

1. Enable Workload Identity on AKS:
bash
az aks update \
  --name <aks-cluster> \
  --resource-group <rg> \
  --enable-oidc-issuer \
  --enable-workload-identity
2. Create User-Assigned Managed Identity:
bash
az identity create \
  --name tempo-identity \
  --resource-group <rg>

IDENTITY_CLIENT_ID=$(az identity show --name tempo-identity --resource-group <rg> --query clientId -o tsv)
3. Assign Storage Permission:
bash
az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id <principal-id> \
  --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>
4. Create Federated Credential:
bash
az identity federated-credential create \
  --name tempo-federated \
  --identity-name tempo-identity \
  --resource-group <rg> \
  --issuer <aks-oidc-issuer-url> \
  --subject system:serviceaccount:monitoring:tempo \
  --audiences api://AzureADTokenExchange
5. Configure Helm Values:
yaml
serviceAccount:
  annotations:
    azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>

podLabels:
  azure.workload.identity/use: "true"

storage:
  trace:
    azure:
      use_federated_token: true
1. 在AKS上启用工作负载身份:
bash
az aks update \
  --name <aks-cluster> \
  --resource-group <rg> \
  --enable-oidc-issuer \
  --enable-workload-identity
2. 创建用户分配的托管身份:
bash
az identity create \
  --name tempo-identity \
  --resource-group <rg>

IDENTITY_CLIENT_ID=$(az identity show --name tempo-identity --resource-group <rg> --query clientId -o tsv)
3. 分配存储权限:
bash
az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id <principal-id> \
  --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>
4. 创建联邦凭据:
bash
az identity federated-credential create \
  --name tempo-federated \
  --identity-name tempo-identity \
  --resource-group <rg> \
  --issuer <aks-oidc-issuer-url> \
  --subject system:serviceaccount:monitoring:tempo \
  --audiences api://AzureADTokenExchange
5. 配置Helm参数:
yaml
serviceAccount:
  annotations:
    azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>

podLabels:
  azure.workload.identity/use: "true"

storage:
  trace:
    azure:
      use_federated_token: true

Troubleshooting

故障排查

Common Issues

常见问题

1. Container Not Found (Azure)
bash
az storage container create --name tempo-traces --account-name <storage>
2. Authorization Failure (Azure)
bash
undefined
1. 容器不存在(Azure环境)
bash
az storage container create --name tempo-traces --account-name <storage>
2. 授权失败(Azure环境)
bash
undefined

Verify RBAC assignment

验证RBAC权限分配

az role assignment list --scope <storage-scope>
az role assignment list --scope <storage-scope>

Assign if missing

若缺失则分配权限

az role assignment create
--role "Storage Blob Data Contributor"
--assignee-object-id <principal-id>
--scope <storage-scope>

**3. Ingester OOM**

```yaml
ingester:
  resources:
    limits:
      memory: 16Gi  # Increase from 8Gi
4. Query Timeout
yaml
querier:
  query_timeout: 5m
  max_concurrent_queries: 20
az role assignment create
--role "Storage Blob Data Contributor"
--assignee-object-id <principal-id>
--scope <storage-scope>

**3. Ingester内存溢出**

```yaml
ingester:
  resources:
    limits:
      memory: 16Gi  # 从8Gi调整为16Gi
4. 查询超时
yaml
querier:
  query_timeout: 5m
  max_concurrent_queries: 20

Diagnostic Commands

诊断命令

bash
undefined
bash
undefined

Check pod status

检查Pod状态

kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo
kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo

Check distributor logs

查看Distributor日志

kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100
kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100

Check ingester logs

查看Ingester日志

kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100

Verify readiness

验证就绪状态

kubectl exec -it <tempo-pod> -n monitoring -- wget -qO- http://localhost:3200/ready
kubectl exec -it <tempo-pod> -n monitoring -- wget -qO- http://localhost:3200/ready

Check ring status

查看哈希环状态

kubectl port-forward svc/tempo-distributor 3200:3200 -n monitoring curl http://localhost:3200/distributor/ring
undefined
kubectl port-forward svc/tempo-distributor 3200:3200 -n monitoring curl http://localhost:3200/distributor/ring
undefined

API Reference

API 参考

Trace Retrieval

追踪数据获取

bash
undefined
bash
undefined

Get trace by ID

通过TraceID获取追踪数据

GET /api/traces/<traceID>
GET /api/traces/<traceID>

Search traces (TraceQL)

搜索追踪数据(TraceQL)

GET /api/search?q={resource.service.name="api"}
GET /api/search?q={resource.service.name="api"}

Search tags

搜索标签

GET /api/search/tags GET /api/search/tag/<tag>/values
undefined
GET /api/search/tags GET /api/search/tag/<tag>/values
undefined

Health

健康检查

bash
GET /ready
GET /metrics
bash
GET /ready
GET /metrics

Reference Documentation

参考文档

For detailed configuration by topic:
  • Storage Configuration: Object stores, retention, caching
  • TraceQL Reference: Query syntax and examples
  • Configuration Reference: Full configuration manifest
按主题分类的详细配置说明:
  • 存储配置:对象存储、数据保留、缓存
  • TraceQL 参考:查询语法及示例
  • 配置参考:完整配置清单

External Resources

外部资源