app-observability

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Grafana Cloud Application Observability Skill

Grafana Cloud 应用可观测性技能

Overview

概述

Grafana Cloud provides three tightly related application monitoring products:

Application Observability (APM) - RED metrics from OTel traces, service inventory, service maps
Frontend Observability - RUM/Faro SDK for browser apps, session replay, web vitals
AI Observability - LLM/model monitoring via OpenLIT + OTel, token/cost/latency metrics

All three integrate with Grafana Tempo (traces), Loki (logs), and Pyroscope (profiles) for full-stack correlation.

Grafana Cloud 提供三款紧密关联的应用监控产品：

应用可观测性（APM） - 基于OTel追踪数据生成的RED指标、服务清单、服务拓扑图
前端可观测性 - 面向浏览器应用的RUM/Faro SDK、会话重放、Web性能指标
AI可观测性 - 通过OpenLIT + OTel实现的LLM/模型监控、令牌/成本/延迟指标

三款产品均可与Grafana Tempo（追踪）、Loki（日志）和Pyroscope（性能剖析）集成，实现全链路关联。

Application Observability (APM)

应用可观测性（APM）

What It Is

产品简介

Application Observability is a pre-built APM experience in Grafana Cloud built on top of OpenTelemetry. It generates RED (Rate, Error, Duration) metrics from distributed traces via span metrics, then surfaces them in:

Service Inventory - table of all services with RED metrics at a glance
Service Overview - per-service RED metrics, top operations, error breakdown
Service Map - node graph of service dependencies with flow visualization
Operations view - per-endpoint RED metrics with p50/p95/p99 latency

应用可观测性是Grafana Cloud中基于OpenTelemetry构建的预集成APM方案。它通过Span指标从分布式追踪数据中生成RED（请求率、错误率、持续时间）指标，并在以下模块中展示：

服务清单 - 一目了然展示所有服务RED指标的表格
服务概览 - 单服务的RED指标、核心操作、错误细分
服务拓扑图 - 可视化服务依赖关系的节点图，展示请求流向
操作视图 - 单端点的RED指标及p50/p95/p99延迟数据

How Metrics Are Generated

指标生成方式

Application Observability does NOT rely on traditional Prometheus scraping. Metrics come from span metrics - aggregations computed from OTel trace data:

Source: OTel traces sent to Grafana Tempo or Grafana Alloy
Generation method: Tempo's metrics-generator OR the
```
spanmetrics
```
connector in Alloy/OTel Collector
Result: Prometheus-compatible metrics stored in Grafana Mimir

Key generated metric names:

Via Tempo metrics-generator:

traces_spanmetrics_calls_total

traces_spanmetrics_duration_seconds

Via OTel Collector spanmetrics connector:

traces_span_metrics_calls_total

traces_span_metrics_duration_seconds

应用可观测性不依赖传统Prometheus抓取方式。指标来自Span指标——基于OTel追踪数据计算的聚合结果：

数据源：发送至Grafana Tempo或Grafana Alloy的OTel追踪数据
生成方式：Tempo的metrics-generator，或Alloy/OTel Collector中的
```
spanmetrics
```
连接器
存储：存储在Grafana Mimir中的Prometheus兼容指标

关键生成指标名称：

通过Tempo metrics-generator：

traces_spanmetrics_calls_total

traces_spanmetrics_duration_seconds

通过OTel Collector spanmetrics连接器：

traces_span_metrics_calls_total

traces_span_metrics_duration_seconds

Required OTel Resource Attributes

必填OTel资源属性

These attributes MUST be present on all spans for Application Observability to work:

Attribute	Grafana Label	Purpose
`service.name`	`service_name` / part of `job`	Identifies the service
`service.namespace`	part of `job` label	Groups services; `job = namespace/service.name`
`deployment.environment`	`deployment_environment`	Env filter (prod/dev/staging)

The

job

label is constructed as:

```
service.namespace/service.name
```
when namespace is set
```
service.name
```
alone when no namespace

Additional recommended attributes:

```
service.version
```
- shown in service overview
```
k8s.cluster.name
```
- for K8s environments
```
k8s.namespace.name
```
- Kubernetes namespace
```
cloud.region
```
- for multi-region setups

所有Span必须包含以下属性，应用可观测性才能正常工作：

属性	Grafana标签	用途
`service.name`	`service_name` / `job` 的一部分	标识服务
`service.namespace`	`job` 标签的一部分	对服务进行分组； `job = namespace/service.name`
`deployment.environment`	`deployment_environment`	环境筛选（生产/开发/预发布）

job

标签的构造规则：

当设置namespace时：
```
service.namespace/service.name
```
未设置namespace时：仅
```
service.name
```

推荐额外添加的属性：

```
service.version
```
- 在服务概览中展示
```
k8s.cluster.name
```
- 适用于K8s环境
```
k8s.namespace.name
```
- Kubernetes命名空间
```
cloud.region
```
- 适用于多区域部署

Setting Environment Variables for OTel SDK

配置OTel SDK环境变量

bash

export OTEL_SERVICE_NAME="my-api"
export OTEL_RESOURCE_ATTRIBUTES="service.namespace=myteam,deployment.environment=production,service.version=1.2.3"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
export OTEL_EXPORTER_OTLP_PROTOCOL="grpc"

bash

export OTEL_SERVICE_NAME="my-api"
export OTEL_RESOURCE_ATTRIBUTES="service.namespace=myteam,deployment.environment=production,service.version=1.2.3"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
export OTEL_EXPORTER_OTLP_PROTOCOL="grpc"

Grafana Alloy Configuration (River syntax)

Grafana Alloy配置（River语法）

Alloy acts as a local OTel Collector and forwards data to Grafana Cloud:

river

// Receive traces, metrics, logs from instrumented apps
otelcol.receiver.otlp "default" {
  grpc {
    endpoint = "0.0.0.0:4317"
  }
  http {
    endpoint = "0.0.0.0:4318"
  }
  output {
    metrics = [otelcol.processor.resourcedetection.default.input]
    logs    = [otelcol.processor.resourcedetection.default.input]
    traces  = [otelcol.processor.resourcedetection.default.input]
  }
}

// Auto-detect host/cloud metadata
otelcol.processor.resourcedetection "default" {
  detectors = ["env", "system", "gcp", "aws", "azure"]
  output {
    metrics = [otelcol.processor.batch.default.input]
    logs    = [otelcol.processor.batch.default.input]
    traces  = [otelcol.processor.batch.default.input]
  }
}

// Batch for efficiency
otelcol.processor.batch "default" {
  output {
    metrics = [otelcol.exporter.otlphttp.grafana_cloud.input]
    logs    = [otelcol.exporter.otlphttp.grafana_cloud.input]
    traces  = [otelcol.exporter.otlphttp.grafana_cloud.input]
  }
}

// Auth
otelcol.auth.basic "grafana_cloud" {
  username = env("GRAFANA_CLOUD_INSTANCE_ID")
  password = env("GRAFANA_CLOUD_API_KEY")
}

// Export to Grafana Cloud OTLP endpoint
otelcol.exporter.otlphttp "grafana_cloud" {
  client {
    endpoint = env("GRAFANA_CLOUD_OTLP_ENDPOINT")
    auth     = otelcol.auth.basic.grafana_cloud.handler
  }
}

Required environment variables for Alloy:

bash

GRAFANA_CLOUD_OTLP_ENDPOINT=https://otlp-gateway-<region>.grafana.net/otlp
GRAFANA_CLOUD_INSTANCE_ID=<your-instance-id>
GRAFANA_CLOUD_API_KEY=<your-api-key>

Alloy作为本地OTel Collector，将数据转发至Grafana Cloud：

river

// 接收来自埋点应用的追踪、指标、日志
otelcol.receiver.otlp "default" {
  grpc {
    endpoint = "0.0.0.0:4317"
  }
  http {
    endpoint = "0.0.0.0:4318"
  }
  output {
    metrics = [otelcol.processor.resourcedetection.default.input]
    logs    = [otelcol.processor.resourcedetection.default.input]
    traces  = [otelcol.processor.resourcedetection.default.input]
  }
}

// 自动检测主机/云元数据
otelcol.processor.resourcedetection "default" {
  detectors = ["env", "system", "gcp", "aws", "azure"]
  output {
    metrics = [otelcol.processor.batch.default.input]
    logs    = [otelcol.processor.batch.default.input]
    traces  = [otelcol.processor.batch.default.input]
  }
}

// 批量处理提升效率
otelcol.processor.batch "default" {
  output {
    metrics = [otelcol.exporter.otlphttp.grafana_cloud.input]
    logs    = [otelcol.exporter.otlphttp.grafana_cloud.input]
    traces  = [otelcol.exporter.otlphttp.grafana_cloud.input]
  }
}

// 认证配置
otelcol.auth.basic "grafana_cloud" {
  username = env("GRAFANA_CLOUD_INSTANCE_ID")
  password = env("GRAFANA_CLOUD_API_KEY")
}

// 导出至Grafana Cloud OTLP端点
otelcol.exporter.otlphttp "grafana_cloud" {
  client {
    endpoint = env("GRAFANA_CLOUD_OTLP_ENDPOINT")
    auth     = otelcol.auth.basic.grafana_cloud.handler
  }
}

Alloy所需环境变量：

bash

GRAFANA_CLOUD_OTLP_ENDPOINT=https://otlp-gateway-<region>.grafana.net/otlp
GRAFANA_CLOUD_INSTANCE_ID=<your-instance-id>
GRAFANA_CLOUD_API_KEY=<your-api-key>

Service Map

服务拓扑图

The Service Map uses Tempo's metrics-generator to produce service graph metrics:

Node graph shows services as nodes, HTTP/gRPC calls as edges
Edge thickness indicates request rate; color indicates error rate
Clicking a node navigates to Service Overview
Requires
```
span.kind
```
(CLIENT/SERVER) on spans for directional edges

Enable in Tempo (managed by Grafana Cloud automatically):

```
service-graphs
```
metrics generator enabled by default in Grafana Cloud Tempo

Uses

traces_service_graph_request_total

traces_service_graph_request_failed_total

metrics

服务拓扑图使用Tempo的metrics-generator生成服务图指标：

节点图将服务展示为节点，HTTP/gRPC调用展示为边
边的粗细表示请求率，颜色表示错误率
点击节点可跳转至服务概览
需要Span上包含
```
span.kind
```
（CLIENT/SERVER）属性以生成有向边

在Tempo中启用（Grafana Cloud自动管理）：

Grafana Cloud Tempo默认启用
```
service-graphs
```
指标生成器

使用

traces_service_graph_request_total

traces_service_graph_request_failed_total

指标

Integration with Traces, Logs, Profiles

与追踪、日志、性能剖析的集成

Application Observability provides one-click correlation:

Traces: Click any metric spike to open exemplar traces in Grafana Tempo
Logs: Service logs shown in Service Overview; correlated via
```
service.name
```
label
Profiles: "Go to profiles" button in Service Overview when Pyroscope is configured
Frontend: Link from Application Observability to Frontend Observability for the same service

应用可观测性提供一键关联功能：

追踪：点击任意指标峰值，可在Grafana Tempo中打开关联的示例追踪
日志：服务概览中展示服务日志，通过
```
service.name
```
标签关联
性能剖析：当配置Pyroscope后，服务概览中会显示「前往性能剖析」按钮
前端：从应用可观测性跳转至同一服务的前端可观测性模块

Frontend Observability (Faro)

前端可观测性（Faro）

What It Is

产品简介

Grafana Faro is an open-source JavaScript/TypeScript SDK for Real User Monitoring (RUM). It instruments browser applications to capture:

Web vitals: Core Web Vitals (LCP, CLS, INP) and additional performance metrics
Errors: Unhandled exceptions, rejected promises with stack traces
Sessions: User journeys, page views, navigation timing
Logs: Custom log messages from frontend code
Traces: Distributed traces via OpenTelemetry-JS (correlates with backend spans)
Session replay: Rrweb-based DOM recording for reproducing user issues

Data flows: Faro SDK -> Grafana Alloy (faro receiver) OR Grafana Cloud OTLP endpoint -> Loki (logs) + Tempo (traces) + Mimir (metrics)

Grafana Faro是一款开源的JavaScript/TypeScript SDK，用于真实用户监控（RUM）。它对浏览器应用进行埋点，捕获以下数据：

Web性能指标：核心Web指标（LCP、CLS、INP）及额外性能指标
错误：未处理异常、带堆栈追踪的被拒绝Promise
会话：用户旅程、页面浏览、导航计时
日志：前端代码中的自定义日志消息
追踪：通过OpenTelemetry-JS实现的分布式追踪（与后端Span关联）
会话重放：基于Rrweb的DOM录制，用于复现用户问题

数据流：Faro SDK -> Grafana Alloy（Faro接收器）或Grafana Cloud OTLP端点 -> Loki（日志）+ Tempo（追踪）+ Mimir（指标）

Faro SDK Packages

Faro SDK包

@grafana/faro-core          # Core SDK - signals, transports, API
@grafana/faro-web-sdk       # Web instrumentations + transports
@grafana/faro-web-tracing   # OpenTelemetry-JS distributed tracing
@grafana/faro-react         # React-specific integrations (error boundary, router)

@grafana/faro-core          # 核心SDK - 信号、传输、API
@grafana/faro-web-sdk       # Web埋点工具 + 传输模块
@grafana/faro-web-tracing   # OpenTelemetry-JS分布式追踪
@grafana/faro-react         # React专属集成（错误边界、路由）

Basic JavaScript Setup (npm)

JavaScript基础配置（npm）

bash

npm install @grafana/faro-web-sdk

bash

npm install @grafana/faro-web-sdk

or

或

yarn add @grafana/faro-web-sdk


```javascript
import {
  initializeFaro,
  getWebInstrumentations,
} from '@grafana/faro-web-sdk';

const faro = initializeFaro({
  url: 'https://faro-collector-prod-<region>.grafana.net/collect/<app-key>',
  app: {
    name: 'my-frontend-app',
    version: '1.0.0',
    environment: 'production',
  },
  instrumentations: [
    ...getWebInstrumentations({
      captureConsole: true,
    }),
  ],
});

// Manual API usage
faro.api.pushLog(['User clicked checkout button']);
faro.api.pushError(new Error('Payment failed'));
faro.api.pushEvent('button_click', { button: 'checkout' });

yarn add @grafana/faro-web-sdk


```javascript
import {
  initializeFaro,
  getWebInstrumentations,
} from '@grafana/faro-web-sdk';

const faro = initializeFaro({
  url: 'https://faro-collector-prod-<region>.grafana.net/collect/<app-key>',
  app: {
    name: 'my-frontend-app',
    version: '1.0.0',
    environment: 'production',
  },
  instrumentations: [
    ...getWebInstrumentations({
      captureConsole: true,
    }),
  ],
});

// 手动调用API
faro.api.pushLog(['User clicked checkout button']);
faro.api.pushError(new Error('Payment failed'));
faro.api.pushEvent('button_click', { button: 'checkout' });

CDN Setup (no bundler)

CDN配置（无需打包工具）

html

<script src="https://unpkg.com/@grafana/faro-web-sdk@latest/dist/library/faro-web-sdk.iife.js"></script>
<script>
  const { initializeFaro, getWebInstrumentations } = GrafanaFaroWebSdk;

  initializeFaro({
    url: 'https://faro-collector-prod-<region>.grafana.net/collect/<app-key>',
    app: { name: 'my-app', version: '1.0.0' },
    instrumentations: [...getWebInstrumentations()],
  });
</script>

html

<script src="https://unpkg.com/@grafana/faro-web-sdk@latest/dist/library/faro-web-sdk.iife.js"></script>
<script>
  const { initializeFaro, getWebInstrumentations } = GrafanaFaroWebSdk;

  initializeFaro({
    url: 'https://faro-collector-prod-<region>.grafana.net/collect/<app-key>',
    app: { name: 'my-app', version: '1.0.0' },
    instrumentations: [...getWebInstrumentations()],
  });
</script>

React Setup with Tracing

带追踪功能的React配置

bash

npm install @grafana/faro-react @grafana/faro-web-tracing

javascript

import { initializeFaro, getWebInstrumentations } from '@grafana/faro-web-sdk';
import { TracingInstrumentation } from '@grafana/faro-web-tracing';
import {
  createReactRouterV6DataOptions,
  ReactIntegration,
  withFaroRouterInstrumentation,
} from '@grafana/faro-react';
import { createBrowserRouter, RouterProvider } from 'react-router-dom';

const faro = initializeFaro({
  url: 'https://faro-collector-prod-<region>.grafana.net/collect/<app-key>',
  app: {
    name: 'my-react-app',
    version: '1.0.0',
    environment: 'production',
  },
  instrumentations: [
    ...getWebInstrumentations({ captureConsole: true }),
    new TracingInstrumentation(),
    new ReactIntegration({
      router: createReactRouterV6DataOptions({}),
    }),
  ],
});

const router = withFaroRouterInstrumentation(
  createBrowserRouter([
    { path: '/', element: <Home /> },
    { path: '/about', element: <About /> },
  ])
);

function App() {
  return <RouterProvider router={router} />;
}

bash

npm install @grafana/faro-react @grafana/faro-web-tracing

javascript

import { initializeFaro, getWebInstrumentations } from '@grafana/faro-web-sdk';
import { TracingInstrumentation } from '@grafana/faro-web-tracing';
import {
  createReactRouterV6DataOptions,
  ReactIntegration,
  withFaroRouterInstrumentation,
} from '@grafana/faro-react';
import { createBrowserRouter, RouterProvider } from 'react-router-dom';

const faro = initializeFaro({
  url: 'https://faro-collector-prod-<region>.grafana.net/collect/<app-key>',
  app: {
    name: 'my-react-app',
    version: '1.0.0',
    environment: 'production',
  },
  instrumentations: [
    ...getWebInstrumentations({ captureConsole: true }),
    new TracingInstrumentation(),
    new ReactIntegration({
      router: createReactRouterV6DataOptions({}),
    }),
  ],
});

const router = withFaroRouterInstrumentation(
  createBrowserRouter([
    { path: '/', element: <Home /> },
    { path: '/about', element: <About /> },
  ])
);

function App() {
  return <RouterProvider router={router} />;
}

Session Configuration

会话配置

javascript

initializeFaro({
  url: '...',
  app: { name: 'my-app' },
  sessionTracking: {
    enabled: true,
    persistent: true,
    maxSessionPersistenceTime: 4 * 60 * 60 * 1000, // 4 hours in ms
    samplingRate: 1,           // 1 = 100%, 0.5 = 50% of sessions
    onSessionChange: (oldSession, newSession) => {
      console.log('Session changed', newSession.id);
    },
  },
  instrumentations: [...getWebInstrumentations()],
});

javascript

initializeFaro({
  url: '...',
  app: { name: 'my-app' },
  sessionTracking: {
    enabled: true,
    persistent: true,
    maxSessionPersistenceTime: 4 * 60 * 60 * 1000, // 4小时（毫秒）
    samplingRate: 1,           // 1 = 100%，0.5 = 50%的会话被采样
    onSessionChange: (oldSession, newSession) => {
      console.log('Session changed', newSession.id);
    },
  },
  instrumentations: [...getWebInstrumentations()],
});

Getting the Collector URL

获取接收器URL

In Grafana Cloud, go to Connections (left menu) > search "Frontend Observability"
Click the Frontend Observability card
Navigate to Web SDK Configuration tab
Copy the
```
url
```
value - this is your unique collector endpoint
Paste into your
```
initializeFaro({ url: '...' })
```
call

在Grafana Cloud中，进入左侧菜单的Connections > 搜索「Frontend Observability」
点击Frontend Observability卡片
切换至Web SDK Configuration标签页
复制
```
url
```
值——这是你的专属接收器端点
将其粘贴至
```
initializeFaro({ url: '...' })
```
调用中

What Faro Captures Automatically

Faro自动捕获的内容

When using

getWebInstrumentations()

Page views and navigation timing
Core Web Vitals (LCP, CLS, INP - replaces FID in Faro v2)
JavaScript errors and unhandled rejections
Console errors/warnings (when
```
captureConsole: true
```
)
Resource loading performance
User interactions (clicks, form events)
Fetch/XHR request timing

使用

getWebInstrumentations()

时，会自动捕获：

页面浏览和导航计时
核心Web指标（LCP、CLS、INP - 在Faro v2中替代FID）
JavaScript错误和未处理拒绝
Console错误/警告（当
```
captureConsole: true
```
时）
资源加载性能
用户交互（点击、表单事件）
Fetch/XHR请求计时

Correlation with Backend Traces

与后端追踪的关联

When

TracingInstrumentation

is included, Faro:

Injects
```
traceparent
```
/
```
tracestate
```
headers into outgoing fetch/XHR requests
Creates spans for each HTTP call
Links browser session to backend traces in Tempo
Enables "Frontend to Backend" trace waterfall in Grafana

当包含

TracingInstrumentation

时，Faro会：

向 outgoing fetch/XHR请求中注入
```
traceparent
```
/
```
tracestate
```
头
为每个HTTP调用创建Span
将浏览器会话与Tempo中的后端追踪关联
在Grafana中启用「前端到后端」追踪瀑布图

AI Observability

AI可观测性

What It Is

产品简介

AI Observability monitors generative AI and LLM applications in production. Built on OTel GenAI semantic conventions and the OpenLIT instrumentation library.

Monitors:

LLM API calls (OpenAI, Anthropic, Cohere, Google, etc.)
Vector databases (Pinecone, Weaviate, Chroma, etc.)
AI frameworks (LangChain, CrewAI, LlamaIndex)
Model Context Protocol (MCP) servers
GPU utilization
AI evaluation quality (hallucination, toxicity, bias)

AI可观测性用于监控生产环境中的生成式AI和LLM应用。基于OTel GenAI语义规范和OpenLIT埋点库构建。

监控范围包括：

LLM API调用（OpenAI、Anthropic、Cohere、Google等）
向量数据库（Pinecone、Weaviate、Chroma等）
AI框架（LangChain、CrewAI、LlamaIndex）
模型上下文协议（MCP）服务器
GPU利用率
AI评估质量（幻觉、毒性、偏见）

Key Metrics (OTel GenAI Semantic Conventions)

核心指标（OTel GenAI语义规范）

Metric	Description
`gen_ai_usage_input_tokens_total`	Total input/prompt tokens consumed
`gen_ai_usage_output_tokens_total`	Total output/completion tokens consumed
`gen_ai_usage_cost_USD_sum`	Total cost in USD
`gen_ai_client_operation_duration`	Latency per LLM call (histogram)
`gen_ai_client_token_usage`	Token usage histogram

Trace spans capture:

Model name (
```
gen_ai.request.model
```
)
Temperature, top_p parameters
Full prompts and completions (configurable)
Provider (
```
gen_ai.system
```
:
```
openai
```
,
```
anthropic
```
, etc.)
Time to first token (TTFT)

指标	描述
`gen_ai_usage_input_tokens_total`	消耗的输入/提示令牌总数
`gen_ai_usage_output_tokens_total`	消耗的输出/完成令牌总数
`gen_ai_usage_cost_USD_sum`	总成本（美元）
`gen_ai_client_operation_duration`	每次LLM调用的延迟（直方图）
`gen_ai_client_token_usage`	令牌使用量直方图

追踪Span捕获以下信息：

模型名称（
```
gen_ai.request.model
```
）
Temperature、top_p参数
完整提示和完成内容（可配置）
提供商（
```
gen_ai.system
```
:
```
openai
```
,
```
anthropic
```
等）
首令牌生成时间（TTFT）

Python Setup with OpenLIT

使用OpenLIT的Python配置

bash

pip install openlit openai anthropic cohere

python

import openlit
import openai

bash

pip install openlit openai anthropic cohere

python

import openlit
import openai

One-line initialization - auto-instruments all supported LLM libraries

一键初始化——自动埋点所有支持的LLM库

openlit.init()

Optional parameters

可选参数

openlit.init( application_name="my-ai-app", environment="production", )

Your existing code works unchanged - OpenLIT intercepts all LLM calls

现有代码无需修改——OpenLIT会拦截所有LLM调用

client = openai.OpenAI() response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hello!"}] )

undefined

client = openai.OpenAI() response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hello!"}] )

undefined

OTel Environment Variables

OTel环境变量

bash

export OTEL_SERVICE_NAME="my-ai-app"
export OTEL_DEPLOYMENT_ENVIRONMENT="production"
export OTEL_EXPORTER_OTLP_ENDPOINT="https://otlp-gateway-<region>.grafana.net/otlp"

bash

export OTEL_SERVICE_NAME="my-ai-app"
export OTEL_DEPLOYMENT_ENVIRONMENT="production"
export OTEL_EXPORTER_OTLP_ENDPOINT="https://otlp-gateway-<region>.grafana.net/otlp"

Base64 encode "instanceID:apiToken"

将"instanceID:apiToken"进行Base64编码

export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic base64-encoded-instanceid:apitoken"


To get the credentials:
1. In Grafana Cloud, go to **My Account** > **Stack** > **OpenTelemetry**
2. Generate a token and copy the OTLP endpoint

export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic base64-encoded-instanceid:apitoken"


获取凭证步骤：
1. 在Grafana Cloud中，进入**My Account** > **Stack** > **OpenTelemetry**
2. 生成令牌并复制OTLP端点

AI Evaluations and Guards

AI评估与安全防护

python

undefined

python

undefined

Hallucination detection

幻觉检测

evals = openlit.evals.Hallucination( provider="openai", api_key=os.getenv("OPENAI_API_KEY") ) result = evals.measure( prompt=user_message, contexts=["Your knowledge base content here"], text=llm_answer )

Content safety guard

内容安全防护

guard = openlit.guard.All( provider="openai", api_key=os.getenv("OPENAI_API_KEY") ) guard.detect(text=user_message)

undefined

guard = openlit.guard.All( provider="openai", api_key=os.getenv("OPENAI_API_KEY") ) guard.detect(text=user_message)

undefined

Prebuilt Dashboards

预构建仪表盘

Once metrics arrive, Grafana Cloud auto-populates five dashboards:

GenAI Observability - request rates, latency percentiles, costs
GenAI Evaluations - hallucination, bias, toxicity scores
Vector Database Observability - query latency, index ops
MCP Observability - tool call rates, errors
GPU Monitoring - utilization, memory, temperature

指标到达后，Grafana Cloud会自动填充五款仪表盘：

GenAI可观测性 - 请求率、延迟百分位数、成本
GenAI评估 - 幻觉、偏见、毒性评分
向量数据库可观测性 - 查询延迟、索引操作
MCP可观测性 - 工具调用率、错误
GPU监控 - 利用率、内存、温度

Setup Path

配置流程

In Grafana Cloud: Connections > search "AI Observability" > click the card
Follow the UI wizard to get your OTLP endpoint and API key
Set the environment variables
```
pip install openlit
```
and call
```
openlit.init()
```
at app startup
Deploy - dashboards populate automatically within minutes

在Grafana Cloud中：Connections > 搜索「AI Observability」 > 点击卡片
按照UI向导获取OTel端点和API密钥
设置环境变量
执行
```
pip install openlit
```
并在应用启动时调用
```
openlit.init()
```
部署——几分钟内仪表盘会自动填充数据

Full-Stack Correlation Summary

全链路关联总结

Signal	Product	Storage	Query Language
Metrics (RED)	App Observability	Mimir	PromQL
Traces	Tempo	Tempo	TraceQL
Logs	Loki	Loki	LogQL
Profiles	Pyroscope	Pyroscope	-
Browser RUM	Faro/Frontend Obs	Loki + Tempo	-
LLM metrics	AI Observability	Mimir	PromQL

Correlation keys:

```
service.name
```
/
```
service_name
```
links all signals for a service
Trace exemplars embed trace IDs in metric data points (RED metrics -> traces)
```
traceID
```
in logs enables log-to-trace correlation
```
profileID
```
/ time range enables trace-to-profile correlation
Faro injects
```
traceparent
```
headers to link browser sessions to backend traces

信号	产品	存储	查询语言
指标（RED）	应用可观测性	Mimir	PromQL
追踪	Tempo	Tempo	TraceQL
日志	Loki	Loki	LogQL
性能剖析	Pyroscope	Pyroscope	-
浏览器RUM	Faro/前端可观测性	Loki + Tempo	-
LLM指标	AI可观测性	Mimir	PromQL

关联键：

```
service.name
```
/
```
service_name
```
关联服务的所有信号
追踪示例将Trace ID嵌入指标数据点（RED指标 -> 追踪）
日志中的
```
traceID
```
实现日志到追踪的关联
```
profileID
```
/ 时间范围实现追踪到性能剖析的关联
Faro注入
```
traceparent
```
头以关联浏览器会话与后端追踪

Common Tasks

常见任务

Find Why a Service Has High Latency

排查服务高延迟原因

App Observability > Service Inventory > click service
In Service Overview: check p95/p99 latency trend in Operations panel
Click a high-latency operation > "View traces" to open exemplar traces in Tempo
In Tempo trace: use "Go to profiles" to see CPU profile at that time
Check correlated logs in the Logs panel of Service Overview

应用可观测性 > 服务清单 > 点击目标服务
在服务概览中：查看Operations面板的p95/p99延迟趋势
点击高延迟操作 > 「查看追踪」在Tempo中打开示例追踪
在Tempo追踪中：使用「前往性能剖析」查看对应时间的CPU剖析
在服务概览的Logs面板中查看关联日志

Debug a Frontend Error

调试前端错误

Frontend Observability > Errors panel > click error
View stack trace, browser, OS, session info
Click "View session replay" to see what the user did
Check correlated backend trace if
```
TracingInstrumentation
```
is configured

前端可观测性 > Errors面板 > 点击目标错误
查看堆栈追踪、浏览器、操作系统、会话信息
点击「查看会话重放」查看用户操作流程
若配置了
```
TracingInstrumentation
```
，可查看关联的后端追踪

Monitor LLM Cost Drift

监控LLM成本异常

AI Observability dashboard > GenAI Observability
Use
```
gen_ai_usage_cost_USD_sum
```
metric to see cost by model/provider
Set alert on cost threshold or token usage spike
Drill into traces to see which prompts are consuming the most tokens

AI可观测性仪表盘 > GenAI可观测性
使用
```
gen_ai_usage_cost_USD_sum
```
指标查看各模型/提供商的成本
设置成本阈值或令牌使用量峰值告警
深入追踪查看哪些提示消耗了最多令牌

References

参考链接

App Observability docs: https://grafana.com/docs/grafana-cloud/monitor-applications/application-observability/
Frontend Observability docs: https://grafana.com/docs/grafana-cloud/monitor-applications/frontend-observability/
Faro Web SDK GitHub: https://github.com/grafana/faro-web-sdk
AI Observability docs: https://grafana.com/docs/grafana-cloud/monitor-applications/ai-observability/
Alloy for App Observability: https://grafana.com/docs/opentelemetry/collector/grafana-alloy/
OpenLIT: https://openlit.io/

应用可观测性文档：https://grafana.com/docs/grafana-cloud/monitor-applications/application-observability/
前端可观测性文档：https://grafana.com/docs/grafana-cloud/monitor-applications/frontend-observability/
Faro Web SDK GitHub：https://github.com/grafana/faro-web-sdk
AI可观测性文档：https://grafana.com/docs/grafana-cloud/monitor-applications/ai-observability/
应用可观测性Alloy配置：https://grafana.com/docs/opentelemetry/collector/grafana-alloy/
OpenLIT：https://openlit.io/