app-observability
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGrafana Cloud Application Observability Skill
Grafana Cloud 应用可观测性技能
Overview
概述
Grafana Cloud provides three tightly related application monitoring products:
- Application Observability (APM) - RED metrics from OTel traces, service inventory, service maps
- Frontend Observability - RUM/Faro SDK for browser apps, session replay, web vitals
- AI Observability - LLM/model monitoring via OpenLIT + OTel, token/cost/latency metrics
All three integrate with Grafana Tempo (traces), Loki (logs), and Pyroscope (profiles) for full-stack correlation.
Grafana Cloud 提供三款紧密关联的应用监控产品:
- 应用可观测性(APM) - 基于OTel追踪数据生成的RED指标、服务清单、服务拓扑图
- 前端可观测性 - 面向浏览器应用的RUM/Faro SDK、会话重放、Web性能指标
- AI可观测性 - 通过OpenLIT + OTel实现的LLM/模型监控、令牌/成本/延迟指标
三款产品均可与Grafana Tempo(追踪)、Loki(日志)和Pyroscope(性能剖析)集成,实现全链路关联。
Application Observability (APM)
应用可观测性(APM)
What It Is
产品简介
Application Observability is a pre-built APM experience in Grafana Cloud built on top of OpenTelemetry. It generates RED (Rate, Error, Duration) metrics from distributed traces via span metrics, then surfaces them in:
- Service Inventory - table of all services with RED metrics at a glance
- Service Overview - per-service RED metrics, top operations, error breakdown
- Service Map - node graph of service dependencies with flow visualization
- Operations view - per-endpoint RED metrics with p50/p95/p99 latency
应用可观测性是Grafana Cloud中基于OpenTelemetry构建的预集成APM方案。它通过Span指标从分布式追踪数据中生成RED(请求率、错误率、持续时间)指标,并在以下模块中展示:
- 服务清单 - 一目了然展示所有服务RED指标的表格
- 服务概览 - 单服务的RED指标、核心操作、错误细分
- 服务拓扑图 - 可视化服务依赖关系的节点图,展示请求流向
- 操作视图 - 单端点的RED指标及p50/p95/p99延迟数据
How Metrics Are Generated
指标生成方式
Application Observability does NOT rely on traditional Prometheus scraping. Metrics come from span metrics - aggregations computed from OTel trace data:
- Source: OTel traces sent to Grafana Tempo or Grafana Alloy
- Generation method: Tempo's metrics-generator OR the connector in Alloy/OTel Collector
spanmetrics - Result: Prometheus-compatible metrics stored in Grafana Mimir
Key generated metric names:
- Via Tempo metrics-generator: ,
traces_spanmetrics_calls_totaltraces_spanmetrics_duration_seconds - Via OTel Collector spanmetrics connector: ,
traces_span_metrics_calls_totaltraces_span_metrics_duration_seconds
应用可观测性不依赖传统Prometheus抓取方式。指标来自Span指标——基于OTel追踪数据计算的聚合结果:
- 数据源:发送至Grafana Tempo或Grafana Alloy的OTel追踪数据
- 生成方式:Tempo的metrics-generator,或Alloy/OTel Collector中的连接器
spanmetrics - 存储:存储在Grafana Mimir中的Prometheus兼容指标
关键生成指标名称:
- 通过Tempo metrics-generator:,
traces_spanmetrics_calls_totaltraces_spanmetrics_duration_seconds - 通过OTel Collector spanmetrics连接器:,
traces_span_metrics_calls_totaltraces_span_metrics_duration_seconds
Required OTel Resource Attributes
必填OTel资源属性
These attributes MUST be present on all spans for Application Observability to work:
| Attribute | Grafana Label | Purpose |
|---|---|---|
| | Identifies the service |
| part of | Groups services; |
| | Env filter (prod/dev/staging) |
The label is constructed as:
job- when namespace is set
service.namespace/service.name - alone when no namespace
service.name
Additional recommended attributes:
- - shown in service overview
service.version - - for K8s environments
k8s.cluster.name - - Kubernetes namespace
k8s.namespace.name - - for multi-region setups
cloud.region
所有Span必须包含以下属性,应用可观测性才能正常工作:
| 属性 | Grafana标签 | 用途 |
|---|---|---|
| | 标识服务 |
| | 对服务进行分组; |
| | 环境筛选(生产/开发/预发布) |
job- 当设置namespace时:
service.namespace/service.name - 未设置namespace时:仅
service.name
推荐额外添加的属性:
- - 在服务概览中展示
service.version - - 适用于K8s环境
k8s.cluster.name - - Kubernetes命名空间
k8s.namespace.name - - 适用于多区域部署
cloud.region
Setting Environment Variables for OTel SDK
配置OTel SDK环境变量
bash
export OTEL_SERVICE_NAME="my-api"
export OTEL_RESOURCE_ATTRIBUTES="service.namespace=myteam,deployment.environment=production,service.version=1.2.3"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
export OTEL_EXPORTER_OTLP_PROTOCOL="grpc"bash
export OTEL_SERVICE_NAME="my-api"
export OTEL_RESOURCE_ATTRIBUTES="service.namespace=myteam,deployment.environment=production,service.version=1.2.3"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"
export OTEL_EXPORTER_OTLP_PROTOCOL="grpc"Grafana Alloy Configuration (River syntax)
Grafana Alloy配置(River语法)
Alloy acts as a local OTel Collector and forwards data to Grafana Cloud:
river
// Receive traces, metrics, logs from instrumented apps
otelcol.receiver.otlp "default" {
grpc {
endpoint = "0.0.0.0:4317"
}
http {
endpoint = "0.0.0.0:4318"
}
output {
metrics = [otelcol.processor.resourcedetection.default.input]
logs = [otelcol.processor.resourcedetection.default.input]
traces = [otelcol.processor.resourcedetection.default.input]
}
}
// Auto-detect host/cloud metadata
otelcol.processor.resourcedetection "default" {
detectors = ["env", "system", "gcp", "aws", "azure"]
output {
metrics = [otelcol.processor.batch.default.input]
logs = [otelcol.processor.batch.default.input]
traces = [otelcol.processor.batch.default.input]
}
}
// Batch for efficiency
otelcol.processor.batch "default" {
output {
metrics = [otelcol.exporter.otlphttp.grafana_cloud.input]
logs = [otelcol.exporter.otlphttp.grafana_cloud.input]
traces = [otelcol.exporter.otlphttp.grafana_cloud.input]
}
}
// Auth
otelcol.auth.basic "grafana_cloud" {
username = env("GRAFANA_CLOUD_INSTANCE_ID")
password = env("GRAFANA_CLOUD_API_KEY")
}
// Export to Grafana Cloud OTLP endpoint
otelcol.exporter.otlphttp "grafana_cloud" {
client {
endpoint = env("GRAFANA_CLOUD_OTLP_ENDPOINT")
auth = otelcol.auth.basic.grafana_cloud.handler
}
}Required environment variables for Alloy:
bash
GRAFANA_CLOUD_OTLP_ENDPOINT=https://otlp-gateway-<region>.grafana.net/otlp
GRAFANA_CLOUD_INSTANCE_ID=<your-instance-id>
GRAFANA_CLOUD_API_KEY=<your-api-key>Alloy作为本地OTel Collector,将数据转发至Grafana Cloud:
river
// 接收来自埋点应用的追踪、指标、日志
otelcol.receiver.otlp "default" {
grpc {
endpoint = "0.0.0.0:4317"
}
http {
endpoint = "0.0.0.0:4318"
}
output {
metrics = [otelcol.processor.resourcedetection.default.input]
logs = [otelcol.processor.resourcedetection.default.input]
traces = [otelcol.processor.resourcedetection.default.input]
}
}
// 自动检测主机/云元数据
otelcol.processor.resourcedetection "default" {
detectors = ["env", "system", "gcp", "aws", "azure"]
output {
metrics = [otelcol.processor.batch.default.input]
logs = [otelcol.processor.batch.default.input]
traces = [otelcol.processor.batch.default.input]
}
}
// 批量处理提升效率
otelcol.processor.batch "default" {
output {
metrics = [otelcol.exporter.otlphttp.grafana_cloud.input]
logs = [otelcol.exporter.otlphttp.grafana_cloud.input]
traces = [otelcol.exporter.otlphttp.grafana_cloud.input]
}
}
// 认证配置
otelcol.auth.basic "grafana_cloud" {
username = env("GRAFANA_CLOUD_INSTANCE_ID")
password = env("GRAFANA_CLOUD_API_KEY")
}
// 导出至Grafana Cloud OTLP端点
otelcol.exporter.otlphttp "grafana_cloud" {
client {
endpoint = env("GRAFANA_CLOUD_OTLP_ENDPOINT")
auth = otelcol.auth.basic.grafana_cloud.handler
}
}Alloy所需环境变量:
bash
GRAFANA_CLOUD_OTLP_ENDPOINT=https://otlp-gateway-<region>.grafana.net/otlp
GRAFANA_CLOUD_INSTANCE_ID=<your-instance-id>
GRAFANA_CLOUD_API_KEY=<your-api-key>Service Map
服务拓扑图
The Service Map uses Tempo's metrics-generator to produce service graph metrics:
- Node graph shows services as nodes, HTTP/gRPC calls as edges
- Edge thickness indicates request rate; color indicates error rate
- Clicking a node navigates to Service Overview
- Requires (CLIENT/SERVER) on spans for directional edges
span.kind
Enable in Tempo (managed by Grafana Cloud automatically):
- metrics generator enabled by default in Grafana Cloud Tempo
service-graphs - Uses ,
traces_service_graph_request_totalmetricstraces_service_graph_request_failed_total
服务拓扑图使用Tempo的metrics-generator生成服务图指标:
- 节点图将服务展示为节点,HTTP/gRPC调用展示为边
- 边的粗细表示请求率,颜色表示错误率
- 点击节点可跳转至服务概览
- 需要Span上包含(CLIENT/SERVER)属性以生成有向边
span.kind
在Tempo中启用(Grafana Cloud自动管理):
- Grafana Cloud Tempo默认启用指标生成器
service-graphs - 使用,
traces_service_graph_request_total指标traces_service_graph_request_failed_total
Integration with Traces, Logs, Profiles
与追踪、日志、性能剖析的集成
Application Observability provides one-click correlation:
- Traces: Click any metric spike to open exemplar traces in Grafana Tempo
- Logs: Service logs shown in Service Overview; correlated via label
service.name - Profiles: "Go to profiles" button in Service Overview when Pyroscope is configured
- Frontend: Link from Application Observability to Frontend Observability for the same service
应用可观测性提供一键关联功能:
- 追踪:点击任意指标峰值,可在Grafana Tempo中打开关联的示例追踪
- 日志:服务概览中展示服务日志,通过标签关联
service.name - 性能剖析:当配置Pyroscope后,服务概览中会显示「前往性能剖析」按钮
- 前端:从应用可观测性跳转至同一服务的前端可观测性模块
Frontend Observability (Faro)
前端可观测性(Faro)
What It Is
产品简介
Grafana Faro is an open-source JavaScript/TypeScript SDK for Real User Monitoring (RUM). It instruments browser applications to capture:
- Web vitals: Core Web Vitals (LCP, CLS, INP) and additional performance metrics
- Errors: Unhandled exceptions, rejected promises with stack traces
- Sessions: User journeys, page views, navigation timing
- Logs: Custom log messages from frontend code
- Traces: Distributed traces via OpenTelemetry-JS (correlates with backend spans)
- Session replay: Rrweb-based DOM recording for reproducing user issues
Data flows: Faro SDK -> Grafana Alloy (faro receiver) OR Grafana Cloud OTLP endpoint -> Loki (logs) + Tempo (traces) + Mimir (metrics)
Grafana Faro是一款开源的JavaScript/TypeScript SDK,用于真实用户监控(RUM)。它对浏览器应用进行埋点,捕获以下数据:
- Web性能指标:核心Web指标(LCP、CLS、INP)及额外性能指标
- 错误:未处理异常、带堆栈追踪的被拒绝Promise
- 会话:用户旅程、页面浏览、导航计时
- 日志:前端代码中的自定义日志消息
- 追踪:通过OpenTelemetry-JS实现的分布式追踪(与后端Span关联)
- 会话重放:基于Rrweb的DOM录制,用于复现用户问题
数据流:Faro SDK -> Grafana Alloy(Faro接收器)或Grafana Cloud OTLP端点 -> Loki(日志)+ Tempo(追踪)+ Mimir(指标)
Faro SDK Packages
Faro SDK包
@grafana/faro-core # Core SDK - signals, transports, API
@grafana/faro-web-sdk # Web instrumentations + transports
@grafana/faro-web-tracing # OpenTelemetry-JS distributed tracing
@grafana/faro-react # React-specific integrations (error boundary, router)@grafana/faro-core # 核心SDK - 信号、传输、API
@grafana/faro-web-sdk # Web埋点工具 + 传输模块
@grafana/faro-web-tracing # OpenTelemetry-JS分布式追踪
@grafana/faro-react # React专属集成(错误边界、路由)Basic JavaScript Setup (npm)
JavaScript基础配置(npm)
bash
npm install @grafana/faro-web-sdkbash
npm install @grafana/faro-web-sdkor
或
yarn add @grafana/faro-web-sdk
```javascript
import {
initializeFaro,
getWebInstrumentations,
} from '@grafana/faro-web-sdk';
const faro = initializeFaro({
url: 'https://faro-collector-prod-<region>.grafana.net/collect/<app-key>',
app: {
name: 'my-frontend-app',
version: '1.0.0',
environment: 'production',
},
instrumentations: [
...getWebInstrumentations({
captureConsole: true,
}),
],
});
// Manual API usage
faro.api.pushLog(['User clicked checkout button']);
faro.api.pushError(new Error('Payment failed'));
faro.api.pushEvent('button_click', { button: 'checkout' });yarn add @grafana/faro-web-sdk
```javascript
import {
initializeFaro,
getWebInstrumentations,
} from '@grafana/faro-web-sdk';
const faro = initializeFaro({
url: 'https://faro-collector-prod-<region>.grafana.net/collect/<app-key>',
app: {
name: 'my-frontend-app',
version: '1.0.0',
environment: 'production',
},
instrumentations: [
...getWebInstrumentations({
captureConsole: true,
}),
],
});
// 手动调用API
faro.api.pushLog(['User clicked checkout button']);
faro.api.pushError(new Error('Payment failed'));
faro.api.pushEvent('button_click', { button: 'checkout' });CDN Setup (no bundler)
CDN配置(无需打包工具)
html
<script src="https://unpkg.com/@grafana/faro-web-sdk@latest/dist/library/faro-web-sdk.iife.js"></script>
<script>
const { initializeFaro, getWebInstrumentations } = GrafanaFaroWebSdk;
initializeFaro({
url: 'https://faro-collector-prod-<region>.grafana.net/collect/<app-key>',
app: { name: 'my-app', version: '1.0.0' },
instrumentations: [...getWebInstrumentations()],
});
</script>html
<script src="https://unpkg.com/@grafana/faro-web-sdk@latest/dist/library/faro-web-sdk.iife.js"></script>
<script>
const { initializeFaro, getWebInstrumentations } = GrafanaFaroWebSdk;
initializeFaro({
url: 'https://faro-collector-prod-<region>.grafana.net/collect/<app-key>',
app: { name: 'my-app', version: '1.0.0' },
instrumentations: [...getWebInstrumentations()],
});
</script>React Setup with Tracing
带追踪功能的React配置
bash
npm install @grafana/faro-react @grafana/faro-web-tracingjavascript
import { initializeFaro, getWebInstrumentations } from '@grafana/faro-web-sdk';
import { TracingInstrumentation } from '@grafana/faro-web-tracing';
import {
createReactRouterV6DataOptions,
ReactIntegration,
withFaroRouterInstrumentation,
} from '@grafana/faro-react';
import { createBrowserRouter, RouterProvider } from 'react-router-dom';
const faro = initializeFaro({
url: 'https://faro-collector-prod-<region>.grafana.net/collect/<app-key>',
app: {
name: 'my-react-app',
version: '1.0.0',
environment: 'production',
},
instrumentations: [
...getWebInstrumentations({ captureConsole: true }),
new TracingInstrumentation(),
new ReactIntegration({
router: createReactRouterV6DataOptions({}),
}),
],
});
const router = withFaroRouterInstrumentation(
createBrowserRouter([
{ path: '/', element: <Home /> },
{ path: '/about', element: <About /> },
])
);
function App() {
return <RouterProvider router={router} />;
}bash
npm install @grafana/faro-react @grafana/faro-web-tracingjavascript
import { initializeFaro, getWebInstrumentations } from '@grafana/faro-web-sdk';
import { TracingInstrumentation } from '@grafana/faro-web-tracing';
import {
createReactRouterV6DataOptions,
ReactIntegration,
withFaroRouterInstrumentation,
} from '@grafana/faro-react';
import { createBrowserRouter, RouterProvider } from 'react-router-dom';
const faro = initializeFaro({
url: 'https://faro-collector-prod-<region>.grafana.net/collect/<app-key>',
app: {
name: 'my-react-app',
version: '1.0.0',
environment: 'production',
},
instrumentations: [
...getWebInstrumentations({ captureConsole: true }),
new TracingInstrumentation(),
new ReactIntegration({
router: createReactRouterV6DataOptions({}),
}),
],
});
const router = withFaroRouterInstrumentation(
createBrowserRouter([
{ path: '/', element: <Home /> },
{ path: '/about', element: <About /> },
])
);
function App() {
return <RouterProvider router={router} />;
}Session Configuration
会话配置
javascript
initializeFaro({
url: '...',
app: { name: 'my-app' },
sessionTracking: {
enabled: true,
persistent: true,
maxSessionPersistenceTime: 4 * 60 * 60 * 1000, // 4 hours in ms
samplingRate: 1, // 1 = 100%, 0.5 = 50% of sessions
onSessionChange: (oldSession, newSession) => {
console.log('Session changed', newSession.id);
},
},
instrumentations: [...getWebInstrumentations()],
});javascript
initializeFaro({
url: '...',
app: { name: 'my-app' },
sessionTracking: {
enabled: true,
persistent: true,
maxSessionPersistenceTime: 4 * 60 * 60 * 1000, // 4小时(毫秒)
samplingRate: 1, // 1 = 100%,0.5 = 50%的会话被采样
onSessionChange: (oldSession, newSession) => {
console.log('Session changed', newSession.id);
},
},
instrumentations: [...getWebInstrumentations()],
});Getting the Collector URL
获取接收器URL
- In Grafana Cloud, go to Connections (left menu) > search "Frontend Observability"
- Click the Frontend Observability card
- Navigate to Web SDK Configuration tab
- Copy the value - this is your unique collector endpoint
url - Paste into your call
initializeFaro({ url: '...' })
- 在Grafana Cloud中,进入左侧菜单的Connections > 搜索「Frontend Observability」
- 点击Frontend Observability卡片
- 切换至Web SDK Configuration标签页
- 复制值——这是你的专属接收器端点
url - 将其粘贴至调用中
initializeFaro({ url: '...' })
What Faro Captures Automatically
Faro自动捕获的内容
When using :
getWebInstrumentations()- Page views and navigation timing
- Core Web Vitals (LCP, CLS, INP - replaces FID in Faro v2)
- JavaScript errors and unhandled rejections
- Console errors/warnings (when )
captureConsole: true - Resource loading performance
- User interactions (clicks, form events)
- Fetch/XHR request timing
使用时,会自动捕获:
getWebInstrumentations()- 页面浏览和导航计时
- 核心Web指标(LCP、CLS、INP - 在Faro v2中替代FID)
- JavaScript错误和未处理拒绝
- Console错误/警告(当时)
captureConsole: true - 资源加载性能
- 用户交互(点击、表单事件)
- Fetch/XHR请求计时
Correlation with Backend Traces
与后端追踪的关联
When is included, Faro:
TracingInstrumentation- Injects /
traceparentheaders into outgoing fetch/XHR requeststracestate - Creates spans for each HTTP call
- Links browser session to backend traces in Tempo
- Enables "Frontend to Backend" trace waterfall in Grafana
当包含时,Faro会:
TracingInstrumentation- 向 outgoing fetch/XHR请求中注入/
traceparent头tracestate - 为每个HTTP调用创建Span
- 将浏览器会话与Tempo中的后端追踪关联
- 在Grafana中启用「前端到后端」追踪瀑布图
AI Observability
AI可观测性
What It Is
产品简介
AI Observability monitors generative AI and LLM applications in production. Built on OTel GenAI semantic conventions and the OpenLIT instrumentation library.
Monitors:
- LLM API calls (OpenAI, Anthropic, Cohere, Google, etc.)
- Vector databases (Pinecone, Weaviate, Chroma, etc.)
- AI frameworks (LangChain, CrewAI, LlamaIndex)
- Model Context Protocol (MCP) servers
- GPU utilization
- AI evaluation quality (hallucination, toxicity, bias)
AI可观测性用于监控生产环境中的生成式AI和LLM应用。基于OTel GenAI语义规范和OpenLIT埋点库构建。
监控范围包括:
- LLM API调用(OpenAI、Anthropic、Cohere、Google等)
- 向量数据库(Pinecone、Weaviate、Chroma等)
- AI框架(LangChain、CrewAI、LlamaIndex)
- 模型上下文协议(MCP)服务器
- GPU利用率
- AI评估质量(幻觉、毒性、偏见)
Key Metrics (OTel GenAI Semantic Conventions)
核心指标(OTel GenAI语义规范)
| Metric | Description |
|---|---|
| Total input/prompt tokens consumed |
| Total output/completion tokens consumed |
| Total cost in USD |
| Latency per LLM call (histogram) |
| Token usage histogram |
Trace spans capture:
- Model name ()
gen_ai.request.model - Temperature, top_p parameters
- Full prompts and completions (configurable)
- Provider (:
gen_ai.system,openai, etc.)anthropic - Time to first token (TTFT)
| 指标 | 描述 |
|---|---|
| 消耗的输入/提示令牌总数 |
| 消耗的输出/完成令牌总数 |
| 总成本(美元) |
| 每次LLM调用的延迟(直方图) |
| 令牌使用量直方图 |
追踪Span捕获以下信息:
- 模型名称()
gen_ai.request.model - Temperature、top_p参数
- 完整提示和完成内容(可配置)
- 提供商(:
gen_ai.system,openai等)anthropic - 首令牌生成时间(TTFT)
Python Setup with OpenLIT
使用OpenLIT的Python配置
bash
pip install openlit openai anthropic coherepython
import openlit
import openaibash
pip install openlit openai anthropic coherepython
import openlit
import openaiOne-line initialization - auto-instruments all supported LLM libraries
一键初始化——自动埋点所有支持的LLM库
openlit.init()
openlit.init()
Optional parameters
可选参数
openlit.init(
application_name="my-ai-app",
environment="production",
)
openlit.init(
application_name="my-ai-app",
environment="production",
)
Your existing code works unchanged - OpenLIT intercepts all LLM calls
现有代码无需修改——OpenLIT会拦截所有LLM调用
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)
undefinedclient = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)
undefinedOTel Environment Variables
OTel环境变量
bash
export OTEL_SERVICE_NAME="my-ai-app"
export OTEL_DEPLOYMENT_ENVIRONMENT="production"
export OTEL_EXPORTER_OTLP_ENDPOINT="https://otlp-gateway-<region>.grafana.net/otlp"bash
export OTEL_SERVICE_NAME="my-ai-app"
export OTEL_DEPLOYMENT_ENVIRONMENT="production"
export OTEL_EXPORTER_OTLP_ENDPOINT="https://otlp-gateway-<region>.grafana.net/otlp"Base64 encode "instanceID:apiToken"
将"instanceID:apiToken"进行Base64编码
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic base64-encoded-instanceid:apitoken"
To get the credentials:
1. In Grafana Cloud, go to **My Account** > **Stack** > **OpenTelemetry**
2. Generate a token and copy the OTLP endpointexport OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic base64-encoded-instanceid:apitoken"
获取凭证步骤:
1. 在Grafana Cloud中,进入**My Account** > **Stack** > **OpenTelemetry**
2. 生成令牌并复制OTLP端点AI Evaluations and Guards
AI评估与安全防护
python
undefinedpython
undefinedHallucination detection
幻觉检测
evals = openlit.evals.Hallucination(
provider="openai",
api_key=os.getenv("OPENAI_API_KEY")
)
result = evals.measure(
prompt=user_message,
contexts=["Your knowledge base content here"],
text=llm_answer
)
evals = openlit.evals.Hallucination(
provider="openai",
api_key=os.getenv("OPENAI_API_KEY")
)
result = evals.measure(
prompt=user_message,
contexts=["Your knowledge base content here"],
text=llm_answer
)
Content safety guard
内容安全防护
guard = openlit.guard.All(
provider="openai",
api_key=os.getenv("OPENAI_API_KEY")
)
guard.detect(text=user_message)
undefinedguard = openlit.guard.All(
provider="openai",
api_key=os.getenv("OPENAI_API_KEY")
)
guard.detect(text=user_message)
undefinedPrebuilt Dashboards
预构建仪表盘
Once metrics arrive, Grafana Cloud auto-populates five dashboards:
- GenAI Observability - request rates, latency percentiles, costs
- GenAI Evaluations - hallucination, bias, toxicity scores
- Vector Database Observability - query latency, index ops
- MCP Observability - tool call rates, errors
- GPU Monitoring - utilization, memory, temperature
指标到达后,Grafana Cloud会自动填充五款仪表盘:
- GenAI可观测性 - 请求率、延迟百分位数、成本
- GenAI评估 - 幻觉、偏见、毒性评分
- 向量数据库可观测性 - 查询延迟、索引操作
- MCP可观测性 - 工具调用率、错误
- GPU监控 - 利用率、内存、温度
Setup Path
配置流程
- In Grafana Cloud: Connections > search "AI Observability" > click the card
- Follow the UI wizard to get your OTLP endpoint and API key
- Set the environment variables
- and call
pip install openlitat app startupopenlit.init() - Deploy - dashboards populate automatically within minutes
- 在Grafana Cloud中:Connections > 搜索「AI Observability」 > 点击卡片
- 按照UI向导获取OTel端点和API密钥
- 设置环境变量
- 执行并在应用启动时调用
pip install openlitopenlit.init() - 部署——几分钟内仪表盘会自动填充数据
Full-Stack Correlation Summary
全链路关联总结
| Signal | Product | Storage | Query Language |
|---|---|---|---|
| Metrics (RED) | App Observability | Mimir | PromQL |
| Traces | Tempo | Tempo | TraceQL |
| Logs | Loki | Loki | LogQL |
| Profiles | Pyroscope | Pyroscope | - |
| Browser RUM | Faro/Frontend Obs | Loki + Tempo | - |
| LLM metrics | AI Observability | Mimir | PromQL |
Correlation keys:
- /
service.namelinks all signals for a serviceservice_name - Trace exemplars embed trace IDs in metric data points (RED metrics -> traces)
- in logs enables log-to-trace correlation
traceID - / time range enables trace-to-profile correlation
profileID - Faro injects headers to link browser sessions to backend traces
traceparent
| 信号 | 产品 | 存储 | 查询语言 |
|---|---|---|---|
| 指标(RED) | 应用可观测性 | Mimir | PromQL |
| 追踪 | Tempo | Tempo | TraceQL |
| 日志 | Loki | Loki | LogQL |
| 性能剖析 | Pyroscope | Pyroscope | - |
| 浏览器RUM | Faro/前端可观测性 | Loki + Tempo | - |
| LLM指标 | AI可观测性 | Mimir | PromQL |
关联键:
- /
service.name关联服务的所有信号service_name - 追踪示例将Trace ID嵌入指标数据点(RED指标 -> 追踪)
- 日志中的实现日志到追踪的关联
traceID - / 时间范围实现追踪到性能剖析的关联
profileID - Faro注入头以关联浏览器会话与后端追踪
traceparent
Common Tasks
常见任务
Find Why a Service Has High Latency
排查服务高延迟原因
- App Observability > Service Inventory > click service
- In Service Overview: check p95/p99 latency trend in Operations panel
- Click a high-latency operation > "View traces" to open exemplar traces in Tempo
- In Tempo trace: use "Go to profiles" to see CPU profile at that time
- Check correlated logs in the Logs panel of Service Overview
- 应用可观测性 > 服务清单 > 点击目标服务
- 在服务概览中:查看Operations面板的p95/p99延迟趋势
- 点击高延迟操作 > 「查看追踪」在Tempo中打开示例追踪
- 在Tempo追踪中:使用「前往性能剖析」查看对应时间的CPU剖析
- 在服务概览的Logs面板中查看关联日志
Debug a Frontend Error
调试前端错误
- Frontend Observability > Errors panel > click error
- View stack trace, browser, OS, session info
- Click "View session replay" to see what the user did
- Check correlated backend trace if is configured
TracingInstrumentation
- 前端可观测性 > Errors面板 > 点击目标错误
- 查看堆栈追踪、浏览器、操作系统、会话信息
- 点击「查看会话重放」查看用户操作流程
- 若配置了,可查看关联的后端追踪
TracingInstrumentation
Monitor LLM Cost Drift
监控LLM成本异常
- AI Observability dashboard > GenAI Observability
- Use metric to see cost by model/provider
gen_ai_usage_cost_USD_sum - Set alert on cost threshold or token usage spike
- Drill into traces to see which prompts are consuming the most tokens
- AI可观测性仪表盘 > GenAI可观测性
- 使用指标查看各模型/提供商的成本
gen_ai_usage_cost_USD_sum - 设置成本阈值或令牌使用量峰值告警
- 深入追踪查看哪些提示消耗了最多令牌
References
参考链接
- App Observability docs: https://grafana.com/docs/grafana-cloud/monitor-applications/application-observability/
- Frontend Observability docs: https://grafana.com/docs/grafana-cloud/monitor-applications/frontend-observability/
- Faro Web SDK GitHub: https://github.com/grafana/faro-web-sdk
- AI Observability docs: https://grafana.com/docs/grafana-cloud/monitor-applications/ai-observability/
- Alloy for App Observability: https://grafana.com/docs/opentelemetry/collector/grafana-alloy/
- OpenLIT: https://openlit.io/
- 应用可观测性文档:https://grafana.com/docs/grafana-cloud/monitor-applications/application-observability/
- 前端可观测性文档:https://grafana.com/docs/grafana-cloud/monitor-applications/frontend-observability/
- Faro Web SDK GitHub:https://github.com/grafana/faro-web-sdk
- AI可观测性文档:https://grafana.com/docs/grafana-cloud/monitor-applications/ai-observability/
- 应用可观测性Alloy配置:https://grafana.com/docs/opentelemetry/collector/grafana-alloy/
- OpenLIT:https://openlit.io/