observability-llm-obs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LLM and Agentic Observability

LLM与智能代理可观测性

Answer user questions about monitoring LLMs and agentic components using data ingested into Elastic only. Focus on LLM performance, cost and token utilization, response quality, and call chaining or agentic workflow orchestration. Use ES|QL, Elasticsearch APIs, and (where needed) Kibana APIs. Do not rely on Kibana UI; the skill works without it. A given deployment typically uses one or more ingestion paths (APM/OTLP traces and/or integration metrics/logs)— discover what is available before querying.

仅使用已导入Elastic的数据来解答用户关于监控LLM和智能代理组件的问题。重点关注LLM性能、成本与Token利用率、响应质量，以及调用链或智能代理工作流编排。使用ES|QL、Elasticsearch API，必要时可使用Kibana API。请勿依赖Kibana UI；本技能无需UI即可运行。特定部署通常会采用一种或多种数据导入路径（APM/OTLP追踪和/或集成指标/日志）——在查询前需先确认可用的数据源。

Where to look

数据查询位置

Trace and metrics data (APM / OTel): Trace data in Elastic is stored in traces*
when collected by the Elastic APM Agent, and in traces-generic.otel-default
(and similar) when collected by OpenTelemetry. Use the generic pattern traces*
to find all trace data regardless of source. When the application is instrumented with OpenTelemetry (e.g. Elastic Distributions of OpenTelemetry (EDOT), OpenLLMetry, OpenLIT, Langtrace exporting to OTLP), LLM and agent spans land in these trace data streams; metrics may land in metrics-apm*
or metrics-generic. Query traces*
and metrics*
data streams for per-request and aggregated LLM signals.
Integration metrics and logs: When the user collects data via Elastic LLM integrations (OpenAI, Azure OpenAI, Azure AI Foundry, Amazon Bedrock, Bedrock AgentCore, GCP Vertex AI, etc.), metrics and logs go to integration data streams (e.g.
```
metrics*
```
,
```
logs*
```
with dataset/namespace per integration). Check which data streams exist.
Discover first: Use Elasticsearch to list data streams or indices (e.g.
```
GET _data_stream
```
, or
```
GET traces*/_mapping
```
,
```
GET metrics*/_mapping
```
) and optionally sample a document to see which LLM-related fields are present. Do not assume both APM and integration data exist.
ES|QL: Use the elasticsearch-esql skill for ES|QL syntax, commands, and query patterns when building queries against
```
traces*
```
or metrics data streams.
Alerts and SLOs: Use the Observability APIs SLOs API (Stack | Serverless) and Alerting API (Stack | Serverless) to find SLOs and alerting rules that target LLM-related data (e.g. services backed by
```
traces*
```
, or integration metrics). Firing alerts or violated/degrading SLOs point to potential degraded performance.

追踪与指标数据（APM / OTel）： 当使用Elastic APM Agent采集时，Elastic中的追踪数据存储在**
```
traces*
```
索引中；当使用OpenTelemetry采集时，存储在
```
traces-generic.otel-default
```
（及类似索引）中。使用通用模式
```
traces*
```
可查找所有来源的追踪数据。当应用通过OpenTelemetry（如Elastic OpenTelemetry发行版(EDOT)、OpenLLMetry、OpenLIT、Langtrace导出至OTLP）进行埋点时，LLM和代理的Span会存入这些追踪数据流；指标可能存入
```
metrics-apm*
```
或metrics-generic索引。查询
```
traces*
```
和
```
metrics*
```
**数据流以获取单请求维度和聚合维度的LLM信号。
集成指标与日志： 当用户通过Elastic LLM集成（OpenAI、Azure OpenAI、Azure AI Foundry、Amazon Bedrock、Bedrock AgentCore、GCP Vertex AI等）采集数据时，指标和日志会存入集成数据流（例如带集成专属dataset/namespace的
```
metrics*
```
、
```
logs*
```
索引）。需确认存在哪些数据流。
先发现再查询： 使用Elasticsearch列出数据流或索引（如
```
GET _data_stream
```
，或
```
GET traces*/_mapping
```
、
```
GET metrics*/_mapping
```
），可选择性地抽取样本文档以查看包含哪些LLM相关字段。请勿默认同时存在APM和集成数据。
ES|QL： 针对
```
traces*
```
或指标数据流构建查询时，可借助elasticsearch-esql技能获取ES|QL语法、命令和查询模式。
告警与SLO： 使用可观测性API中的SLO API（Stack版 | Serverless版）和告警API（Stack版 | Serverless版），查找针对LLM相关数据（如
```
traces*
```
支撑的服务，或集成指标）的SLO和告警规则。触发的告警或已违例/降级的SLO通常指向潜在的性能退化问题。

Data available in Elastic

Elastic中的可用数据

From traces and metrics (traces, metrics-apm / metrics-generic)

来自追踪与指标（traces, metrics-apm / metrics-generic）

Spans from OTel/EDOT (and compatible SDKs) carry span attributes that may follow OpenTelemetry GenAI semantic conventions or provider-specific names. In Elasticsearch, attributes typically appear under

span.attributes

(exact key names depend on ingestion). Common attributes:

Purpose	Example attribute names (OTel GenAI)
Operation / provider	`gen_ai.operation.name` , `gen_ai.provider.name`
Model	`gen_ai.request.model` , `gen_ai.response.model`
Token usage	`gen_ai.usage.input_tokens` , `gen_ai.usage.output_tokens`
Request config	`gen_ai.request.temperature` , `gen_ai.request.max_tokens`
Errors	`error.type`
Conversation / agent	`gen_ai.conversation.id` ; tool/agent spans as child spans

Cost is not in the OTel spec; some instrumentations add custom attributes (e.g.

llm.response.cost.usd_estimate

). Discover actual field names from the index mapping or a sample document (e.g.

span.attributes.*

or flattened keys).

Use duration and event.outcome on spans for latency and success/failure. Use trace.id, span.id, and parent/child span relationships to analyze call chaining and agentic workflows (e.g. one root span, multiple LLM or tool-call child spans).

来自OTel/EDOT（及兼容SDK）的Span携带Span属性，这些属性可能遵循OpenTelemetry生成式AI语义规范或供应商自定义命名。在Elasticsearch中，属性通常位于

span.attributes

下（具体键名取决于导入方式）。常见属性如下：

用途	示例属性名（OTel生成式AI规范）
操作/供应商	`gen_ai.operation.name` , `gen_ai.provider.name`
模型	`gen_ai.request.model` , `gen_ai.response.model`
Token用量	`gen_ai.usage.input_tokens` , `gen_ai.usage.output_tokens`
请求配置	`gen_ai.request.temperature` , `gen_ai.request.max_tokens`
错误信息	`error.type`
会话/代理	`gen_ai.conversation.id` ；工具/代理Span作为子Span存在

成本数据未纳入OTel规范；部分埋点工具会添加自定义属性（如

llm.response.cost.usd_estimate

）。可通过索引映射或样本文档（如

span.attributes.*

或扁平化键）确认实际字段名。

使用Span的duration和event.outcome分析延迟和请求成败状态。利用trace.id、span.id及父子Span关系分析调用链和智能代理工作流（例如一个根Span对应多个LLM或工具调用子Span）。

From LLM integrations

来自LLM集成

Integrations (OpenAI, Azure OpenAI, Azure AI Foundry, Bedrock, Bedrock AgentCore, Vertex AI, etc.) ship metrics (and where supported logs) to Elastic. Metrics typically include token usage, request counts, latency, and—where the integration supports it—cost-related fields. Logs may include prompt/response or guardrail events. Exact field names and data streams are defined by each integration package; discover them from the integration docs or from the target data stream mapping.

集成（OpenAI、Azure OpenAI、Azure AI Foundry、Bedrock、Bedrock AgentCore、Vertex AI等）会将指标（部分支持日志）发送至Elastic。指标通常包含Token用量、请求次数、延迟，以及集成支持的成本相关字段。日志可能包含提示词/响应内容或安全护栏事件。具体字段名和数据流由各集成包定义，可从集成文档或目标数据流映射中查询。

Determine what data is available

确认可用数据

List data streams:
```
GET _data_stream
```
and filter for
```
traces*
```
,
```
metrics-apm*
```
(or
```
metrics*
```
), and
```
metrics-*
```
/
```
logs-*
```
that match known LLM integration datasets (e.g. from Elastic LLM observability).
Inspect trace indices: For
```
traces*
```
, run a small search or use mapping to see if spans contain
```
gen_ai.*
```
or
```
llm.*
```
(or similar) attributes. Confirm presence of token, model, and duration fields.
Inspect integration indices: For metrics/logs data streams, check mapping or one document to see token, cost, latency, and model dimensions.
Use one source per use case: If both APM and integration data exist, prefer one consistent source for a given question (e.g. use traces for per-request chain analysis, integration metrics for aggregate token/cost).
Check alerts and SLOs: Use the SLOs API and Alerting API to list SLOs and alerting rules that target LLM-related services or integration metrics, and to get open or recently fired alerts. Firing alerts or SLOs in degrading/violated status point to potential degraded performance.

列出数据流： 执行
```
GET _data_stream
```
并筛选
```
traces*
```
、
```
metrics-apm*
```
（或
```
metrics*
```
），以及匹配已知LLM集成数据集的
```
metrics-*
```
/
```
logs-*
```
（例如来自Elastic LLM可观测性的数据集）。
检查追踪索引： 针对
```
traces*
```
索引，执行小型搜索或查看映射，确认Span是否包含
```
gen_ai.*
```
或
```
llm.*
```
（或类似）属性，同时确认Token、模型和延迟字段是否存在。
检查集成索引： 针对指标/日志数据流，查看映射或单个文档，确认Token、成本、延迟和模型维度字段。
单场景单数据源： 若同时存在APM和集成数据，针对特定问题请选择一致的数据源（例如使用追踪数据进行单请求链分析，使用集成指标进行聚合Token/成本统计）。
检查告警与SLO： 使用SLO API和告警API列出针对LLM相关服务或集成指标的SLO和告警规则，获取当前触发或近期触发的告警。触发的告警或处于降级/违例状态的SLO指向潜在的性能退化问题。

Use cases and query patterns

用例与查询模式

LLM performance (latency, throughput, errors)

LLM性能（延迟、吞吐量、错误率）

Traces: ES|QL on
```
traces*
```
filtered by span attributes (e.g.
```
gen_ai.operation.name
```
or
```
gen_ai.provider.name
```
when present). Compute throughput (count per time bucket), latency (e.g.
```
duration.us
```
or span duration), and error rate (
```
event.outcome == "failure"
```
) by model, service, or time.
Integrations: Query integration metrics for request rate, latency, and error metrics by model/dimension as exposed by the integration.

追踪数据： 对
```
traces*
```
执行ES|QL查询，通过Span属性（如存在
```
gen_ai.operation.name
```
或
```
gen_ai.provider.name
```
时）过滤数据。按模型、服务或时间维度计算吞吐量（时间桶内请求数）、延迟（如
```
duration.us
```
或Span持续时间）和错误率（
```
event.outcome == "failure"
```
）。
集成数据： 查询集成指标，按集成暴露的模型/维度统计请求速率、延迟和错误指标。

Cost and token utilization

成本与Token利用率

Traces: Aggregate from spans in
```
traces*
```
: sum
```
gen_ai.usage.input_tokens
```
and
```
gen_ai.usage.output_tokens
```
(or equivalent attribute names) by time, model, or service. If a cost attribute exists (e.g. custom
```
llm.response.cost.*
```
), sum it for cost views.
Integrations: Use integration metrics that expose token counts and/or cost; aggregate by time and model.

追踪数据： 对
```
traces*
```
中的Span进行聚合：按时间、模型或服务维度求和
```
gen_ai.usage.input_tokens
```
和
```
gen_ai.usage.output_tokens
```
（或等效属性名）。若存在成本属性（如自定义
```
llm.response.cost.*
```
），可对其求和以生成成本视图。
集成数据： 使用集成暴露的Token计数和/或成本指标，按时间和模型维度进行聚合。

Response quality and safety

响应质量与安全性

Traces: Use
```
event.outcome
```
,
```
error.type
```
, and span attributes (e.g.
```
gen_ai.response.finish_reasons
```
) in
```
traces*
```
to identify failures, timeouts, or content filters. Correlate with prompts/responses if captured in attributes (e.g.
```
gen_ai.input.messages
```
,
```
gen_ai.output.messages
```
) and not redacted.
Integrations: Query integration logs for guardrail blocks, content filter events, or policy violations (e.g. Bedrock Guardrails) using the fields defined by that integration.

追踪数据： 使用
```
traces*
```
中的
```
event.outcome
```
、
```
error.type
```
和Span属性（如
```
gen_ai.response.finish_reasons
```
）识别失败、超时或内容过滤事件。若提示词/响应内容已捕获到属性中且未被脱敏（如
```
gen_ai.input.messages
```
、
```
gen_ai.output.messages
```
），可将其与上述事件关联分析。
集成数据： 查询集成日志中的安全拦截、内容过滤事件或策略违例（例如Bedrock Guardrails），使用该集成定义的字段进行查询。

Call chaining and agentic workflow orchestration

调用链与智能代理工作流编排

Traces only: Use trace hierarchy in
```
traces*
```
. Filter by root service or trace attributes; group by
```
trace.id
```
and use parent/child span relationships (e.g.
```
parent.id
```
,
```
span.id
```
) to reconstruct chains (e.g. orchestration span → multiple LLM or tool-call spans). Aggregate by span name or
```
gen_ai.operation.name
```
to see distribution of steps (e.g. retrieval, LLM, tool use). Duration per span and per trace gives bottleneck and end-to-end latency.

仅使用追踪数据： 利用
```
traces*
```
中的追踪层级。按根服务或追踪属性过滤数据；按
```
trace.id
```
分组，使用父子Span关系（如
```
parent.id
```
、
```
span.id
```
）重构调用链（例如编排Span → 多个LLM或工具调用Span）。按Span名称或
```
gen_ai.operation.name
```
聚合，查看步骤分布（如检索、LLM调用、工具使用）。单个Span和整个追踪的持续时间可用于定位瓶颈和端到端延迟。

Using ES|QL for LLM data

使用ES|QL分析LLM数据

Availability: ES|QL is available in Elasticsearch 8.11+ (GA in 8.14) and in Elastic Observability Serverless.
Scoping: Always restrict by time range (
```
@timestamp
```
). When present, add
```
service.name
```
and optionally
```
service.environment
```
. For LLM-specific spans, filter by span attributes once you know the field names (e.g. a keyword field for
```
gen_ai.provider.name
```
or
```
gen_ai.operation.name
```
).
Performance: Use
```
LIMIT
```
, coarse time buckets when only trends are needed, and avoid full scans over large windows.

可用性： ES|QL在Elasticsearch 8.11+版本中可用（8.14版本正式GA），同时支持Elastic Observability Serverless。
范围限定： 始终按时间范围（
```
@timestamp
```
）过滤数据。若存在
```
service.name
```
，可添加该字段，还可选择性添加
```
service.environment
```
。针对LLM专属Span，确认字段名后通过Span属性过滤（例如
```
gen_ai.provider.name
```
或
```
gen_ai.operation.name
```
的关键字段）。
性能优化： 使用
```
LIMIT
```
限制结果数，仅需趋势时使用粗粒度时间桶，避免对大时间窗口执行全量扫描。

Workflow

工作流

text

LLM observability progress:
- [ ] Step 1: Determine available data (traces*, metrics-apm* or metrics*, or integration data streams)
- [ ] Step 2: Discover LLM-related field names (mapping or sample doc)
- [ ] Step 3: Run ES|QL or Elasticsearch queries for the user's question (performance, cost, quality, orchestration)
- [ ] Step 4: Check for active alerts or SLOs defined on LLM-related data (Alerting API, SLOs API); field names from
        Step 2 help identify related rules; firing alerts or violated/degrading SLOs indicate potential degraded performance
- [ ] Step 5: Summarize findings from ingested data only; include alert/SLO status when relevant

text

LLM可观测性实施步骤：
- [ ] 步骤1：确认可用数据（traces*、metrics-apm*或metrics*，或集成数据流）
- [ ] 步骤2：发现LLM相关字段名（通过映射或样本文档）
- [ ] 步骤3：针对用户问题（性能、成本、质量、编排）执行ES|QL或Elasticsearch查询
- [ ] 步骤4：检查针对LLM相关数据的活跃告警或SLO（告警API、SLO API）；步骤2获取的字段名可帮助识别相关规则；触发的告警或已违例/降级的SLO指示潜在性能退化
- [ ] 步骤5：仅基于导入数据总结发现结果；必要时包含告警/SLO状态

Examples

示例

Example: Token usage over time from traces

示例：基于追踪数据的Token用量趋势

Assume span attributes are available as

span.attributes.gen_ai.usage.input_tokens

and

span.attributes.gen_ai.usage.output_tokens

(adjust to actual field names from mapping):

esql

FROM traces*
| WHERE @timestamp >= "2025-03-01T00:00:00Z" AND @timestamp <= "2025-03-01T23:59:59Z"
  AND span.attributes.gen_ai.provider.name IS NOT NULL
| STATS
    input_tokens = SUM(span.attributes.gen_ai.usage.input_tokens),
    output_tokens = SUM(span.attributes.gen_ai.usage.output_tokens)
  BY BUCKET(@timestamp, 1 hour), span.attributes.gen_ai.request.model
| SORT @timestamp
| LIMIT 500

假设Span属性为

span.attributes.gen_ai.usage.input_tokens

和

span.attributes.gen_ai.usage.output_tokens

（需根据映射调整为实际字段名）：

esql

FROM traces*
| WHERE @timestamp >= "2025-03-01T00:00:00Z" AND @timestamp <= "2025-03-01T23:59:59Z"
  AND span.attributes.gen_ai.provider.name IS NOT NULL
| STATS
    input_tokens = SUM(span.attributes.gen_ai.usage.input_tokens),
    output_tokens = SUM(span.attributes.gen_ai.usage.output_tokens)
  BY BUCKET(@timestamp, 1 hour), span.attributes.gen_ai.request.model
| SORT @timestamp
| LIMIT 500

Example: Latency and error rate by model

示例：按模型统计延迟与错误率

esql

FROM traces*
| WHERE @timestamp >= "2025-03-01T00:00:00Z" AND @timestamp <= "2025-03-01T23:59:59Z"
  AND span.attributes.gen_ai.request.model IS NOT NULL
| STATS
    request_count = COUNT(*),
    failures = COUNT(*) WHERE event.outcome == "failure",
    avg_duration_us = AVG(span.duration.us)
  BY span.attributes.gen_ai.request.model
| EVAL error_rate = failures / request_count
| LIMIT 100

esql

FROM traces*
| WHERE @timestamp >= "2025-03-01T00:00:00Z" AND @timestamp <= "2025-03-01T23:59:59Z"
  AND span.attributes.gen_ai.request.model IS NOT NULL
| STATS
    request_count = COUNT(*),
    failures = COUNT(*) WHERE event.outcome == "failure",
    avg_duration_us = AVG(span.duration.us)
  BY span.attributes.gen_ai.request.model
| EVAL error_rate = failures / request_count
| LIMIT 100

Example: Agentic workflow (trace-level view)

示例：智能代理工作流（追踪级视图）

Get trace IDs that contain at least one LLM span and count spans per trace to see chain length:

esql

FROM traces*
| WHERE @timestamp >= "2025-03-01T00:00:00Z" AND @timestamp <= "2025-03-01T23:59:59Z"
  AND span.attributes.gen_ai.operation.name IS NOT NULL
| STATS span_count = COUNT(*), total_duration_us = SUM(span.duration.us) BY trace.id
| WHERE span_count > 1
| SORT total_duration_us DESC
| LIMIT 50

获取包含至少一个LLM Span的Trace ID，并统计每个Trace的Span数量以查看链长：

esql

FROM traces*
| WHERE @timestamp >= "2025-03-01T00:00:00Z" AND @timestamp <= "2025-03-01T23:59:59Z"
  AND span.attributes.gen_ai.operation.name IS NOT NULL
| STATS span_count = COUNT(*), total_duration_us = SUM(span.duration.us) BY trace.id
| WHERE span_count > 1
| SORT total_duration_us DESC
| LIMIT 50

Example: Integration metrics (Amazon Bedrock AgentCore)

示例：集成指标（Amazon Bedrock AgentCore）

The Amazon Bedrock AgentCore integration ships metrics to the

metrics-aws_bedrock_agentcore.metrics-*

data stream (time series index). Use TS
for aggregations on time series data streams (Elasticsearch 9.2+); use a time range with TRANGE
(9.3+). The integration’s dashboards and alerting rule templates Example: token usage (counter), invocations (counter), and average latency (gauge) by hour and agent:

esql

TS metrics-aws_bedrock_agentcore.metrics-*
| WHERE TRANGE(7 days)
  AND aws.dimensions.Operation == "InvokeAgentRuntime"
| STATS
    total_tokens = SUM(RATE(aws.bedrock_agentcore.metrics.TokenCount.sum)),
    total_invocations = SUM(RATE(aws.bedrock_agentcore.metrics.Invocations.sum)),
    avg_latency_ms = AVG(AVG_OVER_TIME(aws.bedrock_agentcore.metrics.Latency.avg))
  BY TBUCKET(1 hour), aws.bedrock_agentcore.agent_name
| SORT TBUCKET(1 hour) DESC

For Elasticsearch 8.x or when

TS

is not available, use

FROM

with

BUCKET(@timestamp, 1 hour)

and

SUM

AVG

over the metric fields (as in the integration's alert rule templates). For other LLM integrations (OpenAI, Azure OpenAI, Vertex AI, etc.), use that integration’s data stream index pattern and field names from its package (see Elastic LLM observability).

Amazon Bedrock AgentCore集成会将指标发送至

metrics-aws_bedrock_agentcore.metrics-*

数据流（时间序列索引）。针对时间序列数据流执行聚合时，使用**

TS

命令（Elasticsearch 9.2+）；配合时间范围时使用

TRANGE

**命令（9.3+）。该集成的仪表盘和告警规则模板示例：按小时和代理统计Token用量（计数器）、调用次数（计数器）和平均延迟（仪表盘指标）：

esql

TS metrics-aws_bedrock_agentcore.metrics-*
| WHERE TRANGE(7 days)
  AND aws.dimensions.Operation == "InvokeAgentRuntime"
| STATS
    total_tokens = SUM(RATE(aws.bedrock_agentcore.metrics.TokenCount.sum)),
    total_invocations = SUM(RATE(aws.bedrock_agentcore.metrics.Invocations.sum)),
    avg_latency_ms = AVG(AVG_OVER_TIME(aws.bedrock_agentcore.metrics.Latency.avg))
  BY TBUCKET(1 hour), aws.bedrock_agentcore.agent_name
| SORT TBUCKET(1 hour) DESC

对于Elasticsearch 8.x版本或无法使用

TS

命令的场景，使用

FROM

配合

BUCKET(@timestamp, 1 hour)

，并对指标字段执行

SUM

AVG

聚合（如集成告警规则模板中的方式）。针对其他LLM集成（OpenAI、Azure OpenAI、Vertex AI等），使用对应集成的数据流索引模式和其包中定义的字段名（详见Elastic LLM可观测性）。

Guidelines

注意事项

Data only in Elastic: Use only data collected and stored in Elastic (traces in
```
traces*
```
, metrics, or integration metrics/logs). Do not describe or rely on other vendors’ UIs or products.
One technology per customer: Assume a single ingestion path per deployment when answering; discover which (traces vs integration) exists and use it consistently for the question.
Discover field names: Before writing ES|QL or Query DSL, confirm LLM-related attribute or metric names from
```
_mapping
```
or a sample document; naming may differ (e.g.
```
gen_ai.*
```
vs
```
llm.*
```
or integration-specific fields).
No Kibana UI dependency: Prefer ES|QL and Elasticsearch APIs; use Kibana APIs only when needed (e.g. SLO, alerting). Do not instruct the user to open Kibana UI.
References: LLM and agentic AI observability, Observability Labs – LLM Observability, OpenTelemetry GenAI spans. For ES|QL syntax and query patterns, use the elasticsearch-esql skill, or look through ES|QL TS command reference for Elastic v9.3 or higher and for Serverless, and look through ES|QL FROM command reference for other Elastic versions.

仅使用Elastic中的数据： 仅使用已采集并存储在Elastic中的数据（
```
traces*
```
中的追踪数据、指标或集成指标/日志）。请勿描述或依赖其他厂商的UI或产品。
单部署单技术栈： 回答问题时假设单个部署仅使用一种数据导入路径；先确认存在哪种路径（追踪或集成），并在整个问题解答过程中保持一致。
确认字段名： 在编写ES|QL或Query DSL之前，通过
```
_mapping
```
或样本文档确认LLM相关属性或指标的名称；命名可能存在差异（如
```
gen_ai.*
```
vs
```
llm.*
```
或集成专属字段）。
不依赖Kibana UI： 优先使用ES|QL和Elasticsearch API；仅在必要时使用Kibana API（如SLO、告警）。请勿指导用户打开Kibana UI。
参考文档： LLM与智能代理AI可观测性, 可观测性实验室 – LLM可观测性, OpenTelemetry生成式AI Span。关于ES|QL语法和查询模式，可使用elasticsearch-esql技能，或查阅Elastic v9.3及以上版本和Serverless版本的ES|QL TS命令参考，其他Elastic版本可查阅ES|QL FROM命令参考。",

observability-llm-obs

Original

Translation

LLM and Agentic Observability

LLM与智能代理可观测性

Where to look

数据查询位置

Data available in Elastic

Elastic中的可用数据

From traces and metrics (traces*, metrics-apm* / metrics-generic)

来自追踪与指标（traces*, metrics-apm* / metrics-generic）

From LLM integrations

来自LLM集成

Determine what data is available

确认可用数据

Use cases and query patterns

用例与查询模式

LLM performance (latency, throughput, errors)

LLM性能（延迟、吞吐量、错误率）

Cost and token utilization

成本与Token利用率

Response quality and safety

响应质量与安全性

Call chaining and agentic workflow orchestration

调用链与智能代理工作流编排

Using ES|QL for LLM data

使用ES|QL分析LLM数据

Workflow

工作流

Examples

示例

Example: Token usage over time from traces

示例：基于追踪数据的Token用量趋势

Example: Latency and error rate by model

示例：按模型统计延迟与错误率

Example: Agentic workflow (trace-level view)

示例：智能代理工作流（追踪级视图）

Example: Integration metrics (Amazon Bedrock AgentCore)

示例：集成指标（Amazon Bedrock AgentCore）

Guidelines

注意事项

From traces and metrics (traces, metrics-apm / metrics-generic)

来自追踪与指标（traces, metrics-apm / metrics-generic）