google-agents-cli-observability

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ADK Observability Guide

ADK可观测性指南

Cloud Trace works out of the box — no infrastructure needed. Prompt-response logging and BigQuery Agent Analytics require Terraform-provisioned infrastructure (service account, GCS bucket, BigQuery dataset). Run
agents-cli infra single-project --project PROJECT_ID
to provision these resources. See
references/cloud-trace-and-logging.md
for details, env vars, and verification commands. If your project isn't scaffolded yet, see
/google-agents-cli-scaffold
first.
Cloud Trace 开箱即用——无需额外基础设施。提示词响应日志BigQuery Agent Analytics需要通过Terraform预置基础设施(服务账号、GCS存储桶、BigQuery数据集)。运行
agents-cli infra single-project --project PROJECT_ID
来预置这些资源。详情、环境变量和验证命令请查看
references/cloud-trace-and-logging.md
。如果你的项目尚未搭建,请先查看
/google-agents-cli-scaffold

Order of operations for
agent_runtime
deployments

agent_runtime
部署的操作顺序

For
deployment_target = agent_runtime
, run
agents-cli infra single-project
before the first
agents-cli deploy
. The Terraform module owns the entire Reasoning Engine resource (display_name, service account, deployment spec, env vars), so applying it after a SDK-based deploy creates a state mismatch — Terraform has no record of the SDK-deployed instance and cannot layer env vars onto it without taking ownership of the whole resource.
If you have already run
agents-cli deploy
, you have two options:
  1. Switch to Terraform-managed. Delete the SDK-deployed Reasoning Engine, then run
    agents-cli infra single-project
    followed by
    agents-cli deploy
    . Sessions and any in-flight state on the previous instance are lost.
  2. Keep the SDK-deployed instance. Skip
    infra single-project
    and set the observability env vars on the running instance directly via the
    vertexai
    client
    update
    API. You will also need to grant the instance's service account the IAM permissions required to emit telemetry — writing to the logs GCS bucket, BigQuery dataset access, log writer, etc. See
    deployment/terraform/single-project/iam.tf
    and
    telemetry.tf
    in your scaffolded project for the full set of bindings the Terraform module would otherwise provision. Terraform-managed env vars are not available in this mode.
对于
deployment_target = agent_runtime
,请在首次执行
agents-cli deploy
之前运行
agents-cli infra single-project
。Terraform模块管理整个Reasoning Engine资源(display_name、服务账号、部署规格、环境变量),因此在基于SDK的部署之后执行Terraform会导致状态不匹配——Terraform没有SDK部署实例的记录,无法在不接管整个资源的情况下为其添加环境变量。
如果你已经运行过
agents-cli deploy
,有两个选项:
  1. 切换到Terraform管理。删除SDK部署的Reasoning Engine,然后运行
    agents-cli infra single-project
    再执行
    agents-cli deploy
    。之前实例上的会话和任何进行中的状态将会丢失。
  2. 保留SDK部署的实例。跳过
    infra single-project
    ,通过
    vertexai
    客户端的
    update
    API直接在运行实例上设置可观测性环境变量。你还需要为实例的服务账号授予发送遥测数据所需的IAM权限——写入日志GCS存储桶、BigQuery数据集访问、日志写入权限等。请查看你搭建的项目中的
    deployment/terraform/single-project/iam.tf
    telemetry.tf
    文件,了解Terraform模块原本会预置的完整权限绑定。此模式下无法使用Terraform管理的环境变量。

Reference Files

参考文件

FileContents
references/cloud-trace-and-logging.md
Scaffolded project details — Terraform-provisioned resources, environment variables, verification commands, enabling/disabling locally
references/bigquery-agent-analytics.md
BQ Agent Analytics plugin — enabling, key features, GCS offloading, tool provenance

文件内容
references/cloud-trace-and-logging.md
搭建项目详情——Terraform预置资源、环境变量、验证命令、本地启用/禁用方式
references/bigquery-agent-analytics.md
BQ Agent Analytics插件——启用方法、核心功能、GCS卸载、工具溯源

Observability Tiers

可观测性层级

Choose the right level of observability based on your needs:
TierWhat It DoesScopeDefault StateBest For
Cloud TraceDistributed tracing — execution flow, latency, errors via OpenTelemetry spansAll templates, all environmentsAlways enabledDebugging latency, understanding agent execution flow
Prompt-Response LoggingGenAI interactions exported to GCS, BigQuery, and Cloud LoggingADK agents onlyDisabled locally, enabled when deployedAuditing LLM interactions, compliance
BigQuery Agent AnalyticsStructured agent events (LLM calls, tool use, outcomes) to BigQueryADK agents with plugin enabledOpt-in (
--bq-analytics
at scaffold time)
Conversational analytics, custom dashboards, LLM-as-judge evals
Third-Party IntegrationsExternal observability platforms (AgentOps, Phoenix, MLflow, etc.)Any ADK agentOpt-in, per-provider setupTeam collaboration, specialized visualization, prompt management
Ask the user which tier(s) they need — they can be combined. Cloud Trace is always on; the others are additive.

根据你的需求选择合适的可观测性层级:
层级功能适用范围默认状态最佳适用场景
Cloud Trace分布式追踪——通过OpenTelemetry Span追踪执行流程、延迟、错误所有模板、所有环境始终启用调试延迟问题、理解Agent执行流程
提示词响应日志将生成式AI交互导出至GCS、BigQuery和Cloud Logging仅ADK Agent本地禁用,部署后启用审计LLM交互、合规需求
BigQuery Agent Analytics将结构化Agent事件(LLM调用、工具使用、执行结果)记录至BigQuery已启用插件的ADK Agent可选启用(搭建时使用
--bq-analytics
参数)
对话分析、自定义仪表盘、LLM-as-judge评估
第三方集成外部可观测性平台(AgentOps、Phoenix、MLflow等)任意ADK Agent可选启用,需按提供商配置团队协作、专业可视化、提示词管理
请询问用户需要哪些层级——这些层级可以组合使用。Cloud Trace始终开启;其他层级为附加选项。

Cloud Trace

Cloud Trace

ADK uses OpenTelemetry to emit distributed traces. Every agent invocation produces spans that track the full execution flow.
ADK使用OpenTelemetry发送分布式追踪数据。每次Agent调用都会生成Span,追踪完整的执行流程。

Span Hierarchy

Span层级结构

invocation
  └── agent_run (one per agent in the chain)
        ├── call_llm (model request/response)
        └── execute_tool (tool execution)
invocation
  └── agent_run (链中每个Agent对应一个)
        ├── call_llm (模型请求/响应)
        └── execute_tool (工具执行)

Setup by Deployment Type

按部署类型的设置方法

DeploymentSetup
Agent RuntimeAutomatic — traces are exported to Cloud Trace by default
Cloud Run (scaffolded)Automatic —
otel_to_cloud=True
in the FastAPI app
GKE (scaffolded)Automatic —
otel_to_cloud=True
in the FastAPI app
Cloud Run / GKE (manual)Configure OpenTelemetry exporter in your app
Local devWorks with
agents-cli playground
; traces visible in Cloud Console
View traces: Cloud Console → Trace → Trace explorer
For detailed setup instructions (Agent Runtime CLI/SDK, Cloud Run, custom deployments), fetch
https://adk.dev/integrations/cloud-trace/index.md
.

部署方式设置方法
Agent Runtime自动配置——默认将追踪数据导出至Cloud Trace
Cloud Run(已搭建)自动配置——FastAPI应用中
otel_to_cloud=True
GKE(已搭建)自动配置——FastAPI应用中
otel_to_cloud=True
Cloud Run / GKE(手动)在应用中配置OpenTelemetry导出器
本地开发配合
agents-cli playground
使用;可在Cloud Console中查看追踪数据
查看追踪数据:Cloud Console → Trace → Trace explorer
如需详细设置说明(Agent Runtime CLI/SDK、Cloud Run、自定义部署),请获取
https://adk.dev/integrations/cloud-trace/index.md

Prompt-Response Logging

提示词响应日志

Captures GenAI interactions (model name, tokens, timing) and exports to GCS (JSONL) and BigQuery (via direct log sinks and external tables). Privacy-preserving by default — only metadata is logged unless explicitly configured otherwise.
Key env var:
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
— set to
NO_CONTENT
(metadata only, default in deployed envs),
true
(full content), or
false
(disabled). Logging is disabled locally unless
LOGS_BUCKET_NAME
is set.
For scaffolded project details (Terraform resources, env vars, privacy modes, enabling/disabling, verification commands), see
references/cloud-trace-and-logging.md
.
For ADK logging docs (log levels, configuration, debugging), fetch
https://adk.dev/observability/logging/index.md
.

捕获生成式AI交互(模型名称、Token数、耗时)并导出至GCS(JSONL格式)和BigQuery(通过直接日志接收器和外部表)。默认隐私保护模式——除非明确配置,否则仅记录元数据。
核心环境变量:
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
——可设置为
NO_CONTENT
(仅元数据,部署环境默认值)、
true
(完整内容)或
false
(禁用)。本地环境下除非设置了
LOGS_BUCKET_NAME
,否则日志功能处于禁用状态。
搭建项目详情(Terraform资源、环境变量、隐私模式、启用/禁用方式、验证命令)请查看
references/cloud-trace-and-logging.md
ADK日志文档(日志级别、配置、调试)请获取
https://adk.dev/observability/logging/index.md

BigQuery Agent Analytics Plugin

BigQuery Agent Analytics插件

Optional plugin that logs structured agent events to BigQuery. Enable with
--bq-analytics
at scaffold time. See
references/bigquery-agent-analytics.md
for details.

可选插件,将结构化Agent事件记录至BigQuery。搭建时使用
--bq-analytics
参数启用。详情请查看
references/bigquery-agent-analytics.md

Third-Party Integrations

第三方集成

ADK supports several third-party observability platforms. Each uses OpenTelemetry or custom instrumentation to capture agent behavior.
PlatformKey DifferentiatorSetup ComplexitySelf-Hosted Option
AgentOpsSession replays, 2-line setup, replaces native telemetryMinimalNo (SaaS)
Arize AXCommercial platform, production monitoring, evaluation dashboardsLowNo (SaaS)
PhoenixOpen-source, custom evaluators, experiment testingLowYes
MLflowOTel traces to MLflow Tracking Server, span tree visualizationMedium (needs SQL backend)Yes
Monocle1-call setup, VS Code Gantt chart visualizerMinimalYes (local files)
WeaveW&B platform, team collaboration, timeline viewsLowNo (SaaS)
FreeplayPrompt management + evals + observability in one platformLowNo (SaaS)
Ask the user which platform they prefer — present the trade-offs and let them choose. For setup details, fetch the relevant ADK docs page from the Deep Dive table below.

ADK支持多个第三方可观测性平台。每个平台通过OpenTelemetry或自定义工具捕获Agent行为。
平台核心优势设置复杂度自托管选项
AgentOps会话重放、2行代码即可设置、替代原生遥测极低无(SaaS)
Arize AX商用平台、生产环境监控、评估仪表盘无(SaaS)
Phoenix开源、自定义评估器、实验测试
MLflow将OTel追踪数据发送至MLflow Tracking Server、Span树可视化中等(需要SQL后端)
Monocle一键设置、VS Code甘特图可视化工具极低是(本地文件)
WeaveW&B平台、团队协作、时间线视图无(SaaS)
Freeplay提示词管理+评估+可观测性一体化平台无(SaaS)
请询问用户偏好哪个平台——说明各平台的权衡,让用户选择。设置详情请从下方深度探索表格中获取相关ADK文档页面。

Troubleshooting

故障排查

IssueSolution
No traces in Cloud TraceVerify
otel_to_cloud=True
in FastAPI app; check service account has
cloudtrace.agent
role
Prompt-response data not appearingCheck
LOGS_BUCKET_NAME
is set; verify SA has
storage.objectCreator
on the bucket; check app logs for telemetry setup warnings
Privacy mode misconfiguredCheck
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
value — use
NO_CONTENT
for metadata-only,
false
to disable
BigQuery Analytics not loggingVerify plugin is configured in
app/agent.py
; check
BQ_ANALYTICS_DATASET_ID
env var is set
Third-party integration not capturing spansCheck provider-specific env vars (API keys, endpoints); some providers (AgentOps) replace native telemetry
Traces missing tool spansTool execution spans appear under
execute_tool
— check trace explorer filters
High telemetry costsSwitch to
NO_CONTENT
mode; reduce BigQuery retention; disable unused tiers

问题解决方案
Cloud Trace中无追踪数据验证FastAPI应用中
otel_to_cloud=True
;检查服务账号是否拥有
cloudtrace.agent
角色
提示词响应数据未显示检查是否设置了
LOGS_BUCKET_NAME
;验证服务账号是否拥有存储桶的
storage.objectCreator
权限;查看应用日志中的遥测设置警告
隐私模式配置错误检查
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
的值——使用
NO_CONTENT
仅记录元数据,
false
则禁用日志
BigQuery Analytics未记录数据验证插件是否在
app/agent.py
中配置;检查是否设置了
BQ_ANALYTICS_DATASET_ID
环境变量
第三方集成未捕获Span检查提供商特定的环境变量(API密钥、端点);部分提供商(如AgentOps)会替代原生遥测
追踪数据中缺少工具Span工具执行Span位于
execute_tool
下——检查追踪探索器的筛选条件
遥测成本过高切换至
NO_CONTENT
模式;缩短BigQuery数据保留周期;禁用未使用的层级

Deep Dive: ADK Docs (WebFetch URLs)

深度探索:ADK文档(WebFetch链接)

For detailed documentation beyond what this skill covers, fetch these pages:
TopicURL
Observability overview
https://adk.dev/observability/index.md
Agent activity logging
https://adk.dev/observability/logging/index.md
Cloud Trace integration
https://adk.dev/integrations/cloud-trace/index.md
BigQuery Agent Analytics
https://adk.dev/integrations/bigquery-agent-analytics/index.md
AgentOps
https://adk.dev/integrations/agentops/index.md
Arize AX
https://adk.dev/integrations/arize-ax/index.md
Phoenix (Arize)
https://adk.dev/integrations/phoenix/index.md
MLflow tracing
https://adk.dev/integrations/mlflow-tracing/index.md
Monocle
https://adk.dev/integrations/monocle/index.md
W&B Weave
https://adk.dev/integrations/weave/index.md
Freeplay
https://adk.dev/integrations/freeplay/index.md

如需本技能未涵盖的详细文档,请获取以下页面:
主题链接
可观测性概述
https://adk.dev/observability/index.md
Agent活动日志
https://adk.dev/observability/logging/index.md
Cloud Trace集成
https://adk.dev/integrations/cloud-trace/index.md
BigQuery Agent Analytics
https://adk.dev/integrations/bigquery-agent-analytics/index.md
AgentOps
https://adk.dev/integrations/agentops/index.md
Arize AX
https://adk.dev/integrations/arize-ax/index.md
Phoenix (Arize)
https://adk.dev/integrations/phoenix/index.md
MLflow追踪
https://adk.dev/integrations/mlflow-tracing/index.md
Monocle
https://adk.dev/integrations/monocle/index.md
W&B Weave
https://adk.dev/integrations/weave/index.md
Freeplay
https://adk.dev/integrations/freeplay/index.md

Related Skills

相关技能

  • /google-agents-cli-deploy
    — Deployment targets, CI/CD pipelines, and production workflows
  • /google-agents-cli-workflow
    — Development workflow, coding guidelines, and operational rules
  • /google-agents-cli-adk-code
    — ADK Python API quick reference for writing agent code
  • /google-agents-cli-deploy
    — 部署目标、CI/CD流水线和生产工作流
  • /google-agents-cli-workflow
    — 开发工作流、编码指南和操作规则
  • /google-agents-cli-adk-code
    — 用于编写Agent代码的ADK Python API快速参考