google-agents-cli-observability
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseADK Observability Guide
ADK可观测性指南
Cloud Trace works out of the box — no infrastructure needed. Prompt-response logging and BigQuery Agent Analytics require Terraform-provisioned infrastructure (service account, GCS bucket, BigQuery dataset). Runto provision these resources. Seeagents-cli infra single-project --project PROJECT_IDfor details, env vars, and verification commands. If your project isn't scaffolded yet, seereferences/cloud-trace-and-logging.mdfirst./google-agents-cli-scaffold
Cloud Trace 开箱即用——无需额外基础设施。提示词响应日志和BigQuery Agent Analytics需要通过Terraform预置基础设施(服务账号、GCS存储桶、BigQuery数据集)。运行来预置这些资源。详情、环境变量和验证命令请查看agents-cli infra single-project --project PROJECT_ID。如果你的项目尚未搭建,请先查看references/cloud-trace-and-logging.md。/google-agents-cli-scaffold
Order of operations for agent_runtime
deployments
agent_runtimeagent_runtime
部署的操作顺序
agent_runtimeFor , run before the first . The Terraform module owns the entire Reasoning Engine resource (display_name, service account, deployment spec, env vars), so applying it after a SDK-based deploy creates a state mismatch — Terraform has no record of the SDK-deployed instance and cannot layer env vars onto it without taking ownership of the whole resource.
deployment_target = agent_runtimeagents-cli infra single-projectagents-cli deployIf you have already run , you have two options:
agents-cli deploy- Switch to Terraform-managed. Delete the SDK-deployed Reasoning Engine, then run followed by
agents-cli infra single-project. Sessions and any in-flight state on the previous instance are lost.agents-cli deploy - Keep the SDK-deployed instance. Skip and set the observability env vars on the running instance directly via the
infra single-projectclientvertexaiAPI. You will also need to grant the instance's service account the IAM permissions required to emit telemetry — writing to the logs GCS bucket, BigQuery dataset access, log writer, etc. Seeupdateanddeployment/terraform/single-project/iam.tfin your scaffolded project for the full set of bindings the Terraform module would otherwise provision. Terraform-managed env vars are not available in this mode.telemetry.tf
对于 ,请在首次执行 之前运行 。Terraform模块管理整个Reasoning Engine资源(display_name、服务账号、部署规格、环境变量),因此在基于SDK的部署之后执行Terraform会导致状态不匹配——Terraform没有SDK部署实例的记录,无法在不接管整个资源的情况下为其添加环境变量。
deployment_target = agent_runtimeagents-cli deployagents-cli infra single-project如果你已经运行过 ,有两个选项:
agents-cli deploy- 切换到Terraform管理。删除SDK部署的Reasoning Engine,然后运行 再执行
agents-cli infra single-project。之前实例上的会话和任何进行中的状态将会丢失。agents-cli deploy - 保留SDK部署的实例。跳过 ,通过
infra single-project客户端的vertexaiAPI直接在运行实例上设置可观测性环境变量。你还需要为实例的服务账号授予发送遥测数据所需的IAM权限——写入日志GCS存储桶、BigQuery数据集访问、日志写入权限等。请查看你搭建的项目中的update和deployment/terraform/single-project/iam.tf文件,了解Terraform模块原本会预置的完整权限绑定。此模式下无法使用Terraform管理的环境变量。telemetry.tf
Reference Files
参考文件
| File | Contents |
|---|---|
| Scaffolded project details — Terraform-provisioned resources, environment variables, verification commands, enabling/disabling locally |
| BQ Agent Analytics plugin — enabling, key features, GCS offloading, tool provenance |
| 文件 | 内容 |
|---|---|
| 搭建项目详情——Terraform预置资源、环境变量、验证命令、本地启用/禁用方式 |
| BQ Agent Analytics插件——启用方法、核心功能、GCS卸载、工具溯源 |
Observability Tiers
可观测性层级
Choose the right level of observability based on your needs:
| Tier | What It Does | Scope | Default State | Best For |
|---|---|---|---|---|
| Cloud Trace | Distributed tracing — execution flow, latency, errors via OpenTelemetry spans | All templates, all environments | Always enabled | Debugging latency, understanding agent execution flow |
| Prompt-Response Logging | GenAI interactions exported to GCS, BigQuery, and Cloud Logging | ADK agents only | Disabled locally, enabled when deployed | Auditing LLM interactions, compliance |
| BigQuery Agent Analytics | Structured agent events (LLM calls, tool use, outcomes) to BigQuery | ADK agents with plugin enabled | Opt-in ( | Conversational analytics, custom dashboards, LLM-as-judge evals |
| Third-Party Integrations | External observability platforms (AgentOps, Phoenix, MLflow, etc.) | Any ADK agent | Opt-in, per-provider setup | Team collaboration, specialized visualization, prompt management |
Ask the user which tier(s) they need — they can be combined. Cloud Trace is always on; the others are additive.
根据你的需求选择合适的可观测性层级:
| 层级 | 功能 | 适用范围 | 默认状态 | 最佳适用场景 |
|---|---|---|---|---|
| Cloud Trace | 分布式追踪——通过OpenTelemetry Span追踪执行流程、延迟、错误 | 所有模板、所有环境 | 始终启用 | 调试延迟问题、理解Agent执行流程 |
| 提示词响应日志 | 将生成式AI交互导出至GCS、BigQuery和Cloud Logging | 仅ADK Agent | 本地禁用,部署后启用 | 审计LLM交互、合规需求 |
| BigQuery Agent Analytics | 将结构化Agent事件(LLM调用、工具使用、执行结果)记录至BigQuery | 已启用插件的ADK Agent | 可选启用(搭建时使用 | 对话分析、自定义仪表盘、LLM-as-judge评估 |
| 第三方集成 | 外部可观测性平台(AgentOps、Phoenix、MLflow等) | 任意ADK Agent | 可选启用,需按提供商配置 | 团队协作、专业可视化、提示词管理 |
请询问用户需要哪些层级——这些层级可以组合使用。Cloud Trace始终开启;其他层级为附加选项。
Cloud Trace
Cloud Trace
ADK uses OpenTelemetry to emit distributed traces. Every agent invocation produces spans that track the full execution flow.
ADK使用OpenTelemetry发送分布式追踪数据。每次Agent调用都会生成Span,追踪完整的执行流程。
Span Hierarchy
Span层级结构
invocation
└── agent_run (one per agent in the chain)
├── call_llm (model request/response)
└── execute_tool (tool execution)invocation
└── agent_run (链中每个Agent对应一个)
├── call_llm (模型请求/响应)
└── execute_tool (工具执行)Setup by Deployment Type
按部署类型的设置方法
| Deployment | Setup |
|---|---|
| Agent Runtime | Automatic — traces are exported to Cloud Trace by default |
| Cloud Run (scaffolded) | Automatic — |
| GKE (scaffolded) | Automatic — |
| Cloud Run / GKE (manual) | Configure OpenTelemetry exporter in your app |
| Local dev | Works with |
View traces: Cloud Console → Trace → Trace explorer
For detailed setup instructions (Agent Runtime CLI/SDK, Cloud Run, custom deployments), fetch .
https://adk.dev/integrations/cloud-trace/index.md| 部署方式 | 设置方法 |
|---|---|
| Agent Runtime | 自动配置——默认将追踪数据导出至Cloud Trace |
| Cloud Run(已搭建) | 自动配置——FastAPI应用中 |
| GKE(已搭建) | 自动配置——FastAPI应用中 |
| Cloud Run / GKE(手动) | 在应用中配置OpenTelemetry导出器 |
| 本地开发 | 配合 |
查看追踪数据:Cloud Console → Trace → Trace explorer
如需详细设置说明(Agent Runtime CLI/SDK、Cloud Run、自定义部署),请获取 。
https://adk.dev/integrations/cloud-trace/index.mdPrompt-Response Logging
提示词响应日志
Captures GenAI interactions (model name, tokens, timing) and exports to GCS (JSONL) and BigQuery (via direct log sinks and external tables). Privacy-preserving by default — only metadata is logged unless explicitly configured otherwise.
Key env var: — set to (metadata only, default in deployed envs), (full content), or (disabled). Logging is disabled locally unless is set.
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENTNO_CONTENTtruefalseLOGS_BUCKET_NAMEFor scaffolded project details (Terraform resources, env vars, privacy modes, enabling/disabling, verification commands), see .
references/cloud-trace-and-logging.mdFor ADK logging docs (log levels, configuration, debugging), fetch .
https://adk.dev/observability/logging/index.md捕获生成式AI交互(模型名称、Token数、耗时)并导出至GCS(JSONL格式)和BigQuery(通过直接日志接收器和外部表)。默认隐私保护模式——除非明确配置,否则仅记录元数据。
核心环境变量:——可设置为 (仅元数据,部署环境默认值)、(完整内容)或 (禁用)。本地环境下除非设置了 ,否则日志功能处于禁用状态。
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENTNO_CONTENTtruefalseLOGS_BUCKET_NAME搭建项目详情(Terraform资源、环境变量、隐私模式、启用/禁用方式、验证命令)请查看 。
references/cloud-trace-and-logging.mdADK日志文档(日志级别、配置、调试)请获取 。
https://adk.dev/observability/logging/index.mdBigQuery Agent Analytics Plugin
BigQuery Agent Analytics插件
Optional plugin that logs structured agent events to BigQuery. Enable with at scaffold time. See for details.
--bq-analyticsreferences/bigquery-agent-analytics.md可选插件,将结构化Agent事件记录至BigQuery。搭建时使用 参数启用。详情请查看 。
--bq-analyticsreferences/bigquery-agent-analytics.mdThird-Party Integrations
第三方集成
ADK supports several third-party observability platforms. Each uses OpenTelemetry or custom instrumentation to capture agent behavior.
| Platform | Key Differentiator | Setup Complexity | Self-Hosted Option |
|---|---|---|---|
| AgentOps | Session replays, 2-line setup, replaces native telemetry | Minimal | No (SaaS) |
| Arize AX | Commercial platform, production monitoring, evaluation dashboards | Low | No (SaaS) |
| Phoenix | Open-source, custom evaluators, experiment testing | Low | Yes |
| MLflow | OTel traces to MLflow Tracking Server, span tree visualization | Medium (needs SQL backend) | Yes |
| Monocle | 1-call setup, VS Code Gantt chart visualizer | Minimal | Yes (local files) |
| Weave | W&B platform, team collaboration, timeline views | Low | No (SaaS) |
| Freeplay | Prompt management + evals + observability in one platform | Low | No (SaaS) |
Ask the user which platform they prefer — present the trade-offs and let them choose. For setup details, fetch the relevant ADK docs page from the Deep Dive table below.
ADK支持多个第三方可观测性平台。每个平台通过OpenTelemetry或自定义工具捕获Agent行为。
| 平台 | 核心优势 | 设置复杂度 | 自托管选项 |
|---|---|---|---|
| AgentOps | 会话重放、2行代码即可设置、替代原生遥测 | 极低 | 无(SaaS) |
| Arize AX | 商用平台、生产环境监控、评估仪表盘 | 低 | 无(SaaS) |
| Phoenix | 开源、自定义评估器、实验测试 | 低 | 是 |
| MLflow | 将OTel追踪数据发送至MLflow Tracking Server、Span树可视化 | 中等(需要SQL后端) | 是 |
| Monocle | 一键设置、VS Code甘特图可视化工具 | 极低 | 是(本地文件) |
| Weave | W&B平台、团队协作、时间线视图 | 低 | 无(SaaS) |
| Freeplay | 提示词管理+评估+可观测性一体化平台 | 低 | 无(SaaS) |
请询问用户偏好哪个平台——说明各平台的权衡,让用户选择。设置详情请从下方深度探索表格中获取相关ADK文档页面。
Troubleshooting
故障排查
| Issue | Solution |
|---|---|
| No traces in Cloud Trace | Verify |
| Prompt-response data not appearing | Check |
| Privacy mode misconfigured | Check |
| BigQuery Analytics not logging | Verify plugin is configured in |
| Third-party integration not capturing spans | Check provider-specific env vars (API keys, endpoints); some providers (AgentOps) replace native telemetry |
| Traces missing tool spans | Tool execution spans appear under |
| High telemetry costs | Switch to |
| 问题 | 解决方案 |
|---|---|
| Cloud Trace中无追踪数据 | 验证FastAPI应用中 |
| 提示词响应数据未显示 | 检查是否设置了 |
| 隐私模式配置错误 | 检查 |
| BigQuery Analytics未记录数据 | 验证插件是否在 |
| 第三方集成未捕获Span | 检查提供商特定的环境变量(API密钥、端点);部分提供商(如AgentOps)会替代原生遥测 |
| 追踪数据中缺少工具Span | 工具执行Span位于 |
| 遥测成本过高 | 切换至 |
Deep Dive: ADK Docs (WebFetch URLs)
深度探索:ADK文档(WebFetch链接)
For detailed documentation beyond what this skill covers, fetch these pages:
| Topic | URL |
|---|---|
| Observability overview | |
| Agent activity logging | |
| Cloud Trace integration | |
| BigQuery Agent Analytics | |
| AgentOps | |
| Arize AX | |
| Phoenix (Arize) | |
| MLflow tracing | |
| Monocle | |
| W&B Weave | |
| Freeplay | |
如需本技能未涵盖的详细文档,请获取以下页面:
| 主题 | 链接 |
|---|---|
| 可观测性概述 | |
| Agent活动日志 | |
| Cloud Trace集成 | |
| BigQuery Agent Analytics | |
| AgentOps | |
| Arize AX | |
| Phoenix (Arize) | |
| MLflow追踪 | |
| Monocle | |
| W&B Weave | |
| Freeplay | |
Related Skills
相关技能
- — Deployment targets, CI/CD pipelines, and production workflows
/google-agents-cli-deploy - — Development workflow, coding guidelines, and operational rules
/google-agents-cli-workflow - — ADK Python API quick reference for writing agent code
/google-agents-cli-adk-code
- — 部署目标、CI/CD流水线和生产工作流
/google-agents-cli-deploy - — 开发工作流、编码指南和操作规则
/google-agents-cli-workflow - — 用于编写Agent代码的ADK Python API快速参考
/google-agents-cli-adk-code