adk-observability-guide
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseADK Observability Guide
ADK可观测性指南
Scaffolded project? Cloud Trace and prompt-response logging are pre-configured by Terraform. Seefor infrastructure details, env vars, and verification commands.references/cloud-trace-and-logging.mdNo scaffold? Follow the ADK docs links below for manual setup. For production infrastructure, scaffold with./adk-scaffold
使用脚手架项目? Cloud Trace和提示词-响应日志已由Terraform预先配置。如需了解基础设施细节、环境变量和验证命令,请查看。references/cloud-trace-and-logging.md无脚手架? 请按照下方ADK文档链接进行手动配置。对于生产环境基础设施,使用生成脚手架。/adk-scaffold
Reference Files
参考文件
| File | Contents |
|---|---|
| Scaffolded project details — Terraform-provisioned resources, environment variables, verification commands, enabling/disabling locally |
| Third-party integration setup patterns, trade-offs, and ADK docs links for each provider |
| 文件 | 内容 |
|---|---|
| 脚手架项目详情——Terraform预配资源、环境变量、验证命令、本地启用/禁用方法 |
| 第三方集成配置模式、权衡要点,以及各供应商对应的ADK文档链接 |
Observability Tiers
可观测性层级
Choose the right level of observability based on your needs:
| Tier | What It Does | Scope | Default State | Best For |
|---|---|---|---|---|
| Cloud Trace | Distributed tracing — execution flow, latency, errors via OpenTelemetry spans | All templates, all environments | Always enabled | Debugging latency, understanding agent execution flow |
| Prompt-Response Logging | GenAI interactions exported to GCS, BigQuery, and Cloud Logging | ADK agents only | Disabled locally, enabled when deployed | Auditing LLM interactions, compliance |
| BigQuery Agent Analytics | Structured agent events (LLM calls, tool use, outcomes) to BigQuery | ADK agents with plugin enabled | Opt-in ( | Conversational analytics, custom dashboards, LLM-as-judge evals |
| Third-Party Integrations | External observability platforms (AgentOps, Phoenix, MLflow, etc.) | Any ADK agent | Opt-in, per-provider setup | Team collaboration, specialized visualization, prompt management |
Ask the user which tier(s) they need — they can be combined. Cloud Trace is always on; the others are additive.
根据需求选择合适的可观测性层级:
| 层级 | 功能 | 适用范围 | 默认状态 | 最佳适用场景 |
|---|---|---|---|---|
| Cloud Trace | 分布式追踪——通过OpenTelemetry Span追踪执行流程、延迟和错误 | 所有模板、所有环境 | 始终启用 | 调试延迟问题、理解Agent执行流程 |
| 提示词-响应日志 | 将生成式AI交互数据导出至GCS、BigQuery和Cloud Logging | 仅ADK Agent | 本地禁用,部署后启用 | 审计LLM交互、合规需求 |
| BigQuery Agent分析 | 将结构化Agent事件(LLM调用、工具使用、执行结果)同步至BigQuery | 已启用插件的ADK Agent | 可选(脚手架创建时通过 | 会话分析、自定义仪表盘、LLM作为评判者的评估 |
| 第三方集成 | 对接外部可观测性平台(AgentOps、Phoenix、MLflow等) | 任意ADK Agent | 可选,需按供应商配置 | 团队协作、专业可视化、提示词管理 |
请询问用户需要启用哪些层级——这些层级可以组合使用。Cloud Trace始终处于启用状态;其他层级为可选附加功能。
Cloud Trace
Cloud Trace
ADK uses OpenTelemetry to emit distributed traces. Every agent invocation produces spans that track the full execution flow.
ADK使用OpenTelemetry生成分布式追踪数据。每次Agent调用都会生成Span,用于追踪完整的执行流程。
Span Hierarchy
Span层级结构
invocation
└── agent_run (one per agent in the chain)
├── call_llm (model request/response)
└── execute_tool (tool execution)invocation
└── agent_run (链式调用中的每个Agent对应一个)
├── call_llm (模型请求/响应)
└── execute_tool (工具执行)Setup by Deployment Type
按部署类型配置
| Deployment | Setup |
|---|---|
| Agent Engine | Automatic — traces are exported to Cloud Trace by default |
| Cloud Run (scaffolded) | Automatic — |
| Cloud Run (manual) | Configure OpenTelemetry exporter in your app |
| Local dev | Works with |
View traces: Cloud Console → Trace → Trace explorer
For detailed setup instructions (Agent Engine CLI/SDK, Cloud Run, custom deployments), fetch the ADK docs:
WebFetch: https://google.github.io/adk-docs/integrations/cloud-trace/index.md
| 部署方式 | 配置方法 |
|---|---|
| Agent Engine | 自动配置——追踪数据默认导出至Cloud Trace |
| Cloud Run(脚手架生成) | 自动配置——FastAPI应用中已设置 |
| Cloud Run(手动部署) | 在应用中配置OpenTelemetry导出器 |
| 本地开发 | 配合 |
查看追踪数据:Cloud Console → Trace → Trace explorer
如需详细配置说明(Agent Engine CLI/SDK、Cloud Run、自定义部署),请查阅ADK文档:
WebFetch: https://google.github.io/adk-docs/integrations/cloud-trace/index.md
Prompt-Response Logging
提示词-响应日志
Captures GenAI interactions (model name, tokens, timing) and exports to GCS (JSONL), BigQuery (external tables), and Cloud Logging (dedicated bucket).
捕获生成式AI交互数据(模型名称、Token数、耗时)并导出至GCS(JSONL格式)、BigQuery(外部表)和Cloud Logging(专用存储桶)。
Privacy Modes
隐私模式
Prompt-response logging is privacy-preserving by default — only metadata is logged. Controlled by :
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT| Value | Behavior |
|---|---|
| Logging disabled |
| Enabled, metadata only — tokens, model name, timing (default in deployed environments) |
| Enabled with full prompt/response content (not recommended for production) |
For Agent Engine: the platform requires during deployment, but the app overrides to at runtime.
trueNO_CONTENT提示词-响应日志默认保护隐私——仅记录元数据。由环境变量控制:
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT| 值 | 行为 |
|---|---|
| 禁用日志 |
| 启用,仅记录元数据——Token数、模型名称、耗时(部署环境默认值) |
| 启用,记录完整提示词/响应内容(不推荐用于生产环境) |
对于Agent Engine:平台部署时要求设置为,但应用运行时会覆盖为。
trueNO_CONTENTBehavior by Environment
不同环境下的行为
| Environment | Prompt-Response Logging | Why |
|---|---|---|
Local dev ( | Disabled | No |
| Dev (Terraform deployed) | Enabled ( | Terraform sets env vars |
| Staging / Production | Enabled ( | Terraform sets env vars |
To enable locally, set and before running .
LOGS_BUCKET_NAMEOTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=NO_CONTENTmake playgroundTo disable in a deployed environment, set in and re-apply.
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=falsedeployment/terraform/service.tfFor scaffolded project infrastructure details (Terraform resources, env vars, verification), see .
references/cloud-trace-and-logging.mdFor ADK logging docs (log levels, configuration, debugging):
WebFetch: https://google.github.io/adk-docs/observability/logging/index.md
| 环境 | 提示词-响应日志状态 | 原因 |
|---|---|---|
本地开发( | 禁用 | 未设置 |
| 开发环境(Terraform部署) | 启用( | Terraform已配置环境变量 |
| 预发布/生产环境 | 启用( | Terraform已配置环境变量 |
如需在本地启用,请在运行前设置和。
make playgroundLOGS_BUCKET_NAMEOTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=NO_CONTENT如需在部署环境中禁用,请在中设置并重新应用配置。
deployment/terraform/service.tfOTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=false如需了解脚手架项目的基础设施细节(Terraform资源、环境变量、验证方法),请查看。
references/cloud-trace-and-logging.md如需ADK日志文档(日志级别、配置、调试):
WebFetch: https://google.github.io/adk-docs/observability/logging/index.md
BigQuery Agent Analytics Plugin
BigQuery Agent分析插件
An optional plugin that logs structured agent events directly to BigQuery via the Storage Write API. Enables:
- Conversational analytics — session flows, user interaction patterns
- LLM-as-judge evals — structured data for evaluation pipelines
- Custom dashboards — Looker Studio integration
- Tool provenance tracking — LOCAL, MCP, SUB_AGENT, A2A, TRANSFER_AGENT
一款可选插件,通过Storage Write API将结构化Agent事件直接记录至BigQuery。支持以下功能:
- 会话分析——会话流程、用户交互模式
- LLM作为评判者的评估——用于评估流水线的结构化数据
- 自定义仪表盘——对接Looker Studio
- 工具来源追踪——LOCAL、MCP、SUB_AGENT、A2A、TRANSFER_AGENT
Enabling
启用方式
| Method | How |
|---|---|
| At scaffold time | |
| Post-scaffold | Add the plugin manually to |
Infrastructure (BigQuery dataset, GCS offloading) is provisioned automatically by Terraform when enabled at scaffold time.
| 方法 | 操作步骤 |
|---|---|
| 脚手架创建时 | |
| 脚手架创建后 | 手动将插件添加至 |
当在脚手架创建时启用该插件,Terraform会自动预配基础设施(BigQuery数据集、GCS转储)。
Key Features
核心特性
- Auto-schema upgrade (new fields added without migration)
- GCS offloading for multimodal content (images, audio)
- Distributed tracing via OpenTelemetry span context
- SQL-queryable event log for all agent interactions
For full schema, SQL query examples, and Looker Studio setup:
WebFetch: https://google.github.io/adk-docs/integrations/bigquery-agent-analytics/index.md
- 自动升级Schema(无需迁移即可添加新字段)
- GCS转储多模态内容(图片、音频)
- 通过OpenTelemetry Span上下文实现分布式追踪
- 所有Agent交互事件均可通过SQL查询
如需完整Schema、SQL查询示例和Looker Studio配置方法:
WebFetch: https://google.github.io/adk-docs/integrations/bigquery-agent-analytics/index.md
Third-Party Integrations
第三方集成
ADK supports six third-party observability platforms. Each uses OpenTelemetry or custom instrumentation to capture agent behavior.
| Platform | Key Differentiator | Setup Complexity | Self-Hosted Option |
|---|---|---|---|
| AgentOps | Session replays, 2-line setup, replaces native telemetry | Minimal | No (SaaS) |
| Phoenix | Open-source, custom evaluators, experiment testing | Low | Yes |
| MLflow | OTel traces to MLflow Tracking Server, span tree visualization | Medium (needs SQL backend) | Yes |
| Monocle | 1-call setup, VS Code Gantt chart visualizer | Minimal | Yes (local files) |
| Weave | W&B platform, team collaboration, timeline views | Low | No (SaaS) |
| Freeplay | Prompt management + evals + observability in one platform | Low | No (SaaS) |
Ask the user which platform they prefer — present the trade-offs and let them choose. For setup details on each, see .
references/third-party.mdADK支持6款第三方可观测性平台。各平台通过OpenTelemetry或自定义埋点捕获Agent行为。
| 平台 | 核心优势 | 配置复杂度 | 自托管选项 |
|---|---|---|---|
| AgentOps | 会话重放、2行代码完成配置、替代原生遥测 | 极低 | 无(SaaS) |
| Phoenix | 开源、自定义评估器、实验测试 | 低 | 是 |
| MLflow | 将OTel追踪数据同步至MLflow Tracking Server、Span树可视化 | 中等(需SQL后端) | 是 |
| Monocle | 1调用完成配置、VS Code甘特图可视化工具 | 极低 | 是(本地文件) |
| Weave | 对接W&B平台、团队协作、时间线视图 | 低 | 无(SaaS) |
| Freeplay | 提示词管理+评估+可观测性一体化平台 | 低 | 无(SaaS) |
请询问用户偏好的平台——说明各平台的权衡点并让用户选择。如需各平台的配置细节,请查看。
references/third-party.mdTroubleshooting
故障排查
| Issue | Solution |
|---|---|
| No traces in Cloud Trace | Verify |
| Prompt-response data not appearing | Check |
| Privacy mode misconfigured | Check |
| BigQuery Analytics not logging | Verify plugin is configured in |
| Third-party integration not capturing spans | Check provider-specific env vars (API keys, endpoints); some providers (AgentOps) replace native telemetry |
| Traces missing tool spans | Tool execution spans appear under |
| High telemetry costs | Switch to |
| 问题 | 解决方案 |
|---|---|
| Cloud Trace中无追踪数据 | 验证FastAPI应用中 |
| 提示词-响应数据未显示 | 检查是否已设置 |
| 隐私模式配置错误 | 检查 |
| BigQuery分析无日志 | 验证 |
| 第三方集成未捕获Span | 检查供应商特定的环境变量(API密钥、端点);部分供应商(如AgentOps)会替代原生遥测 |
| 追踪数据中缺少工具Span | 工具执行Span位于 |
| 遥测成本过高 | 切换至 |
Deep Dive: ADK Docs (WebFetch URLs)
深入学习:ADK文档(WebFetch链接)
For detailed documentation beyond what this skill covers, fetch these pages:
| Topic | URL |
|---|---|
| Observability overview | |
| Agent activity logging | |
| Cloud Trace integration | |
| BigQuery Agent Analytics | |
| AgentOps | |
| Phoenix (Arize) | |
| MLflow tracing | |
| Monocle | |
| W&B Weave | |
| Freeplay | |
如需本文档未涵盖的详细说明,请查阅以下页面:
| 主题 | 链接 |
|---|---|
| 可观测性概述 | |
| Agent活动日志 | |
| Cloud Trace集成 | |
| BigQuery Agent分析 | |
| AgentOps | |
| Phoenix (Arize) | |
| MLflow追踪 | |
| Monocle | |
| W&B Weave | |
| Freeplay | |