adk-deploy-guide

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ADK Deployment Guide

ADK部署指南

Scaffolded project? Use the
make
commands throughout this guide — they wrap Terraform, Docker, and deployment into a tested pipeline.
No scaffold? See Quick Deploy below, or the ADK deployment docs. For production infrastructure, scaffold with
/adk-scaffold
.
使用脚手架生成的项目? 请在本指南中全程使用
make
命令——这些命令将Terraform、Docker和部署流程封装为经过测试的流水线。
未使用脚手架? 请查看下方的快速部署,或参考ADK部署文档。 若要搭建生产环境基础设施,请使用
/adk-scaffold
生成脚手架。

Reference Files

参考文件

For deeper details, consult these reference files in
references/
:
  • cloud-run.md
    — Scaling defaults, Dockerfile, session types, networking
  • agent-engine.md
    — deploy.py CLI, AdkApp pattern, Terraform resource, deployment metadata, CI/CD differences
  • terraform-patterns.md
    — Custom infrastructure, IAM, state management, importing resources
  • event-driven.md
    — Pub/Sub, Eventarc, BigQuery Remote Function triggers via custom
    fast_api_app.py
    endpoints
Observability: See the adk-observability-guide skill for Cloud Trace, prompt-response logging, BigQuery Analytics, and third-party integrations.

如需更详细的信息,请查阅
references/
目录下的以下参考文件:
  • cloud-run.md
    — 扩缩容默认配置、Dockerfile、会话类型、网络配置
  • agent-engine.md
    — deploy.py命令行工具、AdkApp模式、Terraform资源、部署元数据、CI/CD差异
  • terraform-patterns.md
    — 自定义基础设施、IAM、状态管理、资源导入
  • event-driven.md
    — 通过自定义
    fast_api_app.py
    端点实现Pub/Sub、Eventarc、BigQuery远程函数触发
可观测性: 请查阅adk-observability-guide技能文档,了解Cloud Trace、提示-响应日志、BigQuery分析及第三方集成相关内容。

Deployment Target Decision Matrix

部署目标决策矩阵

Choose the right deployment target based on your requirements:
CriteriaAgent EngineCloud RunGKE
LanguagesPythonPythonPython (+ others via custom containers)
ScalingManaged auto-scaling (configurable min/max, concurrency)Fully configurable (min/max instances, concurrency, CPU allocation)Full Kubernetes scaling (HPA, VPA, node auto-provisioning)
NetworkingVPC-SC and PSC supportedFull VPC support, direct VPC egress, IAP, ingress rulesFull Kubernetes networking
Session stateNative
VertexAiSessionService
(persistent, managed)
In-memory (dev), Cloud SQL, or Agent Engine session backendCustom (any Kubernetes-compatible store)
Batch/event processingNot supported
/invoke
endpoint for Pub/Sub, Eventarc, BigQuery
Custom (Kubernetes Jobs, Pub/Sub)
Cost modelvCPU-hours + memory-hours (not billed when idle)Per-instance-second + min instance costsNode pool costs (always-on or auto-provisioned)
Setup complexityLower (managed, purpose-built for agents)Medium (Dockerfile, Terraform, networking)Higher (Kubernetes expertise required)
Best forManaged infrastructure, minimal opsCustom infra, event-driven workloadsFull control, open models, GPU workloads
Ask the user which deployment target fits their needs. Each is a valid production choice with different trade-offs.

根据你的需求选择合适的部署目标:
评估标准Agent EngineCloud RunGKE
支持语言PythonPythonPython(通过自定义容器支持其他语言)
扩缩容托管式自动扩缩容(可配置最小/最大实例数、并发数)完全可配置(最小/最大实例数、并发数、CPU分配)完整Kubernetes扩缩容(HPA、VPA、节点自动配置)
网络配置支持VPC-SC和PSC完整VPC支持、直接VPC出口、IAP、入口规则完整Kubernetes网络功能
会话状态原生
VertexAiSessionService
(持久化、托管式)
内存存储(开发环境)、Cloud SQL或Agent Engine会话后端自定义存储(任何兼容Kubernetes的存储系统)
批量/事件处理不支持提供
/invoke
端点用于Pub/Sub、Eventarc、BigQuery触发
自定义处理(Kubernetes Jobs、Pub/Sub)
成本模型vCPU小时数 + 内存小时数(空闲时不收费)按实例秒数计费 + 最小实例成本节点池成本(始终运行或自动配置)
搭建复杂度较低(托管式、专为Agent设计)中等(需配置Dockerfile、Terraform、网络)较高(需具备Kubernetes专业知识)
最佳适用场景托管式基础设施、最小化运维工作自定义基础设施、事件驱动型工作负载完全控制能力、开源模型、GPU工作负载
建议询问用户哪种部署目标符合他们的需求。每种都是有效的生产环境选择,各有不同的权衡。

Quick Deploy (ADK CLI)

快速部署(ADK CLI)

For projects without Agent Starter Pack scaffolding. No Makefile, Terraform, or Dockerfile required.
bash
undefined
适用于未使用Agent Starter Pack脚手架的项目。无需Makefile、Terraform或Dockerfile。
bash
undefined

Cloud Run

Cloud Run

adk deploy cloud_run --project=PROJECT --region=REGION path/to/agent/
adk deploy cloud_run --project=PROJECT --region=REGION path/to/agent/

Agent Engine

Agent Engine

adk deploy agent_engine --project=PROJECT --region=REGION path/to/agent/
adk deploy agent_engine --project=PROJECT --region=REGION path/to/agent/

GKE (requires existing cluster)

GKE(需已有集群)

adk deploy gke --project=PROJECT --cluster_name=CLUSTER --region=REGION path/to/agent/

All commands support `--with_ui` to deploy the ADK dev UI. Cloud Run also accepts extra `gcloud` flags after `--` (e.g., `-- --no-allow-unauthenticated`).

See `adk deploy --help` or the [ADK deployment docs](https://google.github.io/adk-docs/deploy/) for full flag reference.

> For CI/CD, observability, or production infrastructure, scaffold with `/adk-scaffold` and use the sections below.

---
adk deploy gke --project=PROJECT --cluster_name=CLUSTER --region=REGION path/to/agent/

所有命令都支持`--with_ui`参数以部署ADK开发UI。Cloud Run还支持在`--`后添加额外的`gcloud`标志(例如:`-- --no-allow-unauthenticated`)。

如需完整的参数参考,请查看`adk deploy --help`或[ADK部署文档](https://google.github.io/adk-docs/deploy/)。

> 若需CI/CD、可观测性或生产环境基础设施,请使用`/adk-scaffold`生成脚手架并参考以下章节。

---

Dev Environment Setup & Deploy (Scaffolded Projects)

开发环境搭建与部署(脚手架生成的项目)

Setting Up Dev Infrastructure (Optional)

搭建开发基础设施(可选)

make setup-dev-env
runs
terraform apply
in
deployment/terraform/dev/
. This provisions supporting infrastructure:
  • Service accounts (
    app_sa
    for the agent, used for runtime permissions)
  • Artifact Registry repository (for container images)
  • IAM bindings (granting the app SA necessary roles)
  • Telemetry resources (Cloud Logging bucket, BigQuery dataset)
  • Any custom resources defined in
    deployment/terraform/dev/
This step is optional
make deploy
works without it (Cloud Run creates the service on the fly via
gcloud run deploy --source .
). However, running it gives you proper service accounts, observability, and IAM setup.
bash
make setup-dev-env
make setup-dev-env
命令会在
deployment/terraform/dev/
目录下执行
terraform apply
。该命令会配置以下支持性基础设施:
  • 服务账号(
    app_sa
    用于Agent,提供运行时权限)
  • Artifact Registry仓库(用于存储容器镜像)
  • IAM绑定(为
    app_sa
    授予必要的角色)
  • 遥测资源(Cloud Logging存储桶、BigQuery数据集)
  • deployment/terraform/dev/
    中定义的任何自定义资源
此步骤为可选——即使不执行此步骤,
make deploy
也能正常工作(Cloud Run会通过
gcloud run deploy --source .
自动创建服务)。但执行此步骤可获得规范的服务账号、可观测性及IAM配置。
bash
make setup-dev-env

Deploying

部署流程

  1. Notify the human: "Eval scores meet thresholds and tests pass. Ready to deploy to dev?"
  2. Wait for explicit approval
  3. Once approved:
    make deploy
IMPORTANT: Never run
make deploy
without explicit human approval.

  1. 通知相关人员:"评估分数已达标,测试通过。是否准备部署到开发环境?"
  2. 等待明确的批准
  3. 获得批准后执行:
    make deploy
重要提示:未经明确的人工批准,切勿运行
make deploy

Production Deployment — CI/CD Pipeline

生产环境部署 — CI/CD流水线

Best for: Production applications, teams requiring staging → production promotion.
Prerequisites:
  1. Project must NOT be in a gitignored folder
  2. User must provide staging and production GCP project IDs
  3. GitHub repository name and owner
Steps:
  1. If prototype, first add Terraform/CI-CD files using the Agent Starter Pack CLI (see
    /adk-scaffold
    for full options):
    bash
    uvx agent-starter-pack enhance . --cicd-runner github_actions -y -s
  2. Ensure you're logged in to GitHub CLI:
    bash
    gh auth login  # (skip if already authenticated)
  3. Run setup-cicd:
    bash
    uvx agent-starter-pack setup-cicd \
      --staging-project YOUR_STAGING_PROJECT \
      --prod-project YOUR_PROD_PROJECT \
      --repository-name YOUR_REPO_NAME \
      --repository-owner YOUR_GITHUB_USERNAME \
      --auto-approve \
      --create-repository
  4. Push code to trigger deployments
最佳适用场景: 生产环境应用、需要从预发布环境到生产环境推广流程的团队。
前置条件:
  1. 项目不得位于被git忽略的目录中
  2. 用户需提供预发布和生产环境的GCP项目ID
  3. GitHub仓库名称和所有者信息
步骤:
  1. 如果是原型项目,请先使用Agent Starter Pack CLI添加Terraform/CI-CD文件(完整选项请查看
    /adk-scaffold
    ):
    bash
    uvx agent-starter-pack enhance . --cicd-runner github_actions -y -s
  2. 确保已登录GitHub CLI:
    bash
    gh auth login  # (已认证可跳过)
  3. 运行setup-cicd命令:
    bash
    uvx agent-starter-pack setup-cicd \
      --staging-project YOUR_STAGING_PROJECT \
      --prod-project YOUR_PROD_PROJECT \
      --repository-name YOUR_REPO_NAME \
      --repository-owner YOUR_GITHUB_USERNAME \
      --auto-approve \
      --create-repository
  4. 推送代码以触发部署

Key
setup-cicd
Flags

setup-cicd
关键参数

FlagDescription
--staging-project
GCP project ID for staging environment
--prod-project
GCP project ID for production environment
--repository-name
/
--repository-owner
GitHub repository name and owner
--auto-approve
Skip Terraform plan confirmation prompts
--create-repository
Create the GitHub repo if it doesn't exist
--cicd-project
Separate GCP project for CI/CD infrastructure. Defaults to prod project
--local-state
Store Terraform state locally instead of in GCS (see
references/terraform-patterns.md
)
Run
uvx agent-starter-pack setup-cicd --help
for the full flag reference (Cloud Build options, dev project, region, etc.).
参数描述
--staging-project
预发布环境的GCP项目ID
--prod-project
生产环境的GCP项目ID
--repository-name
/
--repository-owner
GitHub仓库名称和所有者
--auto-approve
跳过Terraform计划确认提示
--create-repository
若仓库不存在则创建GitHub仓库
--cicd-project
用于CI/CD基础设施的独立GCP项目。默认使用生产环境项目
--local-state
将Terraform状态存储在本地而非GCS中(请查看
references/terraform-patterns.md
如需完整的参数参考(包括Cloud Build选项、开发环境项目、区域等),请运行
uvx agent-starter-pack setup-cicd --help

Choosing a CI/CD Runner

选择CI/CD运行器

RunnerProsCons
github_actions (Default)No PAT needed, uses
gh auth
, WIF-based, fully automated
Requires GitHub CLI authentication
google_cloud_buildNative GCP integrationRequires interactive browser authorization (or PAT + app installation ID for programmatic mode)
运行器优势劣势
github_actions(默认)无需PAT,使用
gh auth
认证,基于WIF,完全自动化
需要GitHub CLI认证
google_cloud_build原生GCP集成需要交互式浏览器授权(或PAT + 应用安装ID以实现程序化授权)

How Authentication Works (WIF)

认证机制说明(WIF)

Both runners use Workload Identity Federation (WIF) — GitHub/Cloud Build OIDC tokens are trusted by a GCP Workload Identity Pool, which grants
cicd_runner_sa
impersonation. No long-lived service account keys needed. Terraform in
setup-cicd
creates the pool, provider, and SA bindings automatically. If auth fails, re-run
terraform apply
in the CI/CD Terraform directory.
两种运行器均使用工作负载身份联邦(WIF)——GitHub/Cloud Build的OIDC令牌会被GCP工作负载身份池信任,从而授予
cicd_runner_sa
impersonation权限。无需长期有效的服务账号密钥。
setup-cicd
中的Terraform会自动创建身份池、提供者及服务账号绑定。若认证失败,请重新运行CI/CD Terraform目录下的
terraform apply

CI/CD Pipeline Stages

CI/CD流水线阶段

The pipeline has three stages:
  1. CI (PR checks) — Triggered on pull request. Runs unit and integration tests.
  2. Staging CD — Triggered on merge to
    main
    . Builds container, deploys to staging, runs load tests.
  3. Production CD — Triggered after successful staging deploy. Requires manual approval before deploying to production.
IMPORTANT:
setup-cicd
creates infrastructure but doesn't deploy automatically. Terraform configures all required GitHub secrets and variables (WIF credentials, project IDs, service accounts). Push code to trigger the pipeline:
bash
git add . && git commit -m "Initial agent implementation"
git push origin main
To approve production deployment:
bash
undefined
流水线包含三个阶段:
  1. CI(PR检查) — 拉取请求触发。运行单元测试和集成测试。
  2. 预发布环境CD — 合并到
    main
    分支时触发。构建容器、部署到预发布环境、运行负载测试。
  3. 生产环境CD — 预发布环境部署成功后触发。部署到生产环境前需要人工批准
重要提示
setup-cicd
仅创建基础设施,不会自动触发部署。Terraform会配置所有必要的GitHub密钥和变量(WIF凭据、项目ID、服务账号)。推送代码即可触发流水线:
bash
git add . && git commit -m "Initial agent implementation"
git push origin main
批准生产环境部署的方式:
bash
undefined

GitHub Actions: Approve via repository Actions tab (environment protection rules)

GitHub Actions:通过仓库的Actions页面批准(环境保护规则)

Cloud Build: Find pending build and approve

Cloud Build:找到待处理的构建任务并批准

gcloud builds list --project=PROD_PROJECT --region=REGION --filter="status=PENDING" gcloud builds approve BUILD_ID --project=PROD_PROJECT

---
gcloud builds list --project=PROD_PROJECT --region=REGION --filter="status=PENDING" gcloud builds approve BUILD_ID --project=PROD_PROJECT

---

Cloud Run Specifics

Cloud Run专属配置

For detailed infrastructure configuration (scaling defaults, Dockerfile, FastAPI endpoints, session types, networking), see
references/cloud-run.md
. For ADK docs on Cloud Run deployment, fetch
https://google.github.io/adk-docs/deploy/cloud-run/index.md
via WebFetch.

如需详细的基础设施配置(扩缩容默认值、Dockerfile、FastAPI端点、会话类型、网络),请查看
references/cloud-run.md
。如需ADK官方的Cloud Run部署文档,请通过WebFetch获取
https://google.github.io/adk-docs/deploy/cloud-run/index.md

Agent Engine Specifics

Agent Engine专属配置

Agent Engine is a managed Vertex AI service for deploying Python ADK agents. Uses source-based deployment (no Dockerfile) via
deploy.py
and the
AdkApp
class.
No
gcloud
CLI exists for Agent Engine.
Deploy via
deploy.py
or
adk deploy agent_engine
. Query via the Python
vertexai.Client
SDK.
Deployments can take 5-10 minutes. If
make deploy
times out, check if the engine was created and manually populate
deployment_metadata.json
with the engine resource ID (see reference for details).
For detailed infrastructure configuration (deploy.py flags, AdkApp pattern, Terraform resource, deployment metadata, session/artifact services, CI/CD differences), see
references/agent-engine.md
. For ADK docs on Agent Engine deployment, fetch
https://google.github.io/adk-docs/deploy/agent-engine/index.md
via WebFetch.

Agent Engine是用于部署Python ADK Agent的托管式Vertex AI服务。通过
deploy.py
AdkApp
类实现基于源码的部署(无需Dockerfile)。
目前没有针对Agent Engine的
gcloud
CLI命令。
请通过
deploy.py
adk deploy agent_engine
进行部署。通过Python
vertexai.Client
SDK进行查询。
部署过程可能需要5-10分钟。若
make deploy
超时,请检查Engine是否已创建,并手动将Engine资源ID填入
deployment_metadata.json
(详情请参考相关文档)。
如需详细的基础设施配置(deploy.py参数、AdkApp模式、Terraform资源、部署元数据、会话/工件服务、CI/CD差异),请查看
references/agent-engine.md
。如需ADK官方的Agent Engine部署文档,请通过WebFetch获取
https://google.github.io/adk-docs/deploy/agent-engine/index.md

Service Account Architecture

服务账号架构

Scaffolded projects use two service accounts:
  • app_sa
    (per environment) — Runtime identity for the deployed agent. Roles defined in
    deployment/terraform/iam.tf
    .
  • cicd_runner_sa
    (CI/CD project) — CI/CD pipeline identity (GitHub Actions / Cloud Build). Lives in the CI/CD project (defaults to prod project), needs permissions in both staging and prod projects.
Check
deployment/terraform/iam.tf
for exact role bindings. Cross-project permissions (Cloud Run service agents, artifact registry access) are also configured there.
Common 403 errors:
  • "Permission denied on Cloud Run" →
    cicd_runner_sa
    missing deployment role in the target project
  • "Cannot act as service account" → Missing
    iam.serviceAccountUser
    binding on
    app_sa
  • "Secret access denied" →
    app_sa
    missing
    secretmanager.secretAccessor
  • "Artifact Registry read denied" → Cloud Run service agent missing read access in CI/CD project

脚手架生成的项目使用两个服务账号:
  • app_sa
    (每个环境一个)—— 已部署Agent的运行时身份。角色定义在
    deployment/terraform/iam.tf
    中。
  • cicd_runner_sa
    (CI/CD项目)—— CI/CD流水线的身份(GitHub Actions / Cloud Build)。位于CI/CD项目中(默认使用生产环境项目),需要在预发布和生产环境项目中拥有权限。
请查看
deployment/terraform/iam.tf
获取确切的角色绑定信息。跨项目权限(Cloud Run服务代理、Artifact Registry访问)也在此文件中配置。
常见403错误排查:
  • "Permission denied on Cloud Run" →
    cicd_runner_sa
    在目标项目中缺少部署角色
  • "Cannot act as service account" →
    app_sa
    缺少
    iam.serviceAccountUser
    绑定
  • "Secret access denied" →
    app_sa
    缺少
    secretmanager.secretAccessor
    角色
  • "Artifact Registry read denied" → Cloud Run服务代理在CI/CD项目中缺少读取权限

Secret Manager (for API Credentials)

Secret Manager(API凭据管理)

Instead of passing sensitive keys as environment variables, use GCP Secret Manager.
bash
undefined
请勿将敏感密钥作为环境变量传递,请使用GCP Secret Manager。
bash
undefined

Create a secret

创建密钥

echo -n "YOUR_API_KEY" | gcloud secrets create MY_SECRET_NAME --data-file=-
echo -n "YOUR_API_KEY" | gcloud secrets create MY_SECRET_NAME --data-file=-

Update an existing secret

更新现有密钥

echo -n "NEW_API_KEY" | gcloud secrets versions add MY_SECRET_NAME --data-file=-

**Grant access:** For Cloud Run, grant `secretmanager.secretAccessor` to `app_sa`. For Agent Engine, grant it to the platform-managed SA (`service-PROJECT_NUMBER@gcp-sa-aiplatform-re.iam.gserviceaccount.com`).

**Pass secrets at deploy time (Agent Engine):**
```bash
make deploy SECRETS="API_KEY=my-api-key,DB_PASS=db-password:2"
Format:
ENV_VAR=SECRET_ID
or
ENV_VAR=SECRET_ID:VERSION
(defaults to latest). Access in code via
os.environ.get("API_KEY")
.

echo -n "NEW_API_KEY" | gcloud secrets versions add MY_SECRET_NAME --data-file=-

**授予访问权限:** 对于Cloud Run,为`app_sa`授予`secretmanager.secretAccessor`角色。对于Agent Engine,为平台托管的服务账号(`service-PROJECT_NUMBER@gcp-sa-aiplatform-re.iam.gserviceaccount.com`)授予该角色。

**部署时传递密钥(Agent Engine):**
```bash
make deploy SECRETS="API_KEY=my-api-key,DB_PASS=db-password:2"
格式:
ENV_VAR=SECRET_ID
ENV_VAR=SECRET_ID:VERSION
(默认使用最新版本)。在代码中通过
os.environ.get("API_KEY")
访问。

Observability

可观测性

See the adk-observability-guide skill for observability configuration (Cloud Trace, prompt-response logging, BigQuery Analytics, third-party integrations).

请查阅adk-observability-guide技能文档,了解可观测性配置(Cloud Trace、提示-响应日志、BigQuery分析、第三方集成)。

Testing Your Deployed Agent

测试已部署的Agent

Agent Engine Deployment

Agent Engine部署测试

Option 1: Testing Notebook
bash
jupyter notebook notebooks/adk_app_testing.ipynb
Option 2: Python Script
python
import json
import vertexai

with open("deployment_metadata.json") as f:
    engine_id = json.load(f)["remote_agent_engine_id"]

client = vertexai.Client(location="us-central1")
agent = client.agent_engines.get(name=engine_id)

async for event in agent.async_stream_query(message="Hello!", user_id="test"):
    print(event)
Option 3: Playground
bash
make playground
选项1:测试笔记本
bash
jupyter notebook notebooks/adk_app_testing.ipynb
选项2:Python脚本
python
import json
import vertexai

with open("deployment_metadata.json") as f:
    engine_id = json.load(f)["remote_agent_engine_id"]

client = vertexai.Client(location="us-central1")
agent = client.agent_engines.get(name=engine_id)

async for event in agent.async_stream_query(message="Hello!", user_id="test"):
    print(event)
选项3:Playground
bash
make playground

Cloud Run Deployment

Cloud Run部署测试

Auth required by default. Cloud Run deploys with
--no-allow-unauthenticated
, so all requests need an
Authorization: Bearer
header with an identity token. Getting a 403? You're likely missing this header. To allow public access, redeploy with
--allow-unauthenticated
.
bash
undefined
默认需要认证。 Cloud Run默认以
--no-allow-unauthenticated
参数部署,因此所有请求都需要带有身份令牌的
Authorization: Bearer
头。收到403错误?你可能缺少此头信息。若要允许公共访问,请使用
--allow-unauthenticated
参数重新部署。
bash
undefined

Test health endpoint

测试健康检查端点

curl -H "Authorization: Bearer $(gcloud auth print-identity-token)"
https://SERVICE_NAME-PROJECT_NUMBER.REGION.run.app/health
curl -H "Authorization: Bearer $(gcloud auth print-identity-token)"
https://SERVICE_NAME-PROJECT_NUMBER.REGION.run.app/health

Test SSE streaming endpoint (ADK HTTP mode)

测试SSE流端点(ADK HTTP模式)

curl -X POST "https://SERVICE_NAME-PROJECT_NUMBER.REGION.run.app/run_sse"
-H "Content-Type: application/json"
-H "Authorization: Bearer $(gcloud auth print-identity-token)"
-d '{"message": "Hello!", "user_id": "test", "session_id": "test-session"}'
undefined
curl -X POST "https://SERVICE_NAME-PROJECT_NUMBER.REGION.run.app/run_sse"
-H "Content-Type: application/json"
-H "Authorization: Bearer $(gcloud auth print-identity-token)"
-d '{"message": "Hello!", "user_id": "test", "session_id": "test-session"}'
undefined

Load Tests

负载测试

bash
make load-test
See
tests/load_test/README.md
for configuration, default settings, and CI/CD integration details.

bash
make load-test
配置、默认设置及CI/CD集成详情请查看
tests/load_test/README.md

Deploying with a UI (IAP)

带UI的部署(IAP)

To expose your agent with a web UI protected by Google identity authentication:
bash
undefined
若要通过受Google身份认证保护的Web UI暴露你的Agent:
bash
undefined

Deploy with IAP (built-in framework UI)

使用IAP部署(内置框架UI)

make deploy IAP=true
make deploy IAP=true

Deploy with custom frontend on a different port

在不同端口部署自定义前端

make deploy IAP=true PORT=5173

IAP (Identity-Aware Proxy) secures the Cloud Run service — only authorized Google accounts can access it. After deploying, grant user access via the [Cloud Console IAP settings](https://cloud.google.com/run/docs/securing/identity-aware-proxy-cloud-run#manage_user_or_group_access).

For Agent Engine with a custom frontend, use a **decoupled deployment** — deploy the frontend separately to Cloud Run or Cloud Storage, connecting to the Agent Engine backend API.

---
make deploy IAP=true PORT=5173

IAP(身份感知代理)会保护Cloud Run服务——仅授权的Google账号可访问。部署完成后,请通过[Cloud Console的IAP设置](https://cloud.google.com/run/docs/securing/identity-aware-proxy-cloud-run#manage_user_or_group_access)授予用户访问权限。

对于带有自定义前端的Agent Engine,请使用**解耦部署**——将前端单独部署到Cloud Run或Cloud Storage,连接到Agent Engine后端API。

---

Rollback & Recovery

回滚与恢复

The primary rollback mechanism is git-based: fix the issue, commit, and push to
main
. The CI/CD pipeline will automatically build and deploy the new version through staging → production.
For immediate Cloud Run rollback without a new commit, use revision traffic shifting:
bash
gcloud run revisions list --service=SERVICE_NAME --region=REGION
gcloud run services update-traffic SERVICE_NAME \
  --to-revisions=REVISION_NAME=100 --region=REGION
Agent Engine doesn't support revision-based rollback — fix and redeploy via
make deploy
.

主要的回滚机制是基于git:修复问题、提交并推送到
main
分支。CI/CD流水线会自动构建并通过预发布→生产环境的流程部署新版本。
若无需新提交即可立即回滚Cloud Run部署,请使用版本流量切换:
bash
gcloud run revisions list --service=SERVICE_NAME --region=REGION
gcloud run services update-traffic SERVICE_NAME \
  --to-revisions=REVISION_NAME=100 --region=REGION
Agent Engine不支持基于版本的回滚——修复问题后通过
make deploy
重新部署。

Custom Infrastructure (Terraform)

自定义基础设施(Terraform)

For custom infrastructure patterns (Pub/Sub, BigQuery, Eventarc, Cloud SQL, IAM), consult
references/terraform-patterns.md
for:
  • Where to put custom Terraform files (dev vs CI/CD)
  • Resource examples (Pub/Sub, BigQuery, Eventarc triggers)
  • IAM bindings for custom resources
  • Terraform state management (remote vs local, importing resources)
  • Common infrastructure patterns

如需自定义基础设施模式(Pub/Sub、BigQuery、Eventarc、Cloud SQL、IAM),请查阅
references/terraform-patterns.md
获取以下信息:
  • 自定义Terraform文件的存放位置(开发环境vs CI/CD)
  • 资源示例(Pub/Sub、BigQuery、Eventarc触发器)
  • 自定义资源的IAM绑定
  • Terraform状态管理(远程vs本地、资源导入)
  • 常见基础设施模式

Troubleshooting

故障排查

IssueSolution
Terraform state locked
terraform force-unlock -force LOCK_ID
in deployment/terraform/
GitHub Actions auth failedRe-run
terraform apply
in CI/CD terraform dir; verify WIF pool/provider
Cloud Build authorization pendingUse
github_actions
runner instead
Resource already exists
terraform import
(see
references/terraform-patterns.md
)
Agent Engine deploy timeout / hangsDeployments take 5-10 min; check if engine was created (see Agent Engine Specifics)
Secret not availableVerify
secretAccessor
granted to
app_sa
(not the default compute SA)
403 on deployCheck
deployment/terraform/iam.tf
cicd_runner_sa
needs deployment + SA impersonation roles in the target project
403 when testing Cloud RunDefault is
--no-allow-unauthenticated
; include
Authorization: Bearer $(gcloud auth print-identity-token)
header
Cold starts too slowSet
min_instance_count > 0
in Cloud Run Terraform config
Cloud Run 503 errorsCheck resource limits (memory/CPU), increase
max_instance_count
, or check container crash logs
问题解决方案
Terraform状态锁定在deployment/terraform/目录下运行
terraform force-unlock -force LOCK_ID
GitHub Actions认证失败重新运行CI/CD Terraform目录下的
terraform apply
;验证WIF池/提供者配置
Cloud Build授权待处理改用
github_actions
运行器
资源已存在使用
terraform import
(请查看
references/terraform-patterns.md
Agent Engine部署超时/挂起部署过程需要5-10分钟;检查Engine是否已创建(请查看Agent Engine专属配置章节)
密钥不可用验证
secretAccessor
角色已授予
app_sa
(而非默认计算服务账号)
部署时出现403错误检查
deployment/terraform/iam.tf
——
cicd_runner_sa
在目标项目中需要部署权限和服务账号 impersonation权限
测试Cloud Run时出现403错误默认启用
--no-allow-unauthenticated
;请添加
Authorization: Bearer $(gcloud auth print-identity-token)
冷启动过慢在Cloud Run Terraform配置中设置
min_instance_count > 0
Cloud Run出现503错误检查资源限制(内存/CPU)、增加
max_instance_count
或查看容器崩溃日志