agent-platform-deploy

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agent Platform Model Garden Deploy Skill

Agent Platform Model Garden 模型部署技能

This skill provides instructions for deploying Open Models from Agent Platform Model Garden to endpoints, and subsequently undeploying them to clean up resources.
本技能提供了将Agent Platform Model Garden中的开源模型部署到端点,以及后续取消部署以清理资源的操作说明。

1P Tuned Model Copy & Deployment

1P调优模型的复制与部署

If you need to copy a 1P (First-Party) Tuned Model from a source project to a destination region or project and deploy it to a newly created endpoint, refer to the 1P Tuned Model Copy & Deployment Guide.
如果您需要将1P(第一方)调优模型从源项目复制到目标区域或项目,并部署到新创建的端点,请参考《1P调优模型复制与部署指南》

Safety & Confirmation Tiers (CRITICAL)

安全与确认层级(CRITICAL)

Before executing any commands on behalf of the user, you MUST adhere to the following safety tiers based on the action requested:
  1. Tier R: Read-only (
    list
    ,
    describe
    ,
    list-deployment-config
    )
    • Rule: No confirmation needed. You may execute these commands immediately to gather information for the user.
  2. Tier M: Mutating & Reversible (
    deploy
    ,
    undeploy-model
    )
    • Rule: This requires explicit user confirmation. You MUST present a clear confirmation prompt to the user explaining the proposed command. You MUST wait for their explicit confirmation before executing. For
      undeploy-model
      , you MUST first verify that the endpoint and deployed model exist; if
      describe
      or
      list
      returns a 404 or empty result, you MUST halt and inform the user rather than attempting undeployment.
  3. Tier D: Destructive & Irreversible (
    delete
    )
    • Rule: This requires explicit typed confirmation. You MUST output a text message explaining the irreversible nature of endpoint or model deletion and asking the user to type "I confirm" or "Yes, delete it" before executing the deletion command.
在代表用户执行任何命令之前,您必须根据请求的操作遵循以下安全层级:
  1. R级:只读操作(
    list
    describe
    list-deployment-config
    • 规则:无需确认。您可以立即执行这些命令,为用户收集信息。
  2. M级:可变更且可撤销操作(
    deploy
    undeploy-model
    • 规则:需要用户明确确认。您必须向用户展示清晰的确认提示,说明拟执行的命令。在执行前必须等待用户的明确确认。对于
      undeploy-model
      ,您必须首先验证端点和已部署模型是否存在;如果
      describe
      list
      返回404或空结果,您必须停止操作并告知用户,而不是尝试取消部署。
  3. D级:破坏性且不可撤销操作(
    delete
    • 规则:需要明确的输入确认。您必须输出一条文本消息,说明端点或模型删除的不可撤销性质,并要求用户输入"I confirm"或"Yes, delete it"后,再执行删除命令。

1. Prerequisites

1. 前置条件

Before deploying, ensure you have the correct project and region set. The commands below use placeholder variables
PROJECT_ID
and
LOCATION_ID
.
Ensure you are authenticated:
bash
gcloud auth login
gcloud auth application-default login
gcloud config set project $PROJECT_ID
部署前,请确保已设置正确的项目和区域。以下命令使用占位符变量
PROJECT_ID
LOCATION_ID
确保已完成身份验证:
bash
gcloud auth login
gcloud auth application-default login
gcloud config set project $PROJECT_ID

2. Discovering Deployable Models

2. 发现可部署模型

You can list models available in Model Garden and check if they can be self-deployed.
bash
gcloud ai model-garden models list
To see what machine types and accelerators are supported for a specific model (e.g.,
google/gemma3@gemma-3-27b-it
):
bash
gcloud ai model-garden models list-deployment-config \
    --model="google/gemma3@gemma-3-27b-it"
[!NOTE] Some models, especially Hugging Face models, might require a Hugging Face Access Token for deployment.
[!TIP] Model Recommendation Instructions: If a user asks to deploy a model but does not specify which one, you should recommend a model based on their use case (e.g., Llama 3.3 70B for general purpose or Gemma 3 for lightweight tasks). * You MUST ensure you are recommending the latest version or popular version of the suggested model family. * You MUST verify the model is currently deployable using
gcloud ai model-garden models list
before suggesting it to the user.
您可以列出Model Garden中的可用模型,并检查它们是否可以自行部署。
bash
gcloud ai model-garden models list
要查看特定模型支持的机器类型和加速器(例如
google/gemma3@gemma-3-27b-it
):
bash
gcloud ai model-garden models list-deployment-config \
    --model="google/gemma3@gemma-3-27b-it"
[!NOTE] 部分模型,尤其是Hugging Face模型,部署时可能需要Hugging Face访问令牌。
[!TIP] 模型推荐说明:如果用户要求部署模型但未指定具体模型,您应根据他们的使用场景推荐模型(例如,通用场景推荐Llama 3.3 70B,轻量任务推荐Gemma 3)。*您必须确保推荐的是所选模型系列的最新版本热门版本。*在向用户推荐之前,您必须使用
gcloud ai model-garden models list
验证该模型当前是否可部署。

3. Deploying a Model

3. 部署模型

[!WARNING] Deploying models, especially large ones, consumes significant compute resources and incurs costs.
  1. You MUST refer to Agent Platform prediction pricing to calculate a rough cost estimation based on the requested
    --machine-type
    and
    --accelerator-type
    (and count).
  2. You MUST present this cost estimation to the user and warn them that this is the list price, which may differ from their actual bill due to potential discounts or reservations.
  3. You MUST ALWAYS request explicit confirmation from the user agreeing to the estimated cost before executing any
    deploy
    command.
To deploy a model, use the
deploy
command. It is highly recommended to use the
--asynchronous
flag for long-running deployments, and then poll the status if necessary.
[!WARNING] 部署模型,尤其是大型模型,会消耗大量计算资源并产生费用。
  1. 必须参考Agent Platform预测定价,根据请求的
    --machine-type
    --accelerator-type
    (及数量)估算大致成本。
  2. 必须向用户展示此成本估算,并提醒他们这是标价,实际账单可能因折扣或预留实例而有所不同。
  3. 在执行任何
    deploy
    命令之前,您必须始终请求用户明确同意估算成本。
要部署模型,请使用
deploy
命令。强烈建议对长时间运行的部署使用
--asynchronous
标志,然后根据需要轮询状态。

Example: Deploying Gemma 3

示例:部署Gemma 3

Here is a typical bash script to deploy a model. You can run this block directly.
bash
#!/bin/bash
以下是部署模型的典型bash脚本,您可以直接运行此代码块。
bash
#!/bin/bash

Example script to deploy a model from Model Garden

Example script to deploy a model from Model Garden

PROJECT_ID=$(gcloud config get-value project) LOCATION_ID="us-central1" # Recommended default region MODEL_ID="google/gemma3@gemma-3-27b-it" # Replace with your chosen model ID
echo "Deploying model $MODEL_ID to project $PROJECT_ID in $LOCATION_ID..."
PROJECT_ID=$(gcloud config get-value project) LOCATION_ID="us-central1" # Recommended default region MODEL_ID="google/gemma3@gemma-3-27b-it" # Replace with your chosen model ID
echo "Deploying model $MODEL_ID to project $PROJECT_ID in $LOCATION_ID..."

Model Garden can automatically select the required hardware based on the list-deployment-config if hardware params are omitted.

Model Garden can automatically select the required hardware based on the list-deployment-config if hardware params are omitted.

Below is a comprehensive command with all supported parameters:

Below is a comprehensive command with all supported parameters:

gcloud ai model-garden models deploy
--project=$PROJECT_ID
--region=$LOCATION_ID
--model=$MODEL_ID
--machine-type="g2-standard-48"
--accelerator-type="NVIDIA_L4"
--accelerator-count=4
--endpoint-display-name="my-gemma-deployment"
--hugging-face-access-token="YOUR_HF_TOKEN"
--reservation-affinity="reservation-affinity-type=specific-reservation,key=compute.googleapis.com/reservation-name,values=my-reservation"
--asynchronous
echo "Deployment initiated asynchronously."
undefined
gcloud ai model-garden models deploy
--project=$PROJECT_ID
--region=$LOCATION_ID
--model=$MODEL_ID
--machine-type="g2-standard-48"
--accelerator-type="NVIDIA_L4"
--accelerator-count=4
--endpoint-display-name="my-gemma-deployment"
--hugging-face-access-token="YOUR_HF_TOKEN"
--reservation-affinity="reservation-affinity-type=specific-reservation,key=compute.googleapis.com/reservation-name,values=my-reservation"
--asynchronous
echo "Deployment initiated asynchronously."
undefined

Example: Deploying Custom Weights

示例:部署自定义权重

To deploy a model using custom weights, you can use the exact same
deploy
command. Instead of providing the model garden model ID, provide the Google Cloud Storage (GCS) URI to your custom weights folder in the
--model
flag.
bash
#!/bin/bash
要使用自定义权重部署模型,您可以使用完全相同的
deploy
命令。无需提供Model Garden模型ID,而是在
--model
标志中提供指向自定义权重文件夹的Google Cloud Storage (GCS) URI。
bash
#!/bin/bash

Example script to deploy a model with custom weights from a GCS bucket

Example script to deploy a model with custom weights from a GCS bucket

PROJECT_ID=$(gcloud config get-value project) LOCATION_ID="us-central1"
PROJECT_ID=$(gcloud config get-value project) LOCATION_ID="us-central1"

Replace with the gs:// URI pointing to your custom weights

Replace with the gs:// URI pointing to your custom weights

MODEL_GCS_URI="gs://your-bucket-name/path/to/custom-weights"
echo "Deploying custom model from $MODEL_GCS_URI to project $PROJECT_ID in $LOCATION_ID..."
gcloud ai model-garden models deploy
--project=$PROJECT_ID
--region=$LOCATION_ID
--model=$MODEL_GCS_URI
--machine-type="g2-standard-12"
--accelerator-type="NVIDIA_L4"
--endpoint-display-name="my-custom-model"
--asynchronous
echo "Deployment initiated asynchronously."
undefined
MODEL_GCS_URI="gs://your-bucket-name/path/to/custom-weights"
echo "Deploying custom model from $MODEL_GCS_URI to project $PROJECT_ID in $LOCATION_ID..."
gcloud ai model-garden models deploy
--project=$PROJECT_ID
--region=$LOCATION_ID
--model=$MODEL_GCS_URI
--machine-type="g2-standard-12"
--accelerator-type="NVIDIA_L4"
--endpoint-display-name="my-custom-model"
--asynchronous
echo "Deployment initiated asynchronously."
undefined

4. Checking Deployment Status

4. 检查部署状态

When you deploy a model asynchronously using the
--asynchronous
flag, the
deploy
command will return an operation ID. You can use this ID to check the ongoing status of the deployment.
bash
gcloud ai operations describe YOUR_OPERATION_ID \
    --region=$LOCATION_ID
[!NOTE] As an agent, you can also offer to check the status of a deployment for the user if they provide an operation ID or if they just initiated the deployment with you.
Alternatively, you can list your endpoints to see if it shows up and check the Cloud Console under the "Online prediction" tab.
bash
gcloud ai endpoints list \
    --region=$LOCATION_ID
Note: Large models (like Llama 3.1 8B or Gemma 27B) may take 15-20 minutes to fully deploy and start serving.
当您使用
--asynchronous
标志异步部署模型时,
deploy
命令将返回一个操作ID。您可以使用此ID检查部署的实时状态。
bash
gcloud ai operations describe YOUR_OPERATION_ID \
    --region=$LOCATION_ID
[!NOTE] 作为代理,如果用户提供了操作ID,或者刚刚与您一起启动了部署,您也可以主动提出为用户检查部署状态。
或者,您可以列出所有端点,查看是否已显示该部署,也可以在Cloud Console的"在线预测"标签下检查。
bash
gcloud ai endpoints list \
    --region=$LOCATION_ID
注意:大型模型(如Llama 3.1 8B或Gemma 27B)可能需要15-20分钟才能完全部署并开始提供服务。

Verifying Deployment

验证部署

If the model is successfully deployed, verify by making a prediction call to test. Because Model Garden models are often deployed to Dedicated Endpoints, you shouldn't use
gcloud ai endpoints predict
. Instead, you must fetch the endpoint's dedicated DNS name and send a
curl
request.
[!TIP] Ask the user to try using their own prompt to see the results. Otherwise use the default.
Use the following script:
bash
#!/bin/bash
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1"
ENDPOINT_ID="YOUR_ENDPOINT_ID"
PROMPT=${1:-"Explain quantum computing in simple terms."}

echo "Fetching dedicated Endpoint DNS..."
ENDPOINT_URL=$(gcloud ai endpoints describe $ENDPOINT_ID --project=$PROJECT_ID --region=$LOCATION_ID --format="value(dedicatedEndpointDns)")

if [ -z "$ENDPOINT_URL" ]; then
    echo "Error: Could not retrieve a dedicated endpoint URL. Verify your ENDPOINT_ID."
    exit 1
fi

echo "Sending prediction request to $ENDPOINT_URL..."
curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/endpoints/${ENDPOINT_ID}/chat/completions" \
  -d '{
    "model": "'"$ENDPOINT_ID"'",
    "messages": [
      {
        "role": "user",
        "content": "'"$PROMPT"'"
      }
    ]
  }'
如果模型部署成功,请通过发送预测请求进行测试验证。由于Model Garden模型通常部署到专用端点,您不应使用
gcloud ai endpoints predict
。相反,您必须获取端点的专用DNS名称并发送
curl
请求。
[!TIP] 请让用户尝试使用自己的提示词查看结果。否则使用默认提示词。
使用以下脚本:
bash
#!/bin/bash
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1"
ENDPOINT_ID="YOUR_ENDPOINT_ID"
PROMPT=${1:-"Explain quantum computing in simple terms."}

echo "Fetching dedicated Endpoint DNS..."
ENDPOINT_URL=$(gcloud ai endpoints describe $ENDPOINT_ID --project=$PROJECT_ID --region=$LOCATION_ID --format="value(dedicatedEndpointDns)")

if [ -z "$ENDPOINT_URL" ]; then
    echo "Error: Could not retrieve a dedicated endpoint URL. Verify your ENDPOINT_ID."
    exit 1
fi

echo "Sending prediction request to $ENDPOINT_URL..."
curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/endpoints/${ENDPOINT_ID}/chat/completions" \
  -d '{
    "model": "'"$ENDPOINT_ID"'",
    "messages": [
      {
        "role": "user",
        "content": "'"$PROMPT"'"
      }
    ]
  }'

5. Undeploying and Cleaning Up

5. 取消部署与清理

To stop incurring charges, you must undeploy the model from the endpoint. This is a multi-step process if you don't already have the exact endpoint and deployed model IDs.
要停止产生费用,您必须从端点取消部署模型。如果您没有确切的端点和已部署模型ID,这是一个多步骤过程。

Example: Finding and Undeploying a Model

示例:查找并取消部署模型

Here is a bash script demonstrating how to find the IDs and undeploy the model.
bash
#!/bin/bash
以下bash脚本演示了如何查找ID并取消部署模型。
bash
#!/bin/bash

Example script to undeploy a model

Example script to undeploy a model

PROJECT_ID=$(gcloud config get-value project) LOCATION_ID="us-central1"
PROJECT_ID=$(gcloud config get-value project) LOCATION_ID="us-central1"

The model ID used during deployment (without the provider prefix sometimes, or exactly as listed in describe)

The model ID used during deployment (without the provider prefix sometimes, or exactly as listed in describe)

It's usually easier to find the specific ID via
gcloud ai models list

It's usually easier to find the specific ID via
gcloud ai models list

For this example, let's assume we know the exact Endpoint ID and Deployed Model ID.

For this example, let's assume we know the exact Endpoint ID and Deployed Model ID.

1. Find the Endpoint ID

1. Find the Endpoint ID

echo "Listing endpoints in $LOCATION_ID:" gcloud ai endpoints list --project=$PROJECT_ID --region=$LOCATION_ID
echo "Listing endpoints in $LOCATION_ID:" gcloud ai endpoints list --project=$PROJECT_ID --region=$LOCATION_ID

(Assuming you extracted ENDPOINT_ID from the above output)

(Assuming you extracted ENDPOINT_ID from the above output)

ENDPOINT_ID="your_endpoint_id"

ENDPOINT_ID="your_endpoint_id"

2. Find the Deployed Model ID

2. Find the Deployed Model ID

echo "Listing models in $LOCATION_ID to find model description:" gcloud ai models list --project=$PROJECT_ID --region=$LOCATION_ID
echo "Listing models in $LOCATION_ID to find model description:" gcloud ai models list --project=$PROJECT_ID --region=$LOCATION_ID

(Assuming you found the specific MODEL_ID)

(Assuming you found the specific MODEL_ID)

MODEL_ID="your_model_id"

MODEL_ID="your_model_id"

gcloud ai models describe $MODEL_ID --project=$PROJECT_ID --region=$LOCATION_ID

gcloud ai models describe $MODEL_ID --project=$PROJECT_ID --region=$LOCATION_ID

(Extract the deployedModelId from the output)

(Extract the deployedModelId from the output)

DEPLOYED_MODEL_ID="your_deployed_model_id"

DEPLOYED_MODEL_ID="your_deployed_model_id"

3. Undeploy

3. Undeploy

echo "Undeploying model $DEPLOYED_MODEL_ID from endpoint $ENDPOINT_ID..." gcloud ai endpoints undeploy-model $ENDPOINT_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--deployed-model-id=$DEPLOYED_MODEL_ID
echo "Model undeployed."
echo "Undeploying model $DEPLOYED_MODEL_ID from endpoint $ENDPOINT_ID..." gcloud ai endpoints undeploy-model $ENDPOINT_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--deployed-model-id=$DEPLOYED_MODEL_ID
echo "Model undeployed."

4. Delete Endpoint

4. Delete Endpoint

echo "Deleting endpoint $ENDPOINT_ID..." gcloud ai endpoints delete $ENDPOINT_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--quiet echo "Endpoint deleted."
echo "Deleting endpoint $ENDPOINT_ID..." gcloud ai endpoints delete $ENDPOINT_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--quiet echo "Endpoint deleted."

5. Delete Model

5. Delete Model

echo "Deleting model $MODEL_ID..." gcloud ai models delete $MODEL_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--quiet echo "Model deleted."

> [!WARNING] Failing to undeploy a model will result in continuous charges for
> the allocated compute resources, even if you are not sending prediction
> requests. Always clean up after testing.
echo "Deleting model $MODEL_ID..." gcloud ai models delete $MODEL_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--quiet echo "Model deleted."

> [!WARNING] 如果未取消部署模型,即使您没有发送预测请求,也会持续产生计算资源的费用。测试后请务必清理资源。

6. Troubleshooting

6. 故障排查

Deployment Failure: Quota or Resource Exhausted

部署失败:配额或资源耗尽

If your deployment fails (or stays in an error state) due to
QUOTA_EXCEEDED
or
RESOURCE_EXHAUSTED
errors, the specific hardware requested (e.g.,
NVIDIA_L4
or
g2-standard-24
) is either not available in your chosen region or exceeds your project's quota limits.
Solution: Look closely at the error message returned. It will often recommend an alternative region or machine type that currently has availability. Ask the user for confirmation to retry the deployment using the suggested
--region
or
--machine-type
parameters.
[!WARNING] If the alternative suggestions involve changing the machine type or accelerator, you MUST recalculate the estimated cost using Agent Platform prediction pricing, warn the user about list prices versus actual billing, and get their explicit confirmation for the new cost before retrying the deployment.
如果部署因
QUOTA_EXCEEDED
RESOURCE_EXHAUSTED
错误而失败(或处于错误状态),则请求的特定硬件(例如
NVIDIA_L4
g2-standard-24
)要么在您选择的区域不可用,要么超出了项目的配额限制。
解决方案:仔细查看返回的错误消息。它通常会推荐当前可用的替代区域或机器类型。请向用户确认,使用建议的
--region
--machine-type
参数重新尝试部署。
[!WARNING] 如果替代建议涉及更改机器类型或加速器,您必须使用Agent Platform预测定价重新估算成本,提醒用户标价与实际账单的差异,并在重新尝试部署前获得用户对新成本的明确确认。