agent-platform-deploy
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAgent Platform Model Garden Deploy Skill
Agent Platform Model Garden 模型部署技能
This skill provides instructions for deploying Open Models from Agent Platform
Model Garden to endpoints, and subsequently undeploying them to clean up
resources.
本技能提供了将Agent Platform Model Garden中的开源模型部署到端点,以及后续取消部署以清理资源的操作说明。
1P Tuned Model Copy & Deployment
1P调优模型的复制与部署
If you need to copy a 1P (First-Party) Tuned Model from a source project to a destination region or project and deploy it to a newly created endpoint, refer to the 1P Tuned Model Copy & Deployment Guide.
如果您需要将1P(第一方)调优模型从源项目复制到目标区域或项目,并部署到新创建的端点,请参考《1P调优模型复制与部署指南》。
Safety & Confirmation Tiers (CRITICAL)
安全与确认层级(CRITICAL)
Before executing any commands on behalf of the user, you MUST adhere to the
following safety tiers based on the action requested:
- Tier R: Read-only (,
list,describe)list-deployment-config- Rule: No confirmation needed. You may execute these commands immediately to gather information for the user.
- Tier M: Mutating & Reversible (,
deploy)undeploy-model- Rule: This requires explicit user confirmation. You MUST present a
clear confirmation prompt to the user explaining the proposed command.
You MUST wait for their explicit confirmation before executing. For
, you MUST first verify that the endpoint and deployed model exist; if
undeploy-modelordescribereturns a 404 or empty result, you MUST halt and inform the user rather than attempting undeployment.list
- Rule: This requires explicit user confirmation. You MUST present a
clear confirmation prompt to the user explaining the proposed command.
You MUST wait for their explicit confirmation before executing. For
- Tier D: Destructive & Irreversible ()
delete- Rule: This requires explicit typed confirmation. You MUST output a text message explaining the irreversible nature of endpoint or model deletion and asking the user to type "I confirm" or "Yes, delete it" before executing the deletion command.
在代表用户执行任何命令之前,您必须根据请求的操作遵循以下安全层级:
- R级:只读操作(、
list、describe)list-deployment-config- 规则:无需确认。您可以立即执行这些命令,为用户收集信息。
- M级:可变更且可撤销操作(、
deploy)undeploy-model- 规则:需要用户明确确认。您必须向用户展示清晰的确认提示,说明拟执行的命令。在执行前必须等待用户的明确确认。对于,您必须首先验证端点和已部署模型是否存在;如果
undeploy-model或describe返回404或空结果,您必须停止操作并告知用户,而不是尝试取消部署。list
- 规则:需要用户明确确认。您必须向用户展示清晰的确认提示,说明拟执行的命令。在执行前必须等待用户的明确确认。对于
- D级:破坏性且不可撤销操作()
delete- 规则:需要明确的输入确认。您必须输出一条文本消息,说明端点或模型删除的不可撤销性质,并要求用户输入"I confirm"或"Yes, delete it"后,再执行删除命令。
1. Prerequisites
1. 前置条件
Before deploying, ensure you have the correct project and region set. The
commands below use placeholder variables and .
PROJECT_IDLOCATION_IDEnsure you are authenticated:
bash
gcloud auth login
gcloud auth application-default login
gcloud config set project $PROJECT_ID部署前,请确保已设置正确的项目和区域。以下命令使用占位符变量和。
PROJECT_IDLOCATION_ID确保已完成身份验证:
bash
gcloud auth login
gcloud auth application-default login
gcloud config set project $PROJECT_ID2. Discovering Deployable Models
2. 发现可部署模型
You can list models available in Model Garden and check if they can be
self-deployed.
bash
gcloud ai model-garden models listTo see what machine types and accelerators are supported for a specific model
(e.g., ):
google/gemma3@gemma-3-27b-itbash
gcloud ai model-garden models list-deployment-config \
--model="google/gemma3@gemma-3-27b-it"[!NOTE] Some models, especially Hugging Face models, might require a Hugging Face Access Token for deployment.
[!TIP] Model Recommendation Instructions: If a user asks to deploy a model but does not specify which one, you should recommend a model based on their use case (e.g., Llama 3.3 70B for general purpose or Gemma 3 for lightweight tasks). * You MUST ensure you are recommending the latest version or popular version of the suggested model family. * You MUST verify the model is currently deployable usingbefore suggesting it to the user.gcloud ai model-garden models list
您可以列出Model Garden中的可用模型,并检查它们是否可以自行部署。
bash
gcloud ai model-garden models list要查看特定模型支持的机器类型和加速器(例如):
google/gemma3@gemma-3-27b-itbash
gcloud ai model-garden models list-deployment-config \
--model="google/gemma3@gemma-3-27b-it"[!NOTE] 部分模型,尤其是Hugging Face模型,部署时可能需要Hugging Face访问令牌。
[!TIP] 模型推荐说明:如果用户要求部署模型但未指定具体模型,您应根据他们的使用场景推荐模型(例如,通用场景推荐Llama 3.3 70B,轻量任务推荐Gemma 3)。*您必须确保推荐的是所选模型系列的最新版本或热门版本。*在向用户推荐之前,您必须使用验证该模型当前是否可部署。gcloud ai model-garden models list
3. Deploying a Model
3. 部署模型
[!WARNING] Deploying models, especially large ones, consumes significant compute resources and incurs costs.
- You MUST refer to Agent Platform prediction pricing to calculate a rough cost estimation based on the requested
and--machine-type(and count).--accelerator-type- You MUST present this cost estimation to the user and warn them that this is the list price, which may differ from their actual bill due to potential discounts or reservations.
- You MUST ALWAYS request explicit confirmation from the user agreeing to the estimated cost before executing any
command.deploy
To deploy a model, use the command. It is highly recommended to use the
flag for long-running deployments, and then poll the status if
necessary.
deploy--asynchronous[!WARNING] 部署模型,尤其是大型模型,会消耗大量计算资源并产生费用。
- 您必须参考Agent Platform预测定价,根据请求的
和--machine-type(及数量)估算大致成本。--accelerator-type- 您必须向用户展示此成本估算,并提醒他们这是标价,实际账单可能因折扣或预留实例而有所不同。
- 在执行任何
命令之前,您必须始终请求用户明确同意估算成本。deploy
要部署模型,请使用命令。强烈建议对长时间运行的部署使用标志,然后根据需要轮询状态。
deploy--asynchronousExample: Deploying Gemma 3
示例:部署Gemma 3
Here is a typical bash script to deploy a model. You can run this block
directly.
bash
#!/bin/bash以下是部署模型的典型bash脚本,您可以直接运行此代码块。
bash
#!/bin/bashExample script to deploy a model from Model Garden
Example script to deploy a model from Model Garden
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1" # Recommended default region
MODEL_ID="google/gemma3@gemma-3-27b-it" # Replace with your chosen model ID
echo "Deploying model $MODEL_ID to project $PROJECT_ID in $LOCATION_ID..."
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1" # Recommended default region
MODEL_ID="google/gemma3@gemma-3-27b-it" # Replace with your chosen model ID
echo "Deploying model $MODEL_ID to project $PROJECT_ID in $LOCATION_ID..."
Model Garden can automatically select the required hardware based on the list-deployment-config if hardware params are omitted.
Model Garden can automatically select the required hardware based on the list-deployment-config if hardware params are omitted.
Below is a comprehensive command with all supported parameters:
Below is a comprehensive command with all supported parameters:
gcloud ai model-garden models deploy
--project=$PROJECT_ID
--region=$LOCATION_ID
--model=$MODEL_ID
--machine-type="g2-standard-48"
--accelerator-type="NVIDIA_L4"
--accelerator-count=4
--endpoint-display-name="my-gemma-deployment"
--hugging-face-access-token="YOUR_HF_TOKEN"
--reservation-affinity="reservation-affinity-type=specific-reservation,key=compute.googleapis.com/reservation-name,values=my-reservation"
--asynchronous
--project=$PROJECT_ID
--region=$LOCATION_ID
--model=$MODEL_ID
--machine-type="g2-standard-48"
--accelerator-type="NVIDIA_L4"
--accelerator-count=4
--endpoint-display-name="my-gemma-deployment"
--hugging-face-access-token="YOUR_HF_TOKEN"
--reservation-affinity="reservation-affinity-type=specific-reservation,key=compute.googleapis.com/reservation-name,values=my-reservation"
--asynchronous
echo "Deployment initiated asynchronously."
undefinedgcloud ai model-garden models deploy
--project=$PROJECT_ID
--region=$LOCATION_ID
--model=$MODEL_ID
--machine-type="g2-standard-48"
--accelerator-type="NVIDIA_L4"
--accelerator-count=4
--endpoint-display-name="my-gemma-deployment"
--hugging-face-access-token="YOUR_HF_TOKEN"
--reservation-affinity="reservation-affinity-type=specific-reservation,key=compute.googleapis.com/reservation-name,values=my-reservation"
--asynchronous
--project=$PROJECT_ID
--region=$LOCATION_ID
--model=$MODEL_ID
--machine-type="g2-standard-48"
--accelerator-type="NVIDIA_L4"
--accelerator-count=4
--endpoint-display-name="my-gemma-deployment"
--hugging-face-access-token="YOUR_HF_TOKEN"
--reservation-affinity="reservation-affinity-type=specific-reservation,key=compute.googleapis.com/reservation-name,values=my-reservation"
--asynchronous
echo "Deployment initiated asynchronously."
undefinedExample: Deploying Custom Weights
示例:部署自定义权重
To deploy a model using custom weights, you can use the exact same
command. Instead of providing the model garden model ID, provide the Google
Cloud Storage (GCS) URI to your custom weights folder in the flag.
deploy--modelbash
#!/bin/bash要使用自定义权重部署模型,您可以使用完全相同的命令。无需提供Model Garden模型ID,而是在标志中提供指向自定义权重文件夹的Google Cloud Storage (GCS) URI。
deploy--modelbash
#!/bin/bashExample script to deploy a model with custom weights from a GCS bucket
Example script to deploy a model with custom weights from a GCS bucket
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1"
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1"
Replace with the gs:// URI pointing to your custom weights
Replace with the gs:// URI pointing to your custom weights
MODEL_GCS_URI="gs://your-bucket-name/path/to/custom-weights"
echo "Deploying custom model from $MODEL_GCS_URI to project $PROJECT_ID in $LOCATION_ID..."
gcloud ai model-garden models deploy
--project=$PROJECT_ID
--region=$LOCATION_ID
--model=$MODEL_GCS_URI
--machine-type="g2-standard-12"
--accelerator-type="NVIDIA_L4"
--endpoint-display-name="my-custom-model"
--asynchronous
--project=$PROJECT_ID
--region=$LOCATION_ID
--model=$MODEL_GCS_URI
--machine-type="g2-standard-12"
--accelerator-type="NVIDIA_L4"
--endpoint-display-name="my-custom-model"
--asynchronous
echo "Deployment initiated asynchronously."
undefinedMODEL_GCS_URI="gs://your-bucket-name/path/to/custom-weights"
echo "Deploying custom model from $MODEL_GCS_URI to project $PROJECT_ID in $LOCATION_ID..."
gcloud ai model-garden models deploy
--project=$PROJECT_ID
--region=$LOCATION_ID
--model=$MODEL_GCS_URI
--machine-type="g2-standard-12"
--accelerator-type="NVIDIA_L4"
--endpoint-display-name="my-custom-model"
--asynchronous
--project=$PROJECT_ID
--region=$LOCATION_ID
--model=$MODEL_GCS_URI
--machine-type="g2-standard-12"
--accelerator-type="NVIDIA_L4"
--endpoint-display-name="my-custom-model"
--asynchronous
echo "Deployment initiated asynchronously."
undefined4. Checking Deployment Status
4. 检查部署状态
When you deploy a model asynchronously using the flag, the
command will return an operation ID. You can use this ID to check the
ongoing status of the deployment.
--asynchronousdeploybash
gcloud ai operations describe YOUR_OPERATION_ID \
--region=$LOCATION_ID[!NOTE] As an agent, you can also offer to check the status of a deployment for the user if they provide an operation ID or if they just initiated the deployment with you.
Alternatively, you can list your endpoints to see if it shows up and check the
Cloud Console under the "Online prediction" tab.
bash
gcloud ai endpoints list \
--region=$LOCATION_IDNote: Large models (like Llama 3.1 8B or Gemma 27B) may take 15-20 minutes to
fully deploy and start serving.
当您使用标志异步部署模型时,命令将返回一个操作ID。您可以使用此ID检查部署的实时状态。
--asynchronousdeploybash
gcloud ai operations describe YOUR_OPERATION_ID \
--region=$LOCATION_ID[!NOTE] 作为代理,如果用户提供了操作ID,或者刚刚与您一起启动了部署,您也可以主动提出为用户检查部署状态。
或者,您可以列出所有端点,查看是否已显示该部署,也可以在Cloud Console的"在线预测"标签下检查。
bash
gcloud ai endpoints list \
--region=$LOCATION_ID注意:大型模型(如Llama 3.1 8B或Gemma 27B)可能需要15-20分钟才能完全部署并开始提供服务。
Verifying Deployment
验证部署
If the model is successfully deployed, verify by making a prediction call to
test. Because Model Garden models are often deployed to Dedicated Endpoints, you
shouldn't use . Instead, you must fetch the
endpoint's dedicated DNS name and send a request.
gcloud ai endpoints predictcurl[!TIP] Ask the user to try using their own prompt to see the results. Otherwise use the default.
Use the following script:
bash
#!/bin/bash
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1"
ENDPOINT_ID="YOUR_ENDPOINT_ID"
PROMPT=${1:-"Explain quantum computing in simple terms."}
echo "Fetching dedicated Endpoint DNS..."
ENDPOINT_URL=$(gcloud ai endpoints describe $ENDPOINT_ID --project=$PROJECT_ID --region=$LOCATION_ID --format="value(dedicatedEndpointDns)")
if [ -z "$ENDPOINT_URL" ]; then
echo "Error: Could not retrieve a dedicated endpoint URL. Verify your ENDPOINT_ID."
exit 1
fi
echo "Sending prediction request to $ENDPOINT_URL..."
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/endpoints/${ENDPOINT_ID}/chat/completions" \
-d '{
"model": "'"$ENDPOINT_ID"'",
"messages": [
{
"role": "user",
"content": "'"$PROMPT"'"
}
]
}'如果模型部署成功,请通过发送预测请求进行测试验证。由于Model Garden模型通常部署到专用端点,您不应使用。相反,您必须获取端点的专用DNS名称并发送请求。
gcloud ai endpoints predictcurl[!TIP] 请让用户尝试使用自己的提示词查看结果。否则使用默认提示词。
使用以下脚本:
bash
#!/bin/bash
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1"
ENDPOINT_ID="YOUR_ENDPOINT_ID"
PROMPT=${1:-"Explain quantum computing in simple terms."}
echo "Fetching dedicated Endpoint DNS..."
ENDPOINT_URL=$(gcloud ai endpoints describe $ENDPOINT_ID --project=$PROJECT_ID --region=$LOCATION_ID --format="value(dedicatedEndpointDns)")
if [ -z "$ENDPOINT_URL" ]; then
echo "Error: Could not retrieve a dedicated endpoint URL. Verify your ENDPOINT_ID."
exit 1
fi
echo "Sending prediction request to $ENDPOINT_URL..."
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/endpoints/${ENDPOINT_ID}/chat/completions" \
-d '{
"model": "'"$ENDPOINT_ID"'",
"messages": [
{
"role": "user",
"content": "'"$PROMPT"'"
}
]
}'5. Undeploying and Cleaning Up
5. 取消部署与清理
To stop incurring charges, you must undeploy the model from the endpoint. This
is a multi-step process if you don't already have the exact endpoint and
deployed model IDs.
要停止产生费用,您必须从端点取消部署模型。如果您没有确切的端点和已部署模型ID,这是一个多步骤过程。
Example: Finding and Undeploying a Model
示例:查找并取消部署模型
Here is a bash script demonstrating how to find the IDs and undeploy the model.
bash
#!/bin/bash以下bash脚本演示了如何查找ID并取消部署模型。
bash
#!/bin/bashExample script to undeploy a model
Example script to undeploy a model
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1"
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1"
The model ID used during deployment (without the provider prefix sometimes, or exactly as listed in describe)
The model ID used during deployment (without the provider prefix sometimes, or exactly as listed in describe)
It's usually easier to find the specific ID via gcloud ai models list
gcloud ai models listIt's usually easier to find the specific ID via gcloud ai models list
gcloud ai models listFor this example, let's assume we know the exact Endpoint ID and Deployed Model ID.
For this example, let's assume we know the exact Endpoint ID and Deployed Model ID.
1. Find the Endpoint ID
1. Find the Endpoint ID
echo "Listing endpoints in $LOCATION_ID:"
gcloud ai endpoints list --project=$PROJECT_ID --region=$LOCATION_ID
echo "Listing endpoints in $LOCATION_ID:"
gcloud ai endpoints list --project=$PROJECT_ID --region=$LOCATION_ID
(Assuming you extracted ENDPOINT_ID from the above output)
(Assuming you extracted ENDPOINT_ID from the above output)
ENDPOINT_ID="your_endpoint_id"
ENDPOINT_ID="your_endpoint_id"
2. Find the Deployed Model ID
2. Find the Deployed Model ID
echo "Listing models in $LOCATION_ID to find model description:"
gcloud ai models list --project=$PROJECT_ID --region=$LOCATION_ID
echo "Listing models in $LOCATION_ID to find model description:"
gcloud ai models list --project=$PROJECT_ID --region=$LOCATION_ID
(Assuming you found the specific MODEL_ID)
(Assuming you found the specific MODEL_ID)
MODEL_ID="your_model_id"
MODEL_ID="your_model_id"
gcloud ai models describe $MODEL_ID --project=$PROJECT_ID --region=$LOCATION_ID
gcloud ai models describe $MODEL_ID --project=$PROJECT_ID --region=$LOCATION_ID
(Extract the deployedModelId from the output)
(Extract the deployedModelId from the output)
DEPLOYED_MODEL_ID="your_deployed_model_id"
DEPLOYED_MODEL_ID="your_deployed_model_id"
3. Undeploy
3. Undeploy
echo "Undeploying model $DEPLOYED_MODEL_ID from endpoint $ENDPOINT_ID..."
gcloud ai endpoints undeploy-model $ENDPOINT_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--deployed-model-id=$DEPLOYED_MODEL_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--deployed-model-id=$DEPLOYED_MODEL_ID
echo "Model undeployed."
echo "Undeploying model $DEPLOYED_MODEL_ID from endpoint $ENDPOINT_ID..."
gcloud ai endpoints undeploy-model $ENDPOINT_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--deployed-model-id=$DEPLOYED_MODEL_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--deployed-model-id=$DEPLOYED_MODEL_ID
echo "Model undeployed."
4. Delete Endpoint
4. Delete Endpoint
echo "Deleting endpoint $ENDPOINT_ID..."
gcloud ai endpoints delete $ENDPOINT_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--quiet echo "Endpoint deleted."
--project=$PROJECT_ID
--region=$LOCATION_ID
--quiet echo "Endpoint deleted."
echo "Deleting endpoint $ENDPOINT_ID..."
gcloud ai endpoints delete $ENDPOINT_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--quiet echo "Endpoint deleted."
--project=$PROJECT_ID
--region=$LOCATION_ID
--quiet echo "Endpoint deleted."
5. Delete Model
5. Delete Model
echo "Deleting model $MODEL_ID..."
gcloud ai models delete $MODEL_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--quiet echo "Model deleted."
--project=$PROJECT_ID
--region=$LOCATION_ID
--quiet echo "Model deleted."
> [!WARNING] Failing to undeploy a model will result in continuous charges for
> the allocated compute resources, even if you are not sending prediction
> requests. Always clean up after testing.echo "Deleting model $MODEL_ID..."
gcloud ai models delete $MODEL_ID
--project=$PROJECT_ID
--region=$LOCATION_ID
--quiet echo "Model deleted."
--project=$PROJECT_ID
--region=$LOCATION_ID
--quiet echo "Model deleted."
> [!WARNING] 如果未取消部署模型,即使您没有发送预测请求,也会持续产生计算资源的费用。测试后请务必清理资源。6. Troubleshooting
6. 故障排查
Deployment Failure: Quota or Resource Exhausted
部署失败:配额或资源耗尽
If your deployment fails (or stays in an error state) due to or
errors, the specific hardware requested (e.g.,
or ) is either not available in your chosen region or exceeds
your project's quota limits.
QUOTA_EXCEEDEDRESOURCE_EXHAUSTEDNVIDIA_L4g2-standard-24Solution: Look closely at the error message returned. It will often
recommend an alternative region or machine type that currently has availability.
Ask the user for confirmation to retry the deployment using the suggested
or parameters.
--region--machine-type[!WARNING] If the alternative suggestions involve changing the machine type or accelerator, you MUST recalculate the estimated cost using Agent Platform prediction pricing, warn the user about list prices versus actual billing, and get their explicit confirmation for the new cost before retrying the deployment.
如果部署因或错误而失败(或处于错误状态),则请求的特定硬件(例如或)要么在您选择的区域不可用,要么超出了项目的配额限制。
QUOTA_EXCEEDEDRESOURCE_EXHAUSTEDNVIDIA_L4g2-standard-24解决方案:仔细查看返回的错误消息。它通常会推荐当前可用的替代区域或机器类型。请向用户确认,使用建议的或参数重新尝试部署。
--region--machine-type[!WARNING] 如果替代建议涉及更改机器类型或加速器,您必须使用Agent Platform预测定价重新估算成本,提醒用户标价与实际账单的差异,并在重新尝试部署前获得用户对新成本的明确确认。