Loading...
Loading...
Deploy open models or custom weights from Model Garden to Agent Platform endpoints, check deployment status, verify serving endpoints, or clean up resources by undeploying models and deleting endpoints. Use when asked to deploy models on Agent Platform, list available Model Garden models, check if a model is deployable, query deployment cost, troubleshoot deployment errors (like quota limits), or undeploy/clean up endpoints. Also use when copying and deploying a 1P Tuned Model. Don't use for public Vertex AI deployments (use the `vertex-deploy` skill) or for running model evaluations (use the `agent-platform-eval` skill).
npx skill4agent add google/skills agent-platform-deploylistdescribelist-deployment-configdeployundeploy-modelundeploy-modeldescribelistdeletePROJECT_IDLOCATION_IDgcloud auth login
gcloud auth application-default login
gcloud config set project $PROJECT_IDgcloud ai model-garden models listgoogle/gemma3@gemma-3-27b-itgcloud ai model-garden models list-deployment-config \
--model="google/gemma3@gemma-3-27b-it"[!NOTE] Some models, especially Hugging Face models, might require a Hugging Face Access Token for deployment.
[!TIP] Model Recommendation Instructions: If a user asks to deploy a model but does not specify which one, you should recommend a model based on their use case (e.g., Llama 3.3 70B for general purpose or Gemma 3 for lightweight tasks). * You MUST ensure you are recommending the latest version or popular version of the suggested model family. * You MUST verify the model is currently deployable usingbefore suggesting it to the user.gcloud ai model-garden models list
[!WARNING] Deploying models, especially large ones, consumes significant compute resources and incurs costs.
- You MUST refer to Agent Platform prediction pricing to calculate a rough cost estimation based on the requested
and--machine-type(and count).--accelerator-type- You MUST present this cost estimation to the user and warn them that this is the list price, which may differ from their actual bill due to potential discounts or reservations.
- You MUST ALWAYS request explicit confirmation from the user agreeing to the estimated cost before executing any
command.deploy
deploy--asynchronous#!/bin/bash
# Example script to deploy a model from Model Garden
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1" # Recommended default region
MODEL_ID="google/gemma3@gemma-3-27b-it" # Replace with your chosen model ID
echo "Deploying model $MODEL_ID to project $PROJECT_ID in $LOCATION_ID..."
# Model Garden can automatically select the required hardware based on the list-deployment-config if hardware params are omitted.
# Below is a comprehensive command with all supported parameters:
gcloud ai model-garden models deploy \
--project=$PROJECT_ID \
--region=$LOCATION_ID \
--model=$MODEL_ID \
--machine-type="g2-standard-48" \
--accelerator-type="NVIDIA_L4" \
--accelerator-count=4 \
--endpoint-display-name="my-gemma-deployment" \
--hugging-face-access-token="YOUR_HF_TOKEN" \
--reservation-affinity="reservation-affinity-type=specific-reservation,key=compute.googleapis.com/reservation-name,values=my-reservation" \
--asynchronous
echo "Deployment initiated asynchronously."deploy--model#!/bin/bash
# Example script to deploy a model with custom weights from a GCS bucket
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1"
# Replace with the gs:// URI pointing to your custom weights
MODEL_GCS_URI="gs://your-bucket-name/path/to/custom-weights"
echo "Deploying custom model from $MODEL_GCS_URI to project $PROJECT_ID in $LOCATION_ID..."
gcloud ai model-garden models deploy \
--project=$PROJECT_ID \
--region=$LOCATION_ID \
--model=$MODEL_GCS_URI \
--machine-type="g2-standard-12" \
--accelerator-type="NVIDIA_L4" \
--endpoint-display-name="my-custom-model" \
--asynchronous
echo "Deployment initiated asynchronously."--asynchronousdeploygcloud ai operations describe YOUR_OPERATION_ID \
--region=$LOCATION_ID[!NOTE] As an agent, you can also offer to check the status of a deployment for the user if they provide an operation ID or if they just initiated the deployment with you.
gcloud ai endpoints list \
--region=$LOCATION_IDgcloud ai endpoints predictcurl[!TIP] Ask the user to try using their own prompt to see the results. Otherwise use the default.
#!/bin/bash
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1"
ENDPOINT_ID="YOUR_ENDPOINT_ID"
PROMPT=${1:-"Explain quantum computing in simple terms."}
echo "Fetching dedicated Endpoint DNS..."
ENDPOINT_URL=$(gcloud ai endpoints describe $ENDPOINT_ID --project=$PROJECT_ID --region=$LOCATION_ID --format="value(dedicatedEndpointDns)")
if [ -z "$ENDPOINT_URL" ]; then
echo "Error: Could not retrieve a dedicated endpoint URL. Verify your ENDPOINT_ID."
exit 1
fi
echo "Sending prediction request to $ENDPOINT_URL..."
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/endpoints/${ENDPOINT_ID}/chat/completions" \
-d '{
"model": "'"$ENDPOINT_ID"'",
"messages": [
{
"role": "user",
"content": "'"$PROMPT"'"
}
]
}'#!/bin/bash
# Example script to undeploy a model
PROJECT_ID=$(gcloud config get-value project)
LOCATION_ID="us-central1"
# The model ID used during deployment (without the provider prefix sometimes, or exactly as listed in describe)
# It's usually easier to find the specific ID via `gcloud ai models list`
# For this example, let's assume we know the exact Endpoint ID and Deployed Model ID.
# 1. Find the Endpoint ID
echo "Listing endpoints in $LOCATION_ID:"
gcloud ai endpoints list --project=$PROJECT_ID --region=$LOCATION_ID
# (Assuming you extracted ENDPOINT_ID from the above output)
# ENDPOINT_ID="your_endpoint_id"
# 2. Find the Deployed Model ID
echo "Listing models in $LOCATION_ID to find model description:"
gcloud ai models list --project=$PROJECT_ID --region=$LOCATION_ID
# (Assuming you found the specific MODEL_ID)
# MODEL_ID="your_model_id"
# gcloud ai models describe $MODEL_ID --project=$PROJECT_ID --region=$LOCATION_ID
# (Extract the deployedModelId from the output)
# DEPLOYED_MODEL_ID="your_deployed_model_id"
# 3. Undeploy
echo "Undeploying model $DEPLOYED_MODEL_ID from endpoint $ENDPOINT_ID..."
gcloud ai endpoints undeploy-model $ENDPOINT_ID \
--project=$PROJECT_ID \
--region=$LOCATION_ID \
--deployed-model-id=$DEPLOYED_MODEL_ID
echo "Model undeployed."
# 4. Delete Endpoint
echo "Deleting endpoint $ENDPOINT_ID..."
gcloud ai endpoints delete $ENDPOINT_ID \
--project=$PROJECT_ID \
--region=$LOCATION_ID \
--quiet
echo "Endpoint deleted."
# 5. Delete Model
echo "Deleting model $MODEL_ID..."
gcloud ai models delete $MODEL_ID \
--project=$PROJECT_ID \
--region=$LOCATION_ID \
--quiet
echo "Model deleted."[!WARNING] Failing to undeploy a model will result in continuous charges for the allocated compute resources, even if you are not sending prediction requests. Always clean up after testing.
QUOTA_EXCEEDEDRESOURCE_EXHAUSTEDNVIDIA_L4g2-standard-24--region--machine-type[!WARNING] If the alternative suggestions involve changing the machine type or accelerator, you MUST recalculate the estimated cost using Agent Platform prediction pricing, warn the user about list prices versus actual billing, and get their explicit confirmation for the new cost before retrying the deployment.