Loading...
Loading...
Manage Databricks Model Serving endpoints via CLI. Use when asked to create, configure, query, or manage model serving endpoints for LLM inference, custom models, or external models.
npx skill4agent add databricks/databricks-agent-skills databricks-model-servingdatabricks-core| Type | When to Use | Key Detail |
|---|---|---|
| Pay-per-token | Foundation Model APIs (Llama, DBRX, etc.) | Uses |
| Provisioned throughput | Dedicated GPU capacity | Guaranteed throughput, higher cost |
| Custom model | Your own MLflow models or containers | Deploy any model with an MLflow signature |
Serving Endpoint (top-level, identified by NAME)
├── Config
│ ├── Served Entities (model references + scaling config)
│ └── Traffic Config (routing percentages across entities)
├── AI Gateway (rate limits, usage tracking)
└── State (READY / NOT_READY, config_update status)served_entities[].namegetbuild-logslogsNOT_READYREADYgetstate.ready# List all serving-endpoints subcommands
databricks serving-endpoints -h
# Get detailed usage for any subcommand (flags, args, JSON fields)
databricks serving-endpoints <subcommand> -hdatabricks serving-endpoints -hdatabricks serving-endpoints <subcommand> -hDo NOT list endpoints before creating.
databricks serving-endpoints create <ENDPOINT_NAME> \
--json '{
"served_entities": [{
"entity_name": "<MODEL_CATALOG_PATH>",
"entity_version": "<VERSION>",
"min_provisioned_throughput": 0,
"max_provisioned_throughput": 0,
"workload_size": "Small"
}],
"traffic_config": {
"routes": [{
"served_entity_name": "<ENTITY_NAME>",
"traffic_percentage": 100
}]
}
}' --profile <PROFILE>system.ai--no-waitdatabricks serving-endpoints get <ENDPOINT_NAME> --profile <PROFILE>
# Check: state.ready == "READY"databricks serving-endpoints create -hdatabricks serving-endpoints query <ENDPOINT_NAME> \
--json '{"messages": [{"role": "user", "content": "Hello, how are you?"}]}' \
--profile <PROFILE>--streamget-open-api <ENDPOINT_NAME>databricks serving-endpoints get-open-api <ENDPOINT_NAME> --profile <PROFILE>/served-models/<model-name>/invocationsdatabricks serving-endpoints <subcommand> -h| Task | Command | Notes |
|---|---|---|
| List all endpoints | | |
| Get endpoint details | | Shows state, config, served entities |
| Delete endpoint | | |
| Update served entities or traffic | | Zero-downtime: old config serves until new is ready |
| Rate limits & usage tracking | | |
| Update tags | | |
| Build logs | | Get |
| Runtime logs | | |
| Metrics (Prometheus format) | | |
| Permissions | | ⚠️ Uses endpoint ID (hex string), not name. Find ID via |
servingdatabricks apps manifest --profile <PROFILE>servingdatabricks apps init --name <APP_NAME> \
--features serving \
--set "serving.serving-endpoint.name=<ENDPOINT_NAME>" \
--run none --profile <PROFILE>servingdatabricks.ymlresources:
apps:
my_app:
resources:
- name: my-model-endpoint
serving_endpoint:
name: <ENDPOINT_NAME>
permission: CAN_QUERYapp.yamlenv:
- name: SERVING_ENDPOINT
valueFrom: serving-endpointdatabricks-apps| Error | Solution |
|---|---|
| Use |
| Check workspace permissions; for apps, ensure |
Endpoint stuck in | Check |
| Verify endpoint name with |
| Query returns 404 | Endpoint may still be provisioning; check |
| AI Gateway rate limit; check |