Loading...
Loading...
Compare original and translation side by side
databricks-coredatabricks-core| Type | When to Use | Key Detail |
|---|---|---|
| Pay-per-token | Foundation Model APIs (Llama, DBRX, etc.) | Uses |
| Provisioned throughput | Dedicated GPU capacity | Guaranteed throughput, higher cost |
| Custom model | Your own MLflow models or containers | Deploy any model with an MLflow signature |
| 类型 | 适用场景 | 核心细节 |
|---|---|---|
| 按Token付费 | 基础模型API(Llama、DBRX等) | 使用 |
| 预置吞吐量 | 专用GPU容量 | 吞吐量有保障,成本更高 |
| 自定义模型 | 您自己的MLflow模型或容器 | 可部署任何带有MLflow签名的模型 |
Serving Endpoint (top-level, identified by NAME)
├── Config
│ ├── Served Entities (model references + scaling config)
│ └── Traffic Config (routing percentages across entities)
├── AI Gateway (rate limits, usage tracking)
└── State (READY / NOT_READY, config_update status)served_entities[].namegetbuild-logslogsNOT_READYREADYgetstate.readyServing Endpoint (top-level, identified by NAME)
├── Config
│ ├── Served Entities (model references + scaling config)
│ └── Traffic Config (routing percentages across entities)
├── AI Gateway (rate limits, usage tracking)
└── State (READY / NOT_READY, config_update status)getserved_entities[].namebuild-logslogsNOT_READYREADYgetstate.readyundefinedundefined
Run `databricks serving-endpoints -h` before constructing any command. Run `databricks serving-endpoints <subcommand> -h` to discover exact flags, positional arguments, and JSON spec fields for that subcommand.
在构造任何命令前请先运行`databricks serving-endpoints -h`。运行`databricks serving-endpoints <子命令> -h`可以查询该子命令的具体参数、位置参数和JSON规范字段。Do NOT list endpoints before creating.
databricks serving-endpoints create <ENDPOINT_NAME> \
--json '{
"served_entities": [{
"entity_name": "<MODEL_CATALOG_PATH>",
"entity_version": "<VERSION>",
"min_provisioned_throughput": 0,
"max_provisioned_throughput": 0,
"workload_size": "Small"
}],
"traffic_config": {
"routes": [{
"served_entity_name": "<ENTITY_NAME>",
"traffic_percentage": 100
}]
}
}' --profile <PROFILE>system.ai--no-waitdatabricks serving-endpoints get <ENDPOINT_NAME> --profile <PROFILE>
# Check: state.ready == "READY"databricks serving-endpoints create -h创建前无需列出所有端点。
databricks serving-endpoints create <ENDPOINT_NAME> \
--json '{
"served_entities": [{
"entity_name": "<MODEL_CATALOG_PATH>",
"entity_version": "<VERSION>",
"min_provisioned_throughput": 0,
"max_provisioned_throughput": 0,
"workload_size": "Small"
}],
"traffic_config": {
"routes": [{
"served_entity_name": "<ENTITY_NAME>",
"traffic_percentage": 100
}]
}
}' --profile <PROFILE>system.ai--no-waitdatabricks serving-endpoints get <ENDPOINT_NAME> --profile <PROFILE>
# 检查:state.ready == "READY"databricks serving-endpoints create -hdatabricks serving-endpoints query <ENDPOINT_NAME> \
--json '{"messages": [{"role": "user", "content": "Hello, how are you?"}]}' \
--profile <PROFILE>--streamget-open-api <ENDPOINT_NAME>databricks serving-endpoints query <ENDPOINT_NAME> \
--json '{"messages": [{"role": "user", "content": "Hello, how are you?"}]}' \
--profile <PROFILE>--streamget-open-api <端点名称>databricks serving-endpoints get-open-api <ENDPOINT_NAME> --profile <PROFILE>/served-models/<model-name>/invocationsdatabricks serving-endpoints get-open-api <ENDPOINT_NAME> --profile <PROFILE>/served-models/<模型名称>/invocationsdatabricks serving-endpoints <subcommand> -h| Task | Command | Notes |
|---|---|---|
| List all endpoints | | |
| Get endpoint details | | Shows state, config, served entities |
| Delete endpoint | | |
| Update served entities or traffic | | Zero-downtime: old config serves until new is ready |
| Rate limits & usage tracking | | |
| Update tags | | |
| Build logs | | Get |
| Runtime logs | | |
| Metrics (Prometheus format) | | |
| Permissions | | ⚠️ Uses endpoint ID (hex string), not name. Find ID via |
databricks serving-endpoints <子命令> -h| 任务 | 命令 | 备注 |
|---|---|---|
| 列出所有端点 | | |
| 获取端点详情 | | 展示状态、配置、已服务实体 |
| 删除端点 | | |
| 更新已服务实体或流量配置 | | 零停机:新配置就绪前旧配置会持续提供服务 |
| 速率限制与用量追踪 | | |
| 更新标签 | | |
| 构建日志 | | 从 |
| 运行时日志 | | |
| 指标(Prometheus格式) | | |
| 权限 | | ⚠️ 使用端点ID(十六进制字符串)而非名称,可通过 |
servingdatabricks apps manifest --profile <PROFILE>servingdatabricks apps init --name <APP_NAME> \
--features serving \
--set "serving.serving-endpoint.name=<ENDPOINT_NAME>" \
--run none --profile <PROFILE>servingdatabricks.ymlresources:
apps:
my_app:
resources:
- name: my-model-endpoint
serving_endpoint:
name: <ENDPOINT_NAME>
permission: CAN_QUERYapp.yamlenv:
- name: SERVING_ENDPOINT
valueFrom: serving-endpointdatabricks-appsservingdatabricks apps manifest --profile <PROFILE>servingdatabricks apps init --name <APP_NAME> \
--features serving \
--set "serving.serving-endpoint.name=<ENDPOINT_NAME>" \
--run none --profile <PROFILE>servingdatabricks.ymlresources:
apps:
my_app:
resources:
- name: my-model-endpoint
serving_endpoint:
name: <ENDPOINT_NAME>
permission: CAN_QUERYapp.yamlenv:
- name: SERVING_ENDPOINT
valueFrom: serving-endpointdatabricks-apps| Error | Solution |
|---|---|
| Use |
| Check workspace permissions; for apps, ensure |
Endpoint stuck in | Check |
| Verify endpoint name with |
| Query returns 404 | Endpoint may still be provisioning; check |
| AI Gateway rate limit; check |
| 错误 | 解决方案 |
|---|---|
| 使用 |
| 检查工作区权限;对于应用,请确保 |
端点一直处于 | 检查已服务模型的 |
| 通过 |
| 查询返回404 | 端点可能还在部署中,通过 |
| 触发AI Gateway速率限制;检查 |