Loading...
Loading...
Found 27 Skills
LLM and ML model deployment for inference. Use when serving models in production, building AI APIs, or optimizing inference. Covers vLLM (LLM serving), TensorRT-LLM (GPU optimization), Ollama (local), BentoML (ML deployment), Triton (multi-model), LangChain (orchestration), LlamaIndex (RAG), and streaming patterns.
Migrate an MLflow ResponsesAgent from Databricks Model Serving to Databricks Apps. Use when: (1) User wants to migrate from Model Serving to Apps, (2) User has a ResponsesAgent with predict()/predict_stream() methods, (3) User wants to convert to @invoke/@stream decorators.
Manage Databricks Model Serving endpoints via CLI. Use when asked to create, configure, query, or manage model serving endpoints for LLM inference, custom models, or external models.
Set up AI Runway on AKS — from bare cluster to running model. Covers cluster verification, controller install, GPU assessment, provider setup, and first deployment. WHEN: "setup AI Runway", "onboard AKS cluster", "install AI Runway", "airunway setup", "deploy model to AKS", "GPU inference on AKS", "KAITO setup on AKS", "run LLM on AKS", "vLLM on AKS", "set up model serving on AKS", "AI Runway controller".
Deploy and serve TensorFlow models
Sagemaker Endpoint Deployer - Auto-activating skill for ML Deployment. Triggers on: sagemaker endpoint deployer, sagemaker endpoint deployer Part of the ML Deployment skill category.
Gpu Resource Optimizer - Auto-activating skill for ML Deployment. Triggers on: gpu resource optimizer, gpu resource optimizer Part of the ML Deployment skill category.
Fine-tune and serve Physical Intelligence OpenPI models (pi0, pi0-fast, pi0.5) using JAX or PyTorch backends for robot policy inference across ALOHA, DROID, and LIBERO environments. Use when adapting pi0 models to custom datasets, converting JAX checkpoints to PyTorch, running policy inference servers, or debugging norm stats and GPU memory issues.
Feature Store Connector - Auto-activating skill for ML Deployment. Triggers on: feature store connector, feature store connector Part of the ML Deployment skill category.
Guidance for setting up HuggingFace model inference services with Flask APIs. This skill applies when downloading HuggingFace models, creating inference endpoints, or building ML model serving APIs. Use for tasks involving transformers library, model caching, and REST API creation for ML models.
Model Registry Manager - Auto-activating skill for ML Deployment. Triggers on: model registry manager, model registry manager Part of the ML Deployment skill category.
Azure Ml Deployer - Auto-activating skill for ML Deployment. Triggers on: azure ml deployer, azure ml deployer Part of the ML Deployment skill category.