Loading...
Loading...
Found 20 Skills
LLM and ML model deployment for inference. Use when serving models in production, building AI APIs, or optimizing inference. Covers vLLM (LLM serving), TensorRT-LLM (GPU optimization), Ollama (local), BentoML (ML deployment), Triton (multi-model), LangChain (orchestration), LlamaIndex (RAG), and streaming patterns.
Migrate an MLflow ResponsesAgent from Databricks Model Serving to Databricks Apps. Use when: (1) User wants to migrate from Model Serving to Apps, (2) User has a ResponsesAgent with predict()/predict_stream() methods, (3) User wants to convert to @invoke/@stream decorators.
Running and fine-tuning LLMs on Apple Silicon with MLX. Use when working with models locally on Mac, converting Hugging Face models to MLX format, fine-tuning with LoRA/QLoRA on Apple Silicon, or serving models via HTTP API.
Эксперт ML API. Используй для model serving, inference endpoints, FastAPI и ML deployment.
Guidance for setting up HuggingFace model inference services with Flask APIs. This skill applies when downloading HuggingFace models, creating inference endpoints, or building ML model serving APIs. Use for tasks involving transformers library, model caching, and REST API creation for ML models.
Deploy and serve TensorFlow models
Build production ML systems with PyTorch 2.x, TensorFlow, and modern ML frameworks. Implements model serving, feature engineering, A/B testing, and monitoring. Use PROACTIVELY for ML model deployment, inference optimization, or production ML infrastructure.
Onnx Converter - Auto-activating skill for ML Deployment. Triggers on: onnx converter, onnx converter Part of the ML Deployment skill category.
Dual skill for deploying scientific models. FastAPI provides a high-performance, asynchronous web framework for building APIs with automatic documentation. Streamlit enables rapid creation of interactive data applications and dashboards directly from Python scripts. Load when working with web APIs, model serving, REST endpoints, interactive dashboards, data visualization UIs, scientific app deployment, async web frameworks, Pydantic validation, uvicorn, or building production-ready scientific tools.
Strategic guidance for operationalizing machine learning models from experimentation to production. Covers experiment tracking (MLflow, Weights & Biases), model registry and versioning, feature stores (Feast, Tecton), model serving patterns (Seldon, KServe, BentoML), ML pipeline orchestration (Kubeflow, Airflow), and model monitoring (drift detection, observability). Use when designing ML infrastructure, selecting MLOps platforms, implementing continuous training pipelines, or establishing model governance.
Use when "LLM inference", "serving LLM", "vLLM", "llama.cpp", "GGUF", "text generation", "model serving", "inference optimization", "KV cache", "continuous batching", "speculative decoding", "local LLM", "CPU inference"
Triton Inference Config - Auto-activating skill for ML Deployment. Triggers on: triton inference config, triton inference config Part of the ML Deployment skill category.