Search Results: model-serving

Found 25 Skills

AI & Machine Learningdatabricks/app-templates

migrate-from-model-serving

Migrate an MLflow ResponsesAgent from Databricks Model Serving to Databricks Apps. Use when: (1) User wants to migrate from Model Serving to Apps, (2) User has a ResponsesAgent with predict()/predict_stream() methods, (3) User wants to convert to @invoke/@stream decorators.

🇺🇸|EnglishTranslated

AI & Machine Learningdatabricks/databricks-age...

databricks-model-serving

Manage Databricks Model Serving endpoints via CLI. Use when asked to create, configure, query, or manage model serving endpoints for LLM inference, custom models, or external models.

🇺🇸|EnglishTranslated

AI & Machine Learningancoleman/ai-design-compo...

model-serving

LLM and ML model deployment for inference. Use when serving models in production, building AI APIs, or optimizing inference. Covers vLLM (LLM serving), TensorRT-LLM (GPU optimization), Ollama (local), BentoML (ML deployment), Triton (multi-model), LangChain (orchestration), LlamaIndex (RAG), and streaming patterns.

🇺🇸|EnglishTranslated

5 scripts/Attention

AI & Machine Learningmicrosoft/azure-skills

airunway-aks-setup

Set up AI Runway on AKS — from bare cluster to running model. Covers cluster verification, controller install, GPU assessment, provider setup, and first deployment. WHEN: "setup AI Runway", "onboard AKS cluster", "install AI Runway", "airunway setup", "deploy model to AKS", "GPU inference on AKS", "KAITO setup on AKS", "run LLM on AKS", "vLLM on AKS", "set up model serving on AKS", "AI Runway controller".

🇺🇸|EnglishTranslated

71.0k

Backend Developmenttondevrel/scientific-agen...

fastapi-streamlit

Dual skill for deploying scientific models. FastAPI provides a high-performance, asynchronous web framework for building APIs with automatic documentation. Streamlit enables rapid creation of interactive data applications and dashboards directly from Python scripts. Load when working with web APIs, model serving, REST endpoints, interactive dashboards, data visualization UIs, scientific app deployment, async web frameworks, Pydantic validation, uvicorn, or building production-ready scientific tools.

🇺🇸|EnglishTranslated

AI & Machine Learningletta-ai/skills

hf-model-inference

Guidance for setting up HuggingFace model inference services with Flask APIs. This skill applies when downloading HuggingFace models, creating inference endpoints, or building ML model serving APIs. Use for tasks involving transformers library, model caching, and REST API creation for ML models.

🇺🇸|EnglishTranslated

AI & Machine Learningsickn33/antigravity-aweso...

ml-engineer

Build production ML systems with PyTorch 2.x, TensorFlow, and modern ML frameworks. Implements model serving, feature engineering, A/B testing, and monitoring. Use PROACTIVELY for ML model deployment, inference optimization, or production ML infrastructure.

🇺🇸|EnglishTranslated

AI & Machine Learningitsmostafa/llm-engineerin...

mlx

Running and fine-tuning LLMs on Apple Silicon with MLX. Use when working with models locally on Mac, converting Hugging Face models to MLX format, fine-tuning with LoRA/QLoRA on Apple Silicon, or serving models via HTTP API.

🇺🇸|EnglishTranslated

AI & Machine Learningjeremylongshore/claude-co...

onnx-converter

Onnx Converter - Auto-activating skill for ML Deployment. Triggers on: onnx converter, onnx converter Part of the ML Deployment skill category.

🇺🇸|EnglishTranslated

AI & Machine Learningeyadsibai/ltk

llm-inference

Use when "LLM inference", "serving LLM", "vLLM", "llama.cpp", "GGUF", "text generation", "model serving", "inference optimization", "KV cache", "continuous batching", "speculative decoding", "local LLM", "CPU inference"

🇺🇸|EnglishTranslated

AI & Machine Learningorchestra-research/ai-res...

fine-tuning-serving-openpi

Fine-tune and serve Physical Intelligence OpenPI models (pi0, pi0-fast, pi0.5) using JAX or PyTorch backends for robot policy inference across ALOHA, DROID, and LIBERO environments. Use when adapting pi0 models to custom datasets, converting JAX checkpoints to PyTorch, running policy inference servers, or debugging norm stats and GPU memory issues.

🇺🇸|EnglishTranslated

AI & Machine Learningancoleman/ai-design-compo...

implementing-mlops

Strategic guidance for operationalizing machine learning models from experimentation to production. Covers experiment tracking (MLflow, Weights & Biases), model registry and versioning, feature stores (Feast, Tecton), model serving patterns (Seldon, KServe, BentoML), ML pipeline orchestration (Kubeflow, Airflow), and model monitoring (drift detection, observability). Use when designing ML infrastructure, selecting MLOps platforms, implementing continuous training pipelines, or establishing model governance.

🇺🇸|EnglishTranslated

5 scripts/Attention