Loading...
Loading...
Found 27 Skills
Use when creating or revising model PR optimization history documents for SGLang, vLLM, or another serving framework that cite GitHub PRs. Requires manual, per-PR source-diff review and documentation of motivation, key implementation approach, most important code excerpts, reviewed files, and validation implications instead of generated or one-line summaries.
Strategic guidance for operationalizing machine learning models from experimentation to production. Covers experiment tracking (MLflow, Weights & Biases), model registry and versioning, feature stores (Feast, Tecton), model serving patterns (Seldon, KServe, BentoML), ML pipeline orchestration (Kubeflow, Airflow), and model monitoring (drift detection, observability). Use when designing ML infrastructure, selecting MLOps platforms, implementing continuous training pipelines, or establishing model governance.
Running and fine-tuning LLMs on Apple Silicon with MLX. Use when working with models locally on Mac, converting Hugging Face models to MLX format, fine-tuning with LoRA/QLoRA on Apple Silicon, or serving models via HTTP API.
Build production ML systems with PyTorch 2.x, TensorFlow, and modern ML frameworks. Implements model serving, feature engineering, A/B testing, and monitoring. Use PROACTIVELY for ML model deployment, inference optimization, or production ML infrastructure.
Onnx Converter - Auto-activating skill for ML Deployment. Triggers on: onnx converter, onnx converter Part of the ML Deployment skill category.
Dual skill for deploying scientific models. FastAPI provides a high-performance, asynchronous web framework for building APIs with automatic documentation. Streamlit enables rapid creation of interactive data applications and dashboards directly from Python scripts. Load when working with web APIs, model serving, REST endpoints, interactive dashboards, data visualization UIs, scientific app deployment, async web frameworks, Pydantic validation, uvicorn, or building production-ready scientific tools.
Vertex Ai Deployer - Auto-activating skill for ML Deployment. Triggers on: vertex ai deployer, vertex ai deployer Part of the ML Deployment skill category.
Use when "LLM inference", "serving LLM", "vLLM", "llama.cpp", "GGUF", "text generation", "model serving", "inference optimization", "KV cache", "continuous batching", "speculative decoding", "local LLM", "CPU inference"
Flask Ml Api Creator - Auto-activating skill for ML Deployment. Triggers on: flask ml api creator, flask ml api creator Part of the ML Deployment skill category.
Model Export Helper - Auto-activating skill for ML Deployment. Triggers on: model export helper, model export helper Part of the ML Deployment skill category.
Builds production AI/ML systems — model training, fine-tuning, MLOps pipelines, model serving, evaluation frameworks, RAG optimization, and agent orchestration at scale. Use when the user asks to build, train, or deploy ML models, set up MLOps pipelines, optimize RAG systems, create inference endpoints, or design production AI agents.
Triton Inference Config - Auto-activating skill for ML Deployment. Triggers on: triton inference config, triton inference config Part of the ML Deployment skill category.