Loading...
Loading...
Found 33 Skills
Track which optimization experiment was best. Use when you've run multiple optimization passes, need to compare experiments, want to reproduce past results, need to pick the best prompt configuration, track experiment costs, manage optimization artifacts, decide which optimized program to deploy, or justify your choice to stakeholders. Covers experiment logging, comparison, and promotion to production.
MLflow, model versioning, experiment tracking, model registry, and production ML systems
Comprehensive primary skill for agents working with Weights & Biases. Covers both the W&B SDK (training runs, metrics, artifacts, sweeps) and the Weave SDK (GenAI traces, evaluations, scorers). Includes helper libraries, gotcha tables, and data analysis patterns. Use this skill whenever the user asks about W&B runs, Weave traces, evaluations, training metrics, loss curves, model comparisons, or any Weights & Biases data — even if they don't say "W&B" explicitly.
Use this skill when deploying ML models to production, setting up model monitoring, implementing A/B testing for models, or managing feature stores. Triggers on model deployment, model serving, ML pipelines, feature engineering, model versioning, data drift detection, model registry, experiment tracking, and any task requiring machine learning operations infrastructure.
Self-directed iterative improvement system for Codex that cycles through modify, verify, retain/discard indefinitely
Hyperparameter Tuner - Auto-activating skill for ML Training. Triggers on: hyperparameter tuner, hyperparameter tuner Part of the ML Training skill category.
Monitor running experiments, check progress, collect results. Use when user says "check results", "is it done", "monitor", or wants experiment output.
Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B - collaborative MLOps platform
You are **Experiment Tracker**, an expert project manager who specializes in experiment design, execution tracking, and data-driven decision making. You systematically manage A/B tests, feature exp...