Loading...
Loading...
Found 32 Skills
Use when asked to compare multiple ML models, perform cross-validation, evaluate metrics, or select the best model for a classification/regression task.
Analyze datasets by running clustering algorithms (K-means, DBSCAN, hierarchical) to identify data groups. Use when requesting "run clustering", "cluster analysis", or "group data points". Trigger with relevant phrases based on skill purpose.
MLflow integration. Manage data, records, and automate workflows. Use when the user wants to interact with MLflow data.
Optimize machine learning model hyperparameters using grid search, random search, or Bayesian optimization. Finds best parameter configurations to maximize performance. Use when asked to "tune hyperparameters" or "optimize model". Trigger with relevant phrases based on skill purpose.
Roc Curve Plotter - Auto-activating skill for ML Training. Triggers on: roc curve plotter, roc curve plotter Part of the ML Training skill category.
Generates a Jupyter notebook that transforms datasets between ML schemas for model training or evaluation. Use when the user says "transform", "convert", "reformat", "change the format", or when a dataset's schema needs to change to match the target format — always use this skill for format changes rather than writing inline transformation code. Supports OpenAI chat, SageMaker SFT/DPO/RLVR, HuggingFace preference, Bedrock Nova, VERL, and custom JSONL formats from local files or S3.
Debug Scikit-learn issues systematically. Use when encountering model errors like NotFittedError, shape mismatches between train and test data, NaN/infinity value errors, pipeline configuration issues, convergence warnings from optimizers, cross-validation failures due to class imbalance, data leakage causing suspiciously high scores, or preprocessing errors with ColumnTransformer and feature alignment.
Refactor Scikit-learn and machine learning code to improve maintainability, reproducibility, and adherence to best practices. This skill transforms working ML code into production-ready pipelines that prevent data leakage and ensure reproducible results. It addresses preprocessing outside pipelines, missing random_state parameters, improper cross-validation, and custom transformers not following sklearn API conventions. Implements proper Pipeline and ColumnTransformer patterns, systematic hyperparameter tuning, and appropriate evaluation metrics.