Loading...
Loading...
Found 19 Skills
Machine learning in Python with scikit-learn. Use when working with supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), model evaluation, hyperparameter tuning, preprocessing, or building ML pipelines. Provides comprehensive reference documentation for algorithms, preprocessing techniques, pipelines, and best practices.
Expert data scientist for advanced analytics, machine learning, and statistical modeling. Handles complex data analysis, predictive modeling, and business intelligence. Use PROACTIVELY for data analysis tasks, ML modeling, statistical analysis, and data-driven insights.
Industry-standard gradient boosting libraries for tabular data and structured datasets. XGBoost and LightGBM excel at classification and regression tasks on tables, CSVs, and databases. Use when working with tabular machine learning, gradient boosting trees, Kaggle competitions, feature importance analysis, hyperparameter tuning, or when you need state-of-the-art performance on structured data.
Use ONLY when creating NEW registrable components in ML projects that require Factory/Registry patterns. ✅ USE when: - Creating a new Dataset class (needs @register_dataset) - Creating a new Model class (needs @register_model) - Creating a new module directory with __init__.py factory - Initializing a new ML project structure from scratch - Adding new component types (Augmentation, CollateFunction, Metrics) ❌ DO NOT USE when: - Modifying existing functions or methods - Fixing bugs in existing code - Adding helper functions or utilities - Refactoring without adding new registrable components - Simple code changes to a single file - Modifying configuration files - Reading or understanding existing code Key indicator: Does the task require @register_* decorator or Factory pattern? If no, skip this skill.
Use Transformers.js to run state-of-the-art machine learning models directly in JavaScript/TypeScript. Supports NLP (text classification, translation, summarization), computer vision (image classification, object detection), audio (speech recognition, audio classification), and multimodal tasks. Works in Node.js and browsers (with WebGPU/WASM) using pre-trained models from Hugging Face Hub.
Analyze datasets by running clustering algorithms (K-means, DBSCAN, hierarchical) to identify data groups. Use when requesting "run clustering", "cluster analysis", or "group data points". Trigger with relevant phrases based on skill purpose.
Expert in statistical analysis, predictive modeling, machine learning, and data storytelling to drive business insights.
Debug Scikit-learn issues systematically. Use when encountering model errors like NotFittedError, shape mismatches between train and test data, NaN/infinity value errors, pipeline configuration issues, convergence warnings from optimizers, cross-validation failures due to class imbalance, data leakage causing suspiciously high scores, or preprocessing errors with ColumnTransformer and feature alignment.
Refactor Scikit-learn and machine learning code to improve maintainability, reproducibility, and adherence to best practices. This skill transforms working ML code into production-ready pipelines that prevent data leakage and ensure reproducible results. It addresses preprocessing outside pipelines, missing random_state parameters, improper cross-validation, and custom transformers not following sklearn API conventions. Implements proper Pipeline and ColumnTransformer patterns, systematic hyperparameter tuning, and appropriate evaluation metrics.
Best practices for scikit-learn machine learning, model development, evaluation, and deployment in Python
Open-source cheminformatics and machine learning toolkit for drug discovery, molecular manipulation, and chemical property calculation. RDKit handles SMILES, molecular fingerprints, substructure searching, 3D conformer generation, pharmacophore modeling, and QSAR. Use when working with chemical structures, drug-like properties, molecular similarity, virtual screening, or computational chemistry workflows.
Composable transformations of Python+NumPy programs. Differentiate, vectorize, JIT-compile to GPU/TPU. Built for high-performance machine learning research and complex scientific simulations. Use for automatic differentiation, GPU/TPU acceleration, higher-order derivatives, physics-informed machine learning, differentiable simulations, and automatic vectorization.