Search Results: model-evaluation

Found 24 Skills

model-evaluation-benchmark

Automated reproduction of comprehensive model evaluation benchmarks following the Benchmark Suite V3. Auto-activates for model benchmarking, comparison evaluation, or performance testing between AI models.

🇺🇸|EnglishTranslated

AI & Machine Learningcosmix/claude-loom

loom-model-evaluation

Evaluates ML models for performance, fairness, and reliability. Use for metric selection, cross-validation strategies, overfitting/underfitting diagnosis, hyperparameter tuning, LLM evaluation, A/B testing, and production monitoring for model drift.

🇺🇸|EnglishTranslated

AI & Machine Learningawslabs/agent-plugins

model-evaluation

Generates a Jupyter notebook that evaluates a fine-tuned SageMaker model using LLM-as-a-Judge. Use when the user says "evaluate my model", "how did my model perform", "compare models", or after a training job completes. Supports built-in and custom evaluation metrics, evaluation dataset setup, and judge model selection.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningk-dense-ai/claude-scienti...

scikit-learn

Machine learning in Python with scikit-learn. Use when working with supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), model evaluation, hyperparameter tuning, preprocessing, or building ML pipelines. Provides comprehensive reference documentation for algorithms, preprocessing techniques, pipelines, and best practices.

🇺🇸|EnglishTranslated

110

2 scripts/Checked

Data Processingk-dense-ai/claude-scienti...

scikit-survival

Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.

🇺🇸|EnglishTranslated

AI & Machine Learningarize-ai/arize-skills

arize-experiment

INVOKE THIS SKILL when creating, running, or analyzing Arize experiments. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI.

🇺🇸|EnglishTranslated

AI & Machine Learningborghei/claude-skills

data-scientist

Expert data science covering machine learning, statistical modeling, experimentation, predictive analytics, and advanced analytics.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

evaluating-code-models

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language support, or measuring code generation quality. Industry standard from BigCode Project used by HuggingFace leaderboards.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

evaluating-llms-harness

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

🇺🇸|EnglishTranslated

Data Processingterrylica/cc-skills

rangebar-eval-metrics

Range bar evaluation metrics for quant trading. TRIGGERS - range bar metrics, Sharpe ratio, WFO metrics, PSR DSR MinTRL.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learning89jobrien/steve

machine-learning

Machine learning development patterns, model training, evaluation, and deployment. Use when building ML pipelines, training models, feature engineering, model evaluation, or deploying ML systems to production.

🇺🇸|EnglishTranslated

AI & Machine Learninghuggingface/skills

hugging-face-evaluation

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.

🇺🇸|EnglishTranslated

8 scripts/Checked