Search Results: model-evaluation

Found 14 Skills

AI & Machine Learningk-dense-ai/claude-scienti...

scikit-learn

Machine learning in Python with scikit-learn. Use when working with supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), model evaluation, hyperparameter tuning, preprocessing, or building ML pipelines. Provides comprehensive reference documentation for algorithms, preprocessing techniques, pipelines, and best practices.

🇺🇸|EnglishTranslated

109

2 scripts/Checked

Data Processingk-dense-ai/claude-scienti...

scikit-survival

Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.

🇺🇸|EnglishTranslated

AI & Machine Learningborghei/claude-skills

data-scientist

Expert data science covering machine learning, statistical modeling, experimentation, predictive analytics, and advanced analytics.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

evaluating-llms-harness

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

🇺🇸|EnglishTranslated

AI & Machine Learningtondevrel/scientific-agen...

scikit-learn

The industry standard library for machine learning in Python. Provides simple and efficient tools for predictive data analysis, covering classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.

🇺🇸|EnglishTranslated

AI & Machine Learning89jobrien/steve

machine-learning

Machine learning development patterns, model training, evaluation, and deployment. Use when building ML pipelines, training models, feature engineering, model evaluation, or deploying ML systems to production.

🇺🇸|EnglishTranslated

AI & Machine Learningmindrally/skills

scikit-learn-best-practices

Best practices for scikit-learn machine learning, model development, evaluation, and deployment in Python

🇺🇸|EnglishTranslated

AI & Machine Learningqodex-ai/ai-agent-skills

llm-fine-tuning-guide

Master fine-tuning of large language models for specific domains and tasks. Covers data preparation, training techniques, optimization strategies, and evaluation methods. Use when adapting models for specialized applications, reducing inference costs, or improving domain-specific performance.

🇺🇸|EnglishTranslated

5 scripts/Attention

AI & Machine Learningdaemon-blockint-tech/agen...

ml-research-engineer-safeguards

Guides ML/research engineering for safeguards—safety classifier development, harm benchmarks and eval suites, labeled dataset design, fine-tuning and ablations, calibration and slice analysis, attack-surface research memos, and promotion criteria for new moderation models. Use when building or evaluating guardrail models, designing safety benchmarks, measuring precision/recall on policy categories, comparing mitigation techniques, or writing research reports on classifier improvements—not for production inference gateways (ml-infrastructure-engineer-safeguards), PII/leakage privacy research (privacy-research-engineer-safeguards), red-team attack campaigns (ai-redteam), AI governance policy (ai-risk-governance), general non-safety research (ai-researcher), or token-efficiency studies (research-engineer-scientist-tokens).

🇺🇸|EnglishTranslated

AI & Machine Learningpromptingcompany/nv-skill...

tao-analyze-gaps-visual-changenet

Performs gap analysis on NVIDIA TAO Visual ChangeNet (VCN) Classify experiments by invoking the data-services container (`tao_toolkit.data_services` from `versions.yaml`) directly via `docker run … gap_analysis vcn_aoi …` — picks the optimal decision threshold, ranks per-sample weakness, and emits a top-K weakest parquet expanded per-lighting for downstream augmentation. Use when analyzing VCN classification failures, picking SDA augmentation targets, auditing PASS/NO_PASS boundary cases, or running DEFT gap analysis on an AOI ChangeNet model.

🇺🇸|EnglishTranslated

6 scripts/Attention

AI & Machine Learningnvidia/skills

accessing-mlflow

Query and browse evaluation results stored in MLflow. Use when the user wants to look up runs by invocation ID, compare metrics across models, fetch artifacts (configs, logs, results), or set up the MLflow MCP server. ALWAYS triggers on mentions of MLflow, experiment results, run comparison, invocation IDs in the context of results, or MLflow MCP setup.

🇺🇸|EnglishTranslated

AI & Machine Learninggemini-cli-extensions/dat...

ml-best-practices

CRITICAL RULE: You MUST use this skill whenever the task involves any machine learning tasks or data analysis. Use this skill if the user's prompt or requirements mention any of the following: * Clustering * Classification * Regression * Time series forecasting * Statistical testing * Model comparison * ML * Data analysis SQL/BigQuery ML HANDOFF: If the user requires a SQL solution, use this skill to dictate the ANALYSIS STEPS (e.g., markdown analysis cells, visualization logic), but defer to `bigquery` for all SQL syntax.

🇺🇸|EnglishTranslated