Search Results: ai-benchmark

Found 4 Skills

huggingface-best

Use when the user asks about finding the best, top, or recommended model for a task, wants to know what AI model to use, or wants to compare models by benchmark scores. Triggers on: "best model for X", "what model should I use for", "top models for [task]", "which model runs on my laptop/machine/device", "recommend a model for", "what LLM should I use for", "compare models for", "what's state of the art for", or any question about choosing an AI model for a specific use case. Always use this skill when the user wants model recommendations or comparisons, even if they don't explicitly mention HuggingFace or benchmarks.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

pytdc

Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction.

🇺🇸|EnglishTranslated

3 scripts/Checked

AI & Machine Learningruvnet/ruflo

gaia-debugging

Diagnose why a GAIA question failed — extract trace, classify failure mode, and propose a fix

🇺🇸|EnglishTranslated

AI & Machine Learningexploreomni/omni-agent-sk...

omni-ai-eval

Evaluate Omni AI query generation accuracy by running test prompts through the Omni CLI, comparing generated query JSON against expected results, and scoring accuracy. Use this skill whenever someone wants to evaluate Omni AI, benchmark Blobby, run regression tests, compare AI output across branches or configurations, test prompt variations, measure AI quality, run A/B tests on model changes, assess impact of context changes, or any variant of "run evals", "test Blobby", "benchmark query generation", "compare AI results", "regression test", "how accurate is the AI", or "measure the impact of my changes".

🇺🇸|EnglishTranslated