Search Results: eval-framework

Found 5 Skills

Tools & Utilitiesk-dense-ai/claude-scienti...

scholar-evaluation

Systematically evaluate scholarly work using the ScholarEval framework, providing structured assessment across research quality dimensions including problem formulation, methodology, analysis, and writing with quantitative scoring and actionable feedback.

🇺🇸|EnglishTranslated

131

1 scripts/Checked

Documentation & Writingovachiever/droid-tings

scholar-evaluation

Systematic framework for evaluating scholarly and research work based on the ScholarEval methodology. This skill should be used when assessing research papers, evaluating literature reviews, scoring research methodologies, analyzing scientific writing quality, or applying structured evaluation criteria to academic work. Provides comprehensive assessment across multiple dimensions including problem formulation, literature review, methodology, data collection, analysis, results interpretation, and scholarly writing quality.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learninghuggingface/skills

hugging-face-evaluation

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.

🇺🇸|EnglishTranslated

8 scripts/Checked

AI & Machine Learning404kidwiz/claude-supercod...

performance-monitor

Expert in observing, benchmarking, and optimizing AI agents. Specializes in token usage tracking, latency analysis, and quality evaluation metrics. Use when optimizing agent costs, measuring performance, or implementing evals. Triggers include "agent performance", "token usage", "latency optimization", "eval", "agent metrics", "cost optimization", "agent benchmarking".

🇺🇸|EnglishTranslated

AI & Machine Learningcodagent-ai/agent-validat...

capture-eval-issues

Capture noteworthy review violations for the eval framework. Use when validator-run finds review failures — judges violations and saves notable ones to evals/inventory.yml.

🇺🇸|EnglishTranslated

1 scripts/Checked