Search Results: uat

Found 1,942 Skills

AI & Machine Learningsupercent-io/skills-templ...

agent-evaluation

Design and implement comprehensive evaluation systems for AI agents. Use when building evals for coding agents, conversational agents, research agents, or computer-use agents. Covers grader types, benchmarks, 8-step roadmap, and production integration.

🇺🇸|EnglishTranslated

10.1k

Tools & Utilitiesk-dense-ai/claude-scienti...

scholar-evaluation

Systematically evaluate scholarly work using the ScholarEval framework, providing structured assessment across research quality dimensions including problem formulation, methodology, analysis, and writing with quantitative scoring and actionable feedback.

🇺🇸|EnglishTranslated

130

1 scripts/Checked

AI & Machine Learningmicrosoft/agent-skills

azure-ai-evaluation-py

Azure AI Evaluation SDK for Python. Use for evaluating generative AI applications with quality, safety, agent, and custom evaluators. Triggers: "azure-ai-evaluation", "evaluators", "GroundednessEvaluator", "evaluate", "AI quality metrics", "RedTeam", "agent evaluation".

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningwshobson/agents

llm-evaluation

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

🇺🇸|EnglishTranslated

Project Managementlyndonkl/claude

evaluation-rubrics

Use when need explicit quality criteria and scoring scales to evaluate work consistently, compare alternatives objectively, set acceptance thresholds, reduce subjective bias, or when user mentions rubric, scoring criteria, quality standards, evaluation framework, inter-rater reliability, or grade/assess work.

🇺🇸|EnglishTranslated

AI & Machine Learningarize-ai/arize-skills

arize-evaluator

INVOKE THIS SKILL for LLM-as-judge evaluation workflows on Arize: creating/updating evaluators, running evaluations on spans or experiments, tasks, trigger-run, column mapping, and continuous monitoring. Use when the user says: create an evaluator, LLM judge, hallucination/faithfulness/correctness/relevance, run eval, score my spans or experiment, ax tasks, trigger-run, trigger eval, column mapping, continuous monitoring, query filter for evals, evaluator version, or improve an evaluator prompt.

🇺🇸|EnglishTranslated

Data Processingclaude-office-skills/skil...

dcf-valuation

Build Discounted Cash Flow (DCF) valuation models. Calculate intrinsic value with customizable assumptions. Generate professional valuation reports.

🇺🇸|EnglishTranslated

AI & Machine Learningshipshitdev/library

evaluation

Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.

🇺🇸|EnglishTranslated

1 scripts/Checked

Backend Developmentgiuseppe-trisciuoglio/dev...

spring-boot-actuator

Configure Spring Boot Actuator for production-grade monitoring, health probes, secured management endpoints, and Micrometer metrics across JVM services.

🇺🇸|EnglishTranslated

AI & Machine Learningflora131/atomic

evaluation

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge, multi-dimensional evaluation, agent testing, or quality gates for agent pipelines. Part of the context engineering skill suite — also activates when the user mentions "context engineering" or "context-engineering" in the context of measuring agent effectiveness.

🇺🇸|EnglishTranslated

1 scripts/Checked

Documentation & Writingovachiever/droid-tings

scholar-evaluation

Systematic framework for evaluating scholarly and research work based on the ScholarEval methodology. This skill should be used when assessing research papers, evaluating literature reviews, scoring research methodologies, analyzing scientific writing quality, or applying structured evaluation criteria to academic work. Provides comprehensive assessment across multiple dimensions including problem formulation, literature review, methodology, data collection, analysis, results interpretation, and scholarly writing quality.

🇺🇸|EnglishTranslated

1 scripts/Checked

Project Managementoldwinter/skills

evaluating-trade-offs

Evaluate trade-offs and produce a Trade-off Evaluation Pack (trade-off brief, options+criteria matrix, all-in cost/opportunity cost table, impact ranges, recommendation, stop/continue triggers). Use for tradeoff/trade-off, pros and cons, cost-benefit, opportunity cost, build vs buy, ship fast vs ship better, continue vs stop (sunk costs). Category: Leadership.

🇺🇸|EnglishTranslated