Search Results: uat

Found 1,746 Skills

evaluation

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge, multi-dimensional evaluation, agent testing, or quality gates for agent pipelines. Part of the context engineering skill suite — also activates when the user mentions "context engineering" or "context-engineering" in the context of measuring agent effectiveness.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningdev-dennis-040/openclaw-a...

testing-tool-evaluator

You are **Tool Evaluator**, an expert technology assessment specialist who evaluates, tests, and recommends tools, software, and platforms for business use. You optimize team productivity and busin...

🇺🇸|EnglishTranslated

Tools & Utilitiessharadchaturveda-coder/ag...

agency-tool-evaluator

Expert technology assessment specialist focused on evaluating, testing, and recommending tools, software, and platforms for business use and productivity optimization

🇺🇸|EnglishTranslated

Documentation & Writingaffaan-m/everything-claud...

scholar-evaluation

Structured scholarly-work evaluation for papers, proposals, literature reviews, methods sections, evidence quality, citation support, and research-writing feedback.

🇺🇸|EnglishTranslated

Product & Designrefoundai/lenny-skills

evaluating-trade-offs

Help users make better decisions between competing options. Use when someone is weighing pros and cons, comparing alternatives, struggling with a difficult choice, deciding between speed and quality, or asking "should we do X or Y?"

🇺🇸|EnglishTranslated

AI & Machine Learningshipshitdev/library

advanced-evaluation

Master LLM-as-a-Judge evaluation techniques including direct scoring, pairwise comparison, rubric generation, and bias mitigation. Use when building evaluation systems, comparing model outputs, or establishing quality standards for AI-generated content.

🇺🇸|EnglishTranslated

1 scripts/Checked

Data Processingfatfingererr/macro-skills

evaluate-exponential-trend-deviation-regimes

Calculate the deviation of asset prices relative to the long-term exponential growth trend line, assess whether the current period falls within a historical extreme range, and optionally perform macro factor analysis to evaluate the market regime.

🇨🇳|ChineseTranslated

2 scripts/Checked

Tools & Utilitiesoldwinter/skills

evaluating-candidates

Make an evidence-based hiring decision and produce a Candidate Evaluation Decision Pack (criteria + scorecard, signal log, work sample/trial plan + rubric, reference check script + summary, decision memo). Use for candidate evaluation, hiring decisions, reference checks, work samples/take-homes, and hiring bar calibration. Category: Hiring & Teams.

🇺🇸|EnglishTranslated

Project Managementoldwinter/skills

evaluating-trade-offs

Evaluate trade-offs and produce a Trade-off Evaluation Pack (trade-off brief, options+criteria matrix, all-in cost/opportunity cost table, impact ranges, recommendation, stop/continue triggers). Use for tradeoff/trade-off, pros and cons, cost-benefit, opportunity cost, build vs buy, ship fast vs ship better, continue vs stop (sunk costs). Category: Leadership.

🇺🇸|EnglishTranslated

Tools & Utilitiesaleister1102/skills

tech-stack-evaluator

Use when comparing technology stacks, evaluating frameworks/providers, or assessing TCO, security, and ecosystem health for migration decisions.

🇺🇸|EnglishTranslated

7 scripts/Checked

AI & Machine Learninggotalab/skillport

skill-evaluator

Evaluates agent skills against Anthropic's best practices. Use when asked to review, evaluate, assess, or audit a skill for quality. Analyzes SKILL.md structure, naming conventions, description quality, content organization, and identifies anti-patterns. Produces actionable improvement recommendations.

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learningoimiragieo/agent-studio

agent-evaluation

LLM-as-judge evaluation framework with 5-dimension rubric (accuracy, groundedness, coherence, completeness, helpfulness) for scoring AI-generated content quality with weighted composite scores and evidence citations

🇺🇸|EnglishTranslated

1 scripts/Checked