Loading...
Loading...
Found 1,746 Skills
This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge, multi-dimensional evaluation, agent testing, or quality gates for agent pipelines. Part of the context engineering skill suite — also activates when the user mentions "context engineering" or "context-engineering" in the context of measuring agent effectiveness.
You are **Tool Evaluator**, an expert technology assessment specialist who evaluates, tests, and recommends tools, software, and platforms for business use. You optimize team productivity and busin...
Expert technology assessment specialist focused on evaluating, testing, and recommending tools, software, and platforms for business use and productivity optimization
Structured scholarly-work evaluation for papers, proposals, literature reviews, methods sections, evidence quality, citation support, and research-writing feedback.
Help users make better decisions between competing options. Use when someone is weighing pros and cons, comparing alternatives, struggling with a difficult choice, deciding between speed and quality, or asking "should we do X or Y?"
Master LLM-as-a-Judge evaluation techniques including direct scoring, pairwise comparison, rubric generation, and bias mitigation. Use when building evaluation systems, comparing model outputs, or establishing quality standards for AI-generated content.
Calculate the deviation of asset prices relative to the long-term exponential growth trend line, assess whether the current period falls within a historical extreme range, and optionally perform macro factor analysis to evaluate the market regime.
Make an evidence-based hiring decision and produce a Candidate Evaluation Decision Pack (criteria + scorecard, signal log, work sample/trial plan + rubric, reference check script + summary, decision memo). Use for candidate evaluation, hiring decisions, reference checks, work samples/take-homes, and hiring bar calibration. Category: Hiring & Teams.
Evaluate trade-offs and produce a Trade-off Evaluation Pack (trade-off brief, options+criteria matrix, all-in cost/opportunity cost table, impact ranges, recommendation, stop/continue triggers). Use for tradeoff/trade-off, pros and cons, cost-benefit, opportunity cost, build vs buy, ship fast vs ship better, continue vs stop (sunk costs). Category: Leadership.
Use when comparing technology stacks, evaluating frameworks/providers, or assessing TCO, security, and ecosystem health for migration decisions.
Evaluates agent skills against Anthropic's best practices. Use when asked to review, evaluate, assess, or audit a skill for quality. Analyzes SKILL.md structure, naming conventions, description quality, content organization, and identifies anti-patterns. Produces actionable improvement recommendations.
LLM-as-judge evaluation framework with 5-dimension rubric (accuracy, groundedness, coherence, completeness, helpfulness) for scoring AI-generated content quality with weighted composite scores and evidence citations