Search Results: evaluation-framework

Found 10 Skills

Code Qualityathola/claude-night-marke...

evaluation-framework

Consult this skill when building evaluation or scoring systems. Use when implementing evaluation systems, creating quality gates, designing scoring rubrics, building decision frameworks. Do not use when simple pass/fail without scoring needs.

🇺🇸|EnglishTranslated

AI & Machine Learningmicrosoft/eval-guide

eval-suite-planner

Produces a concrete eval suite plan grounded in Microsoft's Eval Scenario Library and MS Learn agent evaluation guidance — scenario types, evaluation methods, quality signals, thresholds, and priority order — before any test cases are generated or evals are run.

🇺🇸|EnglishTranslated

AI & Machine Learningmicrosoft/eval-guide

eval-result-interpreter

Analyzes Copilot Studio evaluation CSV results using Microsoft's Triage & Improvement Playbook. Returns a SHIP / ITERATE / BLOCK verdict with root cause classification, diagnostic triage, prioritized remediation, and pattern analysis.

🇺🇸|EnglishTranslated

AI & Machine Learningaffaan-m/everything-claud...

eval-harness

Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles

🇺🇸|EnglishTranslated

AI & Machine Learningasllani94/skills

ing-skill-generator

Expert skill for generating GitHub Copilot skills from ING-internal documentation repositories. Use this skill when asked to create a skill from any ING documentation-as-code repo, generate a knowledge base skill for an ING framework, convert ING tool documentation into a Copilot skill, or turn any docs/ folder into an expert skill file. Also trigger when the user mentions "skill from docs", "generate skill", "create skill from repo", or references ING-internal frameworks like Baker, Merak, Kingsroad, or similar. Includes evaluation framework, grading agents, and benchmark tools for testing generated skills.

🇺🇸|EnglishTranslated

9 scripts/Attention

AI & Machine Learningshipshitdev/library

evaluation

Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningalexei-led/claude-code-co...

agent-workflow-builder_ai_toolkit

Expert in streamlining and enhancing the development of AI Agent Applications, including AI app / agent / workflow code generation, AI model comparison and recommendation, tracing setup, and evaluation planning / setup / execution.

🇺🇸|EnglishTranslated

Project Managementlyndonkl/claude

evaluation-rubrics

Use when need explicit quality criteria and scoring scales to evaluate work consistently, compare alternatives objectively, set acceptance thresholds, reduce subjective bias, or when user mentions rubric, scoring criteria, quality standards, evaluation framework, inter-rater reliability, or grade/assess work.

🇺🇸|EnglishTranslated

AI & Machine Learningwitooh/skills

improve

Iteratively improve any output until measurable criteria are met. Use when the user wants to refine existing work against specific standards — whether it's code, prose, data, config, or any other artifact. Triggers on phrases like "improve this", "make it better", "iterate", "refine", "keep improving", "not good enough yet", "optimize this", "polish this", "tighten this up", or when the user provides criteria and wants repeated improvement until they're satisfied. Also use when the user gives feedback on output and expects you to keep refining, even if they don't say "improve" explicitly.

🇺🇸|EnglishTranslated

AI & Machine Learningsickn33/antigravity-aweso...

mcp-builder-ms

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate exte...

🇺🇸|EnglishTranslated