Loading...
Loading...
Found 8 Skills
Consult this skill when building evaluation or scoring systems. Use when implementing evaluation systems, creating quality gates, designing scoring rubrics, building decision frameworks. Do not use when simple pass/fail without scoring needs.
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
Expert skill for generating GitHub Copilot skills from ING-internal documentation repositories. Use this skill when asked to create a skill from any ING documentation-as-code repo, generate a knowledge base skill for an ING framework, convert ING tool documentation into a Copilot skill, or turn any docs/ folder into an expert skill file. Also trigger when the user mentions "skill from docs", "generate skill", "create skill from repo", or references ING-internal frameworks like Baker, Merak, Kingsroad, or similar. Includes evaluation framework, grading agents, and benchmark tools for testing generated skills.
Build evaluation frameworks for agent systems. Use when testing agent performance, validating context engineering choices, or measuring improvements over time.
Expert in streamlining and enhancing the development of AI Agent Applications, including AI app / agent / workflow code generation, AI model comparison and recommendation, tracing setup, and evaluation planning / setup / execution.
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate exte...
Iteratively improve any output until measurable criteria are met. Use when the user wants to refine existing work against specific standards — whether it's code, prose, data, config, or any other artifact. Triggers on phrases like "improve this", "make it better", "iterate", "refine", "keep improving", "not good enough yet", "optimize this", "polish this", "tighten this up", or when the user provides criteria and wants repeated improvement until they're satisfied. Also use when the user gives feedback on output and expects you to keep refining, even if they don't say "improve" explicitly.
Use when need explicit quality criteria and scoring scales to evaluate work consistently, compare alternatives objectively, set acceptance thresholds, reduce subjective bias, or when user mentions rubric, scoring criteria, quality standards, evaluation framework, inter-rater reliability, or grade/assess work.