Loading...
Loading...
Found 43 Skills
Execute complex tasks through sequential sub-agent orchestration with intelligent model selection, meta-judge → LLM-as-a-judge verification
Execute a task with sub-agent implementation and LLM-as-a-judge verification with automatic retry loop
Comprehensive multi-perspective review using specialized judges with debate and consensus building
Launch a meta-judge then a judge sub-agent to evaluate results produced in the current conversation
Evaluate and improve Claude Code commands, skills, and agents. Use when testing prompt effectiveness, validating context engineering choices, or measuring improvement quality.
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
Use this skill for ANY question about CREATING evaluators. Covers creating custom metrics, LLM as Judge evaluators, code-based evaluators, and uploading evaluation logic to LangSmith. Includes basic usage of evaluators to run evaluations.