Loading...
Loading...
Found 6 Skills
Build a complete test suite with test set and test cases for evaluating an AI agent. Guides through test set type selection, scenario design using vertical-specific templates, expected behavior crafting, and bulk creation. Use when user says "create test cases", "build test suite", "add test scenarios", "set up evaluation tests", or "design test cases".
Design and create a simulation persona for testing an AI agent. Guides through use case selection, voice and language configuration, behavior prompt crafting, and interruption calibration. Use when user says "create a persona", "design a persona", "set up a test persona", "configure simulation persona", or "build a caller profile".
Select and configure evaluation metrics for an AI agent. Guides through metric selection using use-case recommendations, custom LLM-based metric creation with prompt engineering, and agent default attachment. Use when user says "set up metrics", "configure metrics", "create a metric", "what metrics should I use", "add evaluation criteria", or "customize scoring".
Interactively set up a first Coval AI evaluation. Guides users through installing the CLI, connecting an agent, creating personas, building test cases, selecting metrics, and launching their first eval run. Use when user says "onboard", "get started", "set up evaluation", "first eval", "new to coval", or wants help creating their first test run.
Calculate agreement between human ground truth and machine labels for a text LLM judge metric, then analyze transcripts and reviewer notes to propose an improved metric prompt. One metric at a time.
Monitor a Coval run's progress with live updates. Use when user wants to check run status or wait for completion.