Loading...
Loading...
Interactively set up a first Coval AI evaluation. Guides users through installing the CLI, connecting an agent, creating personas, building test cases, selecting metrics, and launching their first eval run. Use when user says "onboard", "get started", "set up evaluation", "first eval", "new to coval", or wants help creating their first test run.
npx skill4agent add coval-ai/coval-external-skills onboardcoval$ARGUMENTScoval --versionbrew install coval-ai/tap/covalcargo install coval# Download the latest Windows binary from GitHub releases
Invoke-WebRequest -Uri "https://github.com/coval-ai/cli/releases/latest/download/coval-x86_64-pc-windows-msvc.exe" -OutFile "coval.exe"coval --versioncoval whoamicoval logincoval agents list --format json
coval test-sets list --format json
coval metrics list --format json
coval personas list --format jsonvoiceoutbound-voicechatsmswebsocket# For voice/sms:
coval agents create --name "<name>" --type <type> --phone-number "<number>" --format json
# For chat/outbound-voice/websocket:
coval agents create --name "<name>" --type <type> --endpoint "<url>" --format jsonagent_idreferences/persona-templates.mdcoval personas create \
--name "<persona_name>" \
--voice "<voice_name>" \
--language "<language_code>" \
--prompt "<behavior_prompt>" \
--background "<background_sound>" \
--wait-seconds <wait> \
--format jsonpersona_id--voice--languagereferences/test-case-templates.mdTest Set: "<Use Case> Evaluation"
[happy_path] <test case name>
<scenario description>
[edge_case] <test case name>
<scenario description>
[compliance] <test case name>
<scenario description>coval test-sets create --name "<Use Case> Evaluation" --description "<desc>" --format jsontest_set_idcoval test-cases create \
--test-set-id <test_set_id> \
--input "<scenario text>" \
--expected "<expected behaviors joined with newlines>" \
--description "<test case name>" \
--format json--expected\nreferences/metric-recommendations.mdcoval metrics listBased on your <use case> agent, I recommend these metrics:
[built-in] Composite Evaluation — Evaluates expected behaviors per test case
[custom] <Use Case Metric> — <description>
[custom] <Critical Requirement> — Based on your #1 priority
[audio] Professional Tone — Agent tone quality (voice only)
[audio] Pause Detection — Flags pauses > 3 seconds (voice only)# LLM Binary metric
coval metrics create \
--name "<metric name>" \
--description "<description>" \
--type llm-binary \
--prompt "<evaluation prompt>" \
--format json
# Pause metric (voice only)
coval metrics create \
--name "Long Pause Detection" \
--description "Flags pauses longer than 3 seconds" \
--type pause \
--min-pause-duration 3.0 \
--format jsoncoval run-templates create \
--name "First Eval - <Use Case>" \
--agent-id <agent_id> \
--persona-id <persona_id> \
--test-set-id <test_set_id> \
--metric-ids <comma_separated_ids> \
--iteration-count <iterations> \
--concurrency <concurrency> \
--format jsoncoval runs launch \
--agent-id <agent_id> \
--persona-id <persona_id> \
--test-set-id <test_set_id> \
--metric-ids <comma_separated_ids> \
--iterations <iterations> \
--concurrency <concurrency> \
--name "First Eval - <Use Case>" \
--format jsonrun_idcoval runs watch <run_id>coval runs get <run_id> --format json
coval simulations list --filter "run_id=\"<run_id>\"" --format jsoncoval simulations metrics <simulation_id> --format jsonEvaluation Complete!
Run: First Eval - <Use Case>
Test Cases: <count>
Iterations: <count>
Status: COMPLETED
Results:
| Test Case | Score | Status |
|------------------------------|-------|--------|
| Happy Path — <name> | 0.85 | PASS |
| Edge Case — <name> | 0.60 | WARN |
| Compliance — <name> | 1.00 | PASS |
View full results: https://app.coval.dev/runs/<run_id>
Saved as template: "First Eval - <Use Case>"
Re-run: coval runs launch --agent-id <id> --persona-id <id> --test-set-id <id>coval test-cases create --test-set-id <id> --input "..."coval scheduled-runs create --template-id <id> --schedule "cron(0 9 * * MON)"coval simulations audio <sim_id> -o recording.wav