onboard
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCoval Onboarding
Coval 入门指南
Guide the user through setting up a complete AI evaluation from scratch using the CLI. Follow the phases below in order, asking questions at each step.
covalIf contains a use case (e.g. "insurance_claims", "customer_support"), skip the use case question in Phase 2.
$ARGUMENTS引导用户使用 CLI从头开始设置完整的AI评估。按以下阶段依次进行,每个步骤都要向用户提问。
coval如果包含用例(例如"insurance_claims"、"customer_support"),则跳过阶段2中的用例问题。
$ARGUMENTSPhase 0: Setup + Preflight
阶段0:设置与预检
Step 1: Check CLI installation
步骤1:检查CLI安装
bash
coval --versionIf the command fails or is not found, guide the user to install it based on their OS:
macOS (Homebrew — recommended):
bash
brew install coval-ai/tap/covalLinux / macOS (Cargo — requires Rust 1.75+):
bash
cargo install covalWindows (PowerShell — binary download):
powershell
undefinedbash
coval --version如果命令执行失败或未找到,请根据用户的操作系统引导其安装:
macOS(Homebrew — 推荐):
bash
brew install coval-ai/tap/covalLinux / macOS(Cargo — 需要Rust 1.75+):
bash
cargo install covalWindows(PowerShell — 二进制文件下载):
powershell
undefinedDownload the latest Windows binary from GitHub releases
从GitHub发布页面下载最新Windows二进制文件
Invoke-WebRequest -Uri "https://github.com/coval-ai/cli/releases/latest/download/coval-x86_64-pc-windows-msvc.exe" -OutFile "coval.exe"
**All platforms (manual binary download):**
Download the latest release for your OS/architecture from https://github.com/coval-ai/cli/releases
After installation, verify: `coval --version`Invoke-WebRequest -Uri "https://github.com/coval-ai/cli/releases/latest/download/coval-x86_64-pc-windows-msvc.exe" -OutFile "coval.exe"
**所有平台(手动下载二进制文件):**
从https://github.com/coval-ai/cli/releases下载对应操作系统/架构的最新版本
安装完成后,验证:`coval --version`Step 2: Check authentication
步骤2:检查认证状态
bash
coval whoamiIf not authenticated, guide the user:
bash
coval loginThis prompts for an API key. Get one at https://app.coval.dev/settings (Organization > Manage > API Keys).
If the user doesn't have a Coval account, direct them to https://coval.dev to sign up.
Then run these in parallel to inventory existing resources:
bash
coval agents list --format json
coval test-sets list --format json
coval metrics list --format json
coval personas list --format jsonDecision matrix:
- No resources → full flow (Phases 1-6)
- Has agents but nothing else → ask which agent to use, skip Phase 1
- Has agents + test sets → ask which to reuse, skip Phases 1 & 3
- Has everything → ask "Re-launch existing eval or build new?"
Present existing resources as a numbered list and let the user pick or say "new".
bash
coval whoami如果未认证,引导用户执行:
bash
coval login此命令会提示输入API密钥。可前往https://app.coval.dev/settings(组织 > 管理 > API密钥)获取。
如果用户没有Coval账户,引导其前往https://coval.dev注册。
然后并行运行以下命令,盘点现有资源:
bash
coval agents list --format json
coval test-sets list --format json
coval metrics list --format json
coval personas list --format json决策矩阵:
- 无资源 → 完整流程(阶段1-6)
- 已有Agent但无其他资源 → 询问使用哪个Agent,跳过阶段1
- 已有Agent和测试集 → 询问复用哪些,跳过阶段1和3
- 所有资源齐全 → 询问“重新启动现有评估还是创建新评估?”
将现有资源以编号列表形式展示,让用户选择或输入“new”创建新资源。
Phase 1: Connect Agent
阶段1:连接Agent
Ask these questions:
-
"What type of AI agent do you have?"
- — Receives inbound phone calls
voice - — Your agent calls out
outbound-voice - — Text/API endpoint
chat - — SMS-based agent
sms - — WebSocket connection
websocket
-
Based on type:
- voice / sms → "What is your agent's phone number? (E.164 format, e.g. +12345678901)"
- outbound-voice / chat / websocket → "What is your agent's endpoint URL?"
-
"What would you like to name this agent?"
-
(Optional) "Do you have the agent's system prompt? Pasting it helps generate better test cases."
Create the agent:
bash
undefined询问以下问题:
-
"你的AI Agent属于哪种类型?"
- — 接听来电
voice - — Agent主动外呼
outbound-voice - — 文本/API端点
chat - — 基于短信的Agent
sms - — WebSocket连接
websocket
-
根据类型进一步询问:
- voice / sms → "你的Agent的电话号码是什么?(E.164格式,例如+12345678901)"
- outbound-voice / chat / websocket → "你的Agent的端点URL是什么?"
-
"你想给这个Agent起什么名字?"
-
(可选)"你有Agent的系统提示词吗?粘贴它有助于生成更合适的测试用例。"
创建Agent:
bash
undefinedFor voice/sms:
适用于voice/sms类型:
coval agents create --name "<name>" --type <type> --phone-number "<number>" --format json
coval agents create --name "<name>" --type <type> --phone-number "<number>" --format json
For chat/outbound-voice/websocket:
适用于chat/outbound-voice/websocket类型:
coval agents create --name "<name>" --type <type> --endpoint "<url>" --format json
Capture `agent_id` from the JSON response.coval agents create --name "<name>" --type <type> --endpoint "<url>" --format json
从JSON响应中获取`agent_id`。Phase 2: Discover Use Case + Create Persona
阶段2:确定用例 + 创建角色
Ask these questions:
-
"What does your agent do?"
- customer_support — Customer Support
- scheduling_booking — Scheduling & Booking
- sales — Sales
- insurance_claims — Insurance Claims
- healthcare_intake — Healthcare Intake
- restaurant_orders — Restaurant Orders
- debt_collection — Debt Collection
- it_helpdesk — IT Helpdesk
- other — Other (describe it)
-
"What industry is this for?" (free text)
-
"What language does your agent speak?"
- en-US, es-ES, fr-FR, de-DE, pt-BR, ja-JP
-
"What's the #1 thing your agent must get right?" (free text — this becomes a custom metric)
Load and select the persona template matching the use case. Apply the user's language choice. Present the persona to the user for confirmation before creating.
references/persona-templates.mdbash
coval personas create \
--name "<persona_name>" \
--voice "<voice_name>" \
--language "<language_code>" \
--prompt "<behavior_prompt>" \
--background "<background_sound>" \
--wait-seconds <wait> \
--format jsonCapture from the JSON response.
persona_idFor chat/sms/websocket agents, still pass and with defaults (aria, en-US) — these fields are ignored by the simulation engine for non-voice agents.
--voice--language询问以下问题:
-
"你的Agent用于什么场景?"
- customer_support — 客户支持
- scheduling_booking — 日程安排与预订
- sales — 销售
- insurance_claims — 保险理赔
- healthcare_intake — 医疗问诊录入
- restaurant_orders — 餐厅订单处理
- debt_collection — 债务催收
- it_helpdesk — IT技术支持
- other — 其他(请描述)
-
"应用于哪个行业?"(自由文本输入)
-
"你的Agent使用哪种语言?"
- en-US, es-ES, fr-FR, de-DE, pt-BR, ja-JP
-
"你的Agent必须做好的最重要的一件事是什么?"(自由文本输入 — 这将成为自定义指标)
加载并选择与用例匹配的角色模板。应用用户选择的语言。在创建前将角色展示给用户确认。
references/persona-templates.mdbash
coval personas create \
--name "<persona_name>" \
--voice "<voice_name>" \
--language "<language_code>" \
--prompt "<behavior_prompt>" \
--background "<background_sound>" \
--wait-seconds <wait> \
--format json从JSON响应中获取。
persona_id对于chat/sms/websocket类型的Agent,仍需传入默认的和参数(aria, en-US)—— 这些字段会被非语音Agent的模拟引擎忽略。
--voice--languagePhase 3: Create Test Set + Test Cases
阶段3:创建测试集 + 测试用例
Load and select the 3 test case templates (happy_path, edge_case, compliance) matching the use case.
references/test-case-templates.mdIf the user provided a system prompt or critical requirement, customize the test cases to be more specific to their agent.
Present a summary table before creating:
Test Set: "<Use Case> Evaluation"
[happy_path] <test case name>
<scenario description>
[edge_case] <test case name>
<scenario description>
[compliance] <test case name>
<scenario description>Ask: "Create these test cases? (yes / customize / add more)"
Create the test set and cases:
bash
coval test-sets create --name "<Use Case> Evaluation" --description "<desc>" --format jsonCapture . Then for each test case:
test_set_idbash
coval test-cases create \
--test-set-id <test_set_id> \
--input "<scenario text>" \
--expected "<expected behaviors joined with newlines>" \
--description "<test case name>" \
--format jsonNote: The flag accepts a single string. Join the expected behaviors array with newlines ().
--expected\n加载并选择与用例匹配的3个测试用例模板(happy_path、edge_case、compliance)。
references/test-case-templates.md如果用户提供了系统提示词或关键要求,自定义测试用例使其更贴合用户的Agent。
在创建前展示汇总表格:
测试集:"<用例名称> 评估"
[happy_path] <测试用例名称>
<场景描述>
[edge_case] <测试用例名称>
<场景描述>
[compliance] <测试用例名称>
<场景描述>询问:"是否创建这些测试用例?(是/自定义/添加更多)"
创建测试集和测试用例:
bash
coval test-sets create --name "<用例名称> 评估" --description "<描述>" --format json获取。然后为每个测试用例执行:
test_set_idbash
coval test-cases create \
--test-set-id <test_set_id> \
--input "<场景文本>" \
--expected "<预期行为用换行符连接>" \
--description "<测试用例名称>" \
--format json注意:参数接受单个字符串。需将预期行为数组用换行符()连接。
--expected\nPhase 4: Select + Create Metrics
阶段4:选择 + 创建指标
Load and build the metric list.
references/metric-recommendations.mdAlways recommend:
- Composite Evaluation (built-in) — find its ID from the output in Phase 0
coval metrics list
Use-case specific (from recommendations):
- One custom llm-binary metric per vertical (e.g. "Identity Verification" for insurance)
Critical requirement:
- If the user provided one in Phase 2, create an additional llm-binary metric with that requirement as the prompt
Voice agents only:
- Professional Tone (audio-binary)
- Pause Detection (pause, min 3.0s)
Default built-ins (reference by existing ID):
- Latency, Call Resolution, Sentiment
Present the recommendations:
Based on your <use case> agent, I recommend these metrics:
[built-in] Composite Evaluation — Evaluates expected behaviors per test case
[custom] <Use Case Metric> — <description>
[custom] <Critical Requirement> — Based on your #1 priority
[audio] Professional Tone — Agent tone quality (voice only)
[audio] Pause Detection — Flags pauses > 3 seconds (voice only)Ask: "Accept these metrics? (yes / add more / remove some)"
Create custom metrics:
bash
undefined加载并构建指标列表。
references/metric-recommendations.md必选推荐:
- Composite Evaluation(内置)—— 从阶段0的输出中获取其ID
coval metrics list
用例特定指标(来自推荐):
- 每个垂直领域一个自定义llm-binary指标(例如保险领域的"Identity Verification")
关键要求指标:
- 如果用户在阶段2中提供了关键要求,创建一个额外的llm-binary指标,将该要求作为提示词
仅适用于语音Agent:
- Professional Tone(audio-binary)
- Pause Detection(pause,最小3.0秒)
默认内置指标(通过现有ID引用):
- Latency、Call Resolution、Sentiment
展示推荐列表:
基于你的<用例>Agent,我推荐以下指标:
[内置] Composite Evaluation — 评估每个测试用例的预期行为
[自定义] <用例特定指标> — <描述>
[自定义] <关键要求指标> — 基于你的首要优先级
[音频] Professional Tone — Agent语气质量(仅语音Agent)
[音频] Pause Detection — 标记超过3秒的停顿(仅语音Agent)询问:"接受这些指标吗?(是/添加更多/移除部分)"
创建自定义指标:
bash
undefinedLLM Binary metric
LLM Binary指标
coval metrics create
--name "<metric name>"
--description "<description>"
--type llm-binary
--prompt "<evaluation prompt>"
--format json
--name "<metric name>"
--description "<description>"
--type llm-binary
--prompt "<evaluation prompt>"
--format json
coval metrics create
--name "<指标名称>"
--description "<描述>"
--type llm-binary
--prompt "<评估提示词>"
--format json
--name "<指标名称>"
--description "<描述>"
--type llm-binary
--prompt "<评估提示词>"
--format json
Pause metric (voice only)
停顿指标(仅语音Agent)
coval metrics create
--name "Long Pause Detection"
--description "Flags pauses longer than 3 seconds"
--type pause
--min-pause-duration 3.0
--format json
--name "Long Pause Detection"
--description "Flags pauses longer than 3 seconds"
--type pause
--min-pause-duration 3.0
--format json
Collect all metric IDs (built-in + newly created).coval metrics create
--name "Long Pause Detection"
--description "标记超过3秒的停顿"
--type pause
--min-pause-duration 3.0
--format json
--name "Long Pause Detection"
--description "标记超过3秒的停顿"
--type pause
--min-pause-duration 3.0
--format json
收集所有指标ID(内置+新创建的)。Phase 5: Create Template + Launch
阶段5:创建模板 + 启动评估
Ask:
- "How many iterations per test case? (1 for a quick first look, 3 for statistical confidence)" — default: 1
- "How many parallel simulations? (1-5)" — default: 3
Create the run template for reuse:
bash
coval run-templates create \
--name "First Eval - <Use Case>" \
--agent-id <agent_id> \
--persona-id <persona_id> \
--test-set-id <test_set_id> \
--metric-ids <comma_separated_ids> \
--iteration-count <iterations> \
--concurrency <concurrency> \
--format jsonLaunch the evaluation:
bash
coval runs launch \
--agent-id <agent_id> \
--persona-id <persona_id> \
--test-set-id <test_set_id> \
--metric-ids <comma_separated_ids> \
--iterations <iterations> \
--concurrency <concurrency> \
--name "First Eval - <Use Case>" \
--format jsonCapture from the response.
run_id询问:
- "每个测试用例运行多少次迭代?(1次用于快速预览,3次用于统计置信度)" — 默认值:1
- "并行运行多少个模拟?(1-5)" — 默认值:3
创建可复用的运行模板:
bash
coval run-templates create \
--name "首次评估 - <用例名称>" \
--agent-id <agent_id> \
--persona-id <persona_id> \
--test-set-id <test_set_id> \
--metric-ids <逗号分隔的ID列表> \
--iteration-count <迭代次数> \
--concurrency <并发数> \
--format json启动评估:
bash
coval runs launch \
--agent-id <agent_id> \
--persona-id <persona_id> \
--test-set-id <test_set_id> \
--metric-ids <逗号分隔的ID列表> \
--iterations <迭代次数> \
--concurrency <并发数> \
--name "首次评估 - <用例名称>" \
--format json从响应中获取。
run_idPhase 6: Watch + Results
阶段6:监控 + 查看结果
Watch the run:
bash
coval runs watch <run_id>When complete, fetch results:
bash
coval runs get <run_id> --format json
coval simulations list --filter "run_id=\"<run_id>\"" --format jsonFor each simulation, fetch metrics:
bash
coval simulations metrics <simulation_id> --format jsonPresent a summary:
Evaluation Complete!
Run: First Eval - <Use Case>
Test Cases: <count>
Iterations: <count>
Status: COMPLETED
Results:
| Test Case | Score | Status |
|------------------------------|-------|--------|
| Happy Path — <name> | 0.85 | PASS |
| Edge Case — <name> | 0.60 | WARN |
| Compliance — <name> | 1.00 | PASS |
View full results: https://app.coval.dev/runs/<run_id>
Saved as template: "First Eval - <Use Case>"
Re-run: coval runs launch --agent-id <id> --persona-id <id> --test-set-id <id>Suggest next steps:
- Add more test cases:
coval test-cases create --test-set-id <id> --input "..." - Schedule recurring runs:
coval scheduled-runs create --template-id <id> --schedule "cron(0 9 * * MON)" - Listen to recordings:
coval simulations audio <sim_id> -o recording.wav - Iterate on metrics based on results
监控运行状态:
bash
coval runs watch <run_id>运行完成后,获取结果:
bash
coval runs get <run_id> --format json
coval simulations list --filter "run_id=\"<run_id>\"" --format json为每个模拟获取指标:
bash
coval simulations metrics <simulation_id> --format json展示汇总结果:
评估完成!
运行任务: 首次评估 - <用例名称>
测试用例数: <数量>
迭代次数: <数量>
状态: COMPLETED
结果:
| 测试用例 | 得分 | 状态 |
|------------------------------|-------|--------|
| 正常路径 — <名称> | 0.85 | PASS |
| 边缘场景 — <名称> | 0.60 | WARN |
| 合规性 — <名称> | 1.00 | PASS |
查看完整结果:https://app.coval.dev/runs/<run_id>
已保存为模板:"首次评估 - <用例名称>"
重新运行:coval runs launch --agent-id <id> --persona-id <id> --test-set-id <id>建议后续步骤:
- 添加更多测试用例:
coval test-cases create --test-set-id <id> --input "..." - 安排定期运行:
coval scheduled-runs create --template-id <id> --schedule "cron(0 9 * * MON)" - 收听录音:
coval simulations audio <sim_id> -o recording.wav - 根据结果迭代优化指标