onboard

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Coval Onboarding

Coval 入门指南

Guide the user through setting up a complete AI evaluation from scratch using the
coval
CLI. Follow the phases below in order, asking questions at each step.
If
$ARGUMENTS
contains a use case (e.g. "insurance_claims", "customer_support"), skip the use case question in Phase 2.
引导用户使用
coval
CLI从头开始设置完整的AI评估。按以下阶段依次进行,每个步骤都要向用户提问。
如果
$ARGUMENTS
包含用例(例如"insurance_claims"、"customer_support"),则跳过阶段2中的用例问题。

Phase 0: Setup + Preflight

阶段0:设置与预检

Step 1: Check CLI installation

步骤1:检查CLI安装

bash
coval --version
If the command fails or is not found, guide the user to install it based on their OS:
macOS (Homebrew — recommended):
bash
brew install coval-ai/tap/coval
Linux / macOS (Cargo — requires Rust 1.75+):
bash
cargo install coval
Windows (PowerShell — binary download):
powershell
undefined
bash
coval --version
如果命令执行失败或未找到,请根据用户的操作系统引导其安装:
macOS(Homebrew — 推荐):
bash
brew install coval-ai/tap/coval
Linux / macOS(Cargo — 需要Rust 1.75+):
bash
cargo install coval
Windows(PowerShell — 二进制文件下载):
powershell
undefined

Download the latest Windows binary from GitHub releases

从GitHub发布页面下载最新Windows二进制文件


**All platforms (manual binary download):**
Download the latest release for your OS/architecture from https://github.com/coval-ai/cli/releases

After installation, verify: `coval --version`

**所有平台(手动下载二进制文件):**
从https://github.com/coval-ai/cli/releases下载对应操作系统/架构的最新版本

安装完成后,验证:`coval --version`

Step 2: Check authentication

步骤2:检查认证状态

bash
coval whoami
If not authenticated, guide the user:
bash
coval login
This prompts for an API key. Get one at https://app.coval.dev/settings (Organization > Manage > API Keys).
If the user doesn't have a Coval account, direct them to https://coval.dev to sign up.
Then run these in parallel to inventory existing resources:
bash
coval agents list --format json
coval test-sets list --format json
coval metrics list --format json
coval personas list --format json
Decision matrix:
  • No resources → full flow (Phases 1-6)
  • Has agents but nothing else → ask which agent to use, skip Phase 1
  • Has agents + test sets → ask which to reuse, skip Phases 1 & 3
  • Has everything → ask "Re-launch existing eval or build new?"
Present existing resources as a numbered list and let the user pick or say "new".
bash
coval whoami
如果未认证,引导用户执行:
bash
coval login
此命令会提示输入API密钥。可前往https://app.coval.dev/settings(组织 > 管理 > API密钥)获取。
如果用户没有Coval账户,引导其前往https://coval.dev注册。
然后并行运行以下命令,盘点现有资源:
bash
coval agents list --format json
coval test-sets list --format json
coval metrics list --format json
coval personas list --format json
决策矩阵:
  • 无资源 → 完整流程(阶段1-6)
  • 已有Agent但无其他资源 → 询问使用哪个Agent,跳过阶段1
  • 已有Agent和测试集 → 询问复用哪些,跳过阶段1和3
  • 所有资源齐全 → 询问“重新启动现有评估还是创建新评估?”
将现有资源以编号列表形式展示,让用户选择或输入“new”创建新资源。

Phase 1: Connect Agent

阶段1:连接Agent

Ask these questions:
  1. "What type of AI agent do you have?"
    • voice
      — Receives inbound phone calls
    • outbound-voice
      — Your agent calls out
    • chat
      — Text/API endpoint
    • sms
      — SMS-based agent
    • websocket
      — WebSocket connection
  2. Based on type:
    • voice / sms → "What is your agent's phone number? (E.164 format, e.g. +12345678901)"
    • outbound-voice / chat / websocket → "What is your agent's endpoint URL?"
  3. "What would you like to name this agent?"
  4. (Optional) "Do you have the agent's system prompt? Pasting it helps generate better test cases."
Create the agent:
bash
undefined
询问以下问题:
  1. "你的AI Agent属于哪种类型?"
    • voice
      — 接听来电
    • outbound-voice
      — Agent主动外呼
    • chat
      — 文本/API端点
    • sms
      — 基于短信的Agent
    • websocket
      — WebSocket连接
  2. 根据类型进一步询问:
    • voice / sms → "你的Agent的电话号码是什么?(E.164格式,例如+12345678901)"
    • outbound-voice / chat / websocket → "你的Agent的端点URL是什么?"
  3. "你想给这个Agent起什么名字?"
  4. (可选)"你有Agent的系统提示词吗?粘贴它有助于生成更合适的测试用例。"
创建Agent:
bash
undefined

For voice/sms:

适用于voice/sms类型:

coval agents create --name "<name>" --type <type> --phone-number "<number>" --format json
coval agents create --name "<name>" --type <type> --phone-number "<number>" --format json

For chat/outbound-voice/websocket:

适用于chat/outbound-voice/websocket类型:

coval agents create --name "<name>" --type <type> --endpoint "<url>" --format json

Capture `agent_id` from the JSON response.
coval agents create --name "<name>" --type <type> --endpoint "<url>" --format json

从JSON响应中获取`agent_id`。

Phase 2: Discover Use Case + Create Persona

阶段2:确定用例 + 创建角色

Ask these questions:
  1. "What does your agent do?"
    • customer_support — Customer Support
    • scheduling_booking — Scheduling & Booking
    • sales — Sales
    • insurance_claims — Insurance Claims
    • healthcare_intake — Healthcare Intake
    • restaurant_orders — Restaurant Orders
    • debt_collection — Debt Collection
    • it_helpdesk — IT Helpdesk
    • other — Other (describe it)
  2. "What industry is this for?" (free text)
  3. "What language does your agent speak?"
    • en-US, es-ES, fr-FR, de-DE, pt-BR, ja-JP
  4. "What's the #1 thing your agent must get right?" (free text — this becomes a custom metric)
Load
references/persona-templates.md
and select the persona template matching the use case. Apply the user's language choice. Present the persona to the user for confirmation before creating.
bash
coval personas create \
  --name "<persona_name>" \
  --voice "<voice_name>" \
  --language "<language_code>" \
  --prompt "<behavior_prompt>" \
  --background "<background_sound>" \
  --wait-seconds <wait> \
  --format json
Capture
persona_id
from the JSON response.
For chat/sms/websocket agents, still pass
--voice
and
--language
with defaults (aria, en-US) — these fields are ignored by the simulation engine for non-voice agents.
询问以下问题:
  1. "你的Agent用于什么场景?"
    • customer_support — 客户支持
    • scheduling_booking — 日程安排与预订
    • sales — 销售
    • insurance_claims — 保险理赔
    • healthcare_intake — 医疗问诊录入
    • restaurant_orders — 餐厅订单处理
    • debt_collection — 债务催收
    • it_helpdesk — IT技术支持
    • other — 其他(请描述)
  2. "应用于哪个行业?"(自由文本输入)
  3. "你的Agent使用哪种语言?"
    • en-US, es-ES, fr-FR, de-DE, pt-BR, ja-JP
  4. "你的Agent必须做好的最重要的一件事是什么?"(自由文本输入 — 这将成为自定义指标)
加载
references/persona-templates.md
并选择与用例匹配的角色模板。应用用户选择的语言。在创建前将角色展示给用户确认。
bash
coval personas create \
  --name "<persona_name>" \
  --voice "<voice_name>" \
  --language "<language_code>" \
  --prompt "<behavior_prompt>" \
  --background "<background_sound>" \
  --wait-seconds <wait> \
  --format json
从JSON响应中获取
persona_id
对于chat/sms/websocket类型的Agent,仍需传入默认的
--voice
--language
参数(aria, en-US)—— 这些字段会被非语音Agent的模拟引擎忽略。

Phase 3: Create Test Set + Test Cases

阶段3:创建测试集 + 测试用例

Load
references/test-case-templates.md
and select the 3 test case templates (happy_path, edge_case, compliance) matching the use case.
If the user provided a system prompt or critical requirement, customize the test cases to be more specific to their agent.
Present a summary table before creating:
Test Set: "<Use Case> Evaluation"

  [happy_path]   <test case name>
                 <scenario description>
  [edge_case]    <test case name>
                 <scenario description>
  [compliance]   <test case name>
                 <scenario description>
Ask: "Create these test cases? (yes / customize / add more)"
Create the test set and cases:
bash
coval test-sets create --name "<Use Case> Evaluation" --description "<desc>" --format json
Capture
test_set_id
. Then for each test case:
bash
coval test-cases create \
  --test-set-id <test_set_id> \
  --input "<scenario text>" \
  --expected "<expected behaviors joined with newlines>" \
  --description "<test case name>" \
  --format json
Note: The
--expected
flag accepts a single string. Join the expected behaviors array with newlines (
\n
).
加载
references/test-case-templates.md
并选择与用例匹配的3个测试用例模板(happy_path、edge_case、compliance)。
如果用户提供了系统提示词或关键要求,自定义测试用例使其更贴合用户的Agent。
在创建前展示汇总表格:
测试集:"<用例名称> 评估"

  [happy_path]   <测试用例名称>
                 <场景描述>
  [edge_case]    <测试用例名称>
                 <场景描述>
  [compliance]   <测试用例名称>
                 <场景描述>
询问:"是否创建这些测试用例?(是/自定义/添加更多)"
创建测试集和测试用例:
bash
coval test-sets create --name "<用例名称> 评估" --description "<描述>" --format json
获取
test_set_id
。然后为每个测试用例执行:
bash
coval test-cases create \
  --test-set-id <test_set_id> \
  --input "<场景文本>" \
  --expected "<预期行为用换行符连接>" \
  --description "<测试用例名称>" \
  --format json
注意:
--expected
参数接受单个字符串。需将预期行为数组用换行符(
\n
)连接。

Phase 4: Select + Create Metrics

阶段4:选择 + 创建指标

Load
references/metric-recommendations.md
and build the metric list.
Always recommend:
  • Composite Evaluation (built-in) — find its ID from the
    coval metrics list
    output in Phase 0
Use-case specific (from recommendations):
  • One custom llm-binary metric per vertical (e.g. "Identity Verification" for insurance)
Critical requirement:
  • If the user provided one in Phase 2, create an additional llm-binary metric with that requirement as the prompt
Voice agents only:
  • Professional Tone (audio-binary)
  • Pause Detection (pause, min 3.0s)
Default built-ins (reference by existing ID):
  • Latency, Call Resolution, Sentiment
Present the recommendations:
Based on your <use case> agent, I recommend these metrics:

  [built-in]  Composite Evaluation    — Evaluates expected behaviors per test case
  [custom]    <Use Case Metric>       — <description>
  [custom]    <Critical Requirement>  — Based on your #1 priority
  [audio]     Professional Tone       — Agent tone quality (voice only)
  [audio]     Pause Detection         — Flags pauses > 3 seconds (voice only)
Ask: "Accept these metrics? (yes / add more / remove some)"
Create custom metrics:
bash
undefined
加载
references/metric-recommendations.md
并构建指标列表。
必选推荐:
  • Composite Evaluation(内置)—— 从阶段0的
    coval metrics list
    输出中获取其ID
用例特定指标(来自推荐):
  • 每个垂直领域一个自定义llm-binary指标(例如保险领域的"Identity Verification")
关键要求指标:
  • 如果用户在阶段2中提供了关键要求,创建一个额外的llm-binary指标,将该要求作为提示词
仅适用于语音Agent:
  • Professional Tone(audio-binary)
  • Pause Detection(pause,最小3.0秒)
默认内置指标(通过现有ID引用):
  • Latency、Call Resolution、Sentiment
展示推荐列表:
基于你的<用例>Agent,我推荐以下指标:

  [内置]  Composite Evaluation    — 评估每个测试用例的预期行为
  [自定义]    <用例特定指标>       — <描述>
  [自定义]    <关键要求指标>  — 基于你的首要优先级
  [音频]     Professional Tone       — Agent语气质量(仅语音Agent)
  [音频]     Pause Detection         — 标记超过3秒的停顿(仅语音Agent)
询问:"接受这些指标吗?(是/添加更多/移除部分)"
创建自定义指标:
bash
undefined

LLM Binary metric

LLM Binary指标

coval metrics create
--name "<metric name>"
--description "<description>"
--type llm-binary
--prompt "<evaluation prompt>"
--format json
coval metrics create
--name "<指标名称>"
--description "<描述>"
--type llm-binary
--prompt "<评估提示词>"
--format json

Pause metric (voice only)

停顿指标(仅语音Agent)

coval metrics create
--name "Long Pause Detection"
--description "Flags pauses longer than 3 seconds"
--type pause
--min-pause-duration 3.0
--format json

Collect all metric IDs (built-in + newly created).
coval metrics create
--name "Long Pause Detection"
--description "标记超过3秒的停顿"
--type pause
--min-pause-duration 3.0
--format json

收集所有指标ID(内置+新创建的)。

Phase 5: Create Template + Launch

阶段5:创建模板 + 启动评估

Ask:
  1. "How many iterations per test case? (1 for a quick first look, 3 for statistical confidence)" — default: 1
  2. "How many parallel simulations? (1-5)" — default: 3
Create the run template for reuse:
bash
coval run-templates create \
  --name "First Eval - <Use Case>" \
  --agent-id <agent_id> \
  --persona-id <persona_id> \
  --test-set-id <test_set_id> \
  --metric-ids <comma_separated_ids> \
  --iteration-count <iterations> \
  --concurrency <concurrency> \
  --format json
Launch the evaluation:
bash
coval runs launch \
  --agent-id <agent_id> \
  --persona-id <persona_id> \
  --test-set-id <test_set_id> \
  --metric-ids <comma_separated_ids> \
  --iterations <iterations> \
  --concurrency <concurrency> \
  --name "First Eval - <Use Case>" \
  --format json
Capture
run_id
from the response.
询问:
  1. "每个测试用例运行多少次迭代?(1次用于快速预览,3次用于统计置信度)" — 默认值:1
  2. "并行运行多少个模拟?(1-5)" — 默认值:3
创建可复用的运行模板:
bash
coval run-templates create \
  --name "首次评估 - <用例名称>" \
  --agent-id <agent_id> \
  --persona-id <persona_id> \
  --test-set-id <test_set_id> \
  --metric-ids <逗号分隔的ID列表> \
  --iteration-count <迭代次数> \
  --concurrency <并发数> \
  --format json
启动评估:
bash
coval runs launch \
  --agent-id <agent_id> \
  --persona-id <persona_id> \
  --test-set-id <test_set_id> \
  --metric-ids <逗号分隔的ID列表> \
  --iterations <迭代次数> \
  --concurrency <并发数> \
  --name "首次评估 - <用例名称>" \
  --format json
从响应中获取
run_id

Phase 6: Watch + Results

阶段6:监控 + 查看结果

Watch the run:
bash
coval runs watch <run_id>
When complete, fetch results:
bash
coval runs get <run_id> --format json
coval simulations list --filter "run_id=\"<run_id>\"" --format json
For each simulation, fetch metrics:
bash
coval simulations metrics <simulation_id> --format json
Present a summary:
Evaluation Complete!

  Run:          First Eval - <Use Case>
  Test Cases:   <count>
  Iterations:   <count>
  Status:       COMPLETED

  Results:
  | Test Case                    | Score | Status |
  |------------------------------|-------|--------|
  | Happy Path — <name>          | 0.85  | PASS   |
  | Edge Case — <name>           | 0.60  | WARN   |
  | Compliance — <name>          | 1.00  | PASS   |

  View full results: https://app.coval.dev/runs/<run_id>

  Saved as template: "First Eval - <Use Case>"
  Re-run: coval runs launch --agent-id <id> --persona-id <id> --test-set-id <id>
Suggest next steps:
  • Add more test cases:
    coval test-cases create --test-set-id <id> --input "..."
  • Schedule recurring runs:
    coval scheduled-runs create --template-id <id> --schedule "cron(0 9 * * MON)"
  • Listen to recordings:
    coval simulations audio <sim_id> -o recording.wav
  • Iterate on metrics based on results
监控运行状态:
bash
coval runs watch <run_id>
运行完成后,获取结果:
bash
coval runs get <run_id> --format json
coval simulations list --filter "run_id=\"<run_id>\"" --format json
为每个模拟获取指标:
bash
coval simulations metrics <simulation_id> --format json
展示汇总结果:
评估完成!

  运行任务:          首次评估 - <用例名称>
  测试用例数:   <数量>
  迭代次数:   <数量>
  状态:       COMPLETED

  结果:
  | 测试用例                    | 得分 | 状态 |
  |------------------------------|-------|--------|
  | 正常路径 — <名称>          | 0.85  | PASS   |
  | 边缘场景 — <名称>           | 0.60  | WARN   |
  | 合规性 — <名称>          | 1.00  | PASS   |

  查看完整结果:https://app.coval.dev/runs/<run_id>

  已保存为模板:"首次评估 - <用例名称>"
  重新运行:coval runs launch --agent-id <id> --persona-id <id> --test-set-id <id>
建议后续步骤:
  • 添加更多测试用例:
    coval test-cases create --test-set-id <id> --input "..."
  • 安排定期运行:
    coval scheduled-runs create --template-id <id> --schedule "cron(0 9 * * MON)"
  • 收听录音:
    coval simulations audio <sim_id> -o recording.wav
  • 根据结果迭代优化指标