onboard

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Coval Onboarding

Coval 入门指南

Guide the user through setting up a complete AI evaluation from scratch using the

coval

CLI. Follow the phases below in order, asking questions at each step.

$ARGUMENTS

contains a use case (e.g. "insurance_claims", "customer_support"), skip the use case question in Phase 2.

引导用户使用

coval

CLI从头开始设置完整的AI评估。按以下阶段依次进行，每个步骤都要向用户提问。

如果

$ARGUMENTS

包含用例（例如"insurance_claims"、"customer_support"），则跳过阶段2中的用例问题。

Phase 0: Setup + Preflight

阶段0：设置与预检

Step 1: Check CLI installation

步骤1：检查CLI安装

bash

coval --version

If the command fails or is not found, guide the user to install it based on their OS:

macOS (Homebrew — recommended):

bash

brew install coval-ai/tap/coval

Linux / macOS (Cargo — requires Rust 1.75+):

bash

cargo install coval

Windows (PowerShell — binary download):

powershell

undefined

bash

coval --version

如果命令执行失败或未找到，请根据用户的操作系统引导其安装：

macOS（Homebrew — 推荐）：

bash

brew install coval-ai/tap/coval

Linux / macOS（Cargo — 需要Rust 1.75+）：

bash

cargo install coval

Windows（PowerShell — 二进制文件下载）：

powershell

undefined

Download the latest Windows binary from GitHub releases

从GitHub发布页面下载最新Windows二进制文件

Invoke-WebRequest -Uri "https://github.com/coval-ai/cli/releases/latest/download/coval-x86_64-pc-windows-msvc.exe" -OutFile "coval.exe"


**All platforms (manual binary download):**
Download the latest release for your OS/architecture from https://github.com/coval-ai/cli/releases

After installation, verify: `coval --version`

Invoke-WebRequest -Uri "https://github.com/coval-ai/cli/releases/latest/download/coval-x86_64-pc-windows-msvc.exe" -OutFile "coval.exe"


**所有平台（手动下载二进制文件）：**
从https://github.com/coval-ai/cli/releases下载对应操作系统/架构的最新版本

安装完成后，验证：`coval --version`

Step 2: Check authentication

步骤2：检查认证状态

bash

coval whoami

If not authenticated, guide the user:

bash

coval login

This prompts for an API key. Get one at https://app.coval.dev/settings (Organization > Manage > API Keys).

If the user doesn't have a Coval account, direct them to https://coval.dev to sign up.

Then run these in parallel to inventory existing resources:

bash

coval agents list --format json
coval test-sets list --format json
coval metrics list --format json
coval personas list --format json

Decision matrix:

No resources → full flow (Phases 1-6)
Has agents but nothing else → ask which agent to use, skip Phase 1
Has agents + test sets → ask which to reuse, skip Phases 1 & 3
Has everything → ask "Re-launch existing eval or build new?"

Present existing resources as a numbered list and let the user pick or say "new".

bash

coval whoami

如果未认证，引导用户执行：

bash

coval login

此命令会提示输入API密钥。可前往https://app.coval.dev/settings（组织 > 管理 > API密钥）获取。

如果用户没有Coval账户，引导其前往https://coval.dev注册。

然后并行运行以下命令，盘点现有资源：

bash

coval agents list --format json
coval test-sets list --format json
coval metrics list --format json
coval personas list --format json

决策矩阵：

无资源 → 完整流程（阶段1-6）
已有Agent但无其他资源 → 询问使用哪个Agent，跳过阶段1
已有Agent和测试集 → 询问复用哪些，跳过阶段1和3
所有资源齐全 → 询问“重新启动现有评估还是创建新评估？”

将现有资源以编号列表形式展示，让用户选择或输入“new”创建新资源。

Phase 1: Connect Agent

阶段1：连接Agent

Ask these questions:

"What type of AI agent do you have?"
- ```
voice
```
  — Receives inbound phone calls
- ```
outbound-voice
```
  — Your agent calls out
- ```
chat
```
  — Text/API endpoint
- ```
sms
```
  — SMS-based agent
- ```
websocket
```
  — WebSocket connection
Based on type:
- voice / sms → "What is your agent's phone number? (E.164 format, e.g. +12345678901)"
- outbound-voice / chat / websocket → "What is your agent's endpoint URL?"
"What would you like to name this agent?"
(Optional) "Do you have the agent's system prompt? Pasting it helps generate better test cases."

Create the agent:

bash

undefined

询问以下问题：

"你的AI Agent属于哪种类型？"
- ```
voice
```
  — 接听来电
- ```
outbound-voice
```
  — Agent主动外呼
- ```
chat
```
  — 文本/API端点
- ```
sms
```
  — 基于短信的Agent
- ```
websocket
```
  — WebSocket连接
根据类型进一步询问：
- voice / sms → "你的Agent的电话号码是什么？（E.164格式，例如+12345678901）"
- outbound-voice / chat / websocket → "你的Agent的端点URL是什么？"
"你想给这个Agent起什么名字？"
（可选）"你有Agent的系统提示词吗？粘贴它有助于生成更合适的测试用例。"

创建Agent：

bash

undefined

For voice/sms:

适用于voice/sms类型：

coval agents create --name "<name>" --type <type> --phone-number "<number>" --format json

For chat/outbound-voice/websocket:

适用于chat/outbound-voice/websocket类型：

coval agents create --name "<name>" --type <type> --endpoint "<url>" --format json


Capture `agent_id` from the JSON response.

coval agents create --name "<name>" --type <type> --endpoint "<url>" --format json


从JSON响应中获取`agent_id`。

Phase 2: Discover Use Case + Create Persona

阶段2：确定用例 + 创建角色

Ask these questions:

"What does your agent do?"
- customer_support — Customer Support
- scheduling_booking — Scheduling & Booking
- sales — Sales
- insurance_claims — Insurance Claims
- healthcare_intake — Healthcare Intake
- restaurant_orders — Restaurant Orders
- debt_collection — Debt Collection
- it_helpdesk — IT Helpdesk
- other — Other (describe it)
"What industry is this for?" (free text)
"What language does your agent speak?"
- en-US, es-ES, fr-FR, de-DE, pt-BR, ja-JP
"What's the #1 thing your agent must get right?" (free text — this becomes a custom metric)

Load

references/persona-templates.md

and select the persona template matching the use case. Apply the user's language choice. Present the persona to the user for confirmation before creating.

bash

coval personas create \
  --name "<persona_name>" \
  --voice "<voice_name>" \
  --language "<language_code>" \
  --prompt "<behavior_prompt>" \
  --background "<background_sound>" \
  --wait-seconds <wait> \
  --format json

Capture

persona_id

from the JSON response.

For chat/sms/websocket agents, still pass

--voice

and

--language

with defaults (aria, en-US) — these fields are ignored by the simulation engine for non-voice agents.

询问以下问题：

"你的Agent用于什么场景？"
- customer_support — 客户支持
- scheduling_booking — 日程安排与预订
- sales — 销售
- insurance_claims — 保险理赔
- healthcare_intake — 医疗问诊录入
- restaurant_orders — 餐厅订单处理
- debt_collection — 债务催收
- it_helpdesk — IT技术支持
- other — 其他（请描述）
"应用于哪个行业？"（自由文本输入）
"你的Agent使用哪种语言？"
- en-US, es-ES, fr-FR, de-DE, pt-BR, ja-JP
"你的Agent必须做好的最重要的一件事是什么？"（自由文本输入 — 这将成为自定义指标）

加载

references/persona-templates.md

并选择与用例匹配的角色模板。应用用户选择的语言。在创建前将角色展示给用户确认。

bash

coval personas create \
  --name "<persona_name>" \
  --voice "<voice_name>" \
  --language "<language_code>" \
  --prompt "<behavior_prompt>" \
  --background "<background_sound>" \
  --wait-seconds <wait> \
  --format json

从JSON响应中获取

persona_id

。

对于chat/sms/websocket类型的Agent，仍需传入默认的

--voice

和

--language

参数（aria, en-US）—— 这些字段会被非语音Agent的模拟引擎忽略。

Phase 3: Create Test Set + Test Cases

阶段3：创建测试集 + 测试用例

Load

references/test-case-templates.md

and select the 3 test case templates (happy_path, edge_case, compliance) matching the use case.

If the user provided a system prompt or critical requirement, customize the test cases to be more specific to their agent.

Present a summary table before creating:

Test Set: "<Use Case> Evaluation"

  [happy_path]   <test case name>
                 <scenario description>
  [edge_case]    <test case name>
                 <scenario description>
  [compliance]   <test case name>
                 <scenario description>

Ask: "Create these test cases? (yes / customize / add more)"

Create the test set and cases:

bash

coval test-sets create --name "<Use Case> Evaluation" --description "<desc>" --format json

Capture

test_set_id

. Then for each test case:

bash

coval test-cases create \
  --test-set-id <test_set_id> \
  --input "<scenario text>" \
  --expected "<expected behaviors joined with newlines>" \
  --description "<test case name>" \
  --format json

Note: The

--expected

flag accepts a single string. Join the expected behaviors array with newlines (

\n

加载

references/test-case-templates.md

并选择与用例匹配的3个测试用例模板（happy_path、edge_case、compliance）。

如果用户提供了系统提示词或关键要求，自定义测试用例使其更贴合用户的Agent。

在创建前展示汇总表格：

测试集："<用例名称> 评估"

  [happy_path]   <测试用例名称>
                 <场景描述>
  [edge_case]    <测试用例名称>
                 <场景描述>
  [compliance]   <测试用例名称>
                 <场景描述>

询问："是否创建这些测试用例？（是/自定义/添加更多）"

创建测试集和测试用例：

bash

coval test-sets create --name "<用例名称> 评估" --description "<描述>" --format json

获取

test_set_id

。然后为每个测试用例执行：

bash

coval test-cases create \
  --test-set-id <test_set_id> \
  --input "<场景文本>" \
  --expected "<预期行为用换行符连接>" \
  --description "<测试用例名称>" \
  --format json

注意：

--expected

参数接受单个字符串。需将预期行为数组用换行符（

\n

）连接。

Phase 4: Select + Create Metrics

阶段4：选择 + 创建指标

Load

references/metric-recommendations.md

and build the metric list.

Always recommend:

Composite Evaluation (built-in) — find its ID from the
```
coval metrics list
```
output in Phase 0

Use-case specific (from recommendations):

One custom llm-binary metric per vertical (e.g. "Identity Verification" for insurance)

Critical requirement:

If the user provided one in Phase 2, create an additional llm-binary metric with that requirement as the prompt

Voice agents only:

Professional Tone (audio-binary)
Pause Detection (pause, min 3.0s)

Default built-ins (reference by existing ID):

Latency, Call Resolution, Sentiment

Present the recommendations:

Based on your <use case> agent, I recommend these metrics:

  [built-in]  Composite Evaluation    — Evaluates expected behaviors per test case
  [custom]    <Use Case Metric>       — <description>
  [custom]    <Critical Requirement>  — Based on your #1 priority
  [audio]     Professional Tone       — Agent tone quality (voice only)
  [audio]     Pause Detection         — Flags pauses > 3 seconds (voice only)

Ask: "Accept these metrics? (yes / add more / remove some)"

Create custom metrics:

bash

undefined

加载

references/metric-recommendations.md

并构建指标列表。

必选推荐：

Composite Evaluation（内置）—— 从阶段0的
```
coval metrics list
```
输出中获取其ID

用例特定指标（来自推荐）：

每个垂直领域一个自定义llm-binary指标（例如保险领域的"Identity Verification"）

关键要求指标：

如果用户在阶段2中提供了关键要求，创建一个额外的llm-binary指标，将该要求作为提示词

仅适用于语音Agent：

Professional Tone（audio-binary）
Pause Detection（pause，最小3.0秒）

默认内置指标（通过现有ID引用）：

Latency、Call Resolution、Sentiment

展示推荐列表：

基于你的<用例>Agent，我推荐以下指标：

  [内置]  Composite Evaluation    — 评估每个测试用例的预期行为
  [自定义]    <用例特定指标>       — <描述>
  [自定义]    <关键要求指标>  — 基于你的首要优先级
  [音频]     Professional Tone       — Agent语气质量（仅语音Agent）
  [音频]     Pause Detection         — 标记超过3秒的停顿（仅语音Agent）

询问："接受这些指标吗？（是/添加更多/移除部分）"

创建自定义指标：

bash

undefined

LLM Binary metric

LLM Binary指标

coval metrics create
--name "<metric name>"
--description "<description>"
--type llm-binary
--prompt "<evaluation prompt>"
--format json

coval metrics create
--name "<指标名称>"
--description "<描述>"
--type llm-binary
--prompt "<评估提示词>"
--format json

Pause metric (voice only)

停顿指标（仅语音Agent）

coval metrics create
--name "Long Pause Detection"
--description "Flags pauses longer than 3 seconds"
--type pause
--min-pause-duration 3.0
--format json


Collect all metric IDs (built-in + newly created).

coval metrics create
--name "Long Pause Detection"
--description "标记超过3秒的停顿"
--type pause
--min-pause-duration 3.0
--format json


收集所有指标ID（内置+新创建的）。

Phase 5: Create Template + Launch

阶段5：创建模板 + 启动评估

Ask:

"How many iterations per test case? (1 for a quick first look, 3 for statistical confidence)" — default: 1
"How many parallel simulations? (1-5)" — default: 3

Create the run template for reuse:

bash

coval run-templates create \
  --name "First Eval - <Use Case>" \
  --agent-id <agent_id> \
  --persona-id <persona_id> \
  --test-set-id <test_set_id> \
  --metric-ids <comma_separated_ids> \
  --iteration-count <iterations> \
  --concurrency <concurrency> \
  --format json

Launch the evaluation:

bash

coval runs launch \
  --agent-id <agent_id> \
  --persona-id <persona_id> \
  --test-set-id <test_set_id> \
  --metric-ids <comma_separated_ids> \
  --iterations <iterations> \
  --concurrency <concurrency> \
  --name "First Eval - <Use Case>" \
  --format json

Capture

run_id

from the response.

询问：

"每个测试用例运行多少次迭代？（1次用于快速预览，3次用于统计置信度）" — 默认值：1
"并行运行多少个模拟？（1-5）" — 默认值：3

创建可复用的运行模板：

bash

coval run-templates create \
  --name "首次评估 - <用例名称>" \
  --agent-id <agent_id> \
  --persona-id <persona_id> \
  --test-set-id <test_set_id> \
  --metric-ids <逗号分隔的ID列表> \
  --iteration-count <迭代次数> \
  --concurrency <并发数> \
  --format json

启动评估：

bash

coval runs launch \
  --agent-id <agent_id> \
  --persona-id <persona_id> \
  --test-set-id <test_set_id> \
  --metric-ids <逗号分隔的ID列表> \
  --iterations <迭代次数> \
  --concurrency <并发数> \
  --name "首次评估 - <用例名称>" \
  --format json

从响应中获取

run_id

。

Phase 6: Watch + Results

阶段6：监控 + 查看结果

Watch the run:

bash

coval runs watch <run_id>

When complete, fetch results:

bash

coval runs get <run_id> --format json
coval simulations list --filter "run_id=\"<run_id>\"" --format json

For each simulation, fetch metrics:

bash

coval simulations metrics <simulation_id> --format json

Present a summary:

Evaluation Complete!

  Run:          First Eval - <Use Case>
  Test Cases:   <count>
  Iterations:   <count>
  Status:       COMPLETED

  Results:
  | Test Case                    | Score | Status |
  |------------------------------|-------|--------|
  | Happy Path — <name>          | 0.85  | PASS   |
  | Edge Case — <name>           | 0.60  | WARN   |
  | Compliance — <name>          | 1.00  | PASS   |

  View full results: https://app.coval.dev/runs/<run_id>

  Saved as template: "First Eval - <Use Case>"
  Re-run: coval runs launch --agent-id <id> --persona-id <id> --test-set-id <id>

Suggest next steps:

Add more test cases:

coval test-cases create --test-set-id <id> --input "..."

Schedule recurring runs:

coval scheduled-runs create --template-id <id> --schedule "cron(0 9 * * MON)"

Listen to recordings:

coval simulations audio <sim_id> -o recording.wav

Iterate on metrics based on results

监控运行状态：

bash

coval runs watch <run_id>

运行完成后，获取结果：

bash

coval runs get <run_id> --format json
coval simulations list --filter "run_id=\"<run_id>\"" --format json

为每个模拟获取指标：

bash

coval simulations metrics <simulation_id> --format json

展示汇总结果：

评估完成！

  运行任务：          首次评估 - <用例名称>
  测试用例数：   <数量>
  迭代次数：   <数量>
  状态：       COMPLETED

  结果：
  | 测试用例                    | 得分 | 状态 |
  |------------------------------|-------|--------|
  | 正常路径 — <名称>          | 0.85  | PASS   |
  | 边缘场景 — <名称>           | 0.60  | WARN   |
  | 合规性 — <名称>          | 1.00  | PASS   |

  查看完整结果：https://app.coval.dev/runs/<run_id>

  已保存为模板："首次评估 - <用例名称>"
  重新运行：coval runs launch --agent-id <id> --persona-id <id> --test-set-id <id>

建议后续步骤：

添加更多测试用例：

coval test-cases create --test-set-id <id> --input "..."

安排定期运行：

coval scheduled-runs create --template-id <id> --schedule "cron(0 9 * * MON)"

收听录音：

coval simulations audio <sim_id> -o recording.wav

根据结果迭代优化指标