configure-metrics

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Configure Metrics

配置指标

Guide the user through selecting, creating, and attaching evaluation metrics for their AI agent using the

coval

CLI. Follow the phases below in order.

$ARGUMENTS

contains an agent name or use case, use it to skip the relevant question in Phase 1.

指导用户使用

coval

CLI为其AI Agent选择、创建并附加评估指标。请按以下阶段依次执行。

如果

$ARGUMENTS

包含Agent名称或用例，可利用该信息跳过阶段1中的相关问题。

Phase 0: Preflight + Inventory

阶段0：预检与资源盘点

Step 1: Check authentication

步骤1：检查身份验证

bash

coval whoami

If not authenticated, guide the user:

bash

coval login

This prompts for an API key. Get one at https://app.coval.dev/settings (Organization > Manage > API Keys).

If the user doesn't have a Coval account, direct them to https://coval.dev to sign up.

bash

coval whoami

若未通过身份验证，指导用户执行：

bash

coval login

此命令会提示输入API密钥。可前往https://app.coval.dev/settings（组织 > 管理 > API密钥）获取密钥。

若用户没有Coval账户，引导其前往https://coval.dev注册。

Step 2: Inventory existing resources

步骤2：盘点现有资源

Run these in parallel:

bash

coval metrics list --format json
coval metrics list --include-builtin --format json
coval agents list --format json

Categorize the metrics inventory:

Built-in: Metrics with
```
created_by: "Coval"
```
in the
```
--include-builtin
```
response. These are platform-provided and exist in every org (e.g., Latency, Turn Count, Audio Duration, Transcript Sentiment Analysis, etc.)
Custom: User-created metrics (llm-binary, audio-binary, pause types)

Note the IDs of relevant built-in metrics — you'll need them for Phase 5.

并行运行以下命令：

bash

coval metrics list --format json
coval metrics list --include-builtin --format json
coval agents list --format json

对指标资源进行分类：

Built-in：在
```
--include-builtin
```
响应中带有
```
created_by: "Coval"
```
的指标。这些是平台提供的指标，存在于每个组织中（例如：Latency、Turn Count、Audio Duration、Transcript Sentiment Analysis等）
Custom：用户创建的指标（llm-binary、audio-binary、pause类型）

记录相关Built-in指标的ID——阶段5会用到这些ID。

Phase 1: Agent + Use Case Context

阶段1：Agent与用例上下文

Ask:

"Which agent are these metrics for?"
- Present existing agents as a numbered list from the inventory
- If
```
$ARGUMENTS
```
  matches an agent name, select it automatically
"What does your agent do?" (if not obvious from agent name or prompt)
- customer_support — Customer Support
- scheduling_booking — Scheduling & Booking
- sales — Sales
- insurance_claims — Insurance Claims
- healthcare_intake — Healthcare Intake
- restaurant_orders — Restaurant Orders
- debt_collection — Debt Collection
- it_helpdesk — IT Helpdesk
- other — Other (describe it)

Capture the agent's

type

(voice, outbound-voice, chat, etc.) from the agent record — this determines whether audio metrics apply.

询问用户：

"这些指标是为哪个Agent配置的？"
- 将盘点得到的现有Agent以编号列表形式展示
- 如果
```
$ARGUMENTS
```
  匹配某个Agent名称，自动选中该Agent
"你的Agent用于什么场景？"（若从Agent名称或提示中无法明确）
- customer_support — 客户支持
- scheduling_booking — 日程安排与预订
- sales — 销售
- insurance_claims — 保险理赔
- healthcare_intake — 医疗问诊录入
- restaurant_orders — 餐厅订单
- debt_collection — 债务催收
- it_helpdesk — IT技术支持
- other — 其他（请描述）

从Agent记录中获取其

type

（语音、外呼语音、聊天等）——这将决定是否适用音频类指标。

Phase 2: Metric Recommendations

阶段2：指标推荐

Load

references/metric-recommendations.md

and build the recommendation list.

Built-in metrics (discover dynamically from

coval metrics list --include-builtin --format json

, look for

created_by: "Coval"

Select relevant built-ins based on agent type:
- All agents: Latency, Turn Count
- Voice agents: Audio Duration, Transcript Sentiment Analysis, Audio Sentiment, Speech Tempo, Time To First Audio, Interruption Rate, Background Noise
- Chat agents: Words Per Message, Transcript Sentiment Analysis

Use-case specific:

One custom llm-binary metric per vertical (from recommendations file)

Voice agents only (type = voice or outbound-voice):

Professional Tone (audio-binary) — custom, needs creation
Pause Detection (pause, min 3.0s) — custom, needs creation

Present the recommendations:

Based on your <use case> agent, I recommend these metrics:

  [built-in]  Latency                 — Response time measurement
  [built-in]  Turn Count              — Number of conversation turns
  [built-in]  <other relevant built-ins based on agent type>
  [custom]    <Use Case Metric>       — <description from recommendations>
  [audio]     Professional Tone       — Voice quality (voice agents only)
  [audio]     Pause Detection         — Flags pauses > 3s (voice agents only)

Tip: List all available built-ins with
coval metrics list --include-builtin --format json
and identify them by
created_by: "Coval"
. Recommend the ones most relevant to the user's agent type and use case.

Ask: "Accept these metrics? (yes / add more / remove some)"

yes → proceed to Phase 3
add more → ask what additional criteria they want to measure, add to list
remove some → present numbered list, let them deselect

加载

references/metric-recommendations.md

并生成推荐列表。

Built-in指标（从

coval metrics list --include-builtin --format json

动态获取，筛选

created_by: "Coval"

的条目）：

根据Agent类型选择相关指标：
- 所有Agent：Latency、Turn Count
- 语音Agent：Audio Duration、Transcript Sentiment Analysis、Audio Sentiment、Speech Tempo、Time To First Audio、Interruption Rate、Background Noise
- 聊天Agent：Words Per Message、Transcript Sentiment Analysis

特定用例指标：

每个垂直领域对应一个自定义llm-binary指标（来自推荐文件）

仅适用于语音Agent（type为voice或outbound-voice）：

Professional Tone（audio-binary）——自定义指标，需创建
Pause Detection（pause，最小时长3.0s）——自定义指标，需创建

展示推荐内容：

基于你的<用例>Agent，我推荐以下指标：

  [built-in]  Latency                 — 响应时间测量
  [built-in]  Turn Count              — 对话轮次数量
  [built-in]  <其他与Agent类型相关的Built-in指标> — <描述>
  [custom]    <用例专属指标>       — <来自推荐文件的描述>
  [audio]     Professional Tone       — 语音质量（仅适用于语音Agent）
  [audio]     Pause Detection         — 标记时长超过3秒的停顿（仅适用于语音Agent）

提示：使用
coval metrics list --include-builtin --format json
列出所有可用Built-in指标，通过
created_by: "Coval"
识别它们。推荐与用户Agent类型和用例最相关的指标。

询问："是否接受这些指标？（是/添加更多/移除部分）"

是 → 进入阶段3
添加更多 → 询问用户想要测量的额外标准，添加到列表中
移除部分 → 展示编号列表，让用户取消选择

Phase 3: Custom Metric Creation

阶段3：自定义指标创建

For each custom metric in the accepted list, guide through creation:

Name and description — pre-filled from recommendations, confirm with user
Type selection — load
```
references/metric-types.md
```
if the user wants to understand options
Configuration:
- For llm-binary: Use the prompt template from recommendations. Ask if they want to customize it.
- For audio-binary: Use the prompt from recommendations. Customize if needed.
- For pause: Confirm min duration threshold (default 3.0s).

Create each metric:

bash

undefined

针对已接受列表中的每个自定义指标，指导用户完成创建：

名称与描述——从推荐文件中预填充，与用户确认
类型选择——若用户想要了解选项，加载
```
references/metric-types.md
```
配置：
- 对于llm-binary：使用推荐文件中的提示模板。询问用户是否需要自定义。
- 对于audio-binary：使用推荐文件中的提示。如有需要可自定义。
- 对于pause：确认最小时长阈值（默认3.0s）。

创建每个指标：

bash

undefined

LLM Binary metric

LLM Binary指标

coval metrics create
--name "<name>"
--description "<description>"
--type llm-binary
--prompt "<evaluation prompt>"
--format json

coval metrics create
--name "<名称>"
--description "<描述>"
--type llm-binary
--prompt "<评估提示>"
--format json

Audio Binary metric (voice only)

Audio Binary指标（仅适用于语音Agent）

coval metrics create
--name "<name>"
--description "<description>"
--type audio-binary
--prompt "<prompt>"
--format json

coval metrics create
--name "<名称>"
--description "<描述>"
--type audio-binary
--prompt "<提示>"
--format json

Pause metric (voice only)

Pause指标（仅适用于语音Agent）

coval metrics create
--name "<name>"
--description "<description>"
--type pause
--min-pause-duration 3.0
--format json


Capture the `metric_id` from each JSON response.

coval metrics create
--name "<名称>"
--description "<描述>"
--type pause
--min-pause-duration 3.0
--format json


从每个JSON响应中记录`metric_id`。

Phase 4: Critical Requirement Metric

阶段4：关键需求指标

Ask: "What's the #1 thing your agent MUST get right?"

If the user provides a requirement:

Create an additional llm-binary metric using the critical requirement template from
```
references/metric-recommendations.md
```
Convert the user's requirement into a short Title Case metric name — do NOT use the raw requirement text as the name. Follow the built-in metric naming convention: short noun phrases like "Caller Identity Verification", "Issue Resolution", "Order Accuracy". Examples:
- "The agent must verify caller identity before sharing account details" →
```
"Caller Identity Verification"
```
- "The agent should never promise features that don't exist" →
```
"Feature Claim Accuracy"
```
- "Make sure the agent collects the policy number" →
```
"Policy Number Collection"
```
Use the user's full requirement text in the
```
--prompt
```
and
```
--description
```
fields — that's where the detail belongs.

bash

coval metrics create \
  --name "<short Title Case name>" \
  --description "<user's full requirement text>" \
  --type llm-binary \
  --prompt "Given the transcript, did the agent satisfy this requirement: <user's requirement>? Return YES if the requirement was met. Return NO if the requirement was violated or not addressed." \
  --format json

Capture the

metric_id

If the user says "none" or "skip", proceed without creating this metric.

询问："你的Agent必须做到的最重要的事情是什么？"

若用户提供需求：

使用
```
references/metric-recommendations.md
```
中的关键需求模板创建一个额外的llm-binary指标
将用户的需求转换为简短的标题式指标名称——不要直接使用原始需求文本作为名称。遵循Built-in指标的命名规范：使用简短的名词短语，如"Caller Identity Verification"、"Issue Resolution"、"Order Accuracy"。示例：
- "Agent必须在共享账户详情前验证来电者身份" →
```
"Caller Identity Verification"
```
- "Agent绝不能承诺不存在的功能" →
```
"Feature Claim Accuracy"
```
- "确保Agent收集保单编号" →
```
"Policy Number Collection"
```
将用户的完整需求文本填入
```
--prompt
```
和
```
--description
```
字段——细节应放在这些位置。

bash

coval metrics create \
  --name "<简短标题式名称>" \
  --description "<用户的完整需求文本>" \
  --type llm-binary \
  --prompt "给定对话 transcript，Agent是否满足以下需求：<用户的需求>？若需求已满足返回YES，若需求被违反或未处理返回NO。" \
  --format json

记录

metric_id

。

若用户回答"没有"或"跳过"，则无需创建此指标，直接进入下一阶段。

Phase 5: Attach to Agent

阶段5：附加到Agent

Collect all metric IDs:

Built-in metric IDs from Phase 0 inventory
Newly created custom metric IDs from Phases 3 and 4

Offer to attach as agent defaults:

I'll attach these metrics as defaults for <agent name>:

  <metric name 1>  (<metric_id>)
  <metric name 2>  (<metric_id>)
  ...

These will automatically apply to every evaluation run for this agent.

Ask: "Attach these as defaults? (yes / no)"

If yes:

bash

coval agents update <agent_id> --metric-ids <comma_separated_ids>

收集所有指标ID：

阶段0资源盘点中获取的Built-in指标ID
阶段3和阶段4中新创建的自定义指标ID

提议将这些指标设置为Agent默认配置：

我将把这些指标设置为<Agent名称>的默认配置：

  <指标名称1>  (<metric_id>)
  <指标名称2>  (<metric_id>)
  ...

这些指标将自动应用于该Agent的每次评估运行。

询问："是否设置为默认配置？（是/否）"

若用户选择是：

bash

coval agents update <agent_id> --metric-ids <逗号分隔的ID列表>

Phase 6: Summary + Next Steps

阶段6：总结与后续步骤

Present all configured metrics:

Metrics configured for <agent name>:

  Type         Name                      ID
  ──────────   ────────────────────────  ──────────────────────
  built-in     Latency                   <id>
  built-in     Turn Count                <id>
  built-in     <other selected built-ins> <id>
  custom       <Use Case Metric>         <id>
  custom       <Critical Requirement>    <id>
  audio        Professional Tone         <id>
  audio        Pause Detection           <id>

  Attached to agent: <agent name> (<agent_id>)

Suggest next steps:

Build test cases: "Use
```
/build-test-suite
```
to create test scenarios"
Design persona: "Use
```
/design-persona
```
to create a simulated caller"
Launch evaluation: "Use
```
/quick-eval
```
to run your first evaluation"
If new metrics were created: "Use
```
/build-dashboard
```
to add your new metrics to a dashboard so you can track them visually"

展示所有已配置的指标：

已为<Agent名称>配置以下指标：

  类型         名称                      ID
  ──────────   ────────────────────────  ──────────────────────
  built-in     Latency                   <ID>
  built-in     Turn Count                <ID>
  built-in     <其他选中的Built-in指标> <ID>
  custom       <用例专属指标>         <ID>
  custom       <关键需求指标>    <ID>
  audio        Professional Tone         <ID>
  audio        Pause Detection           <ID>

  已附加到Agent：<Agent名称> (<agent_id>)

建议后续步骤：

构建测试用例："使用
```
/build-test-suite
```
创建测试场景"
设计角色："使用
```
/design-persona
```
创建模拟来电者"
启动评估："使用
```
/quick-eval
```
运行首次评估"
若创建了新指标："使用
```
/build-dashboard
```
将新指标添加到仪表板，以便可视化追踪"