configure-metrics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Configure Metrics

配置指标

Guide the user through selecting, creating, and attaching evaluation metrics for their AI agent using the
coval
CLI. Follow the phases below in order.
If
$ARGUMENTS
contains an agent name or use case, use it to skip the relevant question in Phase 1.
指导用户使用
coval
CLI为其AI Agent选择、创建并附加评估指标。请按以下阶段依次执行。
如果
$ARGUMENTS
包含Agent名称或用例,可利用该信息跳过阶段1中的相关问题。

Phase 0: Preflight + Inventory

阶段0:预检与资源盘点

Step 1: Check authentication

步骤1:检查身份验证

bash
coval whoami
If not authenticated, guide the user:
bash
coval login
This prompts for an API key. Get one at https://app.coval.dev/settings (Organization > Manage > API Keys).
If the user doesn't have a Coval account, direct them to https://coval.dev to sign up.
bash
coval whoami
若未通过身份验证,指导用户执行:
bash
coval login
此命令会提示输入API密钥。可前往https://app.coval.dev/settings(组织 > 管理 > API密钥)获取密钥。
若用户没有Coval账户,引导其前往https://coval.dev注册。

Step 2: Inventory existing resources

步骤2:盘点现有资源

Run these in parallel:
bash
coval metrics list --format json
coval metrics list --include-builtin --format json
coval agents list --format json
Categorize the metrics inventory:
  • Built-in: Metrics with
    created_by: "Coval"
    in the
    --include-builtin
    response. These are platform-provided and exist in every org (e.g., Latency, Turn Count, Audio Duration, Transcript Sentiment Analysis, etc.)
  • Custom: User-created metrics (llm-binary, audio-binary, pause types)
Note the IDs of relevant built-in metrics — you'll need them for Phase 5.
并行运行以下命令:
bash
coval metrics list --format json
coval metrics list --include-builtin --format json
coval agents list --format json
对指标资源进行分类:
  • Built-in:在
    --include-builtin
    响应中带有
    created_by: "Coval"
    的指标。这些是平台提供的指标,存在于每个组织中(例如:Latency、Turn Count、Audio Duration、Transcript Sentiment Analysis等)
  • Custom:用户创建的指标(llm-binary、audio-binary、pause类型)
记录相关Built-in指标的ID——阶段5会用到这些ID。

Phase 1: Agent + Use Case Context

阶段1:Agent与用例上下文

Ask:
  1. "Which agent are these metrics for?"
    • Present existing agents as a numbered list from the inventory
    • If
      $ARGUMENTS
      matches an agent name, select it automatically
  2. "What does your agent do?" (if not obvious from agent name or prompt)
    • customer_support — Customer Support
    • scheduling_booking — Scheduling & Booking
    • sales — Sales
    • insurance_claims — Insurance Claims
    • healthcare_intake — Healthcare Intake
    • restaurant_orders — Restaurant Orders
    • debt_collection — Debt Collection
    • it_helpdesk — IT Helpdesk
    • other — Other (describe it)
Capture the agent's
type
(voice, outbound-voice, chat, etc.) from the agent record — this determines whether audio metrics apply.
询问用户:
  1. "这些指标是为哪个Agent配置的?"
    • 将盘点得到的现有Agent以编号列表形式展示
    • 如果
      $ARGUMENTS
      匹配某个Agent名称,自动选中该Agent
  2. "你的Agent用于什么场景?"(若从Agent名称或提示中无法明确)
    • customer_support — 客户支持
    • scheduling_booking — 日程安排与预订
    • sales — 销售
    • insurance_claims — 保险理赔
    • healthcare_intake — 医疗问诊录入
    • restaurant_orders — 餐厅订单
    • debt_collection — 债务催收
    • it_helpdesk — IT技术支持
    • other — 其他(请描述)
从Agent记录中获取其
type
(语音、外呼语音、聊天等)——这将决定是否适用音频类指标。

Phase 2: Metric Recommendations

阶段2:指标推荐

Load
references/metric-recommendations.md
and build the recommendation list.
Built-in metrics (discover dynamically from
coval metrics list --include-builtin --format json
, look for
created_by: "Coval"
):
  • Select relevant built-ins based on agent type:
    • All agents: Latency, Turn Count
    • Voice agents: Audio Duration, Transcript Sentiment Analysis, Audio Sentiment, Speech Tempo, Time To First Audio, Interruption Rate, Background Noise
    • Chat agents: Words Per Message, Transcript Sentiment Analysis
Use-case specific:
  • One custom llm-binary metric per vertical (from recommendations file)
Voice agents only (type = voice or outbound-voice):
  • Professional Tone (audio-binary) — custom, needs creation
  • Pause Detection (pause, min 3.0s) — custom, needs creation
Present the recommendations:
Based on your <use case> agent, I recommend these metrics:

  [built-in]  Latency                 — Response time measurement
  [built-in]  Turn Count              — Number of conversation turns
  [built-in]  <other relevant built-ins based on agent type>
  [custom]    <Use Case Metric>       — <description from recommendations>
  [audio]     Professional Tone       — Voice quality (voice agents only)
  [audio]     Pause Detection         — Flags pauses > 3s (voice agents only)
Tip: List all available built-ins with
coval metrics list --include-builtin --format json
and identify them by
created_by: "Coval"
. Recommend the ones most relevant to the user's agent type and use case.
Ask: "Accept these metrics? (yes / add more / remove some)"
  • yes → proceed to Phase 3
  • add more → ask what additional criteria they want to measure, add to list
  • remove some → present numbered list, let them deselect
加载
references/metric-recommendations.md
并生成推荐列表。
Built-in指标(从
coval metrics list --include-builtin --format json
动态获取,筛选
created_by: "Coval"
的条目):
  • 根据Agent类型选择相关指标:
    • 所有Agent:Latency、Turn Count
    • 语音Agent:Audio Duration、Transcript Sentiment Analysis、Audio Sentiment、Speech Tempo、Time To First Audio、Interruption Rate、Background Noise
    • 聊天Agent:Words Per Message、Transcript Sentiment Analysis
特定用例指标
  • 每个垂直领域对应一个自定义llm-binary指标(来自推荐文件)
仅适用于语音Agent(type为voice或outbound-voice):
  • Professional Tone(audio-binary)——自定义指标,需创建
  • Pause Detection(pause,最小时长3.0s)——自定义指标,需创建
展示推荐内容:
基于你的<用例>Agent,我推荐以下指标:

  [built-in]  Latency                 — 响应时间测量
  [built-in]  Turn Count              — 对话轮次数量
  [built-in]  <其他与Agent类型相关的Built-in指标> — <描述>
  [custom]    <用例专属指标>       — <来自推荐文件的描述>
  [audio]     Professional Tone       — 语音质量(仅适用于语音Agent)
  [audio]     Pause Detection         — 标记时长超过3秒的停顿(仅适用于语音Agent)
提示:使用
coval metrics list --include-builtin --format json
列出所有可用Built-in指标,通过
created_by: "Coval"
识别它们。推荐与用户Agent类型和用例最相关的指标。
询问:"是否接受这些指标?(是/添加更多/移除部分)"
  • → 进入阶段3
  • 添加更多 → 询问用户想要测量的额外标准,添加到列表中
  • 移除部分 → 展示编号列表,让用户取消选择

Phase 3: Custom Metric Creation

阶段3:自定义指标创建

For each custom metric in the accepted list, guide through creation:
  1. Name and description — pre-filled from recommendations, confirm with user
  2. Type selection — load
    references/metric-types.md
    if the user wants to understand options
  3. Configuration:
    • For llm-binary: Use the prompt template from recommendations. Ask if they want to customize it.
    • For audio-binary: Use the prompt from recommendations. Customize if needed.
    • For pause: Confirm min duration threshold (default 3.0s).
Create each metric:
bash
undefined
针对已接受列表中的每个自定义指标,指导用户完成创建:
  1. 名称与描述——从推荐文件中预填充,与用户确认
  2. 类型选择——若用户想要了解选项,加载
    references/metric-types.md
  3. 配置
    • 对于llm-binary:使用推荐文件中的提示模板。询问用户是否需要自定义。
    • 对于audio-binary:使用推荐文件中的提示。如有需要可自定义。
    • 对于pause:确认最小时长阈值(默认3.0s)。
创建每个指标:
bash
undefined

LLM Binary metric

LLM Binary指标

coval metrics create
--name "<name>"
--description "<description>"
--type llm-binary
--prompt "<evaluation prompt>"
--format json
coval metrics create
--name "<名称>"
--description "<描述>"
--type llm-binary
--prompt "<评估提示>"
--format json

Audio Binary metric (voice only)

Audio Binary指标(仅适用于语音Agent)

coval metrics create
--name "<name>"
--description "<description>"
--type audio-binary
--prompt "<prompt>"
--format json
coval metrics create
--name "<名称>"
--description "<描述>"
--type audio-binary
--prompt "<提示>"
--format json

Pause metric (voice only)

Pause指标(仅适用于语音Agent)

coval metrics create
--name "<name>"
--description "<description>"
--type pause
--min-pause-duration 3.0
--format json

Capture the `metric_id` from each JSON response.
coval metrics create
--name "<名称>"
--description "<描述>"
--type pause
--min-pause-duration 3.0
--format json

从每个JSON响应中记录`metric_id`。

Phase 4: Critical Requirement Metric

阶段4:关键需求指标

Ask: "What's the #1 thing your agent MUST get right?"
If the user provides a requirement:
  1. Create an additional llm-binary metric using the critical requirement template from
    references/metric-recommendations.md
  2. Convert the user's requirement into a short Title Case metric name — do NOT use the raw requirement text as the name. Follow the built-in metric naming convention: short noun phrases like "Caller Identity Verification", "Issue Resolution", "Order Accuracy". Examples:
    • "The agent must verify caller identity before sharing account details" →
      "Caller Identity Verification"
    • "The agent should never promise features that don't exist" →
      "Feature Claim Accuracy"
    • "Make sure the agent collects the policy number" →
      "Policy Number Collection"
  3. Use the user's full requirement text in the
    --prompt
    and
    --description
    fields — that's where the detail belongs.
bash
coval metrics create \
  --name "<short Title Case name>" \
  --description "<user's full requirement text>" \
  --type llm-binary \
  --prompt "Given the transcript, did the agent satisfy this requirement: <user's requirement>? Return YES if the requirement was met. Return NO if the requirement was violated or not addressed." \
  --format json
Capture the
metric_id
.
If the user says "none" or "skip", proceed without creating this metric.
询问:"你的Agent必须做到的最重要的事情是什么?"
若用户提供需求:
  1. 使用
    references/metric-recommendations.md
    中的关键需求模板创建一个额外的llm-binary指标
  2. 将用户的需求转换为简短的标题式指标名称——不要直接使用原始需求文本作为名称。遵循Built-in指标的命名规范:使用简短的名词短语,如"Caller Identity Verification"、"Issue Resolution"、"Order Accuracy"。示例:
    • "Agent必须在共享账户详情前验证来电者身份" →
      "Caller Identity Verification"
    • "Agent绝不能承诺不存在的功能" →
      "Feature Claim Accuracy"
    • "确保Agent收集保单编号" →
      "Policy Number Collection"
  3. 将用户的完整需求文本填入
    --prompt
    --description
    字段——细节应放在这些位置。
bash
coval metrics create \
  --name "<简短标题式名称>" \
  --description "<用户的完整需求文本>" \
  --type llm-binary \
  --prompt "给定对话 transcript,Agent是否满足以下需求:<用户的需求>?若需求已满足返回YES,若需求被违反或未处理返回NO。" \
  --format json
记录
metric_id
若用户回答"没有"或"跳过",则无需创建此指标,直接进入下一阶段。

Phase 5: Attach to Agent

阶段5:附加到Agent

Collect all metric IDs:
  • Built-in metric IDs from Phase 0 inventory
  • Newly created custom metric IDs from Phases 3 and 4
Offer to attach as agent defaults:
I'll attach these metrics as defaults for <agent name>:

  <metric name 1>  (<metric_id>)
  <metric name 2>  (<metric_id>)
  ...

These will automatically apply to every evaluation run for this agent.
Ask: "Attach these as defaults? (yes / no)"
If yes:
bash
coval agents update <agent_id> --metric-ids <comma_separated_ids>
收集所有指标ID:
  • 阶段0资源盘点中获取的Built-in指标ID
  • 阶段3和阶段4中新创建的自定义指标ID
提议将这些指标设置为Agent默认配置:
我将把这些指标设置为<Agent名称>的默认配置:

  <指标名称1>  (<metric_id>)
  <指标名称2>  (<metric_id>)
  ...

这些指标将自动应用于该Agent的每次评估运行。
询问:"是否设置为默认配置?(是/否)"
若用户选择是:
bash
coval agents update <agent_id> --metric-ids <逗号分隔的ID列表>

Phase 6: Summary + Next Steps

阶段6:总结与后续步骤

Present all configured metrics:
Metrics configured for <agent name>:

  Type         Name                      ID
  ──────────   ────────────────────────  ──────────────────────
  built-in     Latency                   <id>
  built-in     Turn Count                <id>
  built-in     <other selected built-ins> <id>
  custom       <Use Case Metric>         <id>
  custom       <Critical Requirement>    <id>
  audio        Professional Tone         <id>
  audio        Pause Detection           <id>

  Attached to agent: <agent name> (<agent_id>)
Suggest next steps:
  • Build test cases: "Use
    /build-test-suite
    to create test scenarios"
  • Design persona: "Use
    /design-persona
    to create a simulated caller"
  • Launch evaluation: "Use
    /quick-eval
    to run your first evaluation"
  • If new metrics were created: "Use
    /build-dashboard
    to add your new metrics to a dashboard so you can track them visually"
展示所有已配置的指标:
已为<Agent名称>配置以下指标:

  类型         名称                      ID
  ──────────   ────────────────────────  ──────────────────────
  built-in     Latency                   <ID>
  built-in     Turn Count                <ID>
  built-in     <其他选中的Built-in指标> <ID>
  custom       <用例专属指标>         <ID>
  custom       <关键需求指标>    <ID>
  audio        Professional Tone         <ID>
  audio        Pause Detection           <ID>

  已附加到Agent:<Agent名称> (<agent_id>)
建议后续步骤:
  • 构建测试用例:"使用
    /build-test-suite
    创建测试场景"
  • 设计角色:"使用
    /design-persona
    创建模拟来电者"
  • 启动评估:"使用
    /quick-eval
    运行首次评估"
  • 若创建了新指标:"使用
    /build-dashboard
    将新指标添加到仪表板,以便可视化追踪"