configure-metrics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseConfigure Metrics
配置指标
Guide the user through selecting, creating, and attaching evaluation metrics for their AI agent using the CLI. Follow the phases below in order.
covalIf contains an agent name or use case, use it to skip the relevant question in Phase 1.
$ARGUMENTS指导用户使用 CLI为其AI Agent选择、创建并附加评估指标。请按以下阶段依次执行。
coval如果包含Agent名称或用例,可利用该信息跳过阶段1中的相关问题。
$ARGUMENTSPhase 0: Preflight + Inventory
阶段0:预检与资源盘点
Step 1: Check authentication
步骤1:检查身份验证
bash
coval whoamiIf not authenticated, guide the user:
bash
coval loginThis prompts for an API key. Get one at https://app.coval.dev/settings (Organization > Manage > API Keys).
If the user doesn't have a Coval account, direct them to https://coval.dev to sign up.
bash
coval whoami若未通过身份验证,指导用户执行:
bash
coval login此命令会提示输入API密钥。可前往https://app.coval.dev/settings(组织 > 管理 > API密钥)获取密钥。
若用户没有Coval账户,引导其前往https://coval.dev注册。
Step 2: Inventory existing resources
步骤2:盘点现有资源
Run these in parallel:
bash
coval metrics list --format json
coval metrics list --include-builtin --format json
coval agents list --format jsonCategorize the metrics inventory:
- Built-in: Metrics with in the
created_by: "Coval"response. These are platform-provided and exist in every org (e.g., Latency, Turn Count, Audio Duration, Transcript Sentiment Analysis, etc.)--include-builtin - Custom: User-created metrics (llm-binary, audio-binary, pause types)
Note the IDs of relevant built-in metrics — you'll need them for Phase 5.
并行运行以下命令:
bash
coval metrics list --format json
coval metrics list --include-builtin --format json
coval agents list --format json对指标资源进行分类:
- Built-in:在响应中带有
--include-builtin的指标。这些是平台提供的指标,存在于每个组织中(例如:Latency、Turn Count、Audio Duration、Transcript Sentiment Analysis等)created_by: "Coval" - Custom:用户创建的指标(llm-binary、audio-binary、pause类型)
记录相关Built-in指标的ID——阶段5会用到这些ID。
Phase 1: Agent + Use Case Context
阶段1:Agent与用例上下文
Ask:
-
"Which agent are these metrics for?"
- Present existing agents as a numbered list from the inventory
- If matches an agent name, select it automatically
$ARGUMENTS
-
"What does your agent do?" (if not obvious from agent name or prompt)
- customer_support — Customer Support
- scheduling_booking — Scheduling & Booking
- sales — Sales
- insurance_claims — Insurance Claims
- healthcare_intake — Healthcare Intake
- restaurant_orders — Restaurant Orders
- debt_collection — Debt Collection
- it_helpdesk — IT Helpdesk
- other — Other (describe it)
Capture the agent's (voice, outbound-voice, chat, etc.) from the agent record — this determines whether audio metrics apply.
type询问用户:
-
"这些指标是为哪个Agent配置的?"
- 将盘点得到的现有Agent以编号列表形式展示
- 如果匹配某个Agent名称,自动选中该Agent
$ARGUMENTS
-
"你的Agent用于什么场景?"(若从Agent名称或提示中无法明确)
- customer_support — 客户支持
- scheduling_booking — 日程安排与预订
- sales — 销售
- insurance_claims — 保险理赔
- healthcare_intake — 医疗问诊录入
- restaurant_orders — 餐厅订单
- debt_collection — 债务催收
- it_helpdesk — IT技术支持
- other — 其他(请描述)
从Agent记录中获取其(语音、外呼语音、聊天等)——这将决定是否适用音频类指标。
typePhase 2: Metric Recommendations
阶段2:指标推荐
Load and build the recommendation list.
references/metric-recommendations.mdBuilt-in metrics (discover dynamically from , look for ):
coval metrics list --include-builtin --format jsoncreated_by: "Coval"- Select relevant built-ins based on agent type:
- All agents: Latency, Turn Count
- Voice agents: Audio Duration, Transcript Sentiment Analysis, Audio Sentiment, Speech Tempo, Time To First Audio, Interruption Rate, Background Noise
- Chat agents: Words Per Message, Transcript Sentiment Analysis
Use-case specific:
- One custom llm-binary metric per vertical (from recommendations file)
Voice agents only (type = voice or outbound-voice):
- Professional Tone (audio-binary) — custom, needs creation
- Pause Detection (pause, min 3.0s) — custom, needs creation
Present the recommendations:
Based on your <use case> agent, I recommend these metrics:
[built-in] Latency — Response time measurement
[built-in] Turn Count — Number of conversation turns
[built-in] <other relevant built-ins based on agent type>
[custom] <Use Case Metric> — <description from recommendations>
[audio] Professional Tone — Voice quality (voice agents only)
[audio] Pause Detection — Flags pauses > 3s (voice agents only)Tip: List all available built-ins withand identify them bycoval metrics list --include-builtin --format json. Recommend the ones most relevant to the user's agent type and use case.created_by: "Coval"
Ask: "Accept these metrics? (yes / add more / remove some)"
- yes → proceed to Phase 3
- add more → ask what additional criteria they want to measure, add to list
- remove some → present numbered list, let them deselect
加载并生成推荐列表。
references/metric-recommendations.mdBuilt-in指标(从动态获取,筛选的条目):
coval metrics list --include-builtin --format jsoncreated_by: "Coval"- 根据Agent类型选择相关指标:
- 所有Agent:Latency、Turn Count
- 语音Agent:Audio Duration、Transcript Sentiment Analysis、Audio Sentiment、Speech Tempo、Time To First Audio、Interruption Rate、Background Noise
- 聊天Agent:Words Per Message、Transcript Sentiment Analysis
特定用例指标:
- 每个垂直领域对应一个自定义llm-binary指标(来自推荐文件)
仅适用于语音Agent(type为voice或outbound-voice):
- Professional Tone(audio-binary)——自定义指标,需创建
- Pause Detection(pause,最小时长3.0s)——自定义指标,需创建
展示推荐内容:
基于你的<用例>Agent,我推荐以下指标:
[built-in] Latency — 响应时间测量
[built-in] Turn Count — 对话轮次数量
[built-in] <其他与Agent类型相关的Built-in指标> — <描述>
[custom] <用例专属指标> — <来自推荐文件的描述>
[audio] Professional Tone — 语音质量(仅适用于语音Agent)
[audio] Pause Detection — 标记时长超过3秒的停顿(仅适用于语音Agent)提示:使用列出所有可用Built-in指标,通过coval metrics list --include-builtin --format json识别它们。推荐与用户Agent类型和用例最相关的指标。created_by: "Coval"
询问:"是否接受这些指标?(是/添加更多/移除部分)"
- 是 → 进入阶段3
- 添加更多 → 询问用户想要测量的额外标准,添加到列表中
- 移除部分 → 展示编号列表,让用户取消选择
Phase 3: Custom Metric Creation
阶段3:自定义指标创建
For each custom metric in the accepted list, guide through creation:
- Name and description — pre-filled from recommendations, confirm with user
- Type selection — load if the user wants to understand options
references/metric-types.md - Configuration:
- For llm-binary: Use the prompt template from recommendations. Ask if they want to customize it.
- For audio-binary: Use the prompt from recommendations. Customize if needed.
- For pause: Confirm min duration threshold (default 3.0s).
Create each metric:
bash
undefined针对已接受列表中的每个自定义指标,指导用户完成创建:
- 名称与描述——从推荐文件中预填充,与用户确认
- 类型选择——若用户想要了解选项,加载
references/metric-types.md - 配置:
- 对于llm-binary:使用推荐文件中的提示模板。询问用户是否需要自定义。
- 对于audio-binary:使用推荐文件中的提示。如有需要可自定义。
- 对于pause:确认最小时长阈值(默认3.0s)。
创建每个指标:
bash
undefinedLLM Binary metric
LLM Binary指标
coval metrics create
--name "<name>"
--description "<description>"
--type llm-binary
--prompt "<evaluation prompt>"
--format json
--name "<name>"
--description "<description>"
--type llm-binary
--prompt "<evaluation prompt>"
--format json
coval metrics create
--name "<名称>"
--description "<描述>"
--type llm-binary
--prompt "<评估提示>"
--format json
--name "<名称>"
--description "<描述>"
--type llm-binary
--prompt "<评估提示>"
--format json
Audio Binary metric (voice only)
Audio Binary指标(仅适用于语音Agent)
coval metrics create
--name "<name>"
--description "<description>"
--type audio-binary
--prompt "<prompt>"
--format json
--name "<name>"
--description "<description>"
--type audio-binary
--prompt "<prompt>"
--format json
coval metrics create
--name "<名称>"
--description "<描述>"
--type audio-binary
--prompt "<提示>"
--format json
--name "<名称>"
--description "<描述>"
--type audio-binary
--prompt "<提示>"
--format json
Pause metric (voice only)
Pause指标(仅适用于语音Agent)
coval metrics create
--name "<name>"
--description "<description>"
--type pause
--min-pause-duration 3.0
--format json
--name "<name>"
--description "<description>"
--type pause
--min-pause-duration 3.0
--format json
Capture the `metric_id` from each JSON response.coval metrics create
--name "<名称>"
--description "<描述>"
--type pause
--min-pause-duration 3.0
--format json
--name "<名称>"
--description "<描述>"
--type pause
--min-pause-duration 3.0
--format json
从每个JSON响应中记录`metric_id`。Phase 4: Critical Requirement Metric
阶段4:关键需求指标
Ask: "What's the #1 thing your agent MUST get right?"
If the user provides a requirement:
- Create an additional llm-binary metric using the critical requirement template from
references/metric-recommendations.md - Convert the user's requirement into a short Title Case metric name — do NOT use the raw requirement text as the name. Follow the built-in metric naming convention: short noun phrases like "Caller Identity Verification", "Issue Resolution", "Order Accuracy". Examples:
- "The agent must verify caller identity before sharing account details" →
"Caller Identity Verification" - "The agent should never promise features that don't exist" →
"Feature Claim Accuracy" - "Make sure the agent collects the policy number" →
"Policy Number Collection"
- "The agent must verify caller identity before sharing account details" →
- Use the user's full requirement text in the and
--promptfields — that's where the detail belongs.--description
bash
coval metrics create \
--name "<short Title Case name>" \
--description "<user's full requirement text>" \
--type llm-binary \
--prompt "Given the transcript, did the agent satisfy this requirement: <user's requirement>? Return YES if the requirement was met. Return NO if the requirement was violated or not addressed." \
--format jsonCapture the .
metric_idIf the user says "none" or "skip", proceed without creating this metric.
询问:"你的Agent必须做到的最重要的事情是什么?"
若用户提供需求:
- 使用中的关键需求模板创建一个额外的llm-binary指标
references/metric-recommendations.md - 将用户的需求转换为简短的标题式指标名称——不要直接使用原始需求文本作为名称。遵循Built-in指标的命名规范:使用简短的名词短语,如"Caller Identity Verification"、"Issue Resolution"、"Order Accuracy"。示例:
- "Agent必须在共享账户详情前验证来电者身份" →
"Caller Identity Verification" - "Agent绝不能承诺不存在的功能" →
"Feature Claim Accuracy" - "确保Agent收集保单编号" →
"Policy Number Collection"
- "Agent必须在共享账户详情前验证来电者身份" →
- 将用户的完整需求文本填入和
--prompt字段——细节应放在这些位置。--description
bash
coval metrics create \
--name "<简短标题式名称>" \
--description "<用户的完整需求文本>" \
--type llm-binary \
--prompt "给定对话 transcript,Agent是否满足以下需求:<用户的需求>?若需求已满足返回YES,若需求被违反或未处理返回NO。" \
--format json记录。
metric_id若用户回答"没有"或"跳过",则无需创建此指标,直接进入下一阶段。
Phase 5: Attach to Agent
阶段5:附加到Agent
Collect all metric IDs:
- Built-in metric IDs from Phase 0 inventory
- Newly created custom metric IDs from Phases 3 and 4
Offer to attach as agent defaults:
I'll attach these metrics as defaults for <agent name>:
<metric name 1> (<metric_id>)
<metric name 2> (<metric_id>)
...
These will automatically apply to every evaluation run for this agent.Ask: "Attach these as defaults? (yes / no)"
If yes:
bash
coval agents update <agent_id> --metric-ids <comma_separated_ids>收集所有指标ID:
- 阶段0资源盘点中获取的Built-in指标ID
- 阶段3和阶段4中新创建的自定义指标ID
提议将这些指标设置为Agent默认配置:
我将把这些指标设置为<Agent名称>的默认配置:
<指标名称1> (<metric_id>)
<指标名称2> (<metric_id>)
...
这些指标将自动应用于该Agent的每次评估运行。询问:"是否设置为默认配置?(是/否)"
若用户选择是:
bash
coval agents update <agent_id> --metric-ids <逗号分隔的ID列表>Phase 6: Summary + Next Steps
阶段6:总结与后续步骤
Present all configured metrics:
Metrics configured for <agent name>:
Type Name ID
────────── ──────────────────────── ──────────────────────
built-in Latency <id>
built-in Turn Count <id>
built-in <other selected built-ins> <id>
custom <Use Case Metric> <id>
custom <Critical Requirement> <id>
audio Professional Tone <id>
audio Pause Detection <id>
Attached to agent: <agent name> (<agent_id>)Suggest next steps:
- Build test cases: "Use to create test scenarios"
/build-test-suite - Design persona: "Use to create a simulated caller"
/design-persona - Launch evaluation: "Use to run your first evaluation"
/quick-eval - If new metrics were created: "Use to add your new metrics to a dashboard so you can track them visually"
/build-dashboard
展示所有已配置的指标:
已为<Agent名称>配置以下指标:
类型 名称 ID
────────── ──────────────────────── ──────────────────────
built-in Latency <ID>
built-in Turn Count <ID>
built-in <其他选中的Built-in指标> <ID>
custom <用例专属指标> <ID>
custom <关键需求指标> <ID>
audio Professional Tone <ID>
audio Pause Detection <ID>
已附加到Agent:<Agent名称> (<agent_id>)建议后续步骤:
- 构建测试用例:"使用创建测试场景"
/build-test-suite - 设计角色:"使用创建模拟来电者"
/design-persona - 启动评估:"使用运行首次评估"
/quick-eval - 若创建了新指标:"使用将新指标添加到仪表板,以便可视化追踪"
/build-dashboard