huggingface-import

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

HuggingFace to Coval Test Set Import

HuggingFace 转 Coval 测试集导入

Import
$ARGUMENTS
from HuggingFace and convert it into Coval test sets with properly structured test cases.
从HuggingFace导入
$ARGUMENTS
并将其转换为结构规范的Coval测试集及测试用例。

Coval Context

Coval 背景信息

Coval is an AI evaluation platform for testing voice and conversational AI agents. It runs simulations against AI agents and measures performance with configurable metrics.
ConceptDescription
Test SetA collection of test cases, grouped by category or evaluation purpose
Test CaseA single evaluation scenario with
input
(prompt) and optional
metadata
PersonaHigh-level user character (system prompt) - separate from test cases
AgentThe AI system being evaluated
Key distinction:
  • Persona = WHO is asking (character, traits)
  • Test Case = WHAT they ask (prompts, scenarios)
Coval是一个用于测试语音和对话式AI Agent的AI评估平台。它会针对AI Agent运行模拟测试,并通过可配置的指标衡量其性能。
概念(Concept)描述
Test Set按类别或评估目的分组的测试用例集合
Test Case单个评估场景,包含
input
(提示词)和可选的
metadata
Persona高级用户角色(系统提示词)——与测试用例分离
Agent被评估的AI系统
关键区别:
  • Persona = 提问者是谁(角色、特征)
  • Test Case = 提问内容是什么(提示词、场景)

Coval API

Coval API

Base URL:
https://api.coval.dev/v1
Fetch the OpenAPI spec before making API calls:
bash
undefined
基础URL:
https://api.coval.dev/v1
调用API前先获取OpenAPI规范:
bash
undefined

List specs (no auth)

列出规范(无需授权)

Fetch specific spec

获取特定规范

Workflow

工作流程

Step 1: Identify the HuggingFace Source

步骤1:确定HuggingFace数据源

If
$ARGUMENTS
is provided, navigate to it. Otherwise ask:
What is the HuggingFace repository, space, or dataset you want to import?
Then:
  1. Navigate to the HuggingFace source
  2. Find data files (CSV, JSON, Parquet)
  3. Examine structure and fields
若已提供
$ARGUMENTS
,则直接访问该数据源。否则询问用户:
你想要导入的HuggingFace仓库、空间或数据集是什么?
然后执行以下操作:
  1. 访问HuggingFace数据源
  2. 查找数据文件(CSV、JSON、Parquet格式)
  3. 检查数据结构和字段

Step 2: Analyze Data Structure

步骤2:分析数据结构

Report to the user:
  • Total records
  • Available fields/columns
  • Existing categorization
  • 2-3 sample records
向用户汇报以下信息:
  • 记录总数
  • 可用字段/列
  • 现有分类方式
  • 2-3条样本记录

Step 3: Interactive Field Mapping

步骤3:交互式字段映射

Ask these questions to map HuggingFace data to Coval format:
Q1: Input Field
Which field contains the question/prompt for the test case
input
?
Q2: Categorization
How should test cases be organized into test sets?
  • By existing category field
  • Single test set
  • Custom logic
Q3: Metadata
Which fields should be preserved in
metadata
JSON? (Recommend: preserve original IDs like
question_id
)
Q4: Multi-turn (if applicable)
How to handle multi-turn conversations?
  • First turn only
  • Concatenate turns
  • Separate test cases per turn
询问以下问题,将HuggingFace数据映射为Coval格式:
问题1:输入字段
哪个字段包含测试用例
input
对应的问题/提示词?
问题2:分类方式
测试用例应如何组织为测试集?
  • 按现有分类字段分组
  • 单个测试集
  • 自定义逻辑
问题3:元数据
哪些字段应保留在
metadata
JSON中? (建议:保留原始ID,如
question_id
问题4:多轮对话(如适用)
如何处理多轮对话?
  • 仅保留第一轮
  • 拼接所有轮次
  • 每轮对话单独作为测试用例

Step 4: Generate CSVs

步骤4:生成CSV文件

Create Coval-compatible CSVs:
csv
input,metadata
"Your question here","{""question_id"": ""123"", ""source"": ""mt-bench""}"
Requirements:
  • input
    column MUST be first
  • Proper quote escaping (double quotes)
  • metadata
    as valid JSON string
  • UTF-8 encoding
  • One CSV per category (recommended)
Naming:
{source}_{category}.csv
创建符合Coval格式的CSV文件:
csv
input,metadata
"Your question here","{""question_id"": ""123"", ""source"": ""mt-bench""}"
要求:
  • input
    列必须位于第一列
  • 正确转义引号(使用双引号)
  • metadata
    为有效的JSON字符串
  • UTF-8编码
  • 建议按类别分别生成CSV文件
命名规则:
{source}_{category}.csv

Step 5: Upload to Coval

步骤5:上传至Coval

Manual: Upload CSVs via Coval dashboard test sets page.
API: Fetch OpenAPI spec and use test set endpoints programmatically.
手动方式: 通过Coval控制台的测试集页面上传CSV文件。
API方式: 获取OpenAPI规范并通过测试集端点以编程方式上传。

Common HuggingFace Sources

常见HuggingFace数据源

General Language Understanding

通用语言理解

DatasetDescription
cais/mmlu
15k+ multiple-choice questions across 57 subjects (STEM, humanities, law)
nyu-mll/glue
Sentence-level tasks: sentiment, entailment, linguistic acceptability
tau/commonsense_qa
Reasoning tests for everyday world knowledge
Rowan/hellaswag
Common-sense inference and completion
数据集描述
cais/mmlu
涵盖57个学科(STEM、人文、法律等)的15000+道多项选择题
nyu-mll/glue
句子级任务:情感分析、文本蕴含、语言可接受性
tau/commonsense_qa
针对日常世界知识的推理测试
Rowan/hellaswag
常识推理与补全任务

Reasoning & Problem-Solving

推理与问题解决

DatasetDescription
openai/gsm8k
~8k grade-school math word problems (multi-step arithmetic)
ucinlp/drop
Reading comprehension with discrete operations
lukaemon/bbh
BigBench Hard - challenging reasoning subset
数据集描述
openai/gsm8k
约8000道小学数学生应用题(多步算术运算)
ucinlp/drop
包含离散操作的阅读理解任务
lukaemon/bbh
BigBench Hard - 具有挑战性的推理子集

Supporting Files

支持文件

  • For Python transformation example, see examples/huggingface-import.py
  • Python转换示例请查看examples/huggingface-import.py

Checklist

检查清单

  • Identified input field
  • Determined categorization
  • Preserved original IDs in metadata
  • Proper quote escaping
  • Valid JSON in metadata
  • Separate CSVs per category
  • 已确定输入字段
  • 已确定分类方式
  • 已在元数据中保留原始ID
  • 已正确转义引号
  • 元数据为有效JSON
  • 已按类别生成独立CSV文件