Loading...
Loading...
Import datasets from HuggingFace and convert them to Coval test sets. Use when the user wants to create test cases from HuggingFace dataset or repository.
npx skill4agent add coval-ai/coval-external-skills huggingface-import$ARGUMENTS| Concept | Description |
|---|---|
| Test Set | A collection of test cases, grouped by category or evaluation purpose |
| Test Case | A single evaluation scenario with |
| Persona | High-level user character (system prompt) - separate from test cases |
| Agent | The AI system being evaluated |
https://api.coval.dev/v1# List specs (no auth)
GET https://api.coval.dev/v1/openapi
# Fetch specific spec
GET https://api.coval.dev/v1/openapi/{spec_name}$ARGUMENTSWhat is the HuggingFace repository, space, or dataset you want to import?
Which field contains the question/prompt for the test case?input
How should test cases be organized into test sets?
- By existing category field
- Single test set
- Custom logic
Which fields should be preserved inJSON? (Recommend: preserve original IDs likemetadata)question_id
How to handle multi-turn conversations?
- First turn only
- Concatenate turns
- Separate test cases per turn
input,metadata
"Your question here","{""question_id"": ""123"", ""source"": ""mt-bench""}"inputmetadata{source}_{category}.csv| Dataset | Description |
|---|---|
| 15k+ multiple-choice questions across 57 subjects (STEM, humanities, law) |
| Sentence-level tasks: sentiment, entailment, linguistic acceptability |
| Reasoning tests for everyday world knowledge |
| Common-sense inference and completion |
| Dataset | Description |
|---|---|
| ~8k grade-school math word problems (multi-step arithmetic) |
| Reading comprehension with discrete operations |
| BigBench Hard - challenging reasoning subset |