Arize Dataset Skill

Concepts

Dataset = a versioned collection of examples used for evaluation and experimentation
Dataset Version = a snapshot of a dataset at a point in time; updates can be in-place or create a new version
Example = a single record in a dataset with arbitrary user-defined fields (e.g.,
```
question
```
,
```
answer
```
,
```
context
```
)
Space = an organizational container; datasets belong to a space

System-managed fields on examples (

id

created_at

updated_at

) are auto-generated by the server -- never include them in create or append payloads.

Prerequisites

Three things are needed:

ax

CLI, an API key (env var or profile), and a space ID. A project name is also needed but usually comes from the user's message.

Install ax

Verify

ax

is installed and working before proceeding:

Check if
```
ax
```
is on PATH:
```
command -v ax
```
(Unix) or
```
where ax
```
(Windows)

If not found, check common install locations:

macOS/Linux:

test -x ~/.local/bin/ax && export PATH="$HOME/.local/bin:$PATH"

Windows: check

%APPDATA%\Python\Scripts\ax.exe

%LOCALAPPDATA%\Programs\Python\Scripts\ax.exe

If still not found, install it (requires shell access to install packages):
- Preferred:
```
uv tool install arize-ax-cli
```
- Alternative:
```
pipx install arize-ax-cli
```
- Fallback:
```
pip install arize-ax-cli
```

After install, if

ax

is not on PATH:

macOS/Linux:
```
export PATH="$HOME/.local/bin:$PATH"
```

Windows (PowerShell):

$env:PATH = "$env:APPDATA\Python\Scripts;$env:PATH"

ax --version

fails with an SSL/certificate error:

macOS:
```
export SSL_CERT_FILE=/etc/ssl/cert.pem
```

Linux:

export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt

Windows (PowerShell):

$env:SSL_CERT_FILE = "C:\Program Files\Common Files\SSL\cert.pem"

(or use

python -c "import certifi; print(certifi.where())"

to find the cert bundle)

```
ax --version
```
must succeed before proceeding. If it doesn't, stop and ask the user for help.

Verify environment

Run a quick check for credentials:

macOS/Linux (bash):

bash

ax --version && echo "--- env ---" && echo "ARIZE_API_KEY: ${ARIZE_API_KEY:-(not set)}" && echo "ARIZE_SPACE_ID: ${ARIZE_SPACE_ID:-(not set)}" && echo "--- profiles ---" && ax profiles show 2>&1

Windows (PowerShell):

powershell

ax --version; Write-Host "--- env ---"; Write-Host "ARIZE_API_KEY: $env:ARIZE_API_KEY"; Write-Host "ARIZE_SPACE_ID: $env:ARIZE_SPACE_ID"; Write-Host "--- profiles ---"; ax profiles show 2>&1

Read the output and proceed immediately if either the env var or the profile has an API key. Only ask the user if both are missing. Resolve failures:

No API key in env and no profile → AskQuestion: "Arize API key (https://app.arize.com/admin > API Keys)"
Space ID unknown → AskQuestion, or run
```
ax projects list -o json --limit 100
```
and search for a match
Project unclear → ask, or run
```
ax projects list -o json --limit 100
```
and present as selectable options

Space ID and Project

Both are needed for most commands. Resolve each:

User provides it in the conversation -- use directly via
```
--space-id
```
/
```
--project
```
flags.
Env var is set (
```
ARIZE_SPACE_ID
```
,
```
ARIZE_DEFAULT_PROJECT
```
) -- use silently.
If missing, AskQuestion once. Tell the user:
- Space ID is in the Arize URL:
```
/spaces/{SPACE_ID}/...
```
- Project is the project name as shown in the Arize UI.
- For convenience, recommend setting env vars so they don't get asked again:
```
export ARIZE_SPACE_ID="U3BhY2U6..."
```
  and
```
export ARIZE_DEFAULT_PROJECT="my-project"
```

Prefer asking the user over searching or iterating through projects and API keys. If you get a

401 Unauthorized

, tell the user their API key may not have access to that space and ask them to verify.

List Datasets:

ax datasets list

Browse datasets in a space. Output goes to stdout.

bash

ax datasets list
ax datasets list --space-id SPACE_ID --limit 20
ax datasets list --cursor CURSOR_TOKEN
ax datasets list -o json

Flags

Flag	Type	Default	Description
`--space-id`	string	from profile	Filter by space
`--limit, -l`	int	15	Max results (1-100)
`--cursor`	string	none	Pagination cursor from previous response
`-o, --output`	string	table	Output format: table, json, csv, parquet, or file path
`-p, --profile`	string	default	Configuration profile

Get Dataset:

ax datasets get

Quick metadata lookup -- returns dataset name, space, timestamps, and version list.

bash

ax datasets get DATASET_ID
ax datasets get DATASET_ID -o json

Flags

Flag	Type	Default	Description
`DATASET_ID`	string	required	Positional argument
`-o, --output`	string	table	Output format
`-p, --profile`	string	default	Configuration profile

Response fields

Field	Type	Description
`id`	string	Dataset ID
`name`	string	Dataset name
`space_id`	string	Space this dataset belongs to
`created_at`	datetime	When the dataset was created
`updated_at`	datetime	Last modification time
`versions`	array	List of dataset versions (id, name, dataset_id, created_at, updated_at)

Export Dataset:

ax datasets export

Download all examples to a file. By default uses the REST API; pass

--all

to use Arrow Flight for bulk transfer.

bash

ax datasets export DATASET_ID
# -> dataset_abc123_20260305_141500/examples.json

ax datasets export DATASET_ID --all
ax datasets export DATASET_ID --version-id VERSION_ID
ax datasets export DATASET_ID --output-dir ./data
ax datasets export DATASET_ID --stdout
ax datasets export DATASET_ID --stdout | jq '.[0]'

Flags

Flag	Type	Default	Description
`DATASET_ID`	string	required	Positional argument
`--version-id`	string	latest	Export a specific dataset version
`--all`	bool	false	Use Arrow Flight for bulk export (see below)
`--output-dir`	string	`.`	Output directory
`--stdout`	bool	false	Print JSON to stdout instead of file
`-p, --profile`	string	default	Configuration profile

REST vs Flight (

--all

)

REST (default): Lower friction -- no Arrow/Flight dependency, standard HTTPS ports, works through any corporate proxy or firewall. Limited to 500 examples per page.
Flight (
```
--all
```
): Required for datasets with more than 500 examples. Uses gRPC+TLS on a separate host/port (
```
flight.arize.com:443
```
) which some corporate networks may block.

Agent auto-escalation rule: If a REST export returns exactly 500 examples, the result is likely truncated. Re-run with

--all

to get the full dataset.

Output is a JSON array of example objects. Each example has system fields (

id

created_at

updated_at

) plus all user-defined fields:

json

[
  {
    "id": "ex_001",
    "created_at": "2026-01-15T10:00:00Z",
    "updated_at": "2026-01-15T10:00:00Z",
    "question": "What is 2+2?",
    "answer": "4",
    "topic": "math"
  }
]

Create Dataset:

ax datasets create

Create a new dataset from a data file.

bash

ax datasets create --name "My Dataset" --space-id SPACE_ID --file data.csv
ax datasets create --name "My Dataset" --space-id SPACE_ID --file data.json
ax datasets create --name "My Dataset" --space-id SPACE_ID --file data.jsonl
ax datasets create --name "My Dataset" --space-id SPACE_ID --file data.parquet

Flags

Flag	Type	Required	Description
`--name, -n`	string	yes (prompted)	Dataset name
`--space-id`	string	yes (prompted)	Space to create the dataset in
`--file, -f`	path	yes (prompted)	Data file: CSV, JSON, JSONL, or Parquet
`-o, --output`	string	no	Output format for the returned dataset metadata
`-p, --profile`	string	no	Configuration profile

Supported file formats

Format	Extension	Notes
CSV	`.csv`	Column headers become field names
JSON	`.json`	Array of objects
JSON Lines	`.jsonl`	One object per line
Parquet	`.parquet`	Column names become field names

Append Examples:

ax datasets append

Add examples to an existing dataset. Two input modes -- use whichever fits.

Inline JSON (agent-friendly)

Generate the payload directly -- no temp files needed:

bash

ax datasets append DATASET_ID --json '[{"question": "What is 2+2?", "answer": "4"}]'

ax datasets append DATASET_ID --json '[
  {"question": "What is gravity?", "answer": "A fundamental force..."},
  {"question": "What is light?", "answer": "Electromagnetic radiation..."}
]'

From a file

bash

ax datasets append DATASET_ID --file new_examples.csv
ax datasets append DATASET_ID --file additions.json

To a specific version

bash

ax datasets append DATASET_ID --json '[{"q": "..."}]' --version-id VERSION_ID

Flags

Flag	Type	Required	Description
`DATASET_ID`	string	yes	Positional argument
`--json`	string	mutex	JSON array of example objects
`--file, -f`	path	mutex	Data file (CSV, JSON, JSONL, Parquet)
`--version-id`	string	no	Append to a specific version (default: latest)
`-o, --output`	string	no	Output format for the returned dataset metadata
`-p, --profile`	string	no	Configuration profile

Exactly one of

--json

--file

is required.

Validation

Each example must be a JSON object with at least one user-defined field
Fields
```
id
```
,
```
created_at
```
,
```
updated_at
```
are auto-generated -- do not include them
Maximum 100,000 examples per request

Delete Dataset:

ax datasets delete

bash

ax datasets delete DATASET_ID
ax datasets delete DATASET_ID --force   # skip confirmation prompt

Flags

Flag	Type	Default	Description
`DATASET_ID`	string	required	Positional argument
`--force, -f`	bool	false	Skip confirmation prompt
`-p, --profile`	string	default	Configuration profile

Workflows

Create a dataset from file for evaluation

Prepare a CSV/JSON/Parquet file with your evaluation columns (e.g.,
```
input
```
,
```
expected_output
```
)

ax datasets create --name "eval-set-v1" --space-id SPACE_ID --file eval_data.csv

Verify:
```
ax datasets get DATASET_ID
```
Use the dataset ID to run experiments

Add examples to an existing dataset

bash

# Find the dataset
ax datasets list

# Append inline (e.g., from an LLM-generated payload)
ax datasets append DATASET_ID --json '[
  {"question": "What is gravity?", "answer": "A fundamental force..."},
  {"question": "What is light?", "answer": "Electromagnetic radiation..."}
]'

# Or append from a file
ax datasets append DATASET_ID --file additional_examples.csv

Download dataset for offline analysis

```
ax datasets list
```
-- find the dataset
```
ax datasets export DATASET_ID
```
-- download to file

Parse the JSON:

jq '.[] | .question' dataset_*/examples.json

Export a specific version

bash

# List versions
ax datasets get DATASET_ID -o json | jq '.versions'

# Export that version
ax datasets export DATASET_ID --version-id VERSION_ID

Iterate on a dataset

Export current version:
```
ax datasets export DATASET_ID
```
Modify the examples locally

Append new rows:

ax datasets append DATASET_ID --file new_rows.csv

Or create a fresh version:

ax datasets create --name "eval-set-v2" --space-id SPACE_ID --file updated_data.json

Pipe export to other tools

bash

# Count examples
ax datasets export DATASET_ID --stdout | jq 'length'

# Extract a single field
ax datasets export DATASET_ID --stdout | jq '.[].question'

# Convert to CSV with jq
ax datasets export DATASET_ID --stdout | jq -r '.[] | [.question, .answer] | @csv'

Dataset Example Schema

Examples are free-form JSON objects. There is no fixed schema -- columns are whatever fields you provide. System-managed fields are added by the server:

Field	Type	Managed by	Notes
`id`	string	server	Auto-generated UUID. Required on update, forbidden on create/append
`created_at`	datetime	server	Immutable creation timestamp
`updated_at`	datetime	server	Auto-updated on modification
(any user field)	any JSON type	user	String, number, boolean, null, nested object, array

Troubleshooting

Problem	Solution
`ax: command not found`	Check `~/.local/bin/ax` ; if missing: `uv tool install arize-ax-cli` (requires shell access to install packages)
`401 Unauthorized`	API key may not have access to this space. Verify the key and space ID are correct. Keys are scoped per space -- get the right one from https://app.arize.com/admin > API Keys.
`No profile found`	Run `ax profiles show --expand` to check; set `ARIZE_API_KEY` env var or write `~/.arize/config.toml`
`Dataset not found`	Verify dataset ID with `ax datasets list`
`File format error`	Supported: CSV, JSON, JSONL, Parquet
`platform-managed column`	Remove `id` , `created_at` , `updated_at` from create/append payloads
`reserved column`	Remove `time` , `count` , or any `source_record_*` field
`Provide either --json or --file`	Append requires exactly one input source
`Examples array is empty`	Ensure your JSON array or file contains at least one example
`not a JSON object`	Each element in the `--json` array must be a `{...}` object, not a string or number

Save Credentials for Future Use

At the end of the session, if the user manually provided any of the following during this conversation (via AskQuestion response, pasted text, or inline values) and those values were NOT already loaded from a saved profile or environment variable, offer to save them for future use.

Credential	Where it gets saved
API key	`ax` profile at `~/.arize/config.toml`
Space ID	macOS/Linux: shell config ( `~/.zshrc` or `~/.bashrc` ) as `export ARIZE_SPACE_ID="..."` . Windows: user environment variable via `[System.Environment]::SetEnvironmentVariable('ARIZE_SPACE_ID', '...', 'User')`

Skip this entirely if:

The API key was already loaded from an existing profile or
```
ARIZE_API_KEY
```
env var
The space ID was already set via
```
ARIZE_SPACE_ID
```
env var
The user only used base64 project IDs (no space ID was needed)

How to offer: Use AskQuestion: "Would you like to save your Arize credentials so you don't have to enter them next time?" with options

"Yes, save them"

"No thanks"

If the user says yes:

API key — Check if

~/.arize/config.toml

exists. If it does, read it and update the

[auth]

section. If not, create it with this minimal content:

toml

[profile]
name = "default"

[auth]
api_key = "THE_API_KEY"

[output]
format = "table"

Verify with:

ax profiles show

Space ID — Persist the space ID as an environment variable:
macOS/Linux — Detect the user's shell config file (
```
~/.zshrc
```
for zsh,
```
~/.bashrc
```
for bash). Append:
bash
```
export ARIZE_SPACE_ID="THE_SPACE_ID"
```
Tell the user to run
```
source ~/.zshrc
```
(or restart their terminal) for it to take effect.
Windows (PowerShell) — Set a persistent user environment variable:
powershell
```
[System.Environment]::SetEnvironmentVariable('ARIZE_SPACE_ID', 'THE_SPACE_ID', 'User')
```
Tell the user to restart their terminal for it to take effect.

arize-dataset

NPX Install

Tags

SKILL.md Content

Arize Dataset Skill

Concepts

Prerequisites

Install ax

Verify environment

Space ID and Project

List Datasets:
`ax datasets list`

Flags

Get Dataset:
`ax datasets get`

Flags

Response fields

Export Dataset:
`ax datasets export`

Flags

REST vs Flight (
`--all`
)

Create Dataset:
`ax datasets create`

Flags

Supported file formats

Append Examples:
`ax datasets append`

Inline JSON (agent-friendly)

From a file

To a specific version

Flags

Validation

Delete Dataset:
`ax datasets delete`

Flags

Workflows

Create a dataset from file for evaluation

Add examples to an existing dataset

Download dataset for offline analysis

Export a specific version

Iterate on a dataset

Pipe export to other tools

Dataset Example Schema

Troubleshooting

Save Credentials for Future Use

arize-dataset

NPX Install

Tags

SKILL.md Content

Arize Dataset Skill

Concepts

Prerequisites

Install ax

Verify environment

Space ID and Project

List Datasets: ax datasets list

Flags

Get Dataset: ax datasets get

Flags

Response fields

Export Dataset: ax datasets export

Flags

REST vs Flight (--all)

Create Dataset: ax datasets create

Flags

Supported file formats

Append Examples: ax datasets append

Inline JSON (agent-friendly)

From a file

To a specific version

Flags

Validation

Delete Dataset: ax datasets delete

Flags

Workflows

Create a dataset from file for evaluation

Add examples to an existing dataset

Download dataset for offline analysis

Export a specific version

Iterate on a dataset

Pipe export to other tools

Dataset Example Schema

Troubleshooting

Save Credentials for Future Use

List Datasets:
`ax datasets list`

Get Dataset:
`ax datasets get`

Export Dataset:
`ax datasets export`

REST vs Flight (
`--all`
)

Create Dataset:
`ax datasets create`

Append Examples:
`ax datasets append`

Delete Dataset:
`ax datasets delete`