Nemo Gym Pivot Datasets

Paper Reference

This skill operationalizes PivotRL: create local single-step pivot datasets from successful trajectories, prefer informative mixed-reward states, and train with verifier-based local rewards rather than exact trajectory imitation.

Invocation Check

Use this skill when the task is to turn existing agent trajectories or rollout artifacts into a Nemo Gym pivot dataset, or to validate whether a pivot JSONL/config pair can be used for single-step local RL or evaluation.

Before writing a converter, inspect representative source rows and the target resource server. Do not assume the source field names are the contract. Convert by reconstructing the semantic pieces needed by Gym's Responses-style row format.

Core Workflow

Inspect the source data shape and count the candidate assistant decision points.
Identify the semantic fields needed for each pivot:

model-call input context before the pivot action
available tools at that decision point
expected assistant action
reward/verifier target if it is separate from the demonstrated action
optional provenance such as task id, source trajectory id, rollout id, uuid, depth, and original metadata

Convert each accepted decision point into one pivot JSONL row.
Generate or update the matching Gym config so the pivot-format JSONL can be used directly.
Validate with the bundled validator and, when available, the target Gym resource-server models.
Write metrics that make skipped rows, action types, tool names, depth, and provenance coverage easy to inspect.

Row Shape

Read references/row-contract.md when implementing or reviewing a converter. For

single_step_tool_use_with_argument_comparison

, the essential row fields are:

```
responses_create_params
```
: Responses API-style input and tool specs for the model call.
```
expected_action
```
: one
```
function_call
```
or one
```
message
```
.
```
agent_ref
```
: row-level agent routing that matches the generated config.

Do not copy optional null fields into

responses_create_params

; omit them unless the target contract explicitly wants them.

expected_action

is singular. If a source assistant turn has more than one tool call, filter that turn out of the pivot dataset and keep it only in a skipped-row audit if it needs review.

Conversion Patterns

Read references/conversion-patterns.md when the source data is not already in pivot shape. The rule is to normalize by meaning, not by source container.

Useful reference scripts live under

scripts/reference/

. They are copied from real conversions and may contain dataset-specific paths, assumptions, or older branch behavior, so treat them as examples to borrow from rather than canonical commands to run unchanged:

```
generic_pivot_dataset_reference.py
```
: generic source rows to pivot rows.

chat_messages_to_pivot_dataset_reference.py

: chat-completion messages to pivot rows.

```
conversational_messages_to_pivot_dataset_reference.py
```
: conversational message trajectories to pivot rows with reasoning/provenance handling.
```
tool_messages_to_pivot_dataset_reference.py
```
: message/tool-use style rows to pivot rows.

Pivot Selection

Use clean, positive source trajectories for the demonstrated pivots. When multiple source trajectories exist for a task, prefer tasks whose source trajectory group has mixed rewards instead of all success or all failure; this avoids spending data on tasks that were trivial or impossible for the source model. Treat that source-task filter as preferred, not mandatory, because the source model and downstream policy may have different capabilities.

When possible, profile candidate pivots with local on-policy rollouts from the downstream or initial policy. Use at least 8 sampled local rollouts per candidate as the default. Keep candidates with mixed local rewards, discard all-1 and all-0 reward groups, and if data is abundant, drop the easiest/high-pass-rate pivots first so training concentrates on hard but learnable states.

Config And Training

Read references/config-training-and-agent-ref.md when creating the Gym YAML or explaining how to train/evaluate from the dataset.

Key points:

The pivot JSONL is the training/eval dataset; point the config's train dataset entry directly at it.
```
agent_ref.name
```
in each row must match the agent block used by the config unless the launcher overrides routing intentionally.
```
word_count_similarity_threshold
```
is the main string-argument matching knob for the single-step tool-use verifier.
Use
```
tool_choice: "auto"
```
for these rows;
```
tool_choice: "required"
```
can route some inference engines into structured decoding paths.
Validate configs and datasets together; a valid JSONL file can still be unusable if the agent/resource-server names do not line up.

Validation

Run the bundled validator before calling a pivot dataset done:

bash

python scripts/validate_pivot_dataset.py --path /path/to/pivot.jsonl --agent-ref expected_agent_name

When the Gym repo is available, also validate against the resource-server Pydantic models:

bash

python scripts/validate_pivot_dataset.py \
  --path /path/to/pivot.jsonl \
  --agent-ref expected_agent_name \
  --gym-repo /path/to/Gym-github

Use

--require-field

and

--require-any-field

only when a dataset-specific workflow needs extra provenance checks. Provenance is useful for debugging and filtering, but it is not required by the resource-server request model.

The validator accepts both supported expected-action types by default (

function_call

and

message

) and prints an end summary split between tool-call and message pivots.

nemo-gym-pivot-datasets

NPX Install

Tags

SKILL.md Content