nemo-gym-reward-profiling

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Nemo Gym Reward Profiling

Nemo Gym奖励分析

Invocation Check

调用检查

Use this skill when the user wants to run, understand, or lightly modify Nemo Gym reward profiling. Keep the answer oriented around the normal workflow:

ng_run

starts model/resource servers,

ng_collect_rollouts

writes rollout artifacts, and

ng_reward_profile

generates profiling output from those artifacts.

If the user is primarily debugging a failed job or stack trace, use the

nemo-gym-debugging

skill first.

当用户想要运行、理解或轻微修改Nemo Gym奖励分析时使用本技能。回答需围绕常规工作流展开：

ng_run

启动模型/资源服务器，

ng_collect_rollouts

写入rollout工件，

ng_reward_profile

根据这些工件生成分析输出。

如果用户主要是调试失败的作业或堆栈跟踪，请优先使用

nemo-gym-debugging

技能。

Basic Workflow

基础工作流

Identify the environment config paths and input JSONL.
Start Gym servers with
```
ng_run
```
.

Collect rollouts with

ng_collect_rollouts

; this writes

rollouts.jsonl

and

*_materialized_inputs.jsonl

Run
```
ng_reward_profile
```
on the materialized inputs and rollout JSONL to generate
```
*_reward_profiling.jsonl
```
.
Inspect line counts and profile rows.

Repeated rollouts are the main profiling lever.

num_repeats=1

is valid, but per-task averages and variance are only meaningful with multiple rollouts per task.

确定环境配置路径和输入JSONL文件。
使用
```
ng_run
```
启动Gym服务器。

使用

ng_collect_rollouts

收集rollout；此步骤会生成

rollouts.jsonl

和

*_materialized_inputs.jsonl

文件。

对实例化输入和rollout JSONL运行
```
ng_reward_profile
```
，生成
```
*_reward_profiling.jsonl
```
文件。
检查行数和分析行内容。

重复rollout是主要的分析手段。

num_repeats=1

是有效的，但每个任务的平均值和方差只有在每个任务进行多次rollout时才有意义。

Core Concepts

核心概念

```
*_materialized_inputs.jsonl
```
: expanded collection inputs after repeat expansion, agent defaults, and task/rollout id assignment.
```
rollouts.jsonl
```
: one completed rollout/result per materialized input row.
```
*_reward_profiling.jsonl
```
: one summarized profile row per original task with at least one completed rollout.
```
_ng_task_index
```
: original task/sample id.
```
_ng_rollout_index
```
: repeated rollout id for that task.
```
rollout_infos
```
: compact per-rollout info inside each task profile row, including reward, token usage, and numeric rollout metrics when available.

Keep reward-to-length or reward-to-token analysis keyed by both

_ng_task_index

and

_ng_rollout_index

```
*_materialized_inputs.jsonl
```
：经过重复扩展、Agent默认值设置以及任务/rollout ID分配后的扩展集合输入。
```
rollouts.jsonl
```
：每个实例化输入行对应一个已完成的rollout/结果。
```
*_reward_profiling.jsonl
```
：每个至少完成一次rollout的原始任务对应一条汇总分析行。
```
_ng_task_index
```
：原始任务/样本ID。
```
_ng_rollout_index
```
：该任务的重复rollout ID。
```
rollout_infos
```
：每个任务分析行内的紧凑rollout信息，包括奖励、令牌使用情况，以及可用时的数值rollout指标。

奖励与长度或奖励与令牌的分析需同时关联

_ng_task_index

和

_ng_rollout_index

。

Reference Loading

参考资料加载

Load references only when the user needs that detail:

Read
```
references/quick-start.md
```
for a generic command template and the minimal run sequence.
Read
```
references/output-format.md
```
to explain materialized inputs, rollout JSONL, reward profile rows,
```
rollout_infos
```
, and partial profiling.

仅当用户需要详细信息时才加载参考资料：

阅读
```
references/quick-start.md
```
获取通用命令模板和最小运行序列。
阅读
```
references/output-format.md
```
了解实例化输入、rollout JSONL、奖励分析行、
```
rollout_infos
```
和部分分析的相关说明。

Practical Defaults

实用默认设置

Treat
```
ng_reward_profile
```
as the reward profiling step; rollout collection does not write reward profile files.
Run strict profiling by default. If rollout collection stopped early, use
```
++allow_partial_rollouts=True
```
to profile completed rollouts and drop original input rows with no completed rollout.
Trust the target checkout's CLI help and
```
nemo_gym/reward_profile.py
```
over memory if flags differ.

将
```
ng_reward_profile
```
视为奖励分析步骤；rollout收集不会生成奖励分析文件。
默认运行严格分析。如果rollout收集提前终止，使用
```
++allow_partial_rollouts=True
```
来分析已完成的rollout，并丢弃没有完成rollout的原始输入行。
如果命令行参数存在差异，请以目标代码库的CLI帮助和
```
nemo_gym/reward_profile.py
```
为准，而非记忆内容。