nemo-gym-reward-profiling

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Nemo Gym Reward Profiling

Nemo Gym奖励分析

Invocation Check

调用检查

Use this skill when the user wants to run, understand, or lightly modify Nemo Gym reward profiling. Keep the answer oriented around the normal workflow:
ng_run
starts model/resource servers,
ng_collect_rollouts
writes rollout artifacts, and
ng_reward_profile
generates profiling output from those artifacts.
If the user is primarily debugging a failed job or stack trace, use the
nemo-gym-debugging
skill first.
当用户想要运行、理解或轻微修改Nemo Gym奖励分析时使用本技能。回答需围绕常规工作流展开:
ng_run
启动模型/资源服务器,
ng_collect_rollouts
写入rollout工件,
ng_reward_profile
根据这些工件生成分析输出。
如果用户主要是调试失败的作业或堆栈跟踪,请优先使用
nemo-gym-debugging
技能。

Basic Workflow

基础工作流

  1. Identify the environment config paths and input JSONL.
  2. Start Gym servers with
    ng_run
    .
  3. Collect rollouts with
    ng_collect_rollouts
    ; this writes
    rollouts.jsonl
    and
    *_materialized_inputs.jsonl
    .
  4. Run
    ng_reward_profile
    on the materialized inputs and rollout JSONL to generate
    *_reward_profiling.jsonl
    .
  5. Inspect line counts and profile rows.
Repeated rollouts are the main profiling lever.
num_repeats=1
is valid, but per-task averages and variance are only meaningful with multiple rollouts per task.
  1. 确定环境配置路径和输入JSONL文件。
  2. 使用
    ng_run
    启动Gym服务器。
  3. 使用
    ng_collect_rollouts
    收集rollout;此步骤会生成
    rollouts.jsonl
    *_materialized_inputs.jsonl
    文件。
  4. 对实例化输入和rollout JSONL运行
    ng_reward_profile
    ,生成
    *_reward_profiling.jsonl
    文件。
  5. 检查行数和分析行内容。
重复rollout是主要的分析手段。
num_repeats=1
是有效的,但每个任务的平均值和方差只有在每个任务进行多次rollout时才有意义。

Core Concepts

核心概念

  • *_materialized_inputs.jsonl
    : expanded collection inputs after repeat expansion, agent defaults, and task/rollout id assignment.
  • rollouts.jsonl
    : one completed rollout/result per materialized input row.
  • *_reward_profiling.jsonl
    : one summarized profile row per original task with at least one completed rollout.
  • _ng_task_index
    : original task/sample id.
  • _ng_rollout_index
    : repeated rollout id for that task.
  • rollout_infos
    : compact per-rollout info inside each task profile row, including reward, token usage, and numeric rollout metrics when available.
Keep reward-to-length or reward-to-token analysis keyed by both
_ng_task_index
and
_ng_rollout_index
.
  • *_materialized_inputs.jsonl
    :经过重复扩展、Agent默认值设置以及任务/rollout ID分配后的扩展集合输入。
  • rollouts.jsonl
    :每个实例化输入行对应一个已完成的rollout/结果。
  • *_reward_profiling.jsonl
    :每个至少完成一次rollout的原始任务对应一条汇总分析行。
  • _ng_task_index
    :原始任务/样本ID。
  • _ng_rollout_index
    :该任务的重复rollout ID。
  • rollout_infos
    :每个任务分析行内的紧凑rollout信息,包括奖励、令牌使用情况,以及可用时的数值rollout指标。
奖励与长度或奖励与令牌的分析需同时关联
_ng_task_index
_ng_rollout_index

Reference Loading

参考资料加载

Load references only when the user needs that detail:
  • Read
    references/quick-start.md
    for a generic command template and the minimal run sequence.
  • Read
    references/output-format.md
    to explain materialized inputs, rollout JSONL, reward profile rows,
    rollout_infos
    , and partial profiling.
仅当用户需要详细信息时才加载参考资料:
  • 阅读
    references/quick-start.md
    获取通用命令模板和最小运行序列。
  • 阅读
    references/output-format.md
    了解实例化输入、rollout JSONL、奖励分析行、
    rollout_infos
    和部分分析的相关说明。

Practical Defaults

实用默认设置

  • Treat
    ng_reward_profile
    as the reward profiling step; rollout collection does not write reward profile files.
  • Run strict profiling by default. If rollout collection stopped early, use
    ++allow_partial_rollouts=True
    to profile completed rollouts and drop original input rows with no completed rollout.
  • Trust the target checkout's CLI help and
    nemo_gym/reward_profile.py
    over memory if flags differ.
  • ng_reward_profile
    视为奖励分析步骤;rollout收集不会生成奖励分析文件。
  • 默认运行严格分析。如果rollout收集提前终止,使用
    ++allow_partial_rollouts=True
    来分析已完成的rollout,并丢弃没有完成rollout的原始输入行。
  • 如果命令行参数存在差异,请以目标代码库的CLI帮助和
    nemo_gym/reward_profile.py
    为准,而非记忆内容。