nemo-gym-reward-profiling
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNemo Gym Reward Profiling
Nemo Gym奖励分析
Invocation Check
调用检查
Use this skill when the user wants to run, understand, or lightly modify Nemo Gym reward profiling. Keep the answer oriented around the normal workflow:
ng_runng_collect_rolloutsng_reward_profileIf the user is primarily debugging a failed job or stack trace, use the skill first.
nemo-gym-debugging当用户想要运行、理解或轻微修改Nemo Gym奖励分析时使用本技能。回答需围绕常规工作流展开:
ng_runng_collect_rolloutsng_reward_profile如果用户主要是调试失败的作业或堆栈跟踪,请优先使用技能。
nemo-gym-debuggingBasic Workflow
基础工作流
- Identify the environment config paths and input JSONL.
- Start Gym servers with .
ng_run - Collect rollouts with ; this writes
ng_collect_rolloutsandrollouts.jsonl.*_materialized_inputs.jsonl - Run on the materialized inputs and rollout JSONL to generate
ng_reward_profile.*_reward_profiling.jsonl - Inspect line counts and profile rows.
Repeated rollouts are the main profiling lever. is valid, but per-task averages and variance are only meaningful with multiple rollouts per task.
num_repeats=1- 确定环境配置路径和输入JSONL文件。
- 使用启动Gym服务器。
ng_run - 使用收集rollout;此步骤会生成
ng_collect_rollouts和rollouts.jsonl文件。*_materialized_inputs.jsonl - 对实例化输入和rollout JSONL运行,生成
ng_reward_profile文件。*_reward_profiling.jsonl - 检查行数和分析行内容。
重复rollout是主要的分析手段。是有效的,但每个任务的平均值和方差只有在每个任务进行多次rollout时才有意义。
num_repeats=1Core Concepts
核心概念
- : expanded collection inputs after repeat expansion, agent defaults, and task/rollout id assignment.
*_materialized_inputs.jsonl - : one completed rollout/result per materialized input row.
rollouts.jsonl - : one summarized profile row per original task with at least one completed rollout.
*_reward_profiling.jsonl - : original task/sample id.
_ng_task_index - : repeated rollout id for that task.
_ng_rollout_index - : compact per-rollout info inside each task profile row, including reward, token usage, and numeric rollout metrics when available.
rollout_infos
Keep reward-to-length or reward-to-token analysis keyed by both and .
_ng_task_index_ng_rollout_index- :经过重复扩展、Agent默认值设置以及任务/rollout ID分配后的扩展集合输入。
*_materialized_inputs.jsonl - :每个实例化输入行对应一个已完成的rollout/结果。
rollouts.jsonl - :每个至少完成一次rollout的原始任务对应一条汇总分析行。
*_reward_profiling.jsonl - :原始任务/样本ID。
_ng_task_index - :该任务的重复rollout ID。
_ng_rollout_index - :每个任务分析行内的紧凑rollout信息,包括奖励、令牌使用情况,以及可用时的数值rollout指标。
rollout_infos
奖励与长度或奖励与令牌的分析需同时关联和。
_ng_task_index_ng_rollout_indexReference Loading
参考资料加载
Load references only when the user needs that detail:
- Read for a generic command template and the minimal run sequence.
references/quick-start.md - Read to explain materialized inputs, rollout JSONL, reward profile rows,
references/output-format.md, and partial profiling.rollout_infos
仅当用户需要详细信息时才加载参考资料:
- 阅读获取通用命令模板和最小运行序列。
references/quick-start.md - 阅读了解实例化输入、rollout JSONL、奖励分析行、
references/output-format.md和部分分析的相关说明。rollout_infos
Practical Defaults
实用默认设置
- Treat as the reward profiling step; rollout collection does not write reward profile files.
ng_reward_profile - Run strict profiling by default. If rollout collection stopped early, use to profile completed rollouts and drop original input rows with no completed rollout.
++allow_partial_rollouts=True - Trust the target checkout's CLI help and over memory if flags differ.
nemo_gym/reward_profile.py
- 将视为奖励分析步骤;rollout收集不会生成奖励分析文件。
ng_reward_profile - 默认运行严格分析。如果rollout收集提前终止,使用来分析已完成的rollout,并丢弃没有完成rollout的原始输入行。
++allow_partial_rollouts=True - 如果命令行参数存在差异,请以目标代码库的CLI帮助和为准,而非记忆内容。
nemo_gym/reward_profile.py