launching-evals

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

NeMo Evaluator Skill

NeMo Evaluator 技能

Quick Reference

快速参考

nemo-evaluator-launcher CLI

nemo-evaluator-launcher 命令行工具

bash

undefined

bash

undefined

Run evaluation

运行评估

uv run nemo-evaluator-launcher run --config <path.yaml> uv run nemo-evaluator-launcher run --config <path.yaml> -t <a_single_task_to_be_run_by_name> uv run nemo-evaluator-launcher run --config <path.yaml> -t <task_name_1> -t <task_name_2> ... uv run nemo-evaluator-launcher run --config <path.yaml> -o evaluation.nemo_evaluator_config.config.params.limit_samples=10 ...

Preview the resolved config and the sbatch script without running the evaluation

预览解析后的配置和sbatch脚本，不实际运行评估

uv run nemo-evaluator-launcher run --config <path.yaml> --dry-run

Check status (--json for machine-readable output)

检查状态（--json 参数用于生成机器可读的输出）

uv run nemo-evaluator-launcher status <invocation_id> --json

Get evaluation run info (output paths, slurm job IDs, cluster hostname, etc.)

获取评估运行信息（输出路径、Slurm作业ID、集群主机名等）

uv run nemo-evaluator-launcher info <invocation_id>

Copy just the logs (quick — good for debugging)

仅复制日志（快速操作，适合调试）

uv run nemo-evaluator-launcher info <invocation_id> --copy-logs ./evaluation-results/

For artifacts: use

nel info

to discover paths. If remote, SSH to explore and rsync what you need.

关于工件：使用

nel info

命令查找路径。如果是远程路径，通过SSH探索并使用rsync复制所需内容。

If local, just read directly from the paths shown by

nel info

如果是本地路径，直接从

nel info

显示的路径读取即可。

ssh <user>@<hostname> "ls <artifacts_path>/"

rsync -avzP <user>@<hostname>:<artifacts_path>/{results.yml,eval_factory_metrics.json,config.yml} ./evaluation-results/<invocation_id>.<job_index>/artifacts/

Resume a failed/interrupted run (re-sbatches existing run.sub in the original run directory)

恢复失败/中断的运行任务（重新提交原始运行目录中的run.sub脚本）

uv run nemo-evaluator-launcher resume <invocation_id>

List past runs

列出过往运行任务

uv run nemo-evaluator-launcher ls runs --since 1d

List available evaluation tasks (by default, only shows tasks from the latest released containers)

列出可用的评估任务（默认仅显示最新发布容器中的任务）

uv run nemo-evaluator-launcher ls tasks uv run nemo-evaluator-launcher ls tasks --from_container nvcr.io/nvidia/eval-factory/simple-evals:26.03

undefined

uv run nemo-evaluator-launcher ls tasks uv run nemo-evaluator-launcher ls tasks --from_container nvcr.io/nvidia/eval-factory/simple-evals:26.03

undefined

Workflow

工作流程

The complete evaluation workflow is divided into the following steps you should follow IN ORDER.

Create or modify a config using the
```
nel-assistant
```
skill. If the user provides a past run, use its
```
config.yml
```
artifact as a starting point.
Run the evaluation. See
```
references/run-evaluation.md
```
when executing this step.
Monitor progress (MANDATORY after every
nel run
): poll status repeatedly until SUCCESS/FAILED. See
```
references/check-progress.md
```
.
Post-run actions (when terminal state reached):
1. When the evaluation status is
```
SUCCESS
```
  , analyze the results. See
```
references/analyze-results.md
```
  when executing this step.
2. When the evaluation status is
```
FAILED
```
  , debug the failed run. See
```
references/debug-failed-runs.md
```
  when executing this step.

完整的评估工作流程分为以下步骤，请按顺序执行。

使用
```
nel-assistant
```
技能创建或修改配置。如果用户提供过往的运行任务，可以将其
```
config.yml
```
工件作为起点。
运行评估。执行此步骤时，请参考
```
references/run-evaluation.md
```
文档。
监控进度（每次执行
nel run
后必须执行）：反复轮询状态，直到任务显示SUCCESS/FAILED。请参考
```
references/check-progress.md
```
文档。
运行后操作（当任务进入终端状态时）：
1. 当评估状态为
```
SUCCESS
```
  时，分析结果。执行此步骤时，请参考
```
references/analyze-results.md
```
  文档。
2. 当评估状态为
```
FAILED
```
  时，调试失败的运行任务。执行此步骤时，请参考
```
references/debug-failed-runs.md
```
  文档。

Key Facts

关键要点

Benchmark-specific info learned during launching/analyzing evals should be added to
```
references/benchmarks/
```
PPP = Slurm account (the
```
account
```
field in cluster_config.yaml). When the user says "change PPP to X", update the account value (e.g.,
```
coreai_dlalgo_compeval
```
→
```
coreai_dlalgo_llm
```
).
Slurm job pairs: NEL (nemo-evaluator-launcher) submits paired Slurm jobs — a RUNNING job + a PENDING restart job (for when the 4h walltime expires). Never cancel the pending restart jobs — they are expected and necessary.
HF cache requirement: For configs with
```
HF_HUB_OFFLINE=1
```
, models must be pre-downloaded to the HF cache on each cluster before launching. Before running a model on a new cluster, always ask the user if the model is already cached there. If not, on the cluster login node:
```
python3 -m venv hf_cli && source hf_cli/bin/activate && pip install huggingface_hub
```
then
```
HF_HOME=/lustre/fsw/portfolios/coreai/users/<username>/cache/huggingface hf download <model>
```
. Without this, vLLM will fail with
```
LocalEntryNotFoundError
```
.
data_parallel_size
is per node:
```
dp_size=1
```
with
```
num_nodes=8
```
means 8 model instances total (one per node), load-balanced by haproxy. Do NOT interpret
```
dp_size
```
as the global replica count.
payload_modifier
interceptor: The
```
params_to_remove
```
list (e.g.
```
[max_tokens, max_completion_tokens]
```
) strips those fields from the outgoing payload, intentionally lifting output length limits so reasoning models can think as long as they need.

Auto-export git workaround: The export container (

python:3.12-slim

) lacks

git

. When installing the launcher from a git URL, set

auto_export.launcher_install_cmd

to install git first (e.g.,

apt-get update -qq && apt-get install -qq -y git && pip install "nemo-evaluator-launcher[all] @ git+...#subdirectory=packages/nemo-evaluator-launcher"

Do NOT use
nemo-evaluator-launcher export --dest local
— it only writes a summary JSON (
```
processed_results.json
```
), it does NOT copy actual logs or artifacts despite accepting
```
--copy_logs
```
and
```
--copy-artifacts
```
flags.
```
nel info --copy-artifacts
```
works but copies everything (very slow for large benchmarks). Preferred approach: use
```
nel info
```
to discover paths — if local, read directly; if remote, SSH to explore and rsync only what you need. Note that
```
nel info
```
prints standard artifacts but benchmarks produce additional artifacts in subdirs — explore to find them.

在启动/分析评估过程中获得的特定基准测试信息应添加至
```
references/benchmarks/
```
目录
PPP = Slurm账户（对应cluster_config.yaml中的
```
account
```
字段）。当用户要求“将PPP更改为X”时，更新账户值（例如：
```
coreai_dlalgo_compeval
```
→
```
coreai_dlalgo_llm
```
）。
Slurm作业对：NEL（nemo-evaluator-launcher）会提交成对的Slurm作业——一个RUNNING状态的作业 + 一个PENDING状态的重启作业（用于4小时 walltime 到期时）。请勿取消处于pending状态的重启作业——它们是预期且必要的。
HF缓存要求：对于设置了
```
HF_HUB_OFFLINE=1
```
的配置，模型必须在启动前预先下载到每个集群的HF缓存中。在新集群上运行模型前，务必询问用户模型是否已缓存。 如果未缓存，在集群登录节点执行：
```
python3 -m venv hf_cli && source hf_cli/bin/activate && pip install huggingface_hub
```
，然后执行
```
HF_HOME=/lustre/fsw/portfolios/coreai/users/<username>/cache/huggingface hf download <model>
```
。如果不执行此操作，vLLM会抛出
```
LocalEntryNotFoundError
```
错误。
data_parallel_size
为单节点配置：
```
dp_size=1
```
且
```
num_nodes=8
```
意味着总共有8个模型实例（每个节点一个），由haproxy进行负载均衡。请勿将
```
dp_size
```
理解为全局副本数量。
payload_modifier
拦截器：
```
params_to_remove
```
列表（例如
```
[max_tokens, max_completion_tokens]
```
）会从输出负载中移除这些字段，有意解除输出长度限制，以便推理模型可以不受限制地进行思考。

自动导出Git临时解决方案：导出容器（

python:3.12-slim

）未安装

git

。从Git URL安装启动器时，需设置

auto_export.launcher_install_cmd

先安装git（例如：

apt-get update -qq && apt-get install -qq -y git && pip install "nemo-evaluator-launcher[all] @ git+...#subdirectory=packages/nemo-evaluator-launcher"

）。

请勿使用
nemo-evaluator-launcher export --dest local
命令——该命令仅会写入一个摘要JSON文件（
```
processed_results.json
```
），尽管它接受
```
--copy_logs
```
和
```
--copy-artifacts
```
参数，但并不会复制实际的日志或工件。
```
nel info --copy-artifacts
```
命令可以工作，但会复制所有内容（对于大型基准测试来说非常缓慢）。推荐方法：使用
```
nel info
```
命令查找路径——如果是本地路径，直接读取；如果是远程路径，通过SSH探索并使用rsync仅复制所需内容。请注意，
```
nel info
```
会打印标准工件，但基准测试会在子目录中生成额外工件，需要自行探索查找。

launching-evals

Original

Translation

NeMo Evaluator Skill

NeMo Evaluator 技能

Quick Reference

快速参考

nemo-evaluator-launcher CLI

nemo-evaluator-launcher 命令行工具

Run evaluation

运行评估

Preview the resolved config and the sbatch script without running the evaluation

预览解析后的配置和sbatch脚本，不实际运行评估

Check status (--json for machine-readable output)

检查状态（--json 参数用于生成机器可读的输出）

Get evaluation run info (output paths, slurm job IDs, cluster hostname, etc.)

获取评估运行信息（输出路径、Slurm作业ID、集群主机名等）

Copy just the logs (quick — good for debugging)

仅复制日志（快速操作，适合调试）

For artifacts: use
`nel info`
to discover paths. If remote, SSH to explore and rsync what you need.

关于工件：使用
`nel info`
命令查找路径。如果是远程路径，通过SSH探索并使用rsync复制所需内容。

If local, just read directly from the paths shown by
`nel info`
.

如果是本地路径，直接从
`nel info`
显示的路径读取即可。

ssh <user>@<hostname> "ls <artifacts_path>/"

ssh <user>@<hostname> "ls <artifacts_path>/"

rsync -avzP <user>@<hostname>:<artifacts_path>/{results.yml,eval_factory_metrics.json,config.yml} ./evaluation-results/<invocation_id>.<job_index>/artifacts/

rsync -avzP <user>@<hostname>:<artifacts_path>/{results.yml,eval_factory_metrics.json,config.yml} ./evaluation-results/<invocation_id>.<job_index>/artifacts/

Resume a failed/interrupted run (re-sbatches existing run.sub in the original run directory)

恢复失败/中断的运行任务（重新提交原始运行目录中的run.sub脚本）

List past runs

列出过往运行任务

List available evaluation tasks (by default, only shows tasks from the latest released containers)

列出可用的评估任务（默认仅显示最新发布容器中的任务）

Workflow

工作流程

Key Facts

关键要点

launching-evals

Original

Translation

NeMo Evaluator Skill

NeMo Evaluator 技能

Quick Reference

快速参考

nemo-evaluator-launcher CLI

nemo-evaluator-launcher 命令行工具

Run evaluation

运行评估

Preview the resolved config and the sbatch script without running the evaluation

预览解析后的配置和sbatch脚本，不实际运行评估

Check status (--json for machine-readable output)

检查状态（--json 参数用于生成机器可读的输出）

Get evaluation run info (output paths, slurm job IDs, cluster hostname, etc.)

获取评估运行信息（输出路径、Slurm作业ID、集群主机名等）

Copy just the logs (quick — good for debugging)

仅复制日志（快速操作，适合调试）

For artifacts: use nel info to discover paths. If remote, SSH to explore and rsync what you need.

关于工件：使用 nel info 命令查找路径。如果是远程路径，通过SSH探索并使用rsync复制所需内容。

If local, just read directly from the paths shown by nel info.

如果是本地路径，直接从 nel info 显示的路径读取即可。

ssh <user>@<hostname> "ls <artifacts_path>/"

ssh <user>@<hostname> "ls <artifacts_path>/"

rsync -avzP <user>@<hostname>:<artifacts_path>/{results.yml,eval_factory_metrics.json,config.yml} ./evaluation-results/<invocation_id>.<job_index>/artifacts/

rsync -avzP <user>@<hostname>:<artifacts_path>/{results.yml,eval_factory_metrics.json,config.yml} ./evaluation-results/<invocation_id>.<job_index>/artifacts/

Resume a failed/interrupted run (re-sbatches existing run.sub in the original run directory)

恢复失败/中断的运行任务（重新提交原始运行目录中的run.sub脚本）

List past runs

列出过往运行任务

List available evaluation tasks (by default, only shows tasks from the latest released containers)

列出可用的评估任务（默认仅显示最新发布容器中的任务）

Workflow

工作流程

Key Facts

关键要点

For artifacts: use
`nel info`
to discover paths. If remote, SSH to explore and rsync what you need.

关于工件：使用
`nel info`
命令查找路径。如果是远程路径，通过SSH探索并使用rsync复制所需内容。

If local, just read directly from the paths shown by
`nel info`
.

如果是本地路径，直接从
`nel info`
显示的路径读取即可。