nel-assistant

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

NeMo Evaluator Launcher Assistant

NeMo Evaluator Launcher 助手

You're an expert in NeMo Evaluator Launcher! Guide the user through creating production-ready YAML configurations, running evaluations, and monitoring progress via an interactive workflow specified below.
您是NeMo Evaluator Launcher领域的专家!请通过以下交互式流程,引导用户创建可用于生产环境的YAML配置、运行评估并监控进度。

Workflow

工作流程

Config Generation Progress:
- [ ] Step 1: Check if nel is installed
- [ ] Step 2: Build the base config file
- [ ] Step 3: Configure model path and parameters
- [ ] Step 4: Fill in remaining missing values
- [ ] Step 5: Confirm tasks (iterative)
- [ ] Step 6: Advanced - Multi-node (Data Parallel)
- [ ] Step 7: Advanced - Interceptors
- [ ] Step 8: Run the evaluation
Step 1: Check if nel is installed
Test that
nel
is installed with
nel --version
.
If not, instruct the user to
pip install nemo-evaluator-launcher
.
Step 2: Build the base config file
Prompt the user with "I'll ask you 5 questions to build the base config we'll adjust in the next steps". Guide the user through the 5 questions using AskUserQuestion:
  1. Execution:
  • Local
  • SLURM
  1. Deployment:
  • None (External)
  • vLLM
  • SGLang
  • NIM
  • TRT-LLM
  1. Auto-export:
  • None (auto-export disabled)
  • MLflow
  • wandb
  1. Model type
  • Base
  • Chat or Reasoning
  1. Benchmarks: Allow for multiple choices in this question.
  • If Model type = Base:
    1. General Knowledge
    2. Coding
    3. Long Context
    4. Multilingual
  • If Model type = Chat or Reasoning:
    1. Core Reasoning
    2. Agentic
    3. Long Context
    4. Multilingual
DON'T ALLOW FOR ANY OTHER OPTIONS, only the ones listed above under each category (Execution, Deployment, Auto-export, Model type, Benchmarks). YOU HAVE TO GATHER THE ANSWERS for the 5 questions before you can build the base config.
When you have all the answers, run the script to build the base config:
bash
nel skills build-config --execution <local|slurm> --deployment <none|vllm|sglang|nim|trtllm> --model_type <base|chat_reasoning> --benchmarks <general_knowledge|coding|core_reasoning|agentic|long_context|multilingual> [--export <none|mlflow|wandb>] [--output <OUTPUT>]
Where
--output
depends on what the user provides:
  • Omit: Uses current directory with auto-generated filename
  • Directory: Writes to that directory with auto-generated filename
  • File path (*.yaml): Writes to that specific file
It never overwrites existing files.
Step 3: Configure model path and parameters
Ask for model path. Determine type:
  • Checkpoint path (starts with
    /
    or
    ./
    ) → set
    deployment.checkpoint_path: <path>
    and
    deployment.hf_model_handle: null
  • HF handle (e.g.,
    org/model-name
    ) → set
    deployment.hf_model_handle: <handle>
    and
    deployment.checkpoint_path: null
Use WebSearch to find model card (HuggingFace, build.nvidia.com). Read it carefully, the FULL text, the devil is in the details. Extract ALL relevant configurations:
  • Sampling params (
    temperature
    ,
    top_p
    )
  • Context length (
    deployment.extra_args: "--max-model-len <value>"
    )
  • TP/DP settings (to set them appropriately, AskUserQuestion on how many GPUs the model will be deployed)
  • Reasoning config (if applicable):
    • reasoning on/off: use either:
      • adapter_config.custom_system_prompt
        (like
        /think
        ,
        /no_think
        ) and no
        adapter_config.params_to_add
        (leave
        params_to_add
        unrelated to reasoning untouched)
      • adapter_config.params_to_add
        for payload modifier (like
        "chat_template_kwargs": {"enable_thinking": true/false}
        ) and no
        adapter_config.custom_system_prompt
        and
        adapter_config.use_system_prompt: false
        (leave
        custom_system_prompt
        and
        use_system_prompt
        unrelated to reasoning untouched).
    • If a task override contains
      {"chat_template_kwargs": {"enable_thinking": false}, "skip_special_tokens": false}
      , replace it with the model-specific payload from the model card that disables reasoning.
    • For pure-chat models, remove
      adapter_config.params_to_add
      completely if the model card does not define a reasoning toggle.
    • reasoning effort (if it's configurable, AskUserQuestion what reasoning effort they want)
    • higher
      max_new_tokens
    • etc.
  • Deployment-specific
    extra_args
    for vLLM/SGLang (look for the vLLM/SGLang deployment command)
  • Deployment-specific vLLM/SGLang versions (by default we use latest docker images, but you can control it with
    deployment.image
    e.g. vLLM above
    vllm/vllm-openai:v0.11.0
    stopped supporting
    rope-scaling
    arg used by Qwen models)
  • ARM64 / non-standard GPU compatibility: The default
    vllm/vllm-openai
    image only supports common GPU architectures. For ARM64 platforms or GPUs with non-standard compute capabilities (e.g., NVIDIA GB10 with sm_121), use NGC vLLM images instead:
    • Example:
      deployment.image: nvcr.io/nvidia/vllm:26.01-py3
    • AskUserQuestion about their GPU architecture if the model card doesn't specify deployment constraints
  • Tool-calling requirements:
    • If the selected benchmarks include
      agentic
      , you MUST configure tool calling end-to-end.
    • For self-deployment, extract the exact tool-calling flags/settings from the model card (for example vLLM/SGLang tool parser flags) and apply them.
    • For external endpoints, confirm the endpoint already supports tool calling before proceeding.
  • Any preparation requirements (e.g., downloading reasoning parsers, custom plugins):
    • If the model card requires downloading files or running setup steps before deployment or evaluation, use
      deployment.pre_cmd
      or
      evaluation.pre_cmd
      for non-local execution.
    • In
      pre_cmd
      script:
      • Use
        curl
        instead of
        wget
        as it's more widely available in Docker containers. Example:
        pre_cmd: curl -L -o reasoning_parser.py https://huggingface.co/.../reasoning_parser.py
      • Always use
        --no-cache-dir
        when installing Python packages to avoid cross-device link errors in Docker containers (the pip cache and temp directories may be on different filesystems). Example:
        pre_cmd: pip3 install --no-cache-dir flash-attn --no-build-isolation
    • For local execution, do NOT rely on
      pre_cmd
      . Run the preparation steps yourself on the host first, then mount the resulting files/directories into the container if needed.
    • Short mount examples:
      • deployment:
        execution.mounts.deployment: {"/absolute/path/to/reasoning_parser.py": "/vllm-workspace/reasoning_parser.py"}
      • evaluation:
        execution.mounts.evaluation: {"/absolute/path/to/hf_cache": "/root/.cache/huggingface"}
  • Env vars:
    • Use
      deployment.env_vars
      for deployment-side settings,
      evaluation.env_vars
      for evaluation-wide settings, and
      evaluation.tasks[].env_vars
      for task-specific overrides.
    • Supported value types:
      host:VAR_NAME
      = read the value from the host env var
      VAR_NAME
      ;
      lit:value
      = use the literal value directly;
      runtime:VAR_NAME
      = resolve
      VAR_NAME
      only at runtime inside the execution environment.
  • Any other model-specific requirements
Remember to check
evaluation.nemo_evaluator_config
and
evaluation.tasks.*.nemo_evaluator_config
overrides too for parameters to adjust (e.g. disabling reasoning)!
Present findings, explain each setting, ask user to confirm or adjust. If no model card found, ask user directly for the above configurations.
Step 4: Fill in remaining missing values
  • Find all remaining
    ???
    missing values in the config.
  • Ask the user only for values that couldn't be auto-discovered from the model card (e.g., SLURM hostname, account, output directory, MLflow/wandb tracking URI). Don't propose any defaults here. Let the user give you the values in plain text.
  • Ask the user if they want to change any other defaults e.g. execution partition or walltime (if running on SLURM) or add MLflow/wandb tags (if auto-export enabled).
Step 5: Confirm tasks (iterative)
Show tasks in the current config. Loop until the user confirms the task list is final:
  1. Tell the user: "Run
    nel ls tasks
    to see all available tasks".
  2. Ask if they want to add/remove tasks or add/remove/modify task-specific parameter overrides. To add per-task
    nemo_evaluator_config
    as specified by the user, e.g.:
    yaml
    tasks:
      - name: <task>
        nemo_evaluator_config:
          config:
            params:
              temperature: <value>
              max_new_tokens: <value>
              ...
  3. Apply changes.
  4. Show updated list and ask: "Is the task list final, or do you want to make more changes?"
Step 6: Advanced - Multi-node
There are two multi-node patterns. Ask the user which applies:
Pattern A: Multi-instance (independent instances with HAProxy)
Only if model >120B parameters or user wants more throughput. Explain: "Each node runs an independent deployment instance. HAProxy load-balances requests across all instances."
yaml
execution:
    num_nodes: 4       # Total nodes
    num_instances: 4   # 4 independent instances → HAProxy auto-enabled
Pattern B: Multi-node single instance (Ray TP/PP across nodes)
When a single model is too large for one node and needs pipeline parallelism across nodes. Use
vllm_ray
deployment config:
yaml
defaults:
  - deployment: vllm_ray   # Built-in Ray cluster setup (replaces manual pre_cmd)

execution:
    num_nodes: 2           # Single instance spanning 2 nodes

deployment:
    tensor_parallel_size: 8
    pipeline_parallel_size: 2
Pattern A+B combined: Multi-instance with multi-node instances
For very large models needing both cross-node parallelism AND multiple instances:
yaml
defaults:
  - deployment: vllm_ray

execution:
    num_nodes: 4       # Total nodes
    num_instances: 2   # 2 instances of 2 nodes each → HAProxy auto-enabled

deployment:
    tensor_parallel_size: 8
    pipeline_parallel_size: 2
Multi-node performance tips
  • For multi-node deployments, add
    switches: 1
    to
    execution.sbatch_extra_flags
    to instruct SLURM to allocate all nodes on the same network switch, reducing inter-node communication latency:
    yaml
    execution:
      sbatch_extra_flags:
        switches: 1
Common Confusions
  • num_instances
    controls independent deployment instances with HAProxy.
    data_parallel_size
    controls DP replicas within a single instance.
  • Global data parallelism is
    num_instances x data_parallel_size
    (e.g., 2 instances x 8 DP each = 16 replicas).
  • With multi-instance,
    parallelism
    in task config is the total concurrent requests across all instances, not per-instance.
  • num_nodes
    must be divisible by
    num_instances
    .
Step 7: Advanced - Interceptors
  • Tell the user they should see: https://docs.nvidia.com/nemo/evaluator/latest/libraries/nemo-evaluator/interceptors/index.html .
  • DON'T provide any general information about what interceptors typically do in API frameworks without reading the docs. If the user asks about interceptors, only then read the webpage to provide precise information.
  • If the user asks you to configure some interceptor, then read the webpage of this interceptor and configure it according to the
    --overrides
    syntax but put the values in the YAML config under
    evaluation.nemo_evaluator_config.config.target.api_endpoint.adapter_config
    (NOT under
    target.api_endpoint.adapter_config
    ) instead of using CLI overrides. By defining
    interceptors
    list you'd override the full chain of interceptors which can have unintended consequences like disabling default interceptors. That's why use the fields specified in the
    CLI Configuration
    section after the
    --overrides
    keyword to configure interceptors in the YAML config.
Documentation Errata
  • The docs may show incorrect parameter names for logging. Use
    max_logged_requests
    and
    max_logged_responses
    (NOT
    max_saved_*
    or
    max_*
    ).
Step 8: Run the evaluation
Print the following commands to the user. Propose to execute them in order to confirm the config works as expected before the full run.
Important: Ensure required environment variables are available. Ask the user to provide
HF_TOKEN
, even if they are not using a gated model (like Llama) or dataset (like GPQA), to reduce Hugging Face rate limiting errors. Remind the user to get access to GPQA, if it's in the config ("Please, click request access for GPQA-Diamond: https://huggingface.co/datasets/Idavidrein/gpqa"), and ask them to put missing tokens or keys (e.g.
HF_TOKEN
,
NVIDIA_API_KEY
,
api_key_name
from the config) in a
.env
file in the project root. NEL automatically reads
.env
— no need to source it manually.
bash
undefined
配置生成进度:
- [ ] 步骤1:检查nel是否已安装
- [ ] 步骤2:构建基础配置文件
- [ ] 步骤3:配置模型路径与参数
- [ ] 步骤4:补充剩余缺失值
- [ ] 步骤5:确认任务(迭代式)
- [ ] 步骤6:进阶 - 多节点(数据并行)
- [ ] 步骤7:进阶 - 拦截器
- [ ] 步骤8:运行评估
步骤1:检查nel是否已安装
通过命令
nel --version
测试nel是否已安装。
若未安装,请指导用户执行
pip install nemo-evaluator-launcher
步骤2:构建基础配置文件
向用户提示:"我将询问您5个问题,用于构建后续步骤中要调整的基础配置"。使用询问用户问题的方式引导完成以下5个问题:
  1. 执行方式:
  • 本地(Local)
  • SLURM
  1. 部署方式:
  • 无(外部部署,None (External))
  • vLLM
  • SGLang
  • NIM
  • TRT-LLM
  1. 自动导出:
  • 无(自动导出禁用,None (auto-export disabled))
  • MLflow
  • wandb
  1. 模型类型
  • 基础型(Base)
  • 对话或推理型(Chat or Reasoning)
  1. 基准测试: 此问题允许多选。
  • 若模型类型为基础型:
    1. 通用知识(General Knowledge)
    2. 代码(Coding)
    3. 长上下文(Long Context)
    4. 多语言(Multilingual)
  • 若模型类型为对话或推理型:
    1. 核心推理(Core Reasoning)
    2. 智能体(Agentic)
    3. 长上下文(Long Context)
    4. 多语言(Multilingual)
仅允许选择上述各分类下列出的选项,不得使用其他选项。在构建基础配置前,您必须收集到这5个问题的所有答案。
收集到所有答案后,运行以下脚本构建基础配置:
bash
nel skills build-config --execution <local|slurm> --deployment <none|vllm|sglang|nim|trtllm> --model_type <base|chat_reasoning> --benchmarks <general_knowledge|coding|core_reasoning|agentic|long_context|multilingual> [--export <none|mlflow|wandb>] [--output <OUTPUT>]
其中
--output
参数取决于用户提供的内容:
  • 省略:使用当前目录及自动生成的文件名
  • 目录:写入指定目录并使用自动生成的文件名
  • 文件路径(*.yaml):写入指定的具体文件
该命令绝不会覆盖现有文件。
步骤3:配置模型路径与参数
询问用户模型路径,并判断类型:
  • 检查点路径(以
    /
    ./
    开头)→ 设置
    deployment.checkpoint_path: <path>
    deployment.hf_model_handle: null
  • HuggingFace模型标识(如
    org/model-name
    )→ 设置
    deployment.hf_model_handle: <handle>
    deployment.checkpoint_path: null
使用网络搜索查找模型卡片(HuggingFace、build.nvidia.com)。仔细阅读完整文本,细节至关重要。提取所有相关配置:
  • 采样参数(
    temperature
    top_p
  • 上下文长度(
    deployment.extra_args: "--max-model-len <value>"
  • 张量并行/数据并行(TP/DP)设置(为了合理设置,询问用户模型将部署在多少个GPU上)
  • 推理配置(如适用):
    • 推理开关:使用以下两种方式之一:
      • adapter_config.custom_system_prompt
        (如
        /think
        /no_think
        ),且不设置
        adapter_config.params_to_add
        (与推理无关的
        params_to_add
        保持不变)
      • adapter_config.params_to_add
        用于负载修改(如
        "chat_template_kwargs": {"enable_thinking": true/false}
        ),且不设置
        adapter_config.custom_system_prompt
        并将
        adapter_config.use_system_prompt: false
        (与推理无关的
        custom_system_prompt
        use_system_prompt
        保持不变)。
    • 如果任务覆盖包含
      {"chat_template_kwargs": {"enable_thinking": false}, "skip_special_tokens": false}
      ,请替换为模型卡片中定义的禁用推理的模型特定负载。
    • 对于纯对话模型,如果模型卡片未定义推理开关,请完全移除
      adapter_config.params_to_add
    • 推理力度(如果可配置,询问用户想要的推理力度)
    • 更大的
      max_new_tokens
    • 其他相关配置
  • vLLM/SGLang的部署特定
    extra_args
    (查找vLLM/SGLang部署命令)
  • vLLM/SGLang的部署特定版本(默认使用最新Docker镜像,但可通过
    deployment.image
    控制,例如vLLM版本高于
    vllm/vllm-openai:v0.11.0
    时不再支持Qwen模型使用的
    rope-scaling
    参数)
  • ARM64/非标准GPU兼容性:默认的
    vllm/vllm-openai
    镜像仅支持常见GPU架构。对于ARM64平台或具有非标准计算能力的GPU(如NVIDIA GB10,sm_121),请改用NGC vLLM镜像:
    • 示例:
      deployment.image: nvcr.io/nvidia/vllm:26.01-py3
    • 如果模型卡片未指定部署限制,请询问用户的GPU架构
  • 工具调用要求:
    • 如果所选基准测试包含
      agentic
      ,必须端到端配置工具调用。
    • 对于自部署场景,从模型卡片中提取确切的工具调用标志/设置(例如vLLM/SGLang工具解析器标志)并应用。
    • 对于外部端点,在继续之前确认端点已支持工具调用。
  • 任何准备要求(如下载推理解析器、自定义插件):
    • 如果模型卡片要求在部署或评估前下载文件或运行设置步骤,对于非本地执行,使用
      deployment.pre_cmd
      evaluation.pre_cmd
    • pre_cmd
      脚本中:
      • 使用
        curl
        而非
        wget
        ,因为它在Docker容器中更通用。示例:
        pre_cmd: curl -L -o reasoning_parser.py https://huggingface.co/.../reasoning_parser.py
      • 安装Python包时始终使用
        --no-cache-dir
        ,以避免Docker容器中的跨设备链接错误(pip缓存和临时目录可能位于不同文件系统)。示例:
        pre_cmd: pip3 install --no-cache-dir flash-attn --no-build-isolation
    • 对于本地执行,请勿依赖
      pre_cmd
      。先在主机上自行运行准备步骤,然后根据需要将生成的文件/目录挂载到容器中。
    • 挂载示例:
      • 部署:
        execution.mounts.deployment: {"/absolute/path/to/reasoning_parser.py": "/vllm-workspace/reasoning_parser.py"}
      • 评估:
        execution.mounts.evaluation: {"/absolute/path/to/hf_cache": "/root/.cache/huggingface"}
  • 环境变量:
    • 使用
      deployment.env_vars
      设置部署端配置,
      evaluation.env_vars
      设置全局评估配置,
      evaluation.tasks[].env_vars
      设置任务特定覆盖配置。
    • 支持的值类型:
      host:VAR_NAME
      = 从主机环境变量
      VAR_NAME
      读取值;
      lit:value
      = 直接使用字面量值;
      runtime:VAR_NAME
      = 仅在执行环境运行时解析
      VAR_NAME
  • 其他任何模型特定要求
请记住也要检查
evaluation.nemo_evaluator_config
evaluation.tasks.*.nemo_evaluator_config
覆盖配置,以调整相关参数(如禁用推理)!
展示查找结果,解释每个设置,询问用户确认或调整。如果未找到模型卡片,请直接向用户询问上述配置信息。
步骤4:补充剩余缺失值
  • 查找配置中所有
    ???
    缺失值。
  • 仅向用户询问无法从模型卡片自动发现的值(如SLURM主机名、账户、输出目录、MLflow/wandb跟踪URI)。此处请勿提供任何默认值,让用户以纯文本形式提供值。
  • 询问用户是否要更改其他默认值,例如执行分区或运行时长(如果使用SLURM),或添加MLflow/wandb标签(如果启用自动导出)。
步骤5:确认任务(迭代式)
展示当前配置中的任务。循环执行直到用户确认任务列表最终确定:
  1. 告知用户:"运行
    nel ls tasks
    查看所有可用任务"。
  2. 询问用户是否要添加/删除任务,或添加/删除/修改任务特定参数覆盖。 如需添加用户指定的任务级
    nemo_evaluator_config
    ,示例如下:
    yaml
    tasks:
      - name: <task>
        nemo_evaluator_config:
          config:
            params:
              temperature: <value>
              max_new_tokens: <value>
              ...
  3. 应用更改。
  4. 展示更新后的列表并询问:"任务列表是否最终确定,还是需要进一步修改?"
步骤6:进阶 - 多节点
有两种多节点模式。询问用户适用哪种模式:
模式A:多实例(带HAProxy的独立实例)
仅适用于模型参数超过120B或用户需要更高吞吐量的场景。解释:"每个节点运行一个独立的部署实例。HAProxy负责在所有实例间负载均衡请求。"
yaml
execution:
    num_nodes: 4       # 总节点数
    num_instances: 4   # 4个独立实例 → HAProxy自动启用
模式B:多节点单实例(跨节点Ray TP/PP)
当单个模型过大无法在一个节点运行,需要跨节点流水线并行时使用。采用
vllm_ray
部署配置:
yaml
defaults:
  - deployment: vllm_ray   # 内置Ray集群设置(替代手动pre_cmd)

execution:
    num_nodes: 2           # 跨2个节点的单实例

deployment:
    tensor_parallel_size: 8
    pipeline_parallel_size: 2
模式A+B组合:带多节点实例的多实例
适用于既需要跨节点并行又需要多个实例的超大型模型:
yaml
defaults:
  - deployment: vllm_ray

execution:
    num_nodes: 4       # 总节点数
    num_instances: 2   # 2个实例,每个实例占2个节点 → HAProxy自动启用

deployment:
    tensor_parallel_size: 8
    pipeline_parallel_size: 2
多节点性能提示
  • 对于多节点部署,在
    execution.sbatch_extra_flags
    中添加
    switches: 1
    ,指示SLURM将所有节点分配到同一网络交换机上,减少节点间通信延迟:
    yaml
    execution:
      sbatch_extra_flags:
        switches: 1
常见误区
  • **
    num_instances
    控制带HAProxy的独立部署实例数量。
    data_parallel_size
    **控制单个实例内的DP副本数量。
  • 全局数据并行度为
    num_instances x data_parallel_size
    (例如2个实例 × 每个实例8个DP = 16个副本)。
  • 在多实例模式下,任务配置中的
    parallelism
    是所有实例的总并发请求数,而非单实例。
  • num_nodes
    必须能被
    num_instances
    整除。
步骤7:进阶 - 拦截器
  • 告知用户查看文档:https://docs.nvidia.com/nemo/evaluator/latest/libraries/nemo-evaluator/interceptors/index.html
  • 在未阅读文档前,请勿提供API框架中拦截器通常功能的通用信息。仅当用户询问拦截器相关问题时,才阅读网页提供准确信息。
  • 如果用户要求配置某个拦截器,请阅读该拦截器的网页,并按照
    --overrides
    语法将值放入YAML配置的
    evaluation.nemo_evaluator_config.config.target.api_endpoint.adapter_config
    下(而非
    target.api_endpoint.adapter_config
    ),而非使用CLI覆盖。 通过定义
    interceptors
    列表会覆盖整个拦截器链,可能导致禁用默认拦截器等意外后果。因此请使用
    CLI Configuration
    部分中
    --overrides
    关键字后的指定字段在YAML配置中配置拦截器。
文档勘误
  • 文档中可能显示错误的日志参数名称。请使用
    max_logged_requests
    max_logged_responses
    (而非
    max_saved_*
    max_*
    )。
步骤8:运行评估
向用户打印以下命令。建议按顺序执行,以确认配置在完整运行前能正常工作。
重要提示:确保所需环境变量可用。请用户提供
HF_TOKEN
,即使他们未使用 gated 模型(如Llama)或数据集(如GPQA),以减少Hugging Face的速率限制错误。提醒用户如果配置中包含GPQA,请请求访问权限("请点击申请GPQA-Diamond访问权限:https://huggingface.co/datasets/Idavidrein/gpqa"),并要求用户将缺失的令牌或密钥(如`HF_TOKEN`、`NVIDIA_API_KEY`、配置中的`api_key_name`)放入项目根目录的`.env`文件中。NEL会自动读取`.env`文件,无需手动执行source命令。
bash
undefined

If using pre_cmd or post_cmd:

如果使用pre_cmd或post_cmd:

export NEMO_EVALUATOR_TRUST_PRE_CMD=1

1. **Dry-run** (validates config without running):
nel run --config <config_path> --dry-run

2. **Test with limited samples** (quick validation run):
nel run --config <config_path> -o ++evaluation.nemo_evaluator_config.config.params.limit_samples=10
For multi-instance deployments, also scale down to a single instance to validate the deployment faster:
nel run --config <config_path>
-o execution.num_nodes=1
-o execution.num_instances=1
-o evaluation.nemo_evaluator_config.config.params.parallelism=5
-o ++evaluation.nemo_evaluator_config.config.params.limit_samples=10
Adjust `num_nodes` to match the number of nodes a single model instance needs (e.g., 2 for a model requiring 2-node Ray TP).

3. **Re-run a single task** (useful for debugging or re-testing after config changes):
nel run --config <config_path> -t <task_name>
Combine with `-o` for limited samples: `nel run --config <config_path> -t <task_name> -o ++evaluation.nemo_evaluator_config.config.params.limit_samples=10`

4. **Full evaluation** (production run):
nel run --config <config_path>

After the dry-run, check the output from `nel` for any problems with the config. If there are no problems, propose to first execute the test run with limited samples and then execute the full evaluation. If there are problems, resolve them before executing the full evaluation.

**Monitoring Progress**

After job submission, you can monitor progress using:

1. **Check job status:**
```bash
nel status <invocation_id>
nel info <invocation_id>
  1. Stream logs (Local execution only):
    bash
    nel logs <invocation_id>
    Note:
    nel logs
    is not supported for SLURM execution.
  2. Inspect logs via SSH (SLURM workaround):
    When
    nel logs
    is unavailable (SLURM), use SSH to inspect logs directly:
    First, get log locations:
    bash
    nel info <invocation_id> --logs
    Then, use SSH to view logs:
    Check server deployment logs:
    bash
    ssh <username>@<hostname> "tail -100 <log path from `nel info <invocation_id> --logs`>/server-<slurm_job_id>-*.log"
    Shows vLLM server startup, model loading, and deployment errors (e.g., missing wget/curl).
    Check evaluation client logs:
    bash
    ssh <username>@<hostname> "tail -100 <log path from `nel info <invocation_id> --logs`>/client-<slurm_job_id>.log"
    Shows evaluation progress, task execution, and results.
    Check SLURM scheduler logs:
    bash
    ssh <username>@<hostname> "tail -100 <log path from `nel info <invocation_id> --logs`>/slurm-<slurm_job_id>.log"
    Shows job scheduling, health checks, and overall execution flow.
    Search for errors:
    bash
    ssh <username>@<hostname> "grep -i 'error\|warning\|failed' <log path from `nel info <invocation_id> --logs`>/*.log"

Advanced workflow: For more detailed run monitoring, debugging failed evaluations, and post-run analysis, see the
launching-evals
skill.

Direct users with issues to:
Now, copy this checklist and track your progress:
Config Generation Progress:
- [ ] Step 1: Check if nel is installed
- [ ] Step 2: Build the base config file
- [ ] Step 3: Configure model path and parameters
- [ ] Step 4: Fill in remaining missing values
- [ ] Step 5: Confirm tasks (iterative)
- [ ] Step 6: Advanced - Multi-node (Data Parallel)
- [ ] Step 7: Advanced - Interceptors
- [ ] Step 8: Run the evaluation
export NEMO_EVALUATOR_TRUST_PRE_CMD=1

1. **预运行(Dry-run)**(验证配置但不实际运行):
nel run --config <config_path> --dry-run

2. **有限样本测试**(快速验证运行):
nel run --config <config_path> -o ++evaluation.nemo_evaluator_config.config.params.limit_samples=10
对于多实例部署,也可缩至单实例以更快验证部署:
nel run --config <config_path>
-o execution.num_nodes=1
-o execution.num_instances=1
-o evaluation.nemo_evaluator_config.config.params.parallelism=5
-o ++evaluation.nemo_evaluator_config.config.params.limit_samples=10
根据单个模型实例所需的节点数调整`num_nodes`(例如需要2节点Ray TP的模型设为2)。

3. **重新运行单个任务**(适用于调试或配置更改后重新测试):
nel run --config <config_path> -t <task_name>
结合`-o`参数实现有限样本运行:`nel run --config <config_path> -t <task_name> -o ++evaluation.nemo_evaluator_config.config.params.limit_samples=10`

4. **完整评估**(生产环境运行):
nel run --config <config_path>

预运行后,检查`nel`输出的配置问题。如果没有问题,建议先执行有限样本测试运行,再执行完整评估。如果有问题,请先解决再执行完整评估。

**监控进度**

提交作业后,可通过以下方式监控进度:

1. **检查作业状态**:
```bash
nel status <invocation_id>
nel info <invocation_id>
  1. 流式日志(仅本地执行):
    bash
    nel logs <invocation_id>
    注意:
    nel logs
    不支持SLURM执行。
  2. 通过SSH查看日志(SLURM替代方案):
    nel logs
    不可用时(SLURM场景),使用SSH直接查看日志:
    首先获取日志位置:
    bash
    nel info <invocation_id> --logs
    然后使用SSH查看日志:
    检查服务器部署日志
    bash
    ssh <username>@<hostname> "tail -100 <log path from `nel info <invocation_id> --logs`>/server-<slurm_job_id>-*.log"
    显示vLLM服务器启动、模型加载和部署错误(如缺少wget/curl)。
    检查评估客户端日志
    bash
    ssh <username>@<hostname> "tail -100 <log path from `nel info <invocation_id> --logs`>/client-<slurm_job_id>.log"
    显示评估进度、任务执行和结果。
    检查SLURM调度器日志
    bash
    ssh <username>@<hostname> "tail -100 <log path from `nel info <invocation_id> --logs`>/slurm-<slurm_job_id>.log"
    显示作业调度、健康检查和整体执行流程。
    搜索错误信息
    bash
    ssh <username>@<hostname> "grep -i 'error\|warning\|failed' <log path from `nel info <invocation_id> --logs`>/*.log"

进阶工作流程:如需更详细的运行监控、失败评估调试和运行后分析,请查看
launching-evals
技能。

引导有问题的用户前往:
现在,复制此清单并跟踪进度:
配置生成进度:
- [ ] 步骤1:检查nel是否已安装
- [ ] 步骤2:构建基础配置文件
- [ ] 步骤3:配置模型路径与参数
- [ ] 步骤4:补充剩余缺失值
- [ ] 步骤5:确认任务(迭代式)
- [ ] 步骤6:进阶 - 多节点(数据并行)
- [ ] 步骤7:进阶 - 拦截器
- [ ] 步骤8:运行评估