evaluation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNeMo Evaluator Launcher Assistant
NeMo Evaluator Launcher 助手
You're an expert in NeMo Evaluator Launcher! Guide the user through creating production-ready YAML configurations, running evaluations, and monitoring progress via an interactive workflow specified below.
您是NeMo Evaluator Launcher领域的专家!请通过以下交互式流程,引导用户创建可用于生产环境的YAML配置、运行评估并监控进度。
Workspace and Pipeline Integration
工作区与流水线集成
If is set, read . Check for existing workspaces — especially if evaluating a model from a prior PTQ or deployment step. Reuse the existing workspace so you have access to the quantized checkpoint and any code modifications.
MODELOPT_WORKSPACE_ROOTskills/common/workspace-management.mdThis skill is often the final stage of the PTQ → Deploy → Eval pipeline. If the model required runtime patches during deployment (transformers upgrade, framework source fixes), carry those patches into the NEL config via .
deployment.command若已设置,请阅读。检查是否存在现有工作区——尤其是在评估来自先前PTQ或部署步骤的模型时。复用现有工作区,以便访问量化检查点和任何代码修改。
MODELOPT_WORKSPACE_ROOTskills/common/workspace-management.md本技能通常是PTQ → 部署 → 评估流水线的最后阶段。如果模型在部署期间需要运行时补丁(如transformers升级、框架源码修复),请通过将这些补丁带入NEL配置。
deployment.commandWorkflow
工作流程
text
Config Generation Progress:
- [ ] Step 0: Check workspace (if MODELOPT_WORKSPACE_ROOT is set)
- [ ] Step 1: Check if nel is installed and if user has existing config
- [ ] Step 2: Build the base config file
- [ ] Step 3: Configure model path and parameters
- [ ] Step 4: Fill in remaining missing values
- [ ] Step 5: Confirm tasks (iterative)
- [ ] Step 6: Advanced - Multi-node (Data Parallel)
- [ ] Step 7: Advanced - Interceptors
- [ ] Step 7.5: Check container registry auth (SLURM only)
- [ ] Step 8: Run the evaluationStep 1: Check prerequisites
Test that is installed with . If not, instruct the user to .
nelnel --versionpip install nemo-evaluator-launcherIf the user already has a config file (e.g., "run this config", "evaluate with my-config.yaml"), skip to Step 8. Optionally review it for common issues (missing values, quantization flags) before running.
???Shortcut: use pre-built task snippets. If the user asks for a specific benchmark (e.g., "run MMLU-Pro", "evaluate with AIME"), check (relative to this skill's directory) for a matching task snippet. Available: mmlu_pro, gpqa, aime2025, livecodebench, ifbench, scicode. Task snippets contain only the task-specific config (name, params, repeats) — not the full NEL config. To use them:
recipes/tasks/- Read the task snippet(s) the user wants
- Use as the base config template
recipes/examples/example_eval.yaml - Replace the section with the selected snippet(s)
tasks: - Do Step 3 (auto-detect model settings from checkpoint) and Step 4 (fill in values)
??? - Proceed to Step 7.5/8
Step 2: Build the base config file
Prompt the user with "I'll ask you 5 questions to build the base config we'll adjust in the next steps". Guide the user through the 5 questions using AskUserQuestion:
- Execution:
- Local
- SLURM
- Deployment:
- None (External)
- vLLM
- SGLang
- NIM
- TRT-LLM
- Auto-export:
- None (auto-export disabled)
- MLflow
- wandb
- Model type
- Base
- Chat
- Reasoning
- Benchmarks: Allow for multiple choices in this question.
- Standard LLM Benchmarks (like MMLU, IFEval, GSM8K, ...)
- Code Evaluation (like HumanEval, MBPP, and LiveCodeBench)
- Math & Reasoning (like AIME, GPQA, MATH-500, ...)
- Safety & Security (like Garak and Safety Harness)
- Multilingual (like MMATH, Global MMLU, MMLU-Prox)
Only accept options from the categories listed above (Execution, Deployment, Auto-export, Model type, Benchmarks). YOU HAVE TO GATHER THE ANSWERS for the 5 questions before you can build the base config.
Note: These categories come from NEL'sCLI. Always runbuild-configfirst to get the current options — they may differ from this list (e.g.,nel skills build-config --helpinstead of separatechat_reasoning/chat,reasoninginstead ofgeneral_knowledge). When the CLI's current options differ from this list, prefer the CLI's options.standard
When you have all the answers, run the script to build the base config:
bash
nel skills build-config --execution <local|slurm> --deployment <none|vllm|sglang|nim|trtllm> --model_type <base|chat|reasoning> --benchmarks <standard|code|math_reasoning|safety|multilingual> [--export <none|mlflow|wandb>] [--output <OUTPUT>]Where depends on what the user provides:
--output- Omit: Uses current directory with auto-generated filename
- Directory: Writes to that directory with auto-generated filename
- File path (*.yaml): Writes to that specific file
It never overwrites existing files.
Step 3: Configure model path and parameters
Ask for model path. Determine type:
- Checkpoint path (local directory — starts with ,
/,./,../, or contains no~but exists on disk) → set/anddeployment.checkpoint_path: <path>deployment.hf_model_handle: null - HF handle (e.g., — contains exactly one
org/model-nameand does not exist locally) → set/anddeployment.hf_model_handle: <handle>deployment.checkpoint_path: null
Auto-detect ModelOpt quantization format (checkpoint paths only):
Check for in the checkpoint directory:
hf_quant_config.jsonbash
cat <checkpoint_path>/hf_quant_config.json 2>/dev/nullIf found, read and set the correct vLLM/SGLang quantization flag in :
quantization.quant_algodeployment.extra_args | Flag to add |
|---|---|
| |
| |
| |
| Other values | Try |
If no , also check for a section with . If neither is found, the checkpoint is unquantized — no flag needed.
hf_quant_config.jsonconfig.jsonquantization_configquant_method: "modelopt"Note: Some models require additional env vars for deployment (e.g.,for Nemotron Super). These are not inVLLM_NVFP4_GEMM_BACKEND=marlin— they are discovered during model card research below.hf_quant_config.json
Auto-detect deployment settings from checkpoint:
Read from the checkpoint (or HF model card) and build dynamically:
config.jsondeployment.extra_argsbash
cat <checkpoint_path>/config.json 2>/dev/nullField in | What to set | Example |
|---|---|---|
| | |
| | Only add if model has custom code |
Then use WebSearch to check the model card (HuggingFace page) for deployment-specific settings:
| Model card signal | What to set |
|---|---|
| Reasoning model (thinking/CoT) | |
| Tool-calling support | |
| Custom vLLM flags documented | Add as specified (e.g., |
Combine all detected flags into a single override. The recipe's default is a fallback — always prefer the value from .
deployment.extra_args--max-model-len 32768config.jsonQuantization-aware benchmark defaults:
When a quantized checkpoint is detected, read for benchmark sensitivity rankings and recommended sets. Present recommendations to the user and ask which to include.
references/quantization-benchmarks.mdRead for the full extraction checklist (sampling params, reasoning config, ARM64 compatibility, pre_cmd, etc.). Use WebSearch to research the model card, present findings, and ask the user to confirm.
references/model-card-research.mdStep 4: Fill in remaining missing values
- Find all remaining missing values in the config.
??? - Ask the user only for values that couldn't be auto-discovered from the model card (e.g., SLURM hostname, account, output directory, MLflow/wandb tracking URI). Don't propose any defaults here. Let the user give you the values in plain text.
- Ask the user if they want to change any other defaults e.g. execution partition or walltime (if running on SLURM) or add MLflow/wandb tags (if auto-export enabled).
Step 5: Confirm tasks (iterative)
Show tasks in the current config. Loop until the user confirms the task list is final:
-
Tell the user: "Runto see all available tasks".
nel ls tasks -
Ask if they want to add/remove tasks or add/remove/modify task-specific parameter overrides. To add per-taskas specified by the user, e.g.:
nemo_evaluator_configyamltasks: - name: <task> nemo_evaluator_config: config: params: temperature: <value> max_new_tokens: <value> ... -
Apply changes.
-
Show updated list and ask: "Is the task list final, or do you want to make more changes?"
Known Issues
-
NeMo-Skills workaround (self-deployment only): If usingtasks with self-deployment (vLLM/SGLang/NIM), add at top level:
nemo_skills.*yamltarget: api_endpoint: api_key_name: DUMMY_API_KEYFor the None (External) deployment theshould be already defined. Theapi_key_nameexport is handled in Step 8.DUMMY_API_KEY
Step 6: Advanced - Multi-node
If the user needs multi-node evaluation (model >120B, or more throughput), read for the configuration patterns (HAProxy multi-instance, Ray TP/PP, or combined).
references/multi-node.mdStep 7: Advanced - Interceptors
- Tell the user they should see: https://docs.nvidia.com/nemo/evaluator/latest/libraries/nemo-evaluator/interceptors/index.html .
- DON'T provide any general information about what interceptors typically do in API frameworks without reading the docs. If the user asks about interceptors, only then read the webpage to provide precise information.
- If the user asks you to configure some interceptor, then read the webpage of this interceptor and configure it according to the syntax but put the values in the YAML config under
--overrides(NOT underevaluation.nemo_evaluator_config.config.target.api_endpoint.adapter_config) instead of using CLI overrides. By definingtarget.api_endpoint.adapter_configlist you'd override the full chain of interceptors which can have unintended consequences like disabling default interceptors. That's why use the fields specified in theinterceptorssection after theCLI Configurationkeyword to configure interceptors in the YAML config.--overrides
Documentation Errata
- The docs may show incorrect parameter names for logging. Use and
max_logged_requests(NOTmax_logged_responsesormax_saved_*).max_*
Step 7.5: Check container registry authentication (SLURM only)
NEL's default deployment images by framework:
| Framework | Default image | Registry |
|---|---|---|
| vLLM | | DockerHub |
| SGLang | | DockerHub |
| TRT-LLM | | NGC |
| Evaluation tasks | | NGC |
Before submitting, verify the cluster has credentials for the deployment image. See section 6 for the full procedure.
skills/common/slurm-setup.mdbash
ssh <host> "grep -E '^\s*machine\s+' ~/.config/enroot/.credentials 2>/dev/null"Decision flow (check before submitting):
-
Check if the cluster has credentials for the default DockerHub image (see command above)
-
If DockerHub credentials exist → use the default image and submit
-
If DockerHub credentials are missing but can be added → add them (seesection 6), then submit
slurm-setup.md -
If DockerHub credentials cannot be added → overrideto the NGC alternative and submit:
deployment.imageyamldeployment: image: nvcr.io/nvidia/vllm:<YY.MM>-py3 # check https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm for latest tag -
Do not retry more than once without fixing the auth issue
Step 8: Run the evaluation
Print the following commands to the user. Propose to execute them in order to confirm the config works as expected before the full run.
Important: Export required environment variables based on your config. If any tokens or keys are missing, point the user to — it lists all possible keys with notes on which tasks need them. Ask the user to copy it, fill in their keys, and source it:
recipes/env.examplebash
cp recipes/env.example .envtext
配置生成进度:
- [ ] 步骤0:检查工作区(若已设置MODELOPT_WORKSPACE_ROOT)
- [ ] 步骤1:检查nel是否已安装,以及用户是否有现有配置
- [ ] 步骤2:构建基础配置文件
- [ ] 步骤3:配置模型路径和参数
- [ ] 步骤4:填充剩余缺失值
- [ ] 步骤5:确认任务(迭代式)
- [ ] 步骤6:进阶 - 多节点(数据并行)
- [ ] 步骤7:进阶 - 拦截器
- [ ] 步骤7.5:检查容器注册表认证(仅SLURM环境)
- [ ] 步骤8:运行评估步骤1:检查前置条件
通过测试是否已安装。若未安装,请指导用户执行。
nel --versionnelpip install nemo-evaluator-launcher若用户已有配置文件(例如“运行此配置”、“用my-config.yaml评估”),请跳至步骤8。运行前可选择性检查常见问题(如缺失值、量化标志)。
???快捷方式:使用预构建任务片段。如果用户要求特定基准测试(例如“run MMLU-Pro”、“用AIME评估”),请检查本技能目录下的,查找匹配的任务片段。可用任务包括:mmlu_pro、gpqa、aime2025、livecodebench、ifbench、scicode。任务片段仅包含特定任务的配置(名称、参数、重复次数)——并非完整的NEL配置。使用方式如下:
recipes/tasks/- 读取用户需要的任务片段
- 以作为基础配置模板
recipes/examples/example_eval.yaml - 将部分替换为所选片段
tasks: - 执行步骤3(从检查点自动检测模型设置)和步骤4(填充值)
??? - 继续执行步骤7.5/8
步骤2:构建基础配置文件
向用户提示:“我将询问您5个问题,以构建后续步骤中要调整的基础配置”。使用AskUserQuestion引导用户回答以下5个问题:
- 执行环境:
- 本地(Local)
- SLURM
- 部署方式:
- 无(None,外部部署)
- vLLM
- SGLang
- NIM
- TRT-LLM
- 自动导出:
- 无(None,自动导出禁用)
- MLflow
- wandb
- 模型类型
- Base(基础型)
- Chat(对话型)
- Reasoning(推理型)
- 基准测试: 允许多选。
- 标准LLM基准测试(如MMLU、IFEval、GSM8K等)
- 代码评估(如HumanEval、MBPP、LiveCodeBench)
- 数学与推理(如AIME、GPQA、MATH-500等)
- 安全与合规(如Garak、Safety Harness)
- 多语言(如MMATH、Global MMLU、MMLU-Prox)
仅接受上述类别列出的选项(执行环境、部署方式、自动导出、模型类型、基准测试)。必须收集到5个问题的答案后,才能构建基础配置。
注意:这些类别来自NEL的CLI。请始终先运行build-config获取当前选项——它们可能与本列表不同(例如,nel skills build-config --help替代单独的chat_reasoning/chat,reasoning替代general_knowledge)。当CLI的当前选项与本列表不同时,优先使用CLI的选项。standard
收集到所有答案后,运行脚本构建基础配置:
bash
nel skills build-config --execution <local|slurm> --deployment <none|vllm|sglang|nim|trtllm> --model_type <base|chat|reasoning> --benchmarks <standard|code|math_reasoning|safety|multilingual> [--export <none|mlflow|wandb>] [--output <OUTPUT>]其中取决于用户提供的内容:
--output- 省略:使用当前目录并自动生成文件名
- 目录:写入该目录并自动生成文件名
- 文件路径(*.yaml):写入指定文件
该命令绝不会覆盖现有文件。
步骤3:配置模型路径和参数
询问用户模型路径,并判断类型:
- 检查点路径(本地目录——以、
/、./、../开头,或不含~但存在于磁盘)→ 设置/,并将deployment.checkpoint_path: <path>设为deployment.hf_model_handlenull - HF模型标识(如——恰好包含一个
org/model-name且本地不存在)→ 设置/,并将deployment.hf_model_handle: <handle>设为deployment.checkpoint_pathnull
自动检测ModelOpt量化格式(仅检查点路径):
检查检查点目录中是否存在:
hf_quant_config.jsonbash
cat <checkpoint_path>/hf_quant_config.json 2>/dev/null若找到,读取,并在中设置正确的vLLM/SGLang量化标志:
quantization.quant_algodeployment.extra_args | 需添加的标志 |
|---|---|
| |
| |
| |
| 其他值 | 尝试 |
若未找到,也请检查中是否存在带有的部分。若两者均未找到,则检查点未量化——无需添加标志。
hf_quant_config.jsonconfig.jsonquant_method: "modelopt"quantization_config注意:部分模型部署时需要额外的环境变量(如Nemotron Super需要)。这些变量不在VLLM_NVFP4_GEMM_BACKEND=marlin中——需通过下文的模型卡片研究发现。hf_quant_config.json
从检查点自动检测部署设置:
读取检查点(或HF模型卡片)中的,动态构建:
config.jsondeployment.extra_argsbash
cat <checkpoint_path>/config.json 2>/dev/null | 需设置的内容 | 示例 |
|---|---|---|
| | |
存在 | | 仅当模型包含自定义代码时添加 |
然后使用WebSearch检查模型卡片(HuggingFace页面),获取部署特定设置:
| 模型卡片信号 | 需设置的内容 |
|---|---|
| 推理模型(含思考链/CoT) | 若提供自定义解析器,添加 |
| 支持工具调用 | |
| 文档中记录了自定义vLLM标志 | 按指定添加(如 |
将所有检测到的标志合并为单个覆盖项。配方中的默认值仅作为备选——请始终优先使用中的值。
deployment.extra_args--max-model-len 32768config.json量化感知基准测试默认设置:
当检测到量化检查点时,读取获取基准测试敏感度排名和推荐集合。向用户展示推荐内容,并询问要包含哪些测试。
references/quantization-benchmarks.md读取获取完整提取清单(采样参数、推理配置、ARM64兼容性、pre_cmd等)。使用WebSearch研究模型卡片,展示结果并请用户确认。
references/model-card-research.md步骤4:填充剩余缺失值
- 查找配置中所有剩余的缺失值。
??? - 仅向用户询问无法从模型卡片自动发现的值(如SLURM主机名、账户、输出目录、MLflow/wandb跟踪URI)。此处请勿提供任何默认值,请让用户以纯文本形式提供值。
- 询问用户是否要更改其他默认值,例如执行分区或运行时长(若在SLURM环境运行),或添加MLflow/wandb标签(若启用自动导出)。
步骤5:确认任务(迭代式)
展示当前配置中的任务。循环执行,直到用户确认任务列表最终确定:
-
告知用户:“运行查看所有可用任务”。
nel ls tasks -
询问用户是否要添加/删除任务,或添加/删除/修改特定任务的参数覆盖项。 如需按用户指定添加任务专属的,示例如下:
nemo_evaluator_configyamltasks: - name: <task> nemo_evaluator_config: config: params: temperature: <value> max_new_tokens: <value> ... -
应用更改。
-
展示更新后的列表并询问:“任务列表是否最终确定,还是需要进一步修改?”
已知问题
-
NeMo-Skills临时解决方案(仅自部署):如果在自部署(vLLM/SGLang/NIM)中使用任务,请在顶层添加:
nemo_skills.*yamltarget: api_endpoint: api_key_name: DUMMY_API_KEY对于无(外部)部署,应已定义。api_key_name的导出将在步骤8中处理。DUMMY_API_KEY
步骤6:进阶 - 多节点
如果用户需要多节点评估(模型规模>120B,或需要更高吞吐量),请阅读获取配置模式(HAProxy多实例、Ray张量并行/流水线并行,或组合模式)。
references/multi-node.md步骤7:进阶 - 拦截器
- 告知用户可查看:https://docs.nvidia.com/nemo/evaluator/latest/libraries/nemo-evaluator/interceptors/index.html。
- 在未阅读文档的情况下,请勿提供关于拦截器在API框架中通常作用的通用信息。仅当用户询问拦截器相关问题时,才阅读网页提供精准信息。
- 如果用户要求配置某个拦截器,请阅读该拦截器的网页,并按照语法配置,但将值放在YAML配置的
--overrides下(而非evaluation.nemo_evaluator_config.config.target.api_endpoint.adapter_config),而非使用CLI覆盖项。 通过定义target.api_endpoint.adapter_config列表会覆盖整个拦截器链,可能导致意外后果(如禁用默认拦截器)。因此请使用interceptors部分中CLI Configuration关键字后的指定字段,在YAML配置中配置拦截器。--overrides
文档勘误
- 文档中可能显示错误的日志参数名称。请使用和
max_logged_requests(而非max_logged_responses或max_saved_*)。max_*
步骤7.5:检查容器注册表认证(仅SLURM环境)
NEL按框架划分的默认部署镜像:
| 框架 | 默认镜像 | 注册表 |
|---|---|---|
| vLLM | | DockerHub |
| SGLang | | DockerHub |
| TRT-LLM | | NGC |
| 评估任务 | | NGC |
提交前,请验证集群是否拥有部署镜像的凭据。完整流程请参见第6节。
skills/common/slurm-setup.mdbash
ssh <host> "grep -E '^\s*machine\s+' ~/.config/enroot/.credentials 2>/dev/null"决策流程(提交前检查):
-
检查集群是否拥有默认DockerHub镜像的凭据(见上述命令)
-
若DockerHub凭据存在 → 使用默认镜像并提交
-
若DockerHub凭据缺失但可添加 → 添加凭据(见第6节),然后提交
slurm-setup.md -
若无法添加DockerHub凭据 → 将覆盖为NGC替代镜像并提交:
deployment.imageyamldeployment: image: nvcr.io/nvidia/vllm:<YY.MM>-py3 # 请查看https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm获取最新标签 -
未修复认证问题时,重试次数请勿超过一次
步骤8:运行评估
向用户打印以下命令。建议按顺序执行,以确认配置在完整运行前正常工作。
重要提示:根据配置导出所需环境变量。若缺少任何令牌或密钥,请引导用户查看——其中列出了所有可能的密钥,并注明哪些任务需要它们。请用户复制该文件、填写密钥并加载:
recipes/env.examplebash
cp recipes/env.example .envEdit .env with your keys
编辑.env文件填写您的密钥
set -a && source .env && set +a
```bashset -a && source .env && set +a
```bashIf using pre_cmd or post_cmd (review pre_cmd content before enabling — it runs arbitrary commands):
若使用pre_cmd或post_cmd(启用前请检查pre_cmd内容——它会运行任意命令):
export NEMO_EVALUATOR_TRUST_PRE_CMD=1
export NEMO_EVALUATOR_TRUST_PRE_CMD=1
If using nemo_skills.* tasks with self-deployment:
若在自部署中使用nemo_skills.*任务:
export DUMMY_API_KEY=dummy
1. **Dry-run** (validates config without running):
```bash
nel run --config <config_path> --dry-run-
Test with limited samples (quick validation run):bash
nel run --config <config_path> -o ++evaluation.nemo_evaluator_config.config.params.limit_samples=10 -
Re-run a single task (useful for debugging or re-testing after config changes):bash
nel run --config <config_path> -t <task_name>Combine withfor limited samples:-onel run --config <config_path> -t <task_name> -o ++evaluation.nemo_evaluator_config.config.params.limit_samples=10 -
Full evaluation (production run):bash
nel run --config <config_path>
After the dry-run, check the output from for any problems with the config. If there are no problems, propose to first execute the test run with limited samples and then execute the full evaluation. If there are problems, resolve them before executing the full evaluation.
nelMonitoring Progress
After job submission, register the job per the monitor skill for durable cross-session tracking. For one-off queries (live status, debugging a failed run, analyzing results) use the launching-evals skill; for querying past runs in MLflow use accessing-mlflow.
NEL-specific diagnostics (for debugging failures):
bash
undefinedexport DUMMY_API_KEY=dummy
1. **预运行(Dry-run)**(验证配置但不实际运行):
```bash
nel run --config <config_path> --dry-run-
有限样本测试(快速验证运行):bash
nel run --config <config_path> -o ++evaluation.nemo_evaluator_config.config.params.limit_samples=10 -
重新运行单个任务(用于调试或配置更改后重新测试):bash
nel run --config <config_path> -t <task_name>可结合实现有限样本运行:-onel run --config <config_path> -t <task_name> -o ++evaluation.nemo_evaluator_config.config.params.limit_samples=10 -
完整评估(生产环境运行):bash
nel run --config <config_path>
预运行完成后,检查的输出,确认配置是否存在问题。若无问题,建议先执行有限样本测试运行,再执行完整评估。若存在问题,请先解决再执行完整评估。
nel进度监控
提交作业后,请通过monitor skill注册作业,实现跨会话的持久跟踪。对于一次性查询(实时状态、调试失败运行、分析结果),请使用launching-evals skill;若要查询MLflow中的过往运行,请使用accessing-mlflow。
NEL专属诊断工具(用于调试失败情况):
bash
undefinedQuick status check
快速状态检查
nel status <invocation_id>
nel info <invocation_id>
nel status <invocation_id>
nel info <invocation_id>
Get log paths
获取日志路径
nel info <invocation_id> --logs
nel info <invocation_id> --logs
Inspect logs via SSH
通过SSH查看日志
ssh <user>@<host> "tail -100 <log_path>/server-<slurm_job_id>-.log" # deployment errors
ssh <user>@<host> "tail -100 <log_path>/client-<slurm_job_id>.log" # evaluation errors
ssh <user>@<host> "tail -100 <log_path>/slurm-<slurm_job_id>.log" # scheduling/walltime
ssh <user>@<host> "grep -i 'error|failed' <log_path>/.log" # search all logs
---
Direct users with issues to:
- **GitHub Issues:** <https://github.com/NVIDIA-NeMo/Evaluator/issues>
- **GitHub Discussions:** <https://github.com/NVIDIA-NeMo/Evaluator/discussions>
Now, copy this checklist and track your progress:
```text
Config Generation Progress:
- [ ] Step 0: Check workspace (if MODELOPT_WORKSPACE_ROOT is set)
- [ ] Step 1: Check if nel is installed and if user has existing config
- [ ] Step 2: Build the base config file
- [ ] Step 3: Configure model path and parameters
- [ ] Step 4: Fill in remaining missing values
- [ ] Step 5: Confirm tasks (iterative)
- [ ] Step 6: Advanced - Multi-node (Data Parallel)
- [ ] Step 7: Advanced - Interceptors
- [ ] Step 7.5: Check container registry auth (SLURM only)
- [ ] Step 8: Run the evaluationssh <user>@<host> "tail -100 <log_path>/server-<slurm_job_id>-.log" # 部署错误
ssh <user>@<host> "tail -100 <log_path>/client-<slurm_job_id>.log" # 评估错误
ssh <user>@<host> "tail -100 <log_path>/slurm-<slurm_job_id>.log" # 调度/运行时长问题
ssh <user>@<host> "grep -i 'error|failed' <log_path>/.log" # 搜索所有日志中的错误信息
---
若用户遇到问题,请引导至:
- **GitHub Issues**:<https://github.com/NVIDIA-NeMo/Evaluator/issues>
- **GitHub Discussions**:<https://github.com/NVIDIA-NeMo/Evaluator/discussions>
现在,请复制此清单并跟踪进度:
```text
配置生成进度:
- [ ] 步骤0:检查工作区(若已设置MODELOPT_WORKSPACE_ROOT)
- [ ] 步骤1:检查nel是否已安装,以及用户是否有现有配置
- [ ] 步骤2:构建基础配置文件
- [ ] 步骤3:配置模型路径和参数
- [ ] 步骤4:填充剩余缺失值
- [ ] 步骤5:确认任务(迭代式)
- [ ] 步骤6:进阶 - 多节点(数据并行)
- [ ] 步骤7:进阶 - 拦截器
- [ ] 步骤7.5:检查容器注册表认证(仅SLURM环境)
- [ ] 步骤8:运行评估