nemotron-retrieval-recipes

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Nemotron Retrieval Recipes

Nemotron Retrieval Recipes

Invocation:
$nemotron-retrieval-recipes
.
调用方式:
$nemotron-retrieval-recipes

Purpose

用途

Use this skill to work with public Nemotron embedding and reranking retrieval recipes in a source checkout or installed package. Prefer the current checkout over memory, because the recipe CLI, configs, containers, and output paths are actively changing. Treat each recipe family as available only after its recipe directory and matching CLI files are present.
This is a public product skill, not contributor-only guidance. Its value over static docs is to make an agent route the user's retrieval failure to the right recipe family, reconcile docs with the current checkout, avoid accidental long-running launches, preserve secrets, and return concrete preview/execution/run-report commands.
Use it only for tasks tied to the public Nemotron
embed
or
rerank
recipe flow. If the request is unrelated retrieval theory, generic vector database selection, generic benchmark advice, or non-recipe Docker/Slurm/NIM troubleshooting, stop with a short scope note and do not inspect recipe files in that turn.
本技能可用于在源码检出目录或已安装包中处理公共Nemotron embedding和reranking检索配方。优先使用当前检出的源码而非内存中的内容,因为配方CLI、配置文件、容器和输出路径处于活跃更新状态。仅当配方目录及匹配的CLI文件存在时,才视为对应配方系列可用。
这是一款面向公众的产品技能,而非仅面向贡献者的指导文档。相较于静态文档,它的价值在于让Agent将用户的检索失败问题导向正确的配方系列,协调文档与当前检出源码的差异,避免意外启动长时间运行的任务,保护密钥,并返回具体的预览/执行/运行报告命令。
仅将其用于与公共Nemotron
embed
rerank
配方流程相关的任务。如果请求涉及无关的检索理论、通用向量数据库选择、通用基准测试建议,或非配方相关的Docker/Slurm/NIM故障排查,请简短说明适用范围后停止操作,且在该轮对话中不要检查配方文件。

Security Notes

安全说明

Use
Bash
for repo-scoped inspection, help, dry-run, and user-approved execution commands. Do not run API, GPU, Docker, Slurm, NIM, or other long-running work unless the user explicitly asks for it. Never run broad environment dumps or commands that expose secret values. Prefer dotlist overrides and config review over editing recipe defaults.
使用
Bash
执行仓库范围内的检查、帮助、试运行及用户批准的执行命令。除非用户明确要求,否则不要运行API、GPU、Docker、Slurm、NIM或其他长时间运行的任务。切勿运行会暴露密钥值的广泛环境转储或命令。优先使用点列表覆盖(dotlist overrides)和配置审查,而非修改配方默认值。

Source Priority

源码优先级

Resolve conflicts in this order:
  1. Current checkout recipe, CLI, config, and source files.
  2. Bundled references in this skill.
  3. User-provided docs or saved snippets.
  4. Memory.
For runnable commands, treat the current checkout as authoritative. If a required recipe directory, CLI command, config, or env profile is missing, report the blocker instead of guessing.
按以下顺序解决冲突:
  1. 当前检出的配方、CLI、配置及源码文件。
  2. 本技能中捆绑的参考资料。
  3. 用户提供的文档或保存的代码片段。
  4. 内存记录。
对于可运行命令,以当前检出的源码为准。如果所需的配方目录、CLI命令、配置或环境配置文件缺失,请报告阻塞问题,不要猜测。

Prerequisites

前置条件

  • Repo environment:
    uv sync --all-extras
    or the smallest relevant extra documented by the checkout.
  • Stage 0 SDG:
    NVIDIA_API_KEY
    ; never ask users to paste secret values.
  • Stage 1-4 GPU work: CUDA/NVIDIA driver availability and enough VRAM.
  • Stage 4 export: NeMo Export-Deploy container when using TensorRT.
  • Stage 5 deploy: Docker, NGC access, and
    NGC_API_KEY
    .
  • Remote execution: root
    env.toml
    profile for
    --run
    or
    --batch
    ; load
    references/remote.md
    when remote scheduling, logs, or GPU placement matter.
  • 仓库环境:
    uv sync --all-extras
    ,或检出文档中说明的最小相关扩展包。
  • Stage 0 SDG:需配置
    NVIDIA_API_KEY
    ;切勿要求用户粘贴密钥值。
  • Stage 1-4 GPU任务:需具备CUDA/NVIDIA驱动,且显存足够。
  • Stage 4导出:使用TensorRT时需NeMo Export-Deploy容器。
  • Stage 5部署:需Docker、NGC访问权限及
    NGC_API_KEY
  • 远程执行:使用
    --run
    --batch
    时需根目录下的
    env.toml
    配置文件;当涉及远程调度、日志或GPU部署位置时,加载
    references/remote.md

Instructions

操作步骤

  1. Identify the recipe family.
    • Use
      references/embed.md
      for embedding, embed, bi-encoder, vector search, first-stage retrieval, low Recall@k, missing relevant documents, NIM embeddings, or
      nemotron embed
      .
    • Use
      references/rerank.md
      for rerank, reranker, cross-encoder, second-stage retrieval, acceptable recall but poor top-rank ordering, low nDCG with good Recall, or
      nemotron rerank
      .
    • Use both references only when the user asks about both families or asks which family to choose.
  2. Choose the model to tune from the retrieval failure mode.
    • Prefer embedding fine-tuning when relevant documents are absent from the candidate set.
    • Prefer reranker fine-tuning when relevant documents are retrieved but ordered poorly near the top.
    • For production retrieval stacks, remember that these are complementary: embed first, rerank candidates second.
  3. Identify the intent: plan a run, execute a stage, debug a failure, tune hyperparameters, interpret metrics, export/deploy a model, inspect configs, or propose dotlist overrides.
  4. Inspect the current public surface before acting:
    • Recipe files:
      src/nemotron/recipes/<embed|rerank>/
    • CLI files:
      src/nemotron/cli/commands/<embed|rerank>/
    • Default configs:
      src/nemotron/recipes/<family>/stage*/config/default.yaml
    • Help and dry runs:
      uv run nemotron <family> --help
      ,
      uv run nemotron <family> <stage> -c default -d
  1. 确定配方系列:
    • 若涉及embedding、向量搜索、第一阶段检索、Recall@k较低、相关文档缺失、NIM embeddings或
      nemotron embed
      ,使用
      references/embed.md
    • 若涉及rerank、交叉编码器、第二阶段检索、召回率可接受但顶部排序较差、Recall良好但nDCG较低或
      nemotron rerank
      ,使用
      references/rerank.md
    • 仅当用户询问两个配方系列或询问应选择哪个系列时,才同时使用两个参考资料。
  2. 根据检索失败模式选择要调优的模型:
    • 当候选集中缺少相关文档时,优先选择embedding微调。
    • 当相关文档已被检索到但顶部排序不佳时,优先选择reranker微调。
    • 对于生产级检索栈,请记住两者是互补的:先进行embedding,再对候选结果进行rerank。
  3. 确定意图:规划运行、执行阶段、调试故障、调优超参数、解读指标、导出/部署模型、检查配置,或提出点列表覆盖方案。
  4. 操作前检查当前公开内容:
    • 配方文件:
      src/nemotron/recipes/<embed|rerank>/
    • CLI文件:
      src/nemotron/cli/commands/<embed|rerank>/
    • 默认配置:
      src/nemotron/recipes/<family>/stage*/config/default.yaml
    • 帮助信息与试运行:
      uv run nemotron <family> --help
      uv run nemotron <family> <stage> -c default -d

Safe Workflow

安全工作流

  1. Gather only context relevant to the task: corpus path, existing SDG/training/eval data, target stage range, output directory, checkpoint path, execution mode, GPU IDs, and whether required secrets are configured. Never ask users to paste secret values.
  2. Start with cheap checks before expensive work:
    • uv run nemotron <family> --help
    • uv run nemotron <family> <stage> --help
    • uv run nemotron <family> <stage> -c default -d
    • uv run nemotron <family> run -c default -d --from <stage> --to <stage>
    • run --help
      may omit inherited
      -c
      and
      -d
      options even though
      run -c default -d ...
      works; validate by running the dry-run when unsure.
    • In an already prepared checkout,
      uv run --no-sync ... --help
      or
      uv run --no-sync ... -d
      can avoid unexpected dependency sync during read-only checks.
  3. Check prerequisites for the requested stage:
    • Repo environment:
      uv sync --all-extras
      or the smallest relevant extra if documented by the repo.
    • Stage 0 SDG:
      NVIDIA_API_KEY
      .
    • Stage 1-4 GPU work: CUDA/NVIDIA driver availability and enough VRAM.
    • Stage 4 export: the NeMo Export-Deploy container when using TensorRT.
    • Stage 5 deploy: Docker, NGC access, and
      NGC_API_KEY
      .
    • Remote execution: root
      env.toml
      profile for
      --run
      or
      --batch
      ; load
      references/remote.md
      when remote scheduling, logs, or GPU placement matter.
  4. Use dotlist overrides instead of editing defaults unless the user asks for reusable config changes. Keep sequence length, prefixes, pooling/normalization, prompt templates, and hard-negative counts consistent across stages.
  5. Avoid launching API, GPU, Docker, Slurm, NIM, or long-running jobs unless the user explicitly asked to run them. Offer or run dry-runs, config review, and small pilots first.
  6. If the user specifies GPU IDs, scope every stage command with
    CUDA_VISIBLE_DEVICES=<ids>
    .
  7. For multi-stage local runs, prefer
    uv run nemotron <family> run -c default --from <stage> --to <stage>
    . The default
    run
    target stops at
    eval
    ;
    export
    and
    deploy
    are opt-in.
  8. When evaluating quality, compare against the base model on a fixed held-out evaluation set before recommending deployment. Do not substitute a standalone public-benchmark eval for the recipe's own Stage 3 evaluation.
  9. For long-running SDG, prep, finetune, or eval work, start the process in a session-safe way and poll at human-scale intervals: roughly 60 seconds for small pilots and 120-300 seconds for larger runs.
  10. For failures, load
    PITFALLS.md
    , localize the failing stage, then inspect the stage config, expected inputs, output directory, and corresponding CLI wrapper or
    run_uv.py
    .
  1. 仅收集与任务相关的上下文:语料库路径、现有SDG/训练/评估数据、目标阶段范围、输出目录、检查点路径、执行模式、GPU ID,以及所需密钥是否已配置。切勿要求用户粘贴密钥值。
  2. 在执行昂贵任务前先进行低成本检查:
    • uv run nemotron <family> --help
    • uv run nemotron <family> <stage> --help
    • uv run nemotron <family> <stage> -c default -d
    • uv run nemotron <family> run -c default -d --from <stage> --to <stage>
    • run --help
      可能会省略继承的
      -c
      -d
      选项,尽管
      run -c default -d ...
      是有效的;若不确定,可通过运行试运行命令验证。
    • 在已准备好的检出目录中,
      uv run --no-sync ... --help
      uv run --no-sync ... -d
      可避免在只读检查期间意外同步依赖项。
  3. 检查请求阶段的前置条件:
    • 仓库环境:
      uv sync --all-extras
      ,或仓库文档中说明的最小相关扩展包。
    • Stage 0 SDG:需配置
      NVIDIA_API_KEY
    • Stage 1-4 GPU任务:需具备CUDA/NVIDIA驱动,且显存足够。
    • Stage 4导出:使用TensorRT时需NeMo Export-Deploy容器。
    • Stage 5部署:需Docker、NGC访问权限及
      NGC_API_KEY
    • 远程执行:使用
      --run
      --batch
      时需根目录下的
      env.toml
      配置文件;当涉及远程调度、日志或GPU部署位置时,加载
      references/remote.md
  4. 除非用户要求进行可复用的配置更改,否则优先使用点列表覆盖而非编辑默认值。在各个阶段保持序列长度、前缀、池化/归一化、提示模板和硬负样本数量一致。
  5. 除非用户明确要求运行,否则避免启动API、GPU、Docker、Slurm、NIM或长时间运行的任务。先提供或运行试运行、配置审查和小型试点任务。
  6. 如果用户指定了GPU ID,在每个阶段命令中添加
    CUDA_VISIBLE_DEVICES=<ids>
    限定范围。
  7. 对于多阶段本地运行,优先使用
    uv run nemotron <family> run -c default --from <stage> --to <stage>
    。默认的
    run
    目标会在
    eval
    阶段停止;
    export
    deploy
    为可选阶段。
  8. 在评估质量时,在推荐部署前先在固定的预留评估集上与基础模型进行对比。不要用独立的公共基准测试替代配方自身的Stage 3评估。
  9. 对于长时间运行的SDG、预处理、微调或评估任务,以会话安全的方式启动进程,并按人类可接受的间隔轮询:小型试点任务约60秒,大型任务约120-300秒。
  10. 遇到故障时,先加载
    PITFALLS.md
    ,定位故障阶段,然后检查阶段配置、预期输入、输出目录及对应的CLI包装器或
    run_uv.py

References

参考资料

  • references/embed.md
    : embedding recipe stages, commands, defaults, output paths, and operating patterns.
  • references/rerank.md
    : rerank recipe stages, commands, defaults, output paths, and operating patterns.
  • references/evaluation.md
    : metric interpretation, comparison hygiene, and deployment readiness checks.
  • references/remote.md
    : remote execution profiles, batch/run mode, GPU scoping, logs, and polling.
  • PITFALLS.md
    : common failures and recovery moves for SDG, prep, training, eval, export, deploy, and CLI setup.
  • references/embed.md
    :embedding配方的阶段、命令、默认值、输出路径及操作模式。
  • references/rerank.md
    :rerank配方的阶段、命令、默认值、输出路径及操作模式。
  • references/evaluation.md
    :指标解读、对比规范及部署就绪检查。
  • references/remote.md
    :远程执行配置文件、批量/运行模式、GPU范围限定、日志及轮询。
  • PITFALLS.md
    :SDG、预处理、训练、评估、导出、部署及CLI设置中的常见故障与恢复方法。

Examples

示例

User asks: "Recall is decent, but nDCG is poor and the right passage is around rank 40. Should I tune embed or rerank?"
Load
references/rerank.md
and
references/evaluation.md
, explain that acceptable recall with poor top-rank ordering points to reranker tuning, then offer a cheap preview before training.
bash
uv run nemotron rerank run -c default -d --from prep --to eval
用户提问:“Recall表现不错,但nDCG很差,正确的段落排在第40位左右。我应该调优embed还是rerank?”
加载
references/rerank.md
references/evaluation.md
,解释召回率可接受但顶部排序不佳指向reranker调优,然后在训练前提供低成本预览命令。
bash
uv run nemotron rerank run -c default -d --from prep --to eval

Troubleshooting

故障排查

For failures, load
PITFALLS.md
first. Localize the failing stage, then inspect the stage config, expected inputs, output directory, and corresponding CLI wrapper or
run_uv.py
.
遇到故障时,先加载
PITFALLS.md
。定位故障阶段,然后检查阶段配置、预期输入、输出目录及对应的CLI包装器或
run_uv.py

Limitations

局限性

  • Bundled references are condensed snapshots; verify commands, flags, defaults, and output paths against the active checkout before execution.
  • This skill does not provide datasets, checkpoints, credentials, GPU capacity, Docker images, or NIM services.
  • 捆绑的参考资料是浓缩快照;执行前需对照活跃检出的源码验证命令、标志、默认值和输出路径。
  • 本技能不提供数据集、检查点、凭证、GPU容量、Docker镜像或NIM服务。

Output Style

输出风格

For planning or debugging recommendations, use this shape when it helps:
Decision
,
Why
,
Required inputs
,
Preview command
,
Execution command
,
Avoid
, and
Next step
. Omit fields that are irrelevant to a short answer.
Give concrete commands and file paths. State assumptions, expected inputs, expected outputs, and the cheapest validation step that proves the next action is ready. For long-running stages, separate preview commands from execution commands so the user can choose deliberately.
When reporting a dry-run or real run, include a compact run report: command, mode, config, dotlist overrides, input paths, output paths, validation signal or metric file, and next cheapest check. Include the checkout commit when it is available.
对于规划或调试建议,若有帮助可采用以下结构:
决策
原因
所需输入
预览命令
执行命令
注意事项
下一步
。省略与简短回答无关的字段。
提供具体的命令和文件路径。说明假设、预期输入、预期输出,以及能证明下一步操作就绪的最廉价验证步骤。对于长时间运行的阶段,将预览命令与执行命令分开,以便用户自主选择。
当报告试运行或实际运行时,包含简洁的运行报告:命令、模式、配置、点列表覆盖、输入路径、输出路径、验证信号或指标文件,以及下一个最廉价的检查步骤。如果可用,包含检出的提交哈希。