nemotron-retrieval-recipes

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Nemotron Retrieval Recipes

Invocation:

$nemotron-retrieval-recipes

调用方式：

$nemotron-retrieval-recipes

。

Purpose

用途

Use this skill to work with public Nemotron embedding and reranking retrieval recipes in a source checkout or installed package. Prefer the current checkout over memory, because the recipe CLI, configs, containers, and output paths are actively changing. Treat each recipe family as available only after its recipe directory and matching CLI files are present.

This is a public product skill, not contributor-only guidance. Its value over static docs is to make an agent route the user's retrieval failure to the right recipe family, reconcile docs with the current checkout, avoid accidental long-running launches, preserve secrets, and return concrete preview/execution/run-report commands.

Use it only for tasks tied to the public Nemotron

embed

rerank

recipe flow. If the request is unrelated retrieval theory, generic vector database selection, generic benchmark advice, or non-recipe Docker/Slurm/NIM troubleshooting, stop with a short scope note and do not inspect recipe files in that turn.

本技能可用于在源码检出目录或已安装包中处理公共Nemotron embedding和reranking检索配方。优先使用当前检出的源码而非内存中的内容，因为配方CLI、配置文件、容器和输出路径处于活跃更新状态。仅当配方目录及匹配的CLI文件存在时，才视为对应配方系列可用。

这是一款面向公众的产品技能，而非仅面向贡献者的指导文档。相较于静态文档，它的价值在于让Agent将用户的检索失败问题导向正确的配方系列，协调文档与当前检出源码的差异，避免意外启动长时间运行的任务，保护密钥，并返回具体的预览/执行/运行报告命令。

仅将其用于与公共Nemotron

embed

或

rerank

配方流程相关的任务。如果请求涉及无关的检索理论、通用向量数据库选择、通用基准测试建议，或非配方相关的Docker/Slurm/NIM故障排查，请简短说明适用范围后停止操作，且在该轮对话中不要检查配方文件。

Security Notes

安全说明

Use

Bash

for repo-scoped inspection, help, dry-run, and user-approved execution commands. Do not run API, GPU, Docker, Slurm, NIM, or other long-running work unless the user explicitly asks for it. Never run broad environment dumps or commands that expose secret values. Prefer dotlist overrides and config review over editing recipe defaults.

使用

Bash

执行仓库范围内的检查、帮助、试运行及用户批准的执行命令。除非用户明确要求，否则不要运行API、GPU、Docker、Slurm、NIM或其他长时间运行的任务。切勿运行会暴露密钥值的广泛环境转储或命令。优先使用点列表覆盖（dotlist overrides）和配置审查，而非修改配方默认值。

Source Priority

源码优先级

Resolve conflicts in this order:

Current checkout recipe, CLI, config, and source files.
Bundled references in this skill.
User-provided docs or saved snippets.
Memory.

For runnable commands, treat the current checkout as authoritative. If a required recipe directory, CLI command, config, or env profile is missing, report the blocker instead of guessing.

按以下顺序解决冲突：

当前检出的配方、CLI、配置及源码文件。
本技能中捆绑的参考资料。
用户提供的文档或保存的代码片段。
内存记录。

对于可运行命令，以当前检出的源码为准。如果所需的配方目录、CLI命令、配置或环境配置文件缺失，请报告阻塞问题，不要猜测。

Prerequisites

前置条件

Repo environment:
```
uv sync --all-extras
```
or the smallest relevant extra documented by the checkout.
Stage 0 SDG:
```
NVIDIA_API_KEY
```
; never ask users to paste secret values.
Stage 1-4 GPU work: CUDA/NVIDIA driver availability and enough VRAM.
Stage 4 export: NeMo Export-Deploy container when using TensorRT.
Stage 5 deploy: Docker, NGC access, and
```
NGC_API_KEY
```
.
Remote execution: root
```
env.toml
```
profile for
```
--run
```
or
```
--batch
```
; load
```
references/remote.md
```
when remote scheduling, logs, or GPU placement matter.

仓库环境：
```
uv sync --all-extras
```
，或检出文档中说明的最小相关扩展包。
Stage 0 SDG：需配置
```
NVIDIA_API_KEY
```
；切勿要求用户粘贴密钥值。
Stage 1-4 GPU任务：需具备CUDA/NVIDIA驱动，且显存足够。
Stage 4导出：使用TensorRT时需NeMo Export-Deploy容器。
Stage 5部署：需Docker、NGC访问权限及
```
NGC_API_KEY
```
。
远程执行：使用
```
--run
```
或
```
--batch
```
时需根目录下的
```
env.toml
```
配置文件；当涉及远程调度、日志或GPU部署位置时，加载
```
references/remote.md
```
。

Instructions

操作步骤

Identify the recipe family.
- Use
```
references/embed.md
```
  for embedding, embed, bi-encoder, vector search, first-stage retrieval, low Recall@k, missing relevant documents, NIM embeddings, or
```
nemotron embed
```
  .
- Use
```
references/rerank.md
```
  for rerank, reranker, cross-encoder, second-stage retrieval, acceptable recall but poor top-rank ordering, low nDCG with good Recall, or
```
nemotron rerank
```
  .
- Use both references only when the user asks about both families or asks which family to choose.
Choose the model to tune from the retrieval failure mode.
- Prefer embedding fine-tuning when relevant documents are absent from the candidate set.
- Prefer reranker fine-tuning when relevant documents are retrieved but ordered poorly near the top.
- For production retrieval stacks, remember that these are complementary: embed first, rerank candidates second.
Identify the intent: plan a run, execute a stage, debug a failure, tune hyperparameters, interpret metrics, export/deploy a model, inspect configs, or propose dotlist overrides.

Inspect the current public surface before acting:

Recipe files:
```
src/nemotron/recipes/<embed|rerank>/
```

CLI files:

src/nemotron/cli/commands/<embed|rerank>/

Default configs:

src/nemotron/recipes/<family>/stage*/config/default.yaml

Help and dry runs:

uv run nemotron <family> --help

uv run nemotron <family> <stage> -c default -d

确定配方系列：
- 若涉及embedding、向量搜索、第一阶段检索、Recall@k较低、相关文档缺失、NIM embeddings或
```
nemotron embed
```
  ，使用
```
references/embed.md
```
  。
- 若涉及rerank、交叉编码器、第二阶段检索、召回率可接受但顶部排序较差、Recall良好但nDCG较低或
```
nemotron rerank
```
  ，使用
```
references/rerank.md
```
  。
- 仅当用户询问两个配方系列或询问应选择哪个系列时，才同时使用两个参考资料。
根据检索失败模式选择要调优的模型：
- 当候选集中缺少相关文档时，优先选择embedding微调。
- 当相关文档已被检索到但顶部排序不佳时，优先选择reranker微调。
- 对于生产级检索栈，请记住两者是互补的：先进行embedding，再对候选结果进行rerank。
确定意图：规划运行、执行阶段、调试故障、调优超参数、解读指标、导出/部署模型、检查配置，或提出点列表覆盖方案。

操作前检查当前公开内容：

配方文件：
```
src/nemotron/recipes/<embed|rerank>/
```

CLI文件：

src/nemotron/cli/commands/<embed|rerank>/

默认配置：

src/nemotron/recipes/<family>/stage*/config/default.yaml

帮助信息与试运行：

uv run nemotron <family> --help

、

uv run nemotron <family> <stage> -c default -d

Safe Workflow

安全工作流

Gather only context relevant to the task: corpus path, existing SDG/training/eval data, target stage range, output directory, checkpoint path, execution mode, GPU IDs, and whether required secrets are configured. Never ask users to paste secret values.

Start with cheap checks before expensive work:

```
uv run nemotron <family> --help
```
```
uv run nemotron <family> <stage> --help
```

uv run nemotron <family> <stage> -c default -d

uv run nemotron <family> run -c default -d --from <stage> --to <stage>

```
run --help
```
may omit inherited
```
-c
```
and
```
-d
```
options even though
```
run -c default -d ...
```
works; validate by running the dry-run when unsure.
In an already prepared checkout,
```
uv run --no-sync ... --help
```
or
```
uv run --no-sync ... -d
```
can avoid unexpected dependency sync during read-only checks.

Check prerequisites for the requested stage:
- Repo environment:
```
uv sync --all-extras
```
  or the smallest relevant extra if documented by the repo.
- Stage 0 SDG:
```
NVIDIA_API_KEY
```
  .
- Stage 1-4 GPU work: CUDA/NVIDIA driver availability and enough VRAM.
- Stage 4 export: the NeMo Export-Deploy container when using TensorRT.
- Stage 5 deploy: Docker, NGC access, and
```
NGC_API_KEY
```
  .
- Remote execution: root
```
env.toml
```
  profile for
```
--run
```
  or
```
--batch
```
  ; load
```
references/remote.md
```
  when remote scheduling, logs, or GPU placement matter.
Use dotlist overrides instead of editing defaults unless the user asks for reusable config changes. Keep sequence length, prefixes, pooling/normalization, prompt templates, and hard-negative counts consistent across stages.
Avoid launching API, GPU, Docker, Slurm, NIM, or long-running jobs unless the user explicitly asked to run them. Offer or run dry-runs, config review, and small pilots first.
If the user specifies GPU IDs, scope every stage command with
```
CUDA_VISIBLE_DEVICES=<ids>
```
.

For multi-stage local runs, prefer

uv run nemotron <family> run -c default --from <stage> --to <stage>

. The default

run

target stops at

eval

;

export

and

deploy

are opt-in.

When evaluating quality, compare against the base model on a fixed held-out evaluation set before recommending deployment. Do not substitute a standalone public-benchmark eval for the recipe's own Stage 3 evaluation.
For long-running SDG, prep, finetune, or eval work, start the process in a session-safe way and poll at human-scale intervals: roughly 60 seconds for small pilots and 120-300 seconds for larger runs.
For failures, load
```
PITFALLS.md
```
, localize the failing stage, then inspect the stage config, expected inputs, output directory, and corresponding CLI wrapper or
```
run_uv.py
```
.

仅收集与任务相关的上下文：语料库路径、现有SDG/训练/评估数据、目标阶段范围、输出目录、检查点路径、执行模式、GPU ID，以及所需密钥是否已配置。切勿要求用户粘贴密钥值。

在执行昂贵任务前先进行低成本检查：

```
uv run nemotron <family> --help
```
```
uv run nemotron <family> <stage> --help
```

uv run nemotron <family> <stage> -c default -d

uv run nemotron <family> run -c default -d --from <stage> --to <stage>

```
run --help
```
可能会省略继承的
```
-c
```
和
```
-d
```
选项，尽管
```
run -c default -d ...
```
是有效的；若不确定，可通过运行试运行命令验证。
在已准备好的检出目录中，
```
uv run --no-sync ... --help
```
或
```
uv run --no-sync ... -d
```
可避免在只读检查期间意外同步依赖项。

检查请求阶段的前置条件：
- 仓库环境：
```
uv sync --all-extras
```
  ，或仓库文档中说明的最小相关扩展包。
- Stage 0 SDG：需配置
```
NVIDIA_API_KEY
```
  。
- Stage 1-4 GPU任务：需具备CUDA/NVIDIA驱动，且显存足够。
- Stage 4导出：使用TensorRT时需NeMo Export-Deploy容器。
- Stage 5部署：需Docker、NGC访问权限及
```
NGC_API_KEY
```
  。
- 远程执行：使用
```
--run
```
  或
```
--batch
```
  时需根目录下的
```
env.toml
```
  配置文件；当涉及远程调度、日志或GPU部署位置时，加载
```
references/remote.md
```
  。
除非用户要求进行可复用的配置更改，否则优先使用点列表覆盖而非编辑默认值。在各个阶段保持序列长度、前缀、池化/归一化、提示模板和硬负样本数量一致。
除非用户明确要求运行，否则避免启动API、GPU、Docker、Slurm、NIM或长时间运行的任务。先提供或运行试运行、配置审查和小型试点任务。
如果用户指定了GPU ID，在每个阶段命令中添加
```
CUDA_VISIBLE_DEVICES=<ids>
```
限定范围。
对于多阶段本地运行，优先使用
```
uv run nemotron <family> run -c default --from <stage> --to <stage>
```
。默认的
```
run
```
目标会在
```
eval
```
阶段停止；
```
export
```
和
```
deploy
```
为可选阶段。
在评估质量时，在推荐部署前先在固定的预留评估集上与基础模型进行对比。不要用独立的公共基准测试替代配方自身的Stage 3评估。
对于长时间运行的SDG、预处理、微调或评估任务，以会话安全的方式启动进程，并按人类可接受的间隔轮询：小型试点任务约60秒，大型任务约120-300秒。
遇到故障时，先加载
```
PITFALLS.md
```
，定位故障阶段，然后检查阶段配置、预期输入、输出目录及对应的CLI包装器或
```
run_uv.py
```
。

References

参考资料

```
references/embed.md
```
: embedding recipe stages, commands, defaults, output paths, and operating patterns.
```
references/rerank.md
```
: rerank recipe stages, commands, defaults, output paths, and operating patterns.
```
references/evaluation.md
```
: metric interpretation, comparison hygiene, and deployment readiness checks.
```
references/remote.md
```
: remote execution profiles, batch/run mode, GPU scoping, logs, and polling.
```
PITFALLS.md
```
: common failures and recovery moves for SDG, prep, training, eval, export, deploy, and CLI setup.

```
references/embed.md
```
：embedding配方的阶段、命令、默认值、输出路径及操作模式。
```
references/rerank.md
```
：rerank配方的阶段、命令、默认值、输出路径及操作模式。
```
references/evaluation.md
```
：指标解读、对比规范及部署就绪检查。
```
references/remote.md
```
：远程执行配置文件、批量/运行模式、GPU范围限定、日志及轮询。
```
PITFALLS.md
```
：SDG、预处理、训练、评估、导出、部署及CLI设置中的常见故障与恢复方法。

Examples

示例

User asks: "Recall is decent, but nDCG is poor and the right passage is around rank 40. Should I tune embed or rerank?"

Load

references/rerank.md

and

references/evaluation.md

, explain that acceptable recall with poor top-rank ordering points to reranker tuning, then offer a cheap preview before training.

bash

uv run nemotron rerank run -c default -d --from prep --to eval

用户提问：“Recall表现不错，但nDCG很差，正确的段落排在第40位左右。我应该调优embed还是rerank？”

加载

references/rerank.md

和

references/evaluation.md

，解释召回率可接受但顶部排序不佳指向reranker调优，然后在训练前提供低成本预览命令。

bash

uv run nemotron rerank run -c default -d --from prep --to eval

Troubleshooting

故障排查

For failures, load

PITFALLS.md

first. Localize the failing stage, then inspect the stage config, expected inputs, output directory, and corresponding CLI wrapper or

run_uv.py

遇到故障时，先加载

PITFALLS.md

。定位故障阶段，然后检查阶段配置、预期输入、输出目录及对应的CLI包装器或

run_uv.py

。

Limitations

局限性

Bundled references are condensed snapshots; verify commands, flags, defaults, and output paths against the active checkout before execution.
This skill does not provide datasets, checkpoints, credentials, GPU capacity, Docker images, or NIM services.

捆绑的参考资料是浓缩快照；执行前需对照活跃检出的源码验证命令、标志、默认值和输出路径。
本技能不提供数据集、检查点、凭证、GPU容量、Docker镜像或NIM服务。

Output Style

输出风格

For planning or debugging recommendations, use this shape when it helps:

Decision

Why

Required inputs

Preview command

Execution command

Avoid

, and

Next step

. Omit fields that are irrelevant to a short answer.

Give concrete commands and file paths. State assumptions, expected inputs, expected outputs, and the cheapest validation step that proves the next action is ready. For long-running stages, separate preview commands from execution commands so the user can choose deliberately.

When reporting a dry-run or real run, include a compact run report: command, mode, config, dotlist overrides, input paths, output paths, validation signal or metric file, and next cheapest check. Include the checkout commit when it is available.

对于规划或调试建议，若有帮助可采用以下结构：

决策

、

原因

、

所需输入

、

预览命令

、

执行命令

、

注意事项

、

下一步

。省略与简短回答无关的字段。

提供具体的命令和文件路径。说明假设、预期输入、预期输出，以及能证明下一步操作就绪的最廉价验证步骤。对于长时间运行的阶段，将预览命令与执行命令分开，以便用户自主选择。

当报告试运行或实际运行时，包含简洁的运行报告：命令、模式、配置、点列表覆盖、输入路径、输出路径、验证信号或指标文件，以及下一个最廉价的检查步骤。如果可用，包含检出的提交哈希。