rag-blueprint
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNVIDIA RAG Blueprint
NVIDIA RAG Blueprint
Autonomy Principles
自主运行原则
- Auto-detect everything: GPU, VRAM, drivers, Docker, CUDA, disk, OS, ports, existing services, NGC key, repo state.
- If it can be checked with a command, check it — don't ask the user.
- Ask only when user action is required: providing an API key, confirming data deletion, or choosing between equally valid options.
- Once analysis is done, route to the correct workflow and execute.
- 自动检测所有内容:GPU、VRAM、驱动程序、Docker、CUDA、磁盘、操作系统、端口、现有服务、NGC密钥、仓库状态。
- 任何可通过命令检查的内容都自动检查——无需询问用户。
- 仅在需要用户操作时询问:提供API密钥、确认数据删除,或在多个等效选项中选择。
- 分析完成后,路由至正确工作流并执行。
Intent Detection
意图检测
Determine what the user wants and route immediately:
| User Intent | Action |
|---|---|
| Deploy, install, set up, start RAG | Read and follow |
| Configure, enable, change, toggle a feature | Use the Configure section below |
| Troubleshoot, debug, fix, error, unhealthy | Read and follow |
| Stop, shutdown, tear down, clean up | Read and follow |
If the intent is ambiguous, infer from context (e.g., "RAG isn't working" → troubleshoot; "get RAG running" → deploy). Only ask if genuinely unclear.
确定用户需求并立即路由:
| 用户意图 | 操作 |
|---|---|
| 部署、安装、搭建、启动RAG | 阅读并遵循 |
| 配置、启用、修改、切换功能 | 使用下方的配置章节 |
| 故障排查、调试、修复、错误、异常状态 | 阅读并遵循 |
| 停止、关闭、拆除、清理 | 阅读并遵循 |
若意图模糊,可从上下文推断(例如:"RAG无法运行" → 故障排查;"让RAG运行起来" → 部署)。仅在确实无法明确时才询问用户。
Configure
配置
Requires a running RAG deployment. If services are not running, deploy first via .
references/deploy.mdMatch the user's request to a reference file, then read and follow it:
| Feature Keywords | Reference |
|---|---|
| VLM, VLM embeddings, image captioning | |
| NeMo Guardrails | |
| Query rewriting, decomposition, multi-turn | |
| Ingestion (text-only, audio, Nemotron Parse, OCR, batch CLI, NV-Ingest, volume mount, performance) | |
| Search, retrieval, hybrid search, multi-collection, metadata, filters, reranker, topK, accuracy/performance | |
| LLM/embedding/ranking model changes, vector DB, Milvus/Elasticsearch auth, service keys, model profiles, ports/GPU | |
| Reasoning, self-reflection, prompts, generation params (tokens, temperature, citations), per-request LLM params | |
| Summarization | |
| Observability (tracing, Zipkin, Grafana, Prometheus) | |
| Multimodal query (image + text) | |
| Data catalog (collection/document metadata) | |
| User interface (UI settings) | |
| API reference (endpoints, schemas) | |
| Evaluation (RAGAS metrics) | |
| MCP server & client, agent toolkit | |
| Migration (version upgrades) | |
| Notebooks (setup and catalog) | |
需要已运行的RAG部署。若服务未运行,请先通过进行部署。
references/deploy.md将用户请求匹配至参考文件,然后阅读并遵循该文件:
| 功能关键词 | 参考文件 |
|---|---|
| VLM、VLM嵌入、图像字幕生成 | |
| NeMo Guardrails | |
| 查询重写、分解、多轮对话 | |
| 数据摄入(纯文本、音频、Nemotron Parse、OCR、批量CLI、NV-Ingest、卷挂载、性能) | |
| 搜索、检索、混合搜索、多集合、元数据、过滤器、重排序器、topK、准确性/性能 | |
| LLM/嵌入/排序模型变更、向量数据库、Milvus/Elasticsearch认证、服务密钥、模型配置文件、端口/GPU | |
| 推理、自我反思、提示词、生成参数(令牌、温度、引用)、单请求LLM参数 | |
| 摘要生成 | |
| 可观测性(追踪、Zipkin、Grafana、Prometheus) | |
| 多模态查询(图像+文本) | |
| 数据目录(集合/文档元数据) | |
| 用户界面(UI设置) | |
| API参考(端点、模式) | |
| 评估(RAGAS指标) | |
| MCP服务器与客户端、Agent工具包 | |
| 迁移(版本升级) | |
| 笔记本(设置与目录) | |
Configure Flow
配置流程
-
Match the user's request to a reference file from the table above.
-
Detect what's running:bash
echo "=== NIM ===" && docker ps --format '{{.Names}}' 2>/dev/null | grep -iE '(nim-llm|nemoretriever-embedding|nemoretriever-ranking|nemo-vlm|nemotron-vlm)' || echo "NO_LOCAL_NIMS"; echo "=== RAG ===" && docker ps --format '{{.Names}}' 2>/dev/null | grep -iE '(rag-server|ingestor-server|milvus)' || echo "NO_DOCKER_RAG"; echo "=== K8S ===" && kubectl get pods -n rag 2>/dev/null | head -5 || echo "NO_K8S"; echo "=== LIBRARY ===" && ps aux 2>/dev/null | grep -E '(nvidia_rag|uvicorn.*rag)' | grep -v grep || echo "NO_LIBRARY" -
Use this table to determine platform, deployment type, and where config lives:
Local NIMs running? RAG services running? Deployment Type Config Location Yes (Docker) Any Self-hosted deploy/compose/.envNo Yes (Docker) NVIDIA-hosted deploy/compose/nvdev.envYes (K8s pods) Any Self-hosted (NIM sections)values.yamlNo Yes (K8s pods) NVIDIA-hosted (envVars)values.yaml— Library processes Library mode notebooks/config.yamlNo No Not running Deploy first via references/deploy.mdTell the user what you detected and ask to confirm. Example: "I see local NIM containers running (nim-llm-ms, nemoretriever-embedding-ms) — this is a self-hosted deployment. Config file is. Correct?"deploy/compose/.env -
Check current feature state before changing anything — read the config location from step 3, then cross-check the live service:
- Docker:
docker exec rag-server env 2>/dev/null | grep -E "<VAR_NAME>" - Helm:
kubectl get pod -n rag -l app=rag-server -o jsonpath='{.items[0].spec.containers[0].env}' 2>/dev/null
If the config file and live service disagree, tell the user the service has stale config and will need a restart. - Docker:
-
If the feature needs extra GPUs, check availability against hardware restrictions (see below):bash
nvidia-smi --query-gpu=index,name,memory.total,memory.used --format=csv,noheader 2>/dev/null || echo "NO_GPU" -
Read the reference file and apply changes:
- Docker: edit the env file (uncomment to enable, re-comment to disable — the env file is the source of truth). Then restart the affected service:
source <env-file> && docker compose -f deploy/compose/<compose-file> up -dService Compose File rag-server docker-compose-rag-server.yamlingestor-server docker-compose-ingestor-server.yamlmilvus, etcd, minio vectordb.yamlNIM containers (LLM, embedding, ranking, VLM, OCR) nims.yamlguardrails docker-compose-nemo-guardrails.yamlobservability (Grafana, Prometheus, Zipkin) observability.yaml - Helm: edit , then upgrade:
values.yamlhelm upgrade rag <chart> -n rag -f values.yaml - Library: edit , then restart the Python process
notebooks/config.yaml
- Docker: edit the env file (uncomment to enable, re-comment to disable — the env file is the source of truth). Then restart the affected service:
-
Verify:
- Docker:
docker ps --format "table {{.Names}}\t{{.Status}}" | head -20; curl -s http://localhost:8081/v1/health?check_dependencies=true 2>/dev/null | head -1 - Helm:
kubectl get pods -n rag; kubectl rollout status deployment/rag-server -n rag --timeout=120s - Library:
curl -s http://localhost:8081/v1/health 2>/dev/null | head -1
- Docker:
-
If restart fails, read. If multiple features requested, repeat from step 1 for each.
references/troubleshoot.md
-
将用户请求匹配至上方表格中的参考文件。
-
检测当前运行的服务:bash
echo "=== NIM ===" && docker ps --format '{{.Names}}' 2>/dev/null | grep -iE '(nim-llm|nemoretriever-embedding|nemoretriever-ranking|nemo-vlm|nemotron-vlm)' || echo "NO_LOCAL_NIMS"; echo "=== RAG ===" && docker ps --format '{{.Names}}' 2>/dev/null | grep -iE '(rag-server|ingestor-server|milvus)' || echo "NO_DOCKER_RAG"; echo "=== K8S ===" && kubectl get pods -n rag 2>/dev/null | head -5 || echo "NO_K8S"; echo "=== LIBRARY ===" && ps aux 2>/dev/null | grep -E '(nvidia_rag|uvicorn.*rag)' | grep -v grep || echo "NO_LIBRARY" -
使用以下表格确定平台、部署类型及配置文件位置:
是否运行本地NIM? 是否运行RAG服务? 部署类型 配置位置 是(Docker) 任意 自托管 deploy/compose/.env否 是(Docker) NVIDIA托管 deploy/compose/nvdev.env是(K8s Pod) 任意 自托管 (NIM章节)values.yaml否 是(K8s Pod) NVIDIA托管 (envVars)values.yaml— 库进程 库模式 notebooks/config.yaml否 否 未运行 先通过 部署references/deploy.md告知用户检测结果并请求确认。示例:"我检测到本地NIM容器正在运行(nim-llm-ms, nemoretriever-embedding-ms)——这是自托管部署。配置文件为。是否正确?"deploy/compose/.env -
在进行任何变更前检查当前功能状态——读取步骤3中的配置位置,然后交叉检查运行中的服务:
- Docker:
docker exec rag-server env 2>/dev/null | grep -E "<VAR_NAME>" - Helm:
kubectl get pod -n rag -l app=rag-server -o jsonpath='{.items[0].spec.containers[0].env}' 2>/dev/null
若配置文件与运行中的服务不一致,告知用户服务配置已过期,需要重启。 - Docker:
-
若功能需要额外GPU,对照硬件限制检查可用性(见下文):bash
nvidia-smi --query-gpu=index,name,memory.total,memory.used --format=csv,noheader 2>/dev/null || echo "NO_GPU" -
阅读参考文件并应用变更:
- Docker:编辑环境文件(取消注释以启用,重新注释以禁用——环境文件为可信数据源)。然后重启受影响的服务:
source <env-file> && docker compose -f deploy/compose/<compose-file> up -d服务 Compose文件 rag-server docker-compose-rag-server.yamlingestor-server docker-compose-ingestor-server.yamlmilvus, etcd, minio vectordb.yamlNIM容器(LLM、嵌入、排序、VLM、OCR) nims.yamlguardrails docker-compose-nemo-guardrails.yaml可观测性(Grafana、Prometheus、Zipkin) observability.yaml - Helm:编辑,然后执行升级:
values.yamlhelm upgrade rag <chart> -n rag -f values.yaml - 库模式:编辑,然后重启Python进程
notebooks/config.yaml
- Docker:编辑环境文件(取消注释以启用,重新注释以禁用——环境文件为可信数据源)。然后重启受影响的服务:
-
验证:
- Docker:
docker ps --format "table {{.Names}}\t{{.Status}}" | head -20; curl -s http://localhost:8081/v1/health?check_dependencies=true 2>/dev/null | head -1 - Helm:
kubectl get pods -n rag; kubectl rollout status deployment/rag-server -n rag --timeout=120s - 库模式:
curl -s http://localhost:8081/v1/health 2>/dev/null | head -1
- Docker:
-
若重启失败,阅读。若用户请求多个功能,针对每个功能重复步骤1。
references/troubleshoot.md
When User Says "Configure" Without Specifics
当用户仅说“配置”未指定具体内容时
Run steps 2–3 above, then read the identified config file to list what's currently enabled:
bash
grep -E "^(export )?(ENABLE_|APP_)" <config-file> 2>/dev/null | sortSummarize what's running and enabled, then ask which feature to change.
执行上述步骤2–3,然后读取识别到的配置文件以列出当前已启用的功能:
bash
grep -E "^(export )?(ENABLE_|APP_)" <config-file> 2>/dev/null | sort汇总当前运行和已启用的功能,然后询问用户要修改哪个功能。
Hardware Restrictions
硬件限制
Read for current GPU requirements per deployment mode.
Read for port mappings and GPU assignments.
docs/support-matrix.mddocs/service-port-gpu-reference.md| GPU | Feature Restrictions |
|---|---|
| B200 | No VLM, No Guardrails, No Nemotron Parse. May need multi-GPU LLM ( |
| RTX PRO 6000 | No Nemotron Parse. No Audio on Helm. |
阅读获取各部署模式的当前GPU要求。
阅读获取端口映射和GPU分配信息。
docs/support-matrix.mddocs/service-port-gpu-reference.md| GPU | 功能限制 |
|---|---|
| B200 | 不支持VLM、不支持防护机制、不支持Nemotron Parse。可能需要多GPU LLM( |
| RTX PRO 6000 | 不支持Nemotron Parse。Helm部署不支持音频功能。 |