Loading...
Loading...
NVIDIA RAG Blueprint — deploy, configure, troubleshoot, and manage. Handles any RAG action: deploy, install, start, enable, disable, toggle, change, configure, troubleshoot, debug, fix, shutdown, stop, or tear down any RAG feature or service (VLM, guardrails, query rewriting, models, search, ingestion, observability, summarization, and more).
npx skill4agent add nvidia/skills rag-blueprint| User Intent | Action |
|---|---|
| Deploy, install, set up, start RAG | Read and follow |
| Configure, enable, change, toggle a feature | Use the Configure section below |
| Troubleshoot, debug, fix, error, unhealthy | Read and follow |
| Stop, shutdown, tear down, clean up | Read and follow |
references/deploy.md| Feature Keywords | Reference |
|---|---|
| VLM, VLM embeddings, image captioning | |
| NeMo Guardrails | |
| Query rewriting, decomposition, multi-turn | |
| Ingestion (text-only, audio, Nemotron Parse, OCR, batch CLI, NV-Ingest, volume mount, performance) | |
| Search, retrieval, hybrid search, multi-collection, metadata, filters, reranker, topK, accuracy/performance | |
| LLM/embedding/ranking model changes, vector DB, Milvus/Elasticsearch auth, service keys, model profiles, ports/GPU | |
| Reasoning, self-reflection, prompts, generation params (tokens, temperature, citations), per-request LLM params | |
| Summarization | |
| Observability (tracing, Zipkin, Grafana, Prometheus) | |
| Multimodal query (image + text) | |
| Data catalog (collection/document metadata) | |
| User interface (UI settings) | |
| API reference (endpoints, schemas) | |
| Evaluation (RAGAS metrics) | |
| MCP server & client, agent toolkit | |
| Migration (version upgrades) | |
| Notebooks (setup and catalog) | |
echo "=== NIM ===" && docker ps --format '{{.Names}}' 2>/dev/null | grep -iE '(nim-llm|nemoretriever-embedding|nemoretriever-ranking|nemo-vlm|nemotron-vlm)' || echo "NO_LOCAL_NIMS"; echo "=== RAG ===" && docker ps --format '{{.Names}}' 2>/dev/null | grep -iE '(rag-server|ingestor-server|milvus)' || echo "NO_DOCKER_RAG"; echo "=== K8S ===" && kubectl get pods -n rag 2>/dev/null | head -5 || echo "NO_K8S"; echo "=== LIBRARY ===" && ps aux 2>/dev/null | grep -E '(nvidia_rag|uvicorn.*rag)' | grep -v grep || echo "NO_LIBRARY"| Local NIMs running? | RAG services running? | Deployment Type | Config Location |
|---|---|---|---|
| Yes (Docker) | Any | Self-hosted | |
| No | Yes (Docker) | NVIDIA-hosted | |
| Yes (K8s pods) | Any | Self-hosted | |
| No | Yes (K8s pods) | NVIDIA-hosted | |
| — | Library processes | Library mode | |
| No | No | Not running | Deploy first via |
deploy/compose/.envdocker exec rag-server env 2>/dev/null | grep -E "<VAR_NAME>"kubectl get pod -n rag -l app=rag-server -o jsonpath='{.items[0].spec.containers[0].env}' 2>/dev/nullnvidia-smi --query-gpu=index,name,memory.total,memory.used --format=csv,noheader 2>/dev/null || echo "NO_GPU"source <env-file> && docker compose -f deploy/compose/<compose-file> up -d| Service | Compose File |
|---|---|
| rag-server | |
| ingestor-server | |
| milvus, etcd, minio | |
| NIM containers (LLM, embedding, ranking, VLM, OCR) | |
| guardrails | |
| observability (Grafana, Prometheus, Zipkin) | |
values.yamlhelm upgrade rag <chart> -n rag -f values.yamlnotebooks/config.yamldocker ps --format "table {{.Names}}\t{{.Status}}" | head -20; curl -s http://localhost:8081/v1/health?check_dependencies=true 2>/dev/null | head -1kubectl get pods -n rag; kubectl rollout status deployment/rag-server -n rag --timeout=120scurl -s http://localhost:8081/v1/health 2>/dev/null | head -1references/troubleshoot.mdgrep -E "^(export )?(ENABLE_|APP_)" <config-file> 2>/dev/null | sortdocs/support-matrix.mddocs/service-port-gpu-reference.md| GPU | Feature Restrictions |
|---|---|
| B200 | No VLM, No Guardrails, No Nemotron Parse. May need multi-GPU LLM ( |
| RTX PRO 6000 | No Nemotron Parse. No Audio on Helm. |