Loading...
Loading...
Deploy, debug, or tear down any VSS profile using a compose-centric workflow — config (dry-run) with env overrides, review resolved compose, then compose up. Use this skill when the user says "deploy vss", "deploy `profile`", "debug deploy", "verify deployment", or "why is my vss deploy broken".
npx skill4agent add nvidia/skills deploydev-profile.sh| User says | Profile | Reference |
|---|---|---|
| "deploy vss" / "deploy base" | | |
| "deploy alerts" / "alert verification" / "real-time alerts" | | |
| "deploy for incident report" | | |
| "deploy lvs" / "video summarization" | | |
| "deploy search" / "video search" | | |
references/edge.mdconfig_edge.yml# 1. Apply env overrides to the profile .env file
# 2. docker compose --env-file .env config > resolved.yml (dry-run)
# 3. Review resolved.yml
# 4. docker compose -f resolved.yml up -dvideo-search-and-summarization/TOOLS.mdreferences/ngc.md$NGC_CLI_API_KEY# 1. GPU visible
nvidia-smi --query-gpu=index,name --format=csv,noheader
# 2. NVIDIA runtime in Docker
docker info 2>/dev/null | grep -i "runtimes"
# 3. NVIDIA runtime works end-to-end
docker run --rm --gpus all ubuntu:22.04 nvidia-smi 2>&1 | head -5references/prerequisites.mdreferences/teardown.md
If this is the host's first deploy, the `docker compose down`
line is a no-op (exit 0 with no containers to stop) — safe to run
unconditionally.
### Step 1 — Gather context
Discover what's available on the host and cross-reference with the
[VSS prerequisites page](https://docs.nvidia.com/vss/3.1.0/prerequisites.html)
to choose a deployment shape that fits.
| Value | How to determine |
|---|---|
| **Profile** | Match user intent to routing table above. Default: `base` |
| **Repo path** | Find `video-search-and-summarization/` on disk |
| **Hardware** | `nvidia-smi --query-gpu=name,memory.total --format=csv,noheader` → look up per-GPU VRAM against the prerequisites page |
| **LLM/VLM placement** | Pick `local_shared`, `local`, or `remote` per LLM/VLM based on available GPUs + `$LLM_REMOTE_URL` / `$VLM_REMOTE_URL` / `$NGC_CLI_API_KEY`. If no combination on this host satisfies the prerequisites, stop and report the blocker instead of silently picking another shape. |
| **API keys** | `NGC_CLI_API_KEY` for local NIMs, `NVIDIA_API_KEY` for remote |
| **Host IP** | `hostname -I \| awk '{print $1}'` |
**Hardware profile mapping:**
| GPU name contains | HARDWARE_PROFILE | Recommended LLM path |
|---|---|---|
| H100 | `H100` | Nano 9B v2 (NIM) |
| L40S | `L40S` | Nano 9B v2 (NIM) |
| RTX 6000 Ada, RTX PRO 6000 | `RTXPRO6000BW` | Nano 9B v2 (NIM) |
| GB10 (DGX Spark) | `DGX-SPARK` | **Edge 4B** (vLLM) — see [`references/edge.md`](references/edge.md) |
| IGX | `IGX-THOR` | **Edge 4B** (vLLM) — see [`references/edge.md`](references/edge.md) |
| AGX | `AGX-THOR` | **Edge 4B** (vLLM) — see [`references/edge.md`](references/edge.md) |
| Other | `OTHER` | — |
**Minimum GPU count per (profile × mode × platform).** Canonical source
is the [VSS prerequisites page](https://docs.nvidia.com/vss/3.1.0/prerequisites.html);
reproduced here so the skill can fail fast when the host is too small:
| Profile | Mode | H100 / RTX PRO 6000 (Blackwell) | L40S | DGX-Spark / IGX-Thor / AGX-Thor |
|---|---|---|---|---|
| `base` | shared (`local_shared` LLM + VLM) | **1** | — (48 GB/GPU too small) | **1** (Edge 4B + VLM, unified memory) |
| `base` | dedicated (`local` LLM + VLM) | **2** | **2** | — |
| `base` | `remote-llm` | **1** (VLM local) | **1** (VLM local) | **1** (remote LLM only) |
| `base` | `remote-vlm` | **1** (LLM local) | **1** (LLM local) | — |
| `base` | `remote-all` | **0** | **0** | **0** |
| `lvs` | shared | **1** | — | - |
| `lvs` | dedicated | **2** | **2** | — |
| `lvs` | `remote-llm/vlm` | 1 | 1 | - |
| `lvs` | `remote-all` | 0 | 0 | - |
| `alerts` (verification / CV) | shared | **2** | — | — |
| `alerts` (verification / CV) | dedicated | **3** | **3** | — |
| `alerts` (verification / CV) | `remote-all` | 1 | 1 | 1 |
| `alerts` (verification / CV) | `remote-llm/vlm` | 2 | 2 | 1 |
| `alerts` (real-time / VLM) | shared | **2** | — | — |
| `alerts` (real-time / VLM) | dedicated | **3** | **3** | — |
| `alerts` (real-time / VLM) | `remote-llm` | 2 | 2 | 1 |
| `search` | shared | **2** | — | - |
| `search` | dedicated | **3** | **3** | — |
| `search` | `remote-*` | **2** | **2** | - |
A few hard rules encoded in the table:
- **L40S can't do `shared`.** 48 GB is not enough VRAM for LLM + VLM
on a single GPU. Fall back to `dedicated` or a `remote-*` mode.
- **L40S needs +1 GPU for alerts / search vs H100** because the
shared-on-one-GPU trick doesn't work — RT-CV / Embed1 must take
their own GPU, and LLM+VLM still need a second.
- **DGX-Spark / Thor are early-access for most profiles.** Only
`base` + `lvs` are expected to fully land locally; `alerts` /
`search` currently require a remote LLM. See
[`references/edge.md`](references/edge.md).
If the host's (GPU count × VRAM) combination doesn't appear above,
**stop and report the blocker** — don't silently pick a different
mode.
> **Edge shared mode requires Edge 4B + `HF_TOKEN`.** On DGX Spark and AGX/IGX
> Thor, both LLM and VLM must fit in unified memory, AND the standard
> `nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2:1` image has a broken arm64
> manifest. You must run `NVIDIA-Nemotron-Edge-4B-v2.1-EA-020126_FP8` as a
> standalone vLLM container on port 30081 with the agent pointed at it via
> `--use-remote-llm`. Full recipe and the mandatory `HF_TOKEN` verification
> step are in [`references/edge.md`](references/edge.md).
### Step 1b — Prepare the data directory
The data directory layout (asset paths, ownership, mount points, profile-specific subdirs) is documented in [`references/data-directory.md`](references/data-directory.md). Read that file before deploying for the first time on a host or when changing profiles.
# Profile-specific subdirs:
# alerts → mkdir -p "$DATA/data_log/vss_video_analytics_api" "$DATA/videos/dev-profile-alerts" "$DATA/models/rtdetr-its" "$DATA/models/gdino"
# search → mkdir -p "$DATA/models"
chmod -R 777 "$DATA/data_log" "$DATA/agent_eval"
# If you created $DATA/models above, also: chmod -R 777 "$DATA/models"FORBIDDEN:(or any recursive chown).chown -R ubuntu:ubuntu $MDX_DATA_DIRThis is "good housekeeping" to a shell-admin instinct but is the deploy- breaking command in this stack. You will observe a "healthy" deploy (containers Up, endpoints 200) while the video pipeline is silently broken. Useon the specific subdirs above — nothing else.chmod -R 777
$DATA| Container | Image | Runs as | Mount path | Symptom if permissions wrong |
|---|---|---|---|---|
| postgres:17.6-alpine | uid 70 | | Can't read own PGDATA → VST |
| redis:8.2.2-alpine | uid 999 | | "Can't open the log file: Permission denied" → redis dies → |
| elasticsearch | uid 1000 | | "AccessDeniedException" on startup → ES refuses to start |
| vst | uid 1000 | | 403 on ingest or stream write |
chmod -R 777 $DATA/data_logdata-dirsudo rm -rf "$DATA/data_log/vst/postgres" # postgres re-initializes on next start
docker restart centralizedb-devBREV_ENV_IDreferences/brev.mdenv_overridesreferences/env-overrides.md<repo>/deployments/developer-workflow/dev-profile-<profile>/.envThis is the authoritative. Every verifier, healthcheck, and post-deploy tool reads from this path. When you apply env overrides (from Step 2 or from the user's prompt), write them directly to this file — not to.env.generated.envis a scratchpad thatgenerated.envproduces during its own internal flow; it is NOT read by the verifier and is wiped on the next invocation. An agent that usesdev-profile.shas a one-shot deploy but leaves the basedev-profile.shuntouched will silently fail env checks even when the stack comes up cleanly. If you used.envand seedev-profile.shon disk, copy its key/value lines back into the basegenerated.env, or re-apply your.envcommands against the basesedafter the fact. The base.envis the source of truth..env
REPO=/path/to/video-search-and-summarization
PROFILE=base
ENV_FILE=$REPO/deployments/developer-workflow/dev-profile-$PROFILE/.env
# Read current .env, apply overrides, write back
# (read lines, update matching keys, append new keys, write)
# Resolve compose
cd $REPO/deployments
docker compose --env-file $ENV_FILE config > resolved.yml<repo>/deployments/resolved.yml${VAR}resolved.ymlreferences/troubleshooting.mdcd $REPO/deployments
docker compose -f resolved.yml up -dDo NOT useon retries. It destroys already-warm NIM containers, forcing another 3–5 min torch.compile + CUDA-graph capture per NIM. If the previous--force-recreatepartially failed, fix the root cause (usually perms or an env typo) and just re-runup -d— Docker will re-create only the containers whose config changed or that are down.up -d
# Container status
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
# Logs for a specific service
docker compose -f $REPO/deployments/resolved.yml logs --tail 50 <service>mdx-*Up| Profile | Agent UI | REST API | Other |
|---|---|---|---|
| base | | | — |
| alerts | | | VIOS dashboard |
| lvs | | | — |
| search | | | — |
cd $REPO/deployments
docker compose -f resolved.yml downreferences/base.md# 1. All expected containers Up
docker ps --format 'table {{.Names}}\t{{.Status}}'
# 2. Agent API + UI responding
curl -sf http://localhost:8000/docs >/dev/null && echo "agent OK"
curl -sf http://localhost:3000/ >/dev/null && echo "ui OK"
# 3. VLM NIM responding (base/lvs profiles)
curl -sf http://localhost:30082/v1/models | python3 -m json.tool
# 4. LLM NIM responding
curl -sf http://localhost:30081/v1/models | python3 -m json.tooldocker logs vss-agentunknown or invalid runtime name: nvidiareferences/prerequisites.mdNGC_CLI_API_KEYreferences/ngc.mdsudo modprobe nvidia && sudo modprobe nvidia_uvmdocker compose updocker compose config > resolved.yml