VSS Deploy

Deploy any VSS profile using a compose-centric workflow: build env overrides, generate resolved compose (dry-run), review, then deploy. Replaces direct

dev-profile.sh

execution with validated, auditable steps.

Profile Routing

User says	Profile	Reference
"deploy vss" / "deploy base"	`base`	`references/base.md`
"deploy alerts" / "alert verification" / "real-time alerts"	`alerts`	`references/alerts.md`
"deploy for incident report"	`alerts`	`references/alerts.md`
"deploy lvs" / "video summarization"	`lvs`	`references/lvs.md`
"deploy search" / "video search"	`search`	`references/search.md`

Edge hardware routing (DGX Spark, AGX/IGX Thor): see

references/edge.md

for the 4B-LLM recipe (

config_edge.yml

+ standalone vLLM on port 30081). Edge platforms share a single unified-memory GPU between LLM and VLM, so the Nemotron Edge 4B is the default and the Nemotron Nano 9B v2 FP8 is an option when memory allows.

When to Use

Deploy VSS / start VSS / bring up a profile
Deploy a specific profile (base, alerts, lvs, search)
Do a dry-run / preview what will be deployed
Change deployment config (hardware, LLM mode, GPU assignment)
Tear down a running deployment
Debug or verify an existing deployment (see Debugging a Deployment)

How it works

Run docker compose commands directly on the host:

bash

# 1. Apply env overrides to the profile .env file
# 2. docker compose --env-file .env config > resolved.yml   (dry-run)
# 3. Review resolved.yml
# 4. docker compose -f resolved.yml up -d

Before Deploying

Repo path — find
```
video-search-and-summarization/
```
on disk. Check
```
TOOLS.md
```
if available.
NGC CLI & API key — see
```
references/ngc.md
```
. Check
```
$NGC_CLI_API_KEY
```
is set.
System prerequisites (GPU VRAM, driver, Docker, NVIDIA Container Toolkit) — canonical reference is the VSS prerequisites page. That page lists supported hardware, per-profile GPU requirements, and the minimum driver/CUDA version per NIM. Read it and pick the LLM/VLM placement that fits the host — don't guess thresholds from this skill.

Pre-flight Check

Run before every deploy. Do not proceed if any check fails.

bash

# 1. GPU visible
nvidia-smi --query-gpu=index,name --format=csv,noheader

# 2. NVIDIA runtime in Docker
docker info 2>/dev/null | grep -i "runtimes"

# 3. NVIDIA runtime works end-to-end
docker run --rm --gpus all ubuntu:22.04 nvidia-smi 2>&1 | head -5

If check 2 or 3 fails, see

references/prerequisites.md

Deployment Flow

Always follow this sequence. Never skip the dry-run.

Step 0 — Tear down any existing deployment

If a deployment already exists, tear it down first. Full procedure (resolved.yml-driven path, container-name catch-all patterns covering dev-profile compose files, why leftovers cause /sensor/list 502s) lives in

references/teardown.md

If a resolved.yml from a prior deploy exists, prefer it — it

knows about all compose-profile services that were brought up.

if [ -f "$REPO/deployments/resolved.yml" ]; then docker compose -f "$REPO/deployments/resolved.yml" down --remove-orphans fi

Catch-all: remove every VSS-stack container the dev-profile compose

files bring up. Without this, leftovers from a prior deploy linger

(especially the *-smc set, which the alerts compose profile shares

with the *-dev set on host networking and port 30000) and either:

- bind ports the new deploy needs → second sensor-ms fails to bind

→ /sensor/list returns 502 (issue #151), or

- pass the new deploy's container-name health checks while serving

stale data from the prior deploy's DB.

The patterns below cover everything declared in

deployments/vst/{2d,3d,smc,developer,ps}/, deployments/foundational/,

deployments/agents/, deployments/proxy/, and the dev-profile-*

compose files.

docker ps -a --format '{{.Names}}'
| grep -E '^(vss-|mdx-|perception-|rtvi-|alert-|nvstreamer-|sensor-ms-|vst-ingress-|vst-mcp-|vst-file-proxy|centralizedb-|storage-ms-|streamprocessing-ms-|sdr-(http|streamprocessing)-|envoy-(http|streamprocessing)-|rtspserver-ms-|recorder-ms-|replaystream-ms-|livestream-ms-|metropolis-vss-ui|phoenix)'
| xargs -r docker rm -f


If this is the host's first deploy, the `docker compose down`
line is a no-op (exit 0 with no containers to stop) — safe to run
unconditionally.

### Step 1 — Gather context

Discover what's available on the host and cross-reference with the
[VSS prerequisites page](https://docs.nvidia.com/vss/3.1.0/prerequisites.html)
to choose a deployment shape that fits.

| Value | How to determine |
|---|---|
| **Profile** | Match user intent to routing table above. Default: `base` |
| **Repo path** | Find `video-search-and-summarization/` on disk |
| **Hardware** | `nvidia-smi --query-gpu=name,memory.total --format=csv,noheader` → look up per-GPU VRAM against the prerequisites page |
| **LLM/VLM placement** | Pick `local_shared`, `local`, or `remote` per LLM/VLM based on available GPUs + `$LLM_REMOTE_URL` / `$VLM_REMOTE_URL` / `$NGC_CLI_API_KEY`. If no combination on this host satisfies the prerequisites, stop and report the blocker instead of silently picking another shape. |
| **API keys** | `NGC_CLI_API_KEY` for local NIMs, `NVIDIA_API_KEY` for remote |
| **Host IP** | `hostname -I \| awk '{print $1}'` |

**Hardware profile mapping:**

| GPU name contains | HARDWARE_PROFILE | Recommended LLM path |
|---|---|---|
| H100 | `H100` | Nano 9B v2 (NIM) |
| L40S | `L40S` | Nano 9B v2 (NIM) |
| RTX 6000 Ada, RTX PRO 6000 | `RTXPRO6000BW` | Nano 9B v2 (NIM) |
| GB10 (DGX Spark) | `DGX-SPARK` | **Edge 4B** (vLLM) — see [`references/edge.md`](references/edge.md) |
| IGX | `IGX-THOR` | **Edge 4B** (vLLM) — see [`references/edge.md`](references/edge.md) |
| AGX | `AGX-THOR` | **Edge 4B** (vLLM) — see [`references/edge.md`](references/edge.md) |
| Other | `OTHER` | — |

**Minimum GPU count per (profile × mode × platform).** Canonical source
is the [VSS prerequisites page](https://docs.nvidia.com/vss/3.1.0/prerequisites.html);
reproduced here so the skill can fail fast when the host is too small:

| Profile | Mode | H100 / RTX PRO 6000 (Blackwell) | L40S | DGX-Spark / IGX-Thor / AGX-Thor |
|---|---|---|---|---|
| `base` | shared (`local_shared` LLM + VLM) | **1** | — (48 GB/GPU too small) | **1** (Edge 4B + VLM, unified memory) |
| `base` | dedicated (`local` LLM + VLM) | **2** | **2** | — |
| `base` | `remote-llm` | **1** (VLM local) | **1** (VLM local) | **1** (remote LLM only) |
| `base` | `remote-vlm` | **1** (LLM local) | **1** (LLM local) | — |
| `base` | `remote-all` | **0** | **0** | **0** |
| `lvs` | shared | **1** | — | - |
| `lvs` | dedicated | **2** | **2** | — |
| `lvs` | `remote-llm/vlm` | 1 | 1 | - |
| `lvs` | `remote-all` | 0 | 0 | - |
| `alerts` (verification / CV) | shared | **2**  | — | — |
| `alerts` (verification / CV) | dedicated | **3** | **3**  | — |
| `alerts` (verification / CV) | `remote-all` | 1 | 1 | 1 |
| `alerts` (verification / CV) | `remote-llm/vlm` | 2 | 2 | 1 |
| `alerts` (real-time / VLM) | shared | **2** | — | — |
| `alerts` (real-time / VLM) | dedicated | **3** | **3**  | — |
| `alerts` (real-time / VLM) | `remote-llm` | 2 | 2 | 1 |
| `search` | shared | **2** | — | - |
| `search` | dedicated | **3** | **3**  | — |
| `search` | `remote-*` | **2**  | **2** | - |

A few hard rules encoded in the table:

- **L40S can't do `shared`.** 48 GB is not enough VRAM for LLM + VLM
  on a single GPU. Fall back to `dedicated` or a `remote-*` mode.
- **L40S needs +1 GPU for alerts / search vs H100** because the
  shared-on-one-GPU trick doesn't work — RT-CV / Embed1 must take
  their own GPU, and LLM+VLM still need a second.
- **DGX-Spark / Thor are early-access for most profiles.** Only
  `base` + `lvs` are expected to fully land locally; `alerts` /
  `search` currently require a remote LLM. See
  [`references/edge.md`](references/edge.md).

If the host's (GPU count × VRAM) combination doesn't appear above,
**stop and report the blocker** — don't silently pick a different
mode.

> **Edge shared mode requires Edge 4B + `HF_TOKEN`.** On DGX Spark and AGX/IGX
> Thor, both LLM and VLM must fit in unified memory, AND the standard
> `nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2:1` image has a broken arm64
> manifest. You must run `NVIDIA-Nemotron-Edge-4B-v2.1-EA-020126_FP8` as a
> standalone vLLM container on port 30081 with the agent pointed at it via
> `--use-remote-llm`. Full recipe and the mandatory `HF_TOKEN` verification
> step are in [`references/edge.md`](references/edge.md).

### Step 1b — Prepare the data directory

The data directory layout (asset paths, ownership, mount points, profile-specific subdirs) is documented in [`references/data-directory.md`](references/data-directory.md). Read that file before deploying for the first time on a host or when changing profiles.
# Profile-specific subdirs:
#   alerts → mkdir -p "$DATA/data_log/vss_video_analytics_api" "$DATA/videos/dev-profile-alerts" "$DATA/models/rtdetr-its" "$DATA/models/gdino"
#   search → mkdir -p "$DATA/models"
chmod -R 777 "$DATA/data_log" "$DATA/agent_eval"
# If you created $DATA/models above, also: chmod -R 777 "$DATA/models"

FORBIDDEN:
chown -R ubuntu:ubuntu $MDX_DATA_DIR
(or any recursive chown).
This is "good housekeeping" to a shell-admin instinct but is the deploy- breaking command in this stack. You will observe a "healthy" deploy (containers Up, endpoints 200) while the video pipeline is silently broken. Use
chmod -R 777
on the specific subdirs above — nothing else.

Known per-container uid gotchas (each uses a bind mount under

$DATA

Container	Image	Runs as	Mount path	Symptom if permissions wrong
`centralizedb-dev`	postgres:17.6-alpine	uid 70	`$DATA/data_log/vst/postgres/db`	Can't read own PGDATA → VST `sensor_details` query fails → uploaded videos never appear in `/vst/api/v1/sensor/streams` → warehouse E2E check returns empty
`mdx-redis`	redis:8.2.2-alpine	uid 999	`$DATA/data_log/redis/log` , `/redis/data`	"Can't open the log file: Permission denied" → redis dies → `envoy-streamprocessing` dies (needs Redis Lua script) → stream pipeline broken
`elasticsearch`	elasticsearch	uid 1000	`$DATA/data_log/elastic/{data,logs}`	"AccessDeniedException" on startup → ES refuses to start
`vst` / `sensor-ms-dev`	vst	uid 1000	`$DATA/data_log/vst/*` (videos, clips)	403 on ingest or stream write

chmod -R 777 $DATA/data_log

covers all of these. Do NOT chown them to individual uids — containers that init their own dirs on first start (like postgres) will then re-chown to their uid and a later chown back to ubuntu breaks them.

If postgres is already broken (common when redeploying without a clean

data-dir

bash

sudo rm -rf "$DATA/data_log/vst/postgres"  # postgres re-initializes on next start
docker restart centralizedb-dev

Step 1c — If deploying on Brev, set up secure-link env vars

Brev-specific env vars (

BREV_ENV_ID

, secure-link patterns) are documented in

references/brev.md

Step 2 — Build env_overrides

Produce an

env_overrides

dict from the user request and the gathered context: choose remote/local LLM/VLM, set credentials, point at endpoints, set platform-specific flags. The full mapping (every override key, when it applies, defaults, profile-specific differences) lives in

references/env-overrides.md

Step 3 — Config / dry-run

Env file location:

<repo>/deployments/developer-workflow/dev-profile-<profile>/.env

This is the authoritative
.env
. Every verifier, healthcheck, and post-deploy tool reads from this path. When you apply env overrides (from Step 2 or from the user's prompt), write them directly to this file — not to
generated.env
.
generated.env
is a scratchpad that
dev-profile.sh
produces during its own internal flow; it is NOT read by the verifier and is wiped on the next invocation. An agent that uses
dev-profile.sh
as a one-shot deploy but leaves the base
.env
untouched will silently fail env checks even when the stack comes up cleanly. If you used
dev-profile.sh
and see
generated.env
on disk, copy its key/value lines back into the base
.env
, or re-apply your
sed
commands against the base
.env
after the fact. The base
.env
is the source of truth.

bash

REPO=/path/to/video-search-and-summarization
PROFILE=base
ENV_FILE=$REPO/deployments/developer-workflow/dev-profile-$PROFILE/.env

# Read current .env, apply overrides, write back
# (read lines, update matching keys, append new keys, write)

# Resolve compose
cd $REPO/deployments
docker compose --env-file $ENV_FILE config > resolved.yml

The resolved YAML is saved to

<repo>/deployments/resolved.yml

Step 3b — Verify resolved.yml has no unexpanded ${...} tokens

Unexpanded

${VAR}

tokens in

resolved.yml

mean compose did not see those env values. Diagnostic procedure and common culprits live in

references/troubleshooting.md

Step 4 — Review

Show the user a summary of what will be deployed:

Profile name and hardware
LLM/VLM models and mode (local/remote/local_shared)
Services that will start
GPU device assignment
Key endpoints (UI port, agent port)

Ask: "Looks good — deploy now?" and wait for confirmation before Step 5.

Exception — autonomous mode. If the user's request already asks you to run autonomously (e.g. "deploy X autonomously", "run without confirmation", "non-interactive"), skip the confirmation prompt and proceed straight to Step 5. This path exists so automated eval / CI invocations don't hang waiting for a human reply they'll never get. In all other cases, a human must approve.

Step 5 — Deploy

bash

cd $REPO/deployments
docker compose -f resolved.yml up -d

Do NOT use
--force-recreate
on retries. It destroys already-warm NIM containers, forcing another 3–5 min torch.compile + CUDA-graph capture per NIM. If the previous
up -d
partially failed, fix the root cause (usually perms or an env typo) and just re-run
up -d
— Docker will re-create only the containers whose config changed or that are down.

Deploy takes ~10-20 min on first run (image pulls + model downloads). Monitor:

bash

# Container status
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'

# Logs for a specific service
docker compose -f $REPO/deployments/resolved.yml logs --tail 50 <service>

Deploy is complete when all

mdx-*

containers show

Up

status.

Step 6 — Report endpoints

Profile	Agent UI	REST API	Other
base	`:3000`	`:8000` (Swagger at `/docs` )	—
alerts	`:3000`	`:8000`	VIOS dashboard `:30888/vst/`
lvs	`:3000`	`:8000`	—
search	`:3000`	`:8000`	—

Use workflow skills after deployment:

alerts / incident-report → alert management and incident queries
video-search → semantic video search
video-summarization → long video summarization
vios → camera/stream management via VIOS
video-analytics → Elasticsearch queries

Tear Down

bash

cd $REPO/deployments
docker compose -f resolved.yml down

Debugging a Deployment

Use this workflow when the user asks to "debug the deploy", "verify it's working", "why is the agent not responding", or similar. The goal is to confirm the full video-ingestion-to-agent-answer path, not just that containers are "Up".

Each profile reference doc (e.g.

references/base.md

) has a Debugging section listing the exact commands to run for that profile.

Quick checks (all profiles)

bash

# 1. All expected containers Up
docker ps --format 'table {{.Names}}\t{{.Status}}'

# 2. Agent API + UI responding
curl -sf http://localhost:8000/docs >/dev/null && echo "agent OK"
curl -sf http://localhost:3000/ >/dev/null && echo "ui OK"

# 3. VLM NIM responding (base/lvs profiles)
curl -sf http://localhost:30082/v1/models | python3 -m json.tool

# 4. LLM NIM responding
curl -sf http://localhost:30081/v1/models | python3 -m json.tool

End-to-end video sanity check

After the quick checks above pass, drive a real query through the agent — e.g. ask it over the REST API or UI to describe a video you've uploaded to VST. If the agent returns a non-empty answer, the upload → ingest → inference → reply path is healthy. If it fails,

docker logs vss-agent

shows which stage tripped.

Troubleshooting

```
unknown or invalid runtime name: nvidia
```
→ NVIDIA Container Toolkit not installed or Docker not restarted. See
```
references/prerequisites.md
```
.
NGC auth error → re-export
```
NGC_CLI_API_KEY
```
or follow
```
references/ngc.md
```
.

GPU not detected → run

sudo modprobe nvidia && sudo modprobe nvidia_uvm

, then retry.

```
docker compose up
```
fails with "no resolved.yml" → run the dry-run (
```
docker compose config > resolved.yml
```
, Step 3) first.
cosmos-reason2-8b crash → must redeploy the full stack (known issue: NIM cannot restart alone).

deploy

NPX Install

Tags

SKILL.md Content

VSS Deploy

Profile Routing

When to Use

How it works

Before Deploying

Pre-flight Check

Deployment Flow

Step 0 — Tear down any existing deployment

If a resolved.yml from a prior deploy exists, prefer it — it

knows about all compose-profile services that were brought up.

Catch-all: remove every VSS-stack container the dev-profile compose

files bring up. Without this, leftovers from a prior deploy linger

(especially the *-smc set, which the alerts compose profile shares

with the *-dev set on host networking and port 30000) and either:

- bind ports the new deploy needs → second sensor-ms fails to bind

→ /sensor/list returns 502 (issue #151), or

- pass the new deploy's container-name health checks while serving

stale data from the prior deploy's DB.

The patterns below cover everything declared in

deployments/vst/{2d,3d,smc,developer,ps}/, deployments/foundational/,

deployments/agents/, deployments/proxy/, and the dev-profile-*

compose files.

Step 1c — If deploying on Brev, set up secure-link env vars

Step 2 — Build env_overrides

Step 3 — Config / dry-run

Step 3b — Verify resolved.yml has no unexpanded ${...} tokens

Step 4 — Review

Step 5 — Deploy

Step 6 — Report endpoints

Tear Down

Debugging a Deployment

Quick checks (all profiles)

End-to-end video sanity check

Troubleshooting