deploy
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVSS Deploy
VSS 部署
Deploy any VSS profile using a compose-centric workflow: build env overrides, generate resolved compose (dry-run), review, then deploy. Replaces direct execution with validated, auditable steps.
dev-profile.sh采用以Compose为核心的工作流部署任意VSS配置文件:构建环境变量覆盖项、生成解析后的Compose配置(预演运行)、审核配置,然后执行部署。通过经过验证、可审计的步骤替代直接执行的方式。
dev-profile.shProfile Routing
配置文件路由
| User says | Profile | Reference |
|---|---|---|
| "deploy vss" / "deploy base" | | |
| "deploy alerts" / "alert verification" / "real-time alerts" | | |
| "deploy for incident report" | | |
| "deploy lvs" / "video summarization" | | |
| "deploy search" / "video search" | | |
Edge hardware routing (DGX Spark, AGX/IGX Thor): see
for the 4B-LLM recipe ( + standalone vLLM on port 30081). Edge
platforms share a single unified-memory GPU between LLM and VLM, so the
Nemotron Edge 4B is the default and the Nemotron Nano 9B v2 FP8 is an option
when memory allows.
references/edge.mdconfig_edge.yml| 用户指令 | 配置文件 | 参考文档 |
|---|---|---|
| "deploy vss" / "deploy base" | | |
| "deploy alerts" / "alert verification" / "real-time alerts" | | |
| "deploy for incident report" | | |
| "deploy lvs" / "video summarization" | | |
| "deploy search" / "video search" | | |
边缘硬件路由(DGX Spark、AGX/IGX Thor):请查看
获取4B-LLM方案( + 端口30081上的独立vLLM)。边缘
平台在LLM和VLM之间共享单个统一内存GPU,因此默认使用Nemotron Edge 4B,当内存允许时可选择Nemotron Nano 9B v2 FP8。
references/edge.mdconfig_edge.ymlWhen to Use
使用场景
- Deploy VSS / start VSS / bring up a profile
- Deploy a specific profile (base, alerts, lvs, search)
- Do a dry-run / preview what will be deployed
- Change deployment config (hardware, LLM mode, GPU assignment)
- Tear down a running deployment
- Debug or verify an existing deployment (see Debugging a Deployment)
- 部署VSS / 启动VSS / 启动某个配置文件
- 部署特定配置文件(base、alerts、lvs、search)
- 执行预演运行 / 预览即将部署的内容
- 修改部署配置(硬件、LLM模式、GPU分配)
- 拆除正在运行的部署
- 调试或验证现有部署(参见调试部署)
How it works
工作原理
Run docker compose commands directly on the host:
bash
undefined直接在主机上运行docker compose命令:
bash
undefined1. Apply env overrides to the profile .env file
1. 将环境变量覆盖项应用到配置文件的.env文件
2. docker compose --env-file .env config > resolved.yml (dry-run)
2. docker compose --env-file .env config > resolved.yml (预演运行)
3. Review resolved.yml
3. 查看resolved.yml
4. docker compose -f resolved.yml up -d
4. docker compose -f resolved.yml up -d
undefinedundefinedBefore Deploying
部署前准备
- Repo path — find on disk. Check
video-search-and-summarization/if available.TOOLS.md - NGC CLI & API key — see . Check
references/ngc.mdis set.$NGC_CLI_API_KEY - System prerequisites (GPU VRAM, driver, Docker, NVIDIA Container Toolkit) — canonical reference is the VSS prerequisites page. That page lists supported hardware, per-profile GPU requirements, and the minimum driver/CUDA version per NIM. Read it and pick the LLM/VLM placement that fits the host — don't guess thresholds from this skill.
- 代码库路径 — 在磁盘上找到目录。若有
video-search-and-summarization/可参考该文件。TOOLS.md - NGC CLI & API密钥 — 参见。检查
references/ngc.md是否已设置。$NGC_CLI_API_KEY - 系统先决条件(GPU显存、驱动、Docker、NVIDIA Container Toolkit) — 标准参考文档为VSS先决条件页面。该页面列出了支持的硬件、各配置文件的GPU要求,以及每个NIM所需的最低驱动/CUDA版本。请阅读该页面并选择适合主机的LLM/VLM部署方案——不要仅凭本技能猜测阈值。
Pre-flight Check
预部署检查
Run before every deploy. Do not proceed if any check fails.
bash
undefined每次部署前都要运行这些检查。若任何检查失败,请勿继续。
bash
undefined1. GPU visible
1. 确认GPU可见
nvidia-smi --query-gpu=index,name --format=csv,noheader
nvidia-smi --query-gpu=index,name --format=csv,noheader
2. NVIDIA runtime in Docker
2. 确认Docker中存在NVIDIA运行时
docker info 2>/dev/null | grep -i "runtimes"
docker info 2>/dev/null | grep -i "runtimes"
3. NVIDIA runtime works end-to-end
3. 确认NVIDIA运行时端到端可用
docker run --rm --gpus all ubuntu:22.04 nvidia-smi 2>&1 | head -5
If check 2 or 3 fails, see [`references/prerequisites.md`](references/prerequisites.md).docker run --rm --gpus all ubuntu:22.04 nvidia-smi 2>&1 | head -5
若检查2或3失败,请参见[`references/prerequisites.md`](references/prerequisites.md)。Deployment Flow
部署流程
Always follow this sequence. Never skip the dry-run.
请始终遵循此顺序。切勿跳过预演运行步骤。
Step 0 — Tear down any existing deployment
步骤0 — 拆除现有部署
If a deployment already exists, tear it down first. Full procedure (resolved.yml-driven path, container-name catch-all patterns covering dev-profile compose files, why leftovers cause /sensor/list 502s) lives in .
references/teardown.md若已有部署存在,请先拆除它。完整流程(基于resolved.yml的路径、覆盖dev-profile compose文件的容器名称通配模式、残留容器导致/sensor/list返回502的原因)记录在中。
references/teardown.mdbash
undefinedIf a resolved.yml from a prior deploy exists, prefer it — it
如果之前部署生成的resolved.yml存在,优先使用该文件——它
knows about all compose-profile services that were brought up.
了解所有已启动的compose配置文件服务。
if [ -f "$REPO/deployments/resolved.yml" ]; then
docker compose -f "$REPO/deployments/resolved.yml" down --remove-orphans
fi
if [ -f "$REPO/deployments/resolved.yml" ]; then
docker compose -f "$REPO/deployments/resolved.yml" down --remove-orphans
fi
Catch-all: remove every VSS-stack container the dev-profile compose
兜底方案:移除dev-profile compose文件启动的所有VSS栈容器。若不执行此步骤,
files bring up. Without this, leftovers from a prior deploy linger
之前部署的残留容器(尤其是*-smc组,alerts compose配置文件在主机网络和端口30000上
(especially the *-smc set, which the alerts compose profile shares
与*-dev组共享这些容器)会残留,导致以下问题之一:
with the *-dev set on host networking and port 30000) and either:
- 占用新部署需要的端口 → 第二个sensor-ms无法绑定端口
- bind ports the new deploy needs → second sensor-ms fails to bind
→ /sensor/list返回502(问题#151),或
→ /sensor/list returns 502 (issue #151), or
- 通过新部署的容器名称健康检查,但提供来自之前部署数据库的陈旧数据。
- pass the new deploy's container-name health checks while serving
以下模式覆盖了
stale data from the prior deploy's DB.
deployments/vst/{2d,3d,smc,developer,ps}/、deployments/foundational/、
The patterns below cover everything declared in
deployments/agents/、deployments/proxy/以及dev-profile-*
deployments/vst/{2d,3d,smc,developer,ps}/, deployments/foundational/,
compose文件中声明的所有容器。
deployments/agents/, deployments/proxy/, and the dev-profile-*
—
compose files.
—
docker ps -a --format '{{.Names}}'
| grep -E '^(vss-|mdx-|perception-|rtvi-|alert-|nvstreamer-|sensor-ms-|vst-ingress-|vst-mcp-|vst-file-proxy|centralizedb-|storage-ms-|streamprocessing-ms-|sdr-(http|streamprocessing)-|envoy-(http|streamprocessing)-|rtspserver-ms-|recorder-ms-|replaystream-ms-|livestream-ms-|metropolis-vss-ui|phoenix)'
| xargs -r docker rm -f
| grep -E '^(vss-|mdx-|perception-|rtvi-|alert-|nvstreamer-|sensor-ms-|vst-ingress-|vst-mcp-|vst-file-proxy|centralizedb-|storage-ms-|streamprocessing-ms-|sdr-(http|streamprocessing)-|envoy-(http|streamprocessing)-|rtspserver-ms-|recorder-ms-|replaystream-ms-|livestream-ms-|metropolis-vss-ui|phoenix)'
| xargs -r docker rm -f
If this is the host's first deploy, the `docker compose down`
line is a no-op (exit 0 with no containers to stop) — safe to run
unconditionally.docker ps -a --format '{{.Names}}'
| grep -E '^(vss-|mdx-|perception-|rtvi-|alert-|nvstreamer-|sensor-ms-|vst-ingress-|vst-mcp-|vst-file-proxy|centralizedb-|storage-ms-|streamprocessing-ms-|sdr-(http|streamprocessing)-|envoy-(http|streamprocessing)-|rtspserver-ms-|recorder-ms-|replaystream-ms-|livestream-ms-|metropolis-vss-ui|phoenix)'
| xargs -r docker rm -f
| grep -E '^(vss-|mdx-|perception-|rtvi-|alert-|nvstreamer-|sensor-ms-|vst-ingress-|vst-mcp-|vst-file-proxy|centralizedb-|storage-ms-|streamprocessing-ms-|sdr-(http|streamprocessing)-|envoy-(http|streamprocessing)-|rtspserver-ms-|recorder-ms-|replaystream-ms-|livestream-ms-|metropolis-vss-ui|phoenix)'
| xargs -r docker rm -f
若这是主机首次部署,`docker compose down`
命令不会产生任何操作(退出码0,无容器可停止)——可安全地无条件运行。Step 1 — Gather context
步骤1 — 收集上下文信息
Discover what's available on the host and cross-reference with the
VSS prerequisites page
to choose a deployment shape that fits.
| Value | How to determine |
|---|---|
| Profile | Match user intent to routing table above. Default: |
| Repo path | Find |
| Hardware | |
| LLM/VLM placement | Pick |
| API keys | |
| Host IP | |
Hardware profile mapping:
| GPU name contains | HARDWARE_PROFILE | Recommended LLM path |
|---|---|---|
| H100 | | Nano 9B v2 (NIM) |
| L40S | | Nano 9B v2 (NIM) |
| RTX 6000 Ada, RTX PRO 6000 | | Nano 9B v2 (NIM) |
| GB10 (DGX Spark) | | Edge 4B (vLLM) — see |
| IGX | | Edge 4B (vLLM) — see |
| AGX | | Edge 4B (vLLM) — see |
| Other | | — |
Minimum GPU count per (profile × mode × platform). Canonical source
is the VSS prerequisites page;
reproduced here so the skill can fail fast when the host is too small:
| Profile | Mode | H100 / RTX PRO 6000 (Blackwell) | L40S | DGX-Spark / IGX-Thor / AGX-Thor |
|---|---|---|---|---|
| shared ( | 1 | — (48 GB/GPU too small) | 1 (Edge 4B + VLM, unified memory) |
| dedicated ( | 2 | 2 | — |
| | 1 (VLM local) | 1 (VLM local) | 1 (remote LLM only) |
| | 1 (LLM local) | 1 (LLM local) | — |
| | 0 | 0 | 0 |
| shared | 1 | — | - |
| dedicated | 2 | 2 | — |
| | 1 | 1 | - |
| | 0 | 0 | - |
| shared | 2 | — | — |
| dedicated | 3 | 3 | — |
| | 1 | 1 | 1 |
| | 2 | 2 | 1 |
| shared | 2 | — | — |
| dedicated | 3 | 3 | — |
| | 2 | 2 | 1 |
| shared | 2 | — | - |
| dedicated | 3 | 3 | — |
| | 2 | 2 | - |
A few hard rules encoded in the table:
- L40S can't do . 48 GB is not enough VRAM for LLM + VLM on a single GPU. Fall back to
sharedor adedicatedmode.remote-* - L40S needs +1 GPU for alerts / search vs H100 because the shared-on-one-GPU trick doesn't work — RT-CV / Embed1 must take their own GPU, and LLM+VLM still need a second.
- DGX-Spark / Thor are early-access for most profiles. Only
+
baseare expected to fully land locally;lvs/alertscurrently require a remote LLM. Seesearch.references/edge.md
If the host's (GPU count × VRAM) combination doesn't appear above,
stop and report the blocker — don't silently pick a different
mode.
Edge shared mode requires Edge 4B +. On DGX Spark and AGX/IGX Thor, both LLM and VLM must fit in unified memory, AND the standardHF_TOKENimage has a broken arm64 manifest. You must runnvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2:1as a standalone vLLM container on port 30081 with the agent pointed at it viaNVIDIA-Nemotron-Edge-4B-v2.1-EA-020126_FP8. Full recipe and the mandatory--use-remote-llmverification step are inHF_TOKEN.references/edge.md
发现主机上可用的资源,并结合
VSS先决条件页面
选择适合的部署方案。
| 信息项 | 确定方式 |
|---|---|
| 配置文件 | 将用户意图与上方的路由表匹配。默认值: |
| 代码库路径 | 在磁盘上找到 |
| 硬件 | 运行 |
| LLM/VLM部署模式 | 根据可用GPU + |
| API密钥 | 本地NIM使用 |
| 主机IP | 运行 |
硬件配置文件映射:
| GPU名称包含 | HARDWARE_PROFILE | 推荐LLM方案 |
|---|---|---|
| H100 | | Nano 9B v2 (NIM) |
| L40S | | Nano 9B v2 (NIM) |
| RTX 6000 Ada, RTX PRO 6000 | | Nano 9B v2 (NIM) |
| GB10 (DGX Spark) | | Edge 4B (vLLM) — 参见 |
| IGX | | Edge 4B (vLLM) — 参见 |
| AGX | | Edge 4B (vLLM) — 参见 |
| 其他 | | — |
各(配置文件 × 模式 × 平台)所需的最低GPU数量。标准来源为
VSS先决条件页面;
此处复制该信息以便本技能在主机资源不足时快速报错:
| 配置文件 | 模式 | H100 / RTX PRO 6000 (Blackwell) | L40S | DGX-Spark / IGX-Thor / AGX-Thor |
|---|---|---|---|---|
| shared( | 1 | —(单GPU 48GB显存不足) | 1(Edge 4B + VLM,统一内存) |
| dedicated( | 2 | 2 | — |
| | 1(VLM本地部署) | 1(VLM本地部署) | 1(仅LLM远程部署) |
| | 1(LLM本地部署) | 1(LLM本地部署) | — |
| | 0 | 0 | 0 |
| shared | 1 | — | - |
| dedicated | 2 | 2 | — |
| | 1 | 1 | - |
| | 0 | 0 | - |
| shared | 2 | — | — |
| dedicated | 3 | 3 | — |
| | 1 | 1 | 1 |
| | 2 | 2 | 1 |
| shared | 2 | — | — |
| dedicated | 3 | 3 | — |
| | 2 | 2 | 1 |
| shared | 2 | — | - |
| dedicated | 3 | 3 | — |
| | 2 | 2 | - |
表格中包含一些硬性规则:
- L40S不支持模式。48GB显存不足以在单个GPU上同时运行LLM + VLM。请回退到
shared或dedicated模式。remote-* - 与H100相比,L40S运行alerts / search配置文件需要多1个GPU,因为单GPU共享方案不可行——RT-CV / Embed1必须独占一个GPU,而LLM+VLM仍需要第二个GPU。
- DGX-Spark / Thor对大多数配置文件来说是早期访问版本。仅+
base配置文件预计可完全本地部署;lvs/alerts目前需要远程LLM。请参见search。references/edge.md
若主机的(GPU数量 × 显存)组合未出现在上述表格中,
请停止部署并报告障碍——不要静默选择其他模式。
边缘共享模式需要Edge 4B +。在DGX Spark和AGX/IGX Thor上,LLM和VLM必须能放入统一内存,并且标准HF_TOKEN镜像的arm64清单存在问题。您必须运行nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2:1作为端口30081上的独立vLLM容器,并通过NVIDIA-Nemotron-Edge-4B-v2.1-EA-020126_FP8让agent指向它。完整方案和强制的--use-remote-llm验证步骤记录在HF_TOKEN中。references/edge.md
Step 1b — Prepare the data directory
步骤1b — 准备数据目录
The data directory layout (asset paths, ownership, mount points, profile-specific subdirs) is documented in . Read that file before deploying for the first time on a host or when changing profiles.
references/data-directory.md数据目录布局(资源路径、权限、挂载点、配置文件特定子目录)记录在中。首次在主机上部署或切换配置文件前,请阅读该文件。
references/data-directory.mdbash
undefinedProfile-specific subdirs:
配置文件特定子目录:
alerts → mkdir -p "$DATA/data_log/vss_video_analytics_api" "$DATA/videos/dev-profile-alerts" "$DATA/models/rtdetr-its" "$DATA/models/gdino"
alerts → mkdir -p "$DATA/data_log/vss_video_analytics_api" "$DATA/videos/dev-profile-alerts" "$DATA/models/rtdetr-its" "$DATA/models/gdino"
search → mkdir -p "$DATA/models"
search → mkdir -p "$DATA/models"
chmod -R 777 "$DATA/data_log" "$DATA/agent_eval"
chmod -R 777 "$DATA/data_log" "$DATA/agent_eval"
If you created $DATA/models above, also: chmod -R 777 "$DATA/models"
若您创建了$DATA/models目录,还需执行:chmod -R 777 "$DATA/models"
> **FORBIDDEN: `chown -R ubuntu:ubuntu $MDX_DATA_DIR` (or any recursive chown).**
>
> This is "good housekeeping" to a shell-admin instinct but is **the** deploy-
> breaking command in this stack. You will observe a "healthy" deploy
> (containers Up, endpoints 200) while the video pipeline is silently broken.
> Use `chmod -R 777` on the specific subdirs above — nothing else.
**Known per-container uid gotchas** (each uses a bind mount under `$DATA`):
| Container | Image | Runs as | Mount path | Symptom if permissions wrong |
|---|---|---|---|---|
| `centralizedb-dev` | postgres:17.6-alpine | uid **70** | `$DATA/data_log/vst/postgres/db` | Can't read own PGDATA → VST `sensor_details` query fails → uploaded videos never appear in `/vst/api/v1/sensor/streams` → warehouse E2E check returns empty |
| `mdx-redis` | redis:8.2.2-alpine | uid **999** | `$DATA/data_log/redis/log`, `/redis/data` | "Can't open the log file: Permission denied" → redis dies → `envoy-streamprocessing` dies (needs Redis Lua script) → stream pipeline broken |
| `elasticsearch` | elasticsearch | uid **1000** | `$DATA/data_log/elastic/{data,logs}` | "AccessDeniedException" on startup → ES refuses to start |
| `vst` / `sensor-ms-dev` | vst | uid **1000** | `$DATA/data_log/vst/*` (videos, clips) | 403 on ingest or stream write |
`chmod -R 777 $DATA/data_log` covers all of these. Do NOT chown them to
individual uids — containers that init their own dirs on first start (like
postgres) will then re-chown to their uid and a later chown back to ubuntu
breaks them.
**If postgres is already broken** (common when redeploying without a clean
`data-dir`):
```bash
sudo rm -rf "$DATA/data_log/vst/postgres" # postgres re-initializes on next start
docker restart centralizedb-dev
> **禁止执行:`chown -R ubuntu:ubuntu $MDX_DATA_DIR`(或任何递归chown操作)**。
>
> 这对shell管理员来说是“良好的内务操作”,但却是此部署栈中**会导致部署失败**的命令。您会看到“健康”的部署状态
>(容器已启动,端点返回200),但视频流水线已静默损坏。
> 请仅对上述特定子目录执行`chmod -R 777`——不要对其他目录执行。
**已知的容器UID陷阱**(每个容器都在`$DATA`下使用绑定挂载):
| 容器 | 镜像 | 运行用户UID | 挂载路径 | 权限错误时的症状 |
|---|---|---|---|---|
| `centralizedb-dev` | postgres:17.6-alpine | uid **70** | `$DATA/data_log/vst/postgres/db` | 无法读取自身PGDATA → VST的`sensor_details`查询失败 → 上传的视频永远不会出现在`/vst/api/v1/sensor/streams`中 → 仓库端到端检查返回空结果 |
| `mdx-redis` | redis:8.2.2-alpine | uid **999** | `$DATA/data_log/redis/log`, `/redis/data` | “无法打开日志文件:权限被拒绝” → redis终止 → `envoy-streamprocessing`终止(需要Redis Lua脚本) → 流流水线损坏 |
| `elasticsearch` | elasticsearch | uid **1000** | `$DATA/data_log/elastic/{data,logs}` | 启动时出现“AccessDeniedException” → ES拒绝启动 |
| `vst` / `sensor-ms-dev` | vst | uid **1000** | `$DATA/data_log/vst/*`(视频、剪辑) | 摄取或流写入时返回403 |
`chmod -R 777 $DATA/data_log`可覆盖所有上述情况。请勿将这些目录的所有者改为单独的UID——首次启动时会初始化自身目录的容器(如postgres)会将目录重新改为自身UID,之后改回ubuntu会导致容器损坏。
**若postgres已损坏**(重新部署未清理`data-dir`时常见):
```bash
sudo rm -rf "$DATA/data_log/vst/postgres" # postgres会在下次启动时重新初始化
docker restart centralizedb-devStep 1c — If deploying on Brev, set up secure-link env vars
步骤1c — 若在Brev上部署,请设置secure-link环境变量
Brev-specific env vars (, secure-link patterns) are documented in .
BREV_ENV_IDreferences/brev.mdBrev特定的环境变量(、secure-link模式)记录在中。
BREV_ENV_IDreferences/brev.mdStep 2 — Build env_overrides
步骤2 — 构建环境变量覆盖项
Produce an dict from the user request and the gathered context: choose remote/local LLM/VLM, set credentials, point at endpoints, set platform-specific flags. The full mapping (every override key, when it applies, defaults, profile-specific differences) lives in .
env_overridesreferences/env-overrides.md根据用户请求和收集到的上下文生成字典:选择远程/本地LLM/VLM、设置凭据、指向端点、设置平台特定标志。完整的映射关系(所有覆盖项键、适用场景、默认值、配置文件特定差异)记录在中。
env_overridesreferences/env-overrides.mdStep 3 — Config / dry-run
步骤3 — 配置 / 预演运行
Env file location:
<repo>/deployments/developer-workflow/dev-profile-<profile>/.envThis is the authoritative. Every verifier, healthcheck, and post-deploy tool reads from this path. When you apply env overrides (from Step 2 or from the user's prompt), write them directly to this file — not to.env.generated.envis a scratchpad thatgenerated.envproduces during its own internal flow; it is NOT read by the verifier and is wiped on the next invocation. An agent that usesdev-profile.shas a one-shot deploy but leaves the basedev-profile.shuntouched will silently fail env checks even when the stack comes up cleanly. If you used.envand seedev-profile.shon disk, copy its key/value lines back into the basegenerated.env, or re-apply your.envcommands against the basesedafter the fact. The base.envis the source of truth..env
bash
REPO=/path/to/video-search-and-summarization
PROFILE=base
ENV_FILE=$REPO/deployments/developer-workflow/dev-profile-$PROFILE/.env环境文件位置:
<repo>/deployments/developer-workflow/dev-profile-<profile>/.env这是权威的文件。所有验证器、健康检查工具和部署后工具都会从此路径读取文件。当您应用环境变量覆盖项 (来自步骤2或用户提示)时,请将它们直接写入此文件——不要写入.env。generated.env是generated.env在内部流程中生成的临时文件;它不会被验证器读取,且会在下次调用时被清除。若agent使用dev-profile.sh进行一次性部署,但未修改基础dev-profile.sh文件,即使部署栈正常启动,也会静默失败环境检查。若您使用了.env并在磁盘上看到dev-profile.sh,请将其键值对复制回基础generated.env文件,或在之后针对基础.env重新应用您的.env命令。基础sed是唯一的可信来源。.env
bash
REPO=/path/to/video-search-and-summarization
PROFILE=base
ENV_FILE=$REPO/deployments/developer-workflow/dev-profile-$PROFILE/.envRead current .env, apply overrides, write back
读取当前.env文件,应用覆盖项,写回文件
(read lines, update matching keys, append new keys, write)
—
Resolve compose
—
cd $REPO/deployments
docker compose --env-file $ENV_FILE config > resolved.yml
The resolved YAML is saved to `<repo>/deployments/resolved.yml`.#(读取行、更新匹配的键、追加新键、写入)
Step 3b — Verify resolved.yml has no unexpanded ${...} tokens
解析Compose配置
Unexpanded tokens in mean compose did not see those env values. Diagnostic procedure and common culprits live in .
${VAR}resolved.ymlreferences/troubleshooting.mdcd $REPO/deployments
docker compose --env-file $ENV_FILE config > resolved.yml
解析后的YAML文件保存到`<repo>/deployments/resolved.yml`。Step 4 — Review
步骤3b — 验证resolved.yml中没有未展开的${...}令牌
Show the user a summary of what will be deployed:
- Profile name and hardware
- LLM/VLM models and mode (local/remote/local_shared)
- Services that will start
- GPU device assignment
- Key endpoints (UI port, agent port)
Ask: "Looks good — deploy now?" and wait for confirmation before Step 5.
Exception — autonomous mode. If the user's request already asks
you to run autonomously (e.g. "deploy X autonomously", "run without
confirmation", "non-interactive"), skip the confirmation prompt and
proceed straight to Step 5. This path exists so automated eval /
CI invocations don't hang waiting for a human reply they'll never
get. In all other cases, a human must approve.
resolved.yml${VAR}references/troubleshooting.mdStep 5 — Deploy
步骤4 — 审核
bash
cd $REPO/deployments
docker compose -f resolved.yml up -dDo NOT useon retries. It destroys already-warm NIM containers, forcing another 3–5 min torch.compile + CUDA-graph capture per NIM. If the previous--force-recreatepartially failed, fix the root cause (usually perms or an env typo) and just re-runup -d— Docker will re-create only the containers whose config changed or that are down.up -d
Deploy takes ~10-20 min on first run (image pulls + model downloads). Monitor:
bash
undefined向用户展示即将部署内容的摘要:
- 配置文件名称和硬件信息
- LLM/VLM模型和模式(local/remote/local_shared)
- 即将启动的服务
- GPU设备分配
- 关键端点(UI端口、agent端口)
询问:“看起来没问题——现在部署吗?”,等待用户确认后再执行步骤5。
例外情况——自主模式。若用户请求要求自主运行(例如“deploy X autonomously”、“run without confirmation”、“non-interactive”),请跳过确认提示,直接执行步骤5。此路径用于自动化评估/CI调用,避免因等待人类回复而挂起。在所有其他情况下,必须获得人类批准。
Container status
步骤5 — 部署
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
bash
cd $REPO/deployments
docker compose -f resolved.yml up -d重试时请勿使用。它会销毁已预热的NIM容器,导致每个NIM重新执行3–5分钟的torch.compile + CUDA-graph捕获。若之前的--force-recreate部分失败,请修复根本原因 (通常是权限问题或环境变量输入错误),然后重新运行up -d——Docker只会重新创建配置已更改或已停止的容器。up -d
首次部署需要约10-20分钟(镜像拉取 + 模型下载)。监控部署状态:
bash
undefinedLogs for a specific service
容器状态
docker compose -f $REPO/deployments/resolved.yml logs --tail 50 <service>
Deploy is complete when all `mdx-*` containers show `Up` status.docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
Step 6 — Report endpoints
特定服务的日志
| Profile | Agent UI | REST API | Other |
|---|---|---|---|
| base | | | — |
| alerts | | | VIOS dashboard |
| lvs | | | — |
| search | | | — |
Use workflow skills after deployment:
- alerts / incident-report → alert management and incident queries
- video-search → semantic video search
- video-summarization → long video summarization
- vios → camera/stream management via VIOS
- video-analytics → Elasticsearch queries
docker compose -f $REPO/deployments/resolved.yml logs --tail 50 <service>
当所有`mdx-*`容器显示`Up`状态时,部署完成。Tear Down
步骤6 — 报告端点
bash
cd $REPO/deployments
docker compose -f resolved.yml down| 配置文件 | Agent UI | REST API | 其他 |
|---|---|---|---|
| base | | | — |
| alerts | | | VIOS控制台 |
| lvs | | | — |
| search | | | — |
部署完成后可使用以下工作流技能:
- alerts / incident-report → 告警管理和事件查询
- video-search → 语义视频搜索
- video-summarization → 长视频摘要
- vios → 通过VIOS管理摄像头/流
- video-analytics → Elasticsearch查询
Debugging a Deployment
拆除部署
Use this workflow when the user asks to "debug the deploy", "verify it's working",
"why is the agent not responding", or similar. The goal is to confirm the full
video-ingestion-to-agent-answer path, not just that containers are "Up".
Each profile reference doc (e.g. ) has a
Debugging section listing the exact commands to run for that profile.
references/base.mdbash
cd $REPO/deployments
docker compose -f resolved.yml downQuick checks (all profiles)
调试部署
bash
undefined当用户要求“debug the deploy”、“verify it's working”、
“why is the agent not responding”或类似请求时,使用此工作流。目标是确认从视频摄取到agent回复的完整路径正常工作,而不仅仅是容器“已启动”。
每个配置文件的参考文档(例如)都有一个
调试部分,列出了针对该配置文件的具体命令。
references/base.md1. All expected containers Up
快速检查(所有配置文件)
docker ps --format 'table {{.Names}}\t{{.Status}}'
bash
undefined2. Agent API + UI responding
1. 确认所有预期容器已启动
curl -sf http://localhost:8000/docs >/dev/null && echo "agent OK"
curl -sf http://localhost:3000/ >/dev/null && echo "ui OK"
docker ps --format 'table {{.Names}}\t{{.Status}}'
3. VLM NIM responding (base/lvs profiles)
2. 确认Agent API + UI可响应
curl -sf http://localhost:30082/v1/models | python3 -m json.tool
curl -sf http://localhost:8000/docs >/dev/null && echo "agent OK"
curl -sf http://localhost:3000/ >/dev/null && echo "ui OK"
4. LLM NIM responding
3. 确认VLM NIM可响应(base/lvs配置文件)
curl -sf http://localhost:30081/v1/models | python3 -m json.tool
undefinedcurl -sf http://localhost:30082/v1/models | python3 -m json.tool
End-to-end video sanity check
4. 确认LLM NIM可响应
After the quick checks above pass, drive a real query through the agent — e.g.
ask it over the REST API or UI to describe a video you've uploaded to VST.
If the agent returns a non-empty answer, the upload → ingest → inference →
reply path is healthy. If it fails, shows which stage
tripped.
docker logs vss-agentcurl -sf http://localhost:30081/v1/models | python3 -m json.tool
undefinedTroubleshooting
端到端视频完整性检查
- → NVIDIA Container Toolkit not installed or Docker not restarted. See
unknown or invalid runtime name: nvidia.references/prerequisites.md - NGC auth error → re-export or follow
NGC_CLI_API_KEY.references/ngc.md - GPU not detected → run , then retry.
sudo modprobe nvidia && sudo modprobe nvidia_uvm - fails with "no resolved.yml" → run the dry-run (
docker compose up, Step 3) first.docker compose config > resolved.yml - cosmos-reason2-8b crash → must redeploy the full stack (known issue: NIM cannot restart alone).
完成上述快速检查后,通过agent执行一个真实查询——例如,
通过REST API或UI要求agent描述您上传到VST的视频。
若agent返回非空答案,则上传 → 摄取 → 推理 →
回复路径正常。若失败,会显示哪个环节出了问题。
docker logs vss-agent—
故障排除
—
- → 未安装NVIDIA Container Toolkit或未重启Docker。请参见
unknown or invalid runtime name: nvidia。references/prerequisites.md - NGC认证错误 → 重新导出或遵循
NGC_CLI_API_KEY中的步骤。references/ngc.md - 未检测到GPU → 运行,然后重试。
sudo modprobe nvidia && sudo modprobe nvidia_uvm - 失败并提示“no resolved.yml” → 先执行预演运行(
docker compose up,步骤3)。docker compose config > resolved.yml - cosmos-reason2-8b崩溃 → 必须重新部署完整栈(已知问题:NIM无法单独重启)。