deploy

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

VSS Deploy

VSS 部署

Deploy any VSS profile using a compose-centric workflow: build env overrides, generate resolved compose (dry-run), review, then deploy. Replaces direct
dev-profile.sh
execution with validated, auditable steps.
采用以Compose为核心的工作流部署任意VSS配置文件:构建环境变量覆盖项、生成解析后的Compose配置(预演运行)、审核配置,然后执行部署。通过经过验证、可审计的步骤替代直接执行
dev-profile.sh
的方式。

Profile Routing

配置文件路由

User saysProfileReference
"deploy vss" / "deploy base"
base
references/base.md
"deploy alerts" / "alert verification" / "real-time alerts"
alerts
references/alerts.md
"deploy for incident report"
alerts
references/alerts.md
"deploy lvs" / "video summarization"
lvs
references/lvs.md
"deploy search" / "video search"
search
references/search.md
Edge hardware routing (DGX Spark, AGX/IGX Thor): see
references/edge.md
for the 4B-LLM recipe (
config_edge.yml
+ standalone vLLM on port 30081). Edge platforms share a single unified-memory GPU between LLM and VLM, so the Nemotron Edge 4B is the default and the Nemotron Nano 9B v2 FP8 is an option when memory allows.
用户指令配置文件参考文档
"deploy vss" / "deploy base"
base
references/base.md
"deploy alerts" / "alert verification" / "real-time alerts"
alerts
references/alerts.md
"deploy for incident report"
alerts
references/alerts.md
"deploy lvs" / "video summarization"
lvs
references/lvs.md
"deploy search" / "video search"
search
references/search.md
边缘硬件路由(DGX Spark、AGX/IGX Thor):请查看
references/edge.md
获取4B-LLM方案(
config_edge.yml
+ 端口30081上的独立vLLM)。边缘 平台在LLM和VLM之间共享单个统一内存GPU,因此默认使用Nemotron Edge 4B,当内存允许时可选择Nemotron Nano 9B v2 FP8。

When to Use

使用场景

  • Deploy VSS / start VSS / bring up a profile
  • Deploy a specific profile (base, alerts, lvs, search)
  • Do a dry-run / preview what will be deployed
  • Change deployment config (hardware, LLM mode, GPU assignment)
  • Tear down a running deployment
  • Debug or verify an existing deployment (see Debugging a Deployment)
  • 部署VSS / 启动VSS / 启动某个配置文件
  • 部署特定配置文件(base、alerts、lvs、search)
  • 执行预演运行 / 预览即将部署的内容
  • 修改部署配置(硬件、LLM模式、GPU分配)
  • 拆除正在运行的部署
  • 调试或验证现有部署(参见调试部署

How it works

工作原理

Run docker compose commands directly on the host:
bash
undefined
直接在主机上运行docker compose命令:
bash
undefined

1. Apply env overrides to the profile .env file

1. 将环境变量覆盖项应用到配置文件的.env文件

2. docker compose --env-file .env config > resolved.yml (dry-run)

2. docker compose --env-file .env config > resolved.yml (预演运行)

3. Review resolved.yml

3. 查看resolved.yml

4. docker compose -f resolved.yml up -d

4. docker compose -f resolved.yml up -d

undefined
undefined

Before Deploying

部署前准备

  1. Repo path — find
    video-search-and-summarization/
    on disk. Check
    TOOLS.md
    if available.
  2. NGC CLI & API key — see
    references/ngc.md
    . Check
    $NGC_CLI_API_KEY
    is set.
  3. System prerequisites (GPU VRAM, driver, Docker, NVIDIA Container Toolkit) — canonical reference is the VSS prerequisites page. That page lists supported hardware, per-profile GPU requirements, and the minimum driver/CUDA version per NIM. Read it and pick the LLM/VLM placement that fits the host — don't guess thresholds from this skill.
  1. 代码库路径 — 在磁盘上找到
    video-search-and-summarization/
    目录。若有
    TOOLS.md
    可参考该文件。
  2. NGC CLI & API密钥 — 参见
    references/ngc.md
    。检查
    $NGC_CLI_API_KEY
    是否已设置。
  3. 系统先决条件(GPU显存、驱动、Docker、NVIDIA Container Toolkit) — 标准参考文档为VSS先决条件页面。该页面列出了支持的硬件、各配置文件的GPU要求,以及每个NIM所需的最低驱动/CUDA版本。请阅读该页面并选择适合主机的LLM/VLM部署方案——不要仅凭本技能猜测阈值。

Pre-flight Check

预部署检查

Run before every deploy. Do not proceed if any check fails.
bash
undefined
每次部署前都要运行这些检查。若任何检查失败,请勿继续。
bash
undefined

1. GPU visible

1. 确认GPU可见

nvidia-smi --query-gpu=index,name --format=csv,noheader
nvidia-smi --query-gpu=index,name --format=csv,noheader

2. NVIDIA runtime in Docker

2. 确认Docker中存在NVIDIA运行时

docker info 2>/dev/null | grep -i "runtimes"
docker info 2>/dev/null | grep -i "runtimes"

3. NVIDIA runtime works end-to-end

3. 确认NVIDIA运行时端到端可用

docker run --rm --gpus all ubuntu:22.04 nvidia-smi 2>&1 | head -5

If check 2 or 3 fails, see [`references/prerequisites.md`](references/prerequisites.md).
docker run --rm --gpus all ubuntu:22.04 nvidia-smi 2>&1 | head -5

若检查2或3失败,请参见[`references/prerequisites.md`](references/prerequisites.md)。

Deployment Flow

部署流程

Always follow this sequence. Never skip the dry-run.
请始终遵循此顺序。切勿跳过预演运行步骤。

Step 0 — Tear down any existing deployment

步骤0 — 拆除现有部署

If a deployment already exists, tear it down first. Full procedure (resolved.yml-driven path, container-name catch-all patterns covering dev-profile compose files, why leftovers cause /sensor/list 502s) lives in
references/teardown.md
.
若已有部署存在,请先拆除它。完整流程(基于resolved.yml的路径、覆盖dev-profile compose文件的容器名称通配模式、残留容器导致/sensor/list返回502的原因)记录在
references/teardown.md
中。
bash
undefined

If a resolved.yml from a prior deploy exists, prefer it — it

如果之前部署生成的resolved.yml存在,优先使用该文件——它

knows about all compose-profile services that were brought up.

了解所有已启动的compose配置文件服务。

if [ -f "$REPO/deployments/resolved.yml" ]; then docker compose -f "$REPO/deployments/resolved.yml" down --remove-orphans fi
if [ -f "$REPO/deployments/resolved.yml" ]; then docker compose -f "$REPO/deployments/resolved.yml" down --remove-orphans fi

Catch-all: remove every VSS-stack container the dev-profile compose

兜底方案:移除dev-profile compose文件启动的所有VSS栈容器。若不执行此步骤,

files bring up. Without this, leftovers from a prior deploy linger

之前部署的残留容器(尤其是*-smc组,alerts compose配置文件在主机网络和端口30000上

(especially the *-smc set, which the alerts compose profile shares

与*-dev组共享这些容器)会残留,导致以下问题之一:

with the *-dev set on host networking and port 30000) and either:

- 占用新部署需要的端口 → 第二个sensor-ms无法绑定端口

- bind ports the new deploy needs → second sensor-ms fails to bind

→ /sensor/list返回502(问题#151),或

→ /sensor/list returns 502 (issue #151), or

- 通过新部署的容器名称健康检查,但提供来自之前部署数据库的陈旧数据。

- pass the new deploy's container-name health checks while serving

以下模式覆盖了

stale data from the prior deploy's DB.

deployments/vst/{2d,3d,smc,developer,ps}/、deployments/foundational/、

The patterns below cover everything declared in

deployments/agents/、deployments/proxy/以及dev-profile-*

deployments/vst/{2d,3d,smc,developer,ps}/, deployments/foundational/,

compose文件中声明的所有容器。

deployments/agents/, deployments/proxy/, and the dev-profile-*

compose files.

docker ps -a --format '{{.Names}}'
| grep -E '^(vss-|mdx-|perception-|rtvi-|alert-|nvstreamer-|sensor-ms-|vst-ingress-|vst-mcp-|vst-file-proxy|centralizedb-|storage-ms-|streamprocessing-ms-|sdr-(http|streamprocessing)-|envoy-(http|streamprocessing)-|rtspserver-ms-|recorder-ms-|replaystream-ms-|livestream-ms-|metropolis-vss-ui|phoenix)'
| xargs -r docker rm -f

If this is the host's first deploy, the `docker compose down`
line is a no-op (exit 0 with no containers to stop) — safe to run
unconditionally.
docker ps -a --format '{{.Names}}'
| grep -E '^(vss-|mdx-|perception-|rtvi-|alert-|nvstreamer-|sensor-ms-|vst-ingress-|vst-mcp-|vst-file-proxy|centralizedb-|storage-ms-|streamprocessing-ms-|sdr-(http|streamprocessing)-|envoy-(http|streamprocessing)-|rtspserver-ms-|recorder-ms-|replaystream-ms-|livestream-ms-|metropolis-vss-ui|phoenix)'
| xargs -r docker rm -f

若这是主机首次部署,`docker compose down`
命令不会产生任何操作(退出码0,无容器可停止)——可安全地无条件运行。

Step 1 — Gather context

步骤1 — 收集上下文信息

Discover what's available on the host and cross-reference with the VSS prerequisites page to choose a deployment shape that fits.
ValueHow to determine
ProfileMatch user intent to routing table above. Default:
base
Repo pathFind
video-search-and-summarization/
on disk
Hardware
nvidia-smi --query-gpu=name,memory.total --format=csv,noheader
→ look up per-GPU VRAM against the prerequisites page
LLM/VLM placementPick
local_shared
,
local
, or
remote
per LLM/VLM based on available GPUs +
$LLM_REMOTE_URL
/
$VLM_REMOTE_URL
/
$NGC_CLI_API_KEY
. If no combination on this host satisfies the prerequisites, stop and report the blocker instead of silently picking another shape.
API keys
NGC_CLI_API_KEY
for local NIMs,
NVIDIA_API_KEY
for remote
Host IP
hostname -I | awk '{print $1}'
Hardware profile mapping:
GPU name containsHARDWARE_PROFILERecommended LLM path
H100
H100
Nano 9B v2 (NIM)
L40S
L40S
Nano 9B v2 (NIM)
RTX 6000 Ada, RTX PRO 6000
RTXPRO6000BW
Nano 9B v2 (NIM)
GB10 (DGX Spark)
DGX-SPARK
Edge 4B (vLLM) — see
references/edge.md
IGX
IGX-THOR
Edge 4B (vLLM) — see
references/edge.md
AGX
AGX-THOR
Edge 4B (vLLM) — see
references/edge.md
Other
OTHER
Minimum GPU count per (profile × mode × platform). Canonical source is the VSS prerequisites page; reproduced here so the skill can fail fast when the host is too small:
ProfileModeH100 / RTX PRO 6000 (Blackwell)L40SDGX-Spark / IGX-Thor / AGX-Thor
base
shared (
local_shared
LLM + VLM)
1— (48 GB/GPU too small)1 (Edge 4B + VLM, unified memory)
base
dedicated (
local
LLM + VLM)
22
base
remote-llm
1 (VLM local)1 (VLM local)1 (remote LLM only)
base
remote-vlm
1 (LLM local)1 (LLM local)
base
remote-all
000
lvs
shared1-
lvs
dedicated22
lvs
remote-llm/vlm
11-
lvs
remote-all
00-
alerts
(verification / CV)
shared2
alerts
(verification / CV)
dedicated33
alerts
(verification / CV)
remote-all
111
alerts
(verification / CV)
remote-llm/vlm
221
alerts
(real-time / VLM)
shared2
alerts
(real-time / VLM)
dedicated33
alerts
(real-time / VLM)
remote-llm
221
search
shared2-
search
dedicated33
search
remote-*
22-
A few hard rules encoded in the table:
  • L40S can't do
    shared
    .
    48 GB is not enough VRAM for LLM + VLM on a single GPU. Fall back to
    dedicated
    or a
    remote-*
    mode.
  • L40S needs +1 GPU for alerts / search vs H100 because the shared-on-one-GPU trick doesn't work — RT-CV / Embed1 must take their own GPU, and LLM+VLM still need a second.
  • DGX-Spark / Thor are early-access for most profiles. Only
    base
    +
    lvs
    are expected to fully land locally;
    alerts
    /
    search
    currently require a remote LLM. See
    references/edge.md
    .
If the host's (GPU count × VRAM) combination doesn't appear above, stop and report the blocker — don't silently pick a different mode.
Edge shared mode requires Edge 4B +
HF_TOKEN
.
On DGX Spark and AGX/IGX Thor, both LLM and VLM must fit in unified memory, AND the standard
nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2:1
image has a broken arm64 manifest. You must run
NVIDIA-Nemotron-Edge-4B-v2.1-EA-020126_FP8
as a standalone vLLM container on port 30081 with the agent pointed at it via
--use-remote-llm
. Full recipe and the mandatory
HF_TOKEN
verification step are in
references/edge.md
.
发现主机上可用的资源,并结合 VSS先决条件页面 选择适合的部署方案。
信息项确定方式
配置文件将用户意图与上方的路由表匹配。默认值:
base
代码库路径在磁盘上找到
video-search-and-summarization/
目录
硬件运行
nvidia-smi --query-gpu=name,memory.total --format=csv,noheader
→ 将单GPU显存与先决条件页面进行比对
LLM/VLM部署模式根据可用GPU +
$LLM_REMOTE_URL
/
$VLM_REMOTE_URL
/
$NGC_CLI_API_KEY
选择
local_shared
local
remote
。若主机上没有任何组合满足先决条件,请停止部署并报告障碍,不要静默选择其他方案。
API密钥本地NIM使用
NGC_CLI_API_KEY
,远程部署使用
NVIDIA_API_KEY
主机IP运行
hostname -I | awk '{print $1}'
硬件配置文件映射:
GPU名称包含HARDWARE_PROFILE推荐LLM方案
H100
H100
Nano 9B v2 (NIM)
L40S
L40S
Nano 9B v2 (NIM)
RTX 6000 Ada, RTX PRO 6000
RTXPRO6000BW
Nano 9B v2 (NIM)
GB10 (DGX Spark)
DGX-SPARK
Edge 4B (vLLM) — 参见
references/edge.md
IGX
IGX-THOR
Edge 4B (vLLM) — 参见
references/edge.md
AGX
AGX-THOR
Edge 4B (vLLM) — 参见
references/edge.md
其他
OTHER
各(配置文件 × 模式 × 平台)所需的最低GPU数量。标准来源为 VSS先决条件页面; 此处复制该信息以便本技能在主机资源不足时快速报错:
配置文件模式H100 / RTX PRO 6000 (Blackwell)L40SDGX-Spark / IGX-Thor / AGX-Thor
base
shared(
local_shared
LLM + VLM)
1—(单GPU 48GB显存不足)1(Edge 4B + VLM,统一内存)
base
dedicated(
local
LLM + VLM)
22
base
remote-llm
1(VLM本地部署)1(VLM本地部署)1(仅LLM远程部署)
base
remote-vlm
1(LLM本地部署)1(LLM本地部署)
base
remote-all
000
lvs
shared1-
lvs
dedicated22
lvs
remote-llm/vlm
11-
lvs
remote-all
00-
alerts
(验证 / CV)
shared2
alerts
(验证 / CV)
dedicated33
alerts
(验证 / CV)
remote-all
111
alerts
(验证 / CV)
remote-llm/vlm
221
alerts
(实时 / VLM)
shared2
alerts
(实时 / VLM)
dedicated33
alerts
(实时 / VLM)
remote-llm
221
search
shared2-
search
dedicated33
search
remote-*
22-
表格中包含一些硬性规则:
  • L40S不支持
    shared
    模式
    。48GB显存不足以在单个GPU上同时运行LLM + VLM。请回退到
    dedicated
    remote-*
    模式。
  • 与H100相比,L40S运行alerts / search配置文件需要多1个GPU,因为单GPU共享方案不可行——RT-CV / Embed1必须独占一个GPU,而LLM+VLM仍需要第二个GPU。
  • DGX-Spark / Thor对大多数配置文件来说是早期访问版本。仅
    base
    +
    lvs
    配置文件预计可完全本地部署;
    alerts
    /
    search
    目前需要远程LLM。请参见
    references/edge.md
若主机的(GPU数量 × 显存)组合未出现在上述表格中, 请停止部署并报告障碍——不要静默选择其他模式。
边缘共享模式需要Edge 4B +
HF_TOKEN
。在DGX Spark和AGX/IGX Thor上,LLM和VLM必须能放入统一内存,并且标准
nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2:1
镜像的arm64清单存在问题。您必须运行
NVIDIA-Nemotron-Edge-4B-v2.1-EA-020126_FP8
作为端口30081上的独立vLLM容器,并通过
--use-remote-llm
让agent指向它。完整方案和强制的
HF_TOKEN
验证步骤记录在
references/edge.md
中。

Step 1b — Prepare the data directory

步骤1b — 准备数据目录

The data directory layout (asset paths, ownership, mount points, profile-specific subdirs) is documented in
references/data-directory.md
. Read that file before deploying for the first time on a host or when changing profiles.
数据目录布局(资源路径、权限、挂载点、配置文件特定子目录)记录在
references/data-directory.md
中。首次在主机上部署或切换配置文件前,请阅读该文件。
bash
undefined

Profile-specific subdirs:

配置文件特定子目录:

alerts → mkdir -p "$DATA/data_log/vss_video_analytics_api" "$DATA/videos/dev-profile-alerts" "$DATA/models/rtdetr-its" "$DATA/models/gdino"

alerts → mkdir -p "$DATA/data_log/vss_video_analytics_api" "$DATA/videos/dev-profile-alerts" "$DATA/models/rtdetr-its" "$DATA/models/gdino"

search → mkdir -p "$DATA/models"

search → mkdir -p "$DATA/models"

chmod -R 777 "$DATA/data_log" "$DATA/agent_eval"
chmod -R 777 "$DATA/data_log" "$DATA/agent_eval"

If you created $DATA/models above, also: chmod -R 777 "$DATA/models"

若您创建了$DATA/models目录,还需执行:chmod -R 777 "$DATA/models"


> **FORBIDDEN: `chown -R ubuntu:ubuntu $MDX_DATA_DIR` (or any recursive chown).**
>
> This is "good housekeeping" to a shell-admin instinct but is **the** deploy-
> breaking command in this stack. You will observe a "healthy" deploy
> (containers Up, endpoints 200) while the video pipeline is silently broken.
> Use `chmod -R 777` on the specific subdirs above — nothing else.

**Known per-container uid gotchas** (each uses a bind mount under `$DATA`):

| Container | Image | Runs as | Mount path | Symptom if permissions wrong |
|---|---|---|---|---|
| `centralizedb-dev` | postgres:17.6-alpine | uid **70** | `$DATA/data_log/vst/postgres/db` | Can't read own PGDATA → VST `sensor_details` query fails → uploaded videos never appear in `/vst/api/v1/sensor/streams` → warehouse E2E check returns empty |
| `mdx-redis` | redis:8.2.2-alpine | uid **999** | `$DATA/data_log/redis/log`, `/redis/data` | "Can't open the log file: Permission denied" → redis dies → `envoy-streamprocessing` dies (needs Redis Lua script) → stream pipeline broken |
| `elasticsearch` | elasticsearch | uid **1000** | `$DATA/data_log/elastic/{data,logs}` | "AccessDeniedException" on startup → ES refuses to start |
| `vst` / `sensor-ms-dev` | vst | uid **1000** | `$DATA/data_log/vst/*` (videos, clips) | 403 on ingest or stream write |

`chmod -R 777 $DATA/data_log` covers all of these. Do NOT chown them to
individual uids — containers that init their own dirs on first start (like
postgres) will then re-chown to their uid and a later chown back to ubuntu
breaks them.

**If postgres is already broken** (common when redeploying without a clean
`data-dir`):
```bash
sudo rm -rf "$DATA/data_log/vst/postgres"  # postgres re-initializes on next start
docker restart centralizedb-dev

> **禁止执行:`chown -R ubuntu:ubuntu $MDX_DATA_DIR`(或任何递归chown操作)**。
>
> 这对shell管理员来说是“良好的内务操作”,但却是此部署栈中**会导致部署失败**的命令。您会看到“健康”的部署状态
>(容器已启动,端点返回200),但视频流水线已静默损坏。
> 请仅对上述特定子目录执行`chmod -R 777`——不要对其他目录执行。

**已知的容器UID陷阱**(每个容器都在`$DATA`下使用绑定挂载):

| 容器 | 镜像 | 运行用户UID | 挂载路径 | 权限错误时的症状 |
|---|---|---|---|---|
| `centralizedb-dev` | postgres:17.6-alpine | uid **70** | `$DATA/data_log/vst/postgres/db` | 无法读取自身PGDATA → VST的`sensor_details`查询失败 → 上传的视频永远不会出现在`/vst/api/v1/sensor/streams`中 → 仓库端到端检查返回空结果 |
| `mdx-redis` | redis:8.2.2-alpine | uid **999** | `$DATA/data_log/redis/log`, `/redis/data` | “无法打开日志文件:权限被拒绝” → redis终止 → `envoy-streamprocessing`终止(需要Redis Lua脚本) → 流流水线损坏 |
| `elasticsearch` | elasticsearch | uid **1000** | `$DATA/data_log/elastic/{data,logs}` | 启动时出现“AccessDeniedException” → ES拒绝启动 |
| `vst` / `sensor-ms-dev` | vst | uid **1000** | `$DATA/data_log/vst/*`(视频、剪辑) | 摄取或流写入时返回403 |

`chmod -R 777 $DATA/data_log`可覆盖所有上述情况。请勿将这些目录的所有者改为单独的UID——首次启动时会初始化自身目录的容器(如postgres)会将目录重新改为自身UID,之后改回ubuntu会导致容器损坏。

**若postgres已损坏**(重新部署未清理`data-dir`时常见):
```bash
sudo rm -rf "$DATA/data_log/vst/postgres"  # postgres会在下次启动时重新初始化
docker restart centralizedb-dev

Step 1c — If deploying on Brev, set up secure-link env vars

步骤1c — 若在Brev上部署,请设置secure-link环境变量

Brev-specific env vars (
BREV_ENV_ID
, secure-link patterns) are documented in
references/brev.md
.
Brev特定的环境变量(
BREV_ENV_ID
、secure-link模式)记录在
references/brev.md
中。

Step 2 — Build env_overrides

步骤2 — 构建环境变量覆盖项

Produce an
env_overrides
dict from the user request and the gathered context: choose remote/local LLM/VLM, set credentials, point at endpoints, set platform-specific flags. The full mapping (every override key, when it applies, defaults, profile-specific differences) lives in
references/env-overrides.md
.
根据用户请求和收集到的上下文生成
env_overrides
字典:选择远程/本地LLM/VLM、设置凭据、指向端点、设置平台特定标志。完整的映射关系(所有覆盖项键、适用场景、默认值、配置文件特定差异)记录在
references/env-overrides.md
中。

Step 3 — Config / dry-run

步骤3 — 配置 / 预演运行

Env file location:
<repo>/deployments/developer-workflow/dev-profile-<profile>/.env
This is the authoritative
.env
.
Every verifier, healthcheck, and post-deploy tool reads from this path. When you apply env overrides (from Step 2 or from the user's prompt), write them directly to this file — not to
generated.env
.
generated.env
is a scratchpad that
dev-profile.sh
produces during its own internal flow; it is NOT read by the verifier and is wiped on the next invocation. An agent that uses
dev-profile.sh
as a one-shot deploy but leaves the base
.env
untouched will silently fail env checks even when the stack comes up cleanly. If you used
dev-profile.sh
and see
generated.env
on disk, copy its key/value lines back into the base
.env
, or re-apply your
sed
commands against the base
.env
after the fact. The base
.env
is the source of truth.
bash
REPO=/path/to/video-search-and-summarization
PROFILE=base
ENV_FILE=$REPO/deployments/developer-workflow/dev-profile-$PROFILE/.env
环境文件位置:
<repo>/deployments/developer-workflow/dev-profile-<profile>/.env
这是权威的
.env
文件
。所有验证器、健康检查工具和部署后工具都会从此路径读取文件。当您应用环境变量覆盖项 (来自步骤2或用户提示)时,请将它们直接写入此文件——不要写入
generated.env
generated.env
dev-profile.sh
在内部流程中生成的临时文件;它不会被验证器读取,且会在下次调用时被清除。若agent使用
dev-profile.sh
进行一次性部署,但未修改基础
.env
文件,即使部署栈正常启动,也会静默失败环境检查。若您使用了
dev-profile.sh
并在磁盘上看到
generated.env
,请将其键值对复制回基础
.env
文件,或在之后针对基础
.env
重新应用您的
sed
命令。基础
.env
是唯一的可信来源。
bash
REPO=/path/to/video-search-and-summarization
PROFILE=base
ENV_FILE=$REPO/deployments/developer-workflow/dev-profile-$PROFILE/.env

Read current .env, apply overrides, write back

读取当前.env文件,应用覆盖项,写回文件

(read lines, update matching keys, append new keys, write)

Resolve compose

cd $REPO/deployments docker compose --env-file $ENV_FILE config > resolved.yml

The resolved YAML is saved to `<repo>/deployments/resolved.yml`.
#(读取行、更新匹配的键、追加新键、写入)

Step 3b — Verify resolved.yml has no unexpanded ${...} tokens

解析Compose配置

Unexpanded
${VAR}
tokens in
resolved.yml
mean compose did not see those env values. Diagnostic procedure and common culprits live in
references/troubleshooting.md
.
cd $REPO/deployments docker compose --env-file $ENV_FILE config > resolved.yml

解析后的YAML文件保存到`<repo>/deployments/resolved.yml`。

Step 4 — Review

步骤3b — 验证resolved.yml中没有未展开的${...}令牌

Show the user a summary of what will be deployed:
  • Profile name and hardware
  • LLM/VLM models and mode (local/remote/local_shared)
  • Services that will start
  • GPU device assignment
  • Key endpoints (UI port, agent port)
Ask: "Looks good — deploy now?" and wait for confirmation before Step 5.
Exception — autonomous mode. If the user's request already asks you to run autonomously (e.g. "deploy X autonomously", "run without confirmation", "non-interactive"), skip the confirmation prompt and proceed straight to Step 5. This path exists so automated eval / CI invocations don't hang waiting for a human reply they'll never get. In all other cases, a human must approve.
resolved.yml
中存在未展开的
${VAR}
令牌意味着compose未读取到这些环境变量值。诊断流程和常见问题记录在
references/troubleshooting.md
中。

Step 5 — Deploy

步骤4 — 审核

bash
cd $REPO/deployments
docker compose -f resolved.yml up -d
Do NOT use
--force-recreate
on retries.
It destroys already-warm NIM containers, forcing another 3–5 min torch.compile + CUDA-graph capture per NIM. If the previous
up -d
partially failed, fix the root cause (usually perms or an env typo) and just re-run
up -d
— Docker will re-create only the containers whose config changed or that are down.
Deploy takes ~10-20 min on first run (image pulls + model downloads). Monitor:
bash
undefined
向用户展示即将部署内容的摘要:
  • 配置文件名称和硬件信息
  • LLM/VLM模型和模式(local/remote/local_shared)
  • 即将启动的服务
  • GPU设备分配
  • 关键端点(UI端口、agent端口)
询问:“看起来没问题——现在部署吗?”,等待用户确认后再执行步骤5。
例外情况——自主模式。若用户请求要求自主运行(例如“deploy X autonomously”、“run without confirmation”、“non-interactive”),请跳过确认提示,直接执行步骤5。此路径用于自动化评估/CI调用,避免因等待人类回复而挂起。在所有其他情况下,必须获得人类批准。

Container status

步骤5 — 部署

docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
bash
cd $REPO/deployments
docker compose -f resolved.yml up -d
重试时请勿使用
--force-recreate
。它会销毁已预热的NIM容器,导致每个NIM重新执行3–5分钟的torch.compile + CUDA-graph捕获。若之前的
up -d
部分失败,请修复根本原因 (通常是权限问题或环境变量输入错误),然后重新运行
up -d
——Docker只会重新创建配置已更改或已停止的容器。
首次部署需要约10-20分钟(镜像拉取 + 模型下载)。监控部署状态:
bash
undefined

Logs for a specific service

容器状态

docker compose -f $REPO/deployments/resolved.yml logs --tail 50 <service>

Deploy is complete when all `mdx-*` containers show `Up` status.
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'

Step 6 — Report endpoints

特定服务的日志

ProfileAgent UIREST APIOther
base
:3000
:8000
(Swagger at
/docs
)
alerts
:3000
:8000
VIOS dashboard
:30888/vst/
lvs
:3000
:8000
search
:3000
:8000
Use workflow skills after deployment:
  • alerts / incident-report → alert management and incident queries
  • video-search → semantic video search
  • video-summarization → long video summarization
  • vios → camera/stream management via VIOS
  • video-analytics → Elasticsearch queries
docker compose -f $REPO/deployments/resolved.yml logs --tail 50 <service>

当所有`mdx-*`容器显示`Up`状态时,部署完成。

Tear Down

步骤6 — 报告端点

bash
cd $REPO/deployments
docker compose -f resolved.yml down
配置文件Agent UIREST API其他
base
:3000
:8000
(Swagger文档位于
/docs
alerts
:3000
:8000
VIOS控制台
:30888/vst/
lvs
:3000
:8000
search
:3000
:8000
部署完成后可使用以下工作流技能:
  • alerts / incident-report → 告警管理和事件查询
  • video-search → 语义视频搜索
  • video-summarization → 长视频摘要
  • vios → 通过VIOS管理摄像头/流
  • video-analytics → Elasticsearch查询

Debugging a Deployment

拆除部署

Use this workflow when the user asks to "debug the deploy", "verify it's working", "why is the agent not responding", or similar. The goal is to confirm the full video-ingestion-to-agent-answer path, not just that containers are "Up".
Each profile reference doc (e.g.
references/base.md
) has a Debugging section listing the exact commands to run for that profile.
bash
cd $REPO/deployments
docker compose -f resolved.yml down

Quick checks (all profiles)

调试部署

bash
undefined
当用户要求“debug the deploy”、“verify it's working”、 “why is the agent not responding”或类似请求时,使用此工作流。目标是确认从视频摄取到agent回复的完整路径正常工作,而不仅仅是容器“已启动”。
每个配置文件的参考文档(例如
references/base.md
)都有一个 调试部分,列出了针对该配置文件的具体命令。

1. All expected containers Up

快速检查(所有配置文件)

docker ps --format 'table {{.Names}}\t{{.Status}}'
bash
undefined

2. Agent API + UI responding

1. 确认所有预期容器已启动

curl -sf http://localhost:8000/docs >/dev/null && echo "agent OK" curl -sf http://localhost:3000/ >/dev/null && echo "ui OK"
docker ps --format 'table {{.Names}}\t{{.Status}}'

3. VLM NIM responding (base/lvs profiles)

2. 确认Agent API + UI可响应

curl -sf http://localhost:30082/v1/models | python3 -m json.tool
curl -sf http://localhost:8000/docs >/dev/null && echo "agent OK" curl -sf http://localhost:3000/ >/dev/null && echo "ui OK"

4. LLM NIM responding

3. 确认VLM NIM可响应(base/lvs配置文件)

curl -sf http://localhost:30081/v1/models | python3 -m json.tool
undefined
curl -sf http://localhost:30082/v1/models | python3 -m json.tool

End-to-end video sanity check

4. 确认LLM NIM可响应

After the quick checks above pass, drive a real query through the agent — e.g. ask it over the REST API or UI to describe a video you've uploaded to VST. If the agent returns a non-empty answer, the upload → ingest → inference → reply path is healthy. If it fails,
docker logs vss-agent
shows which stage tripped.
curl -sf http://localhost:30081/v1/models | python3 -m json.tool
undefined

Troubleshooting

端到端视频完整性检查

  • unknown or invalid runtime name: nvidia
    → NVIDIA Container Toolkit not installed or Docker not restarted. See
    references/prerequisites.md
    .
  • NGC auth error → re-export
    NGC_CLI_API_KEY
    or follow
    references/ngc.md
    .
  • GPU not detected → run
    sudo modprobe nvidia && sudo modprobe nvidia_uvm
    , then retry.
  • docker compose up
    fails with "no resolved.yml" → run the dry-run (
    docker compose config > resolved.yml
    , Step 3) first.
  • cosmos-reason2-8b crash → must redeploy the full stack (known issue: NIM cannot restart alone).
完成上述快速检查后,通过agent执行一个真实查询——例如, 通过REST API或UI要求agent描述您上传到VST的视频。 若agent返回非空答案,则上传 → 摄取 → 推理 → 回复路径正常。若失败,
docker logs vss-agent
会显示哪个环节出了问题。

故障排除

  • unknown or invalid runtime name: nvidia
    → 未安装NVIDIA Container Toolkit或未重启Docker。请参见
    references/prerequisites.md
  • NGC认证错误 → 重新导出
    NGC_CLI_API_KEY
    或遵循
    references/ngc.md
    中的步骤。
  • 未检测到GPU → 运行
    sudo modprobe nvidia && sudo modprobe nvidia_uvm
    ,然后重试。
  • docker compose up
    失败并提示“no resolved.yml” → 先执行预演运行(
    docker compose config > resolved.yml
    ,步骤3)。
  • cosmos-reason2-8b崩溃 → 必须重新部署完整栈(已知问题:NIM无法单独重启)。