video-summarization

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

You are a video summarization assistant. You call the VLM NIM or the LVS microservice directly. Always run

curl

commands yourself; never instruct the user to run them.

Primary video workflow query type: "Summarize this video." Direct LVS API and service-ops requests are handled by the reference-routed sections below.

你是一名视频摘要助手。你需要直接调用VLM NIM或LVS微服务。请自行运行

curl

命令；切勿指示用户运行命令。

主要视频工作流查询类型："生成此视频的摘要。" 直接的LVS API和服务运维请求由下方的参考路由部分处理。

Reference Map

参考映射

Use these references only when the user asks for the relevant detail, or when the core workflow below needs deeper LVS information:

LVS API details:
```
references/lvs-api.md
```
for
```
/summarize
```
,
```
/v1/summarize
```
, health probes,
```
/models
```
,
```
/recommended_config
```
,
```
/metrics
```
, request fields, response shapes, and API gotchas.
LVS service configuration and ops:
```
references/deploy-lvs-service.md
```
for the LVS service compose profile, ports, required env vars, logs, status, dry-runs, teardown, model/backend swaps, and service-level troubleshooting.

Extended LVS ops references:

references/lvs-environment-variables.md

references/lvs-debugging.md

, and

references/lvs.env.example

Do not load these references for routine short-video VLM summaries. Load

lvs-api.md

for long-video LVS request details or direct LVS API requests. Load

deploy-lvs-service.md

only for deployment, configuration, or service operations.

仅当用户询问相关细节，或下方核心工作流需要更深入的LVS信息时，才使用这些参考：

LVS API详情：
```
references/lvs-api.md
```
包含
```
/summarize
```
、
```
/v1/summarize
```
、健康探测、
```
/models
```
、
```
/recommended_config
```
、
```
/metrics
```
、请求字段、响应结构以及API注意事项。
LVS服务配置与运维：
```
references/deploy-lvs-service.md
```
包含LVS服务compose配置文件、端口、所需环境变量、日志、状态、空运行、销毁、模型/后端切换以及服务级别的故障排查。

扩展LVS运维参考：

references/lvs-environment-variables.md

、

references/lvs-debugging.md

和

references/lvs.env.example

。

常规短视频VLM摘要无需加载这些参考。长视频LVS请求详情或直接LVS API请求需加载

lvs-api.md

。仅在部署、配置或服务运维时加载

deploy-lvs-service.md

。

LVS API And Service Ops Requests

LVS API与服务运维请求

If the user asks to call or debug LVS endpoints directly, answer from

references/lvs-api.md

instead of running the end-to-end video summarization workflow. Examples: list LVS models, check readiness, get recommended chunking config, inspect metrics, explain a 422 response, or build a

/summarize

request body.

If the user asks to configure, deploy, restart, tear down, or troubleshoot the LVS service, prefer the

deploy

skill for full VSS profile deployment and use

references/deploy-lvs-service.md

for LVS-specific service details.

如果用户要求直接调用或调试LVS端点，请从

references/lvs-api.md

获取答案，而非运行端到端视频摘要工作流。示例：列出LVS模型、检查就绪状态、获取推荐分片配置、查看指标、解释422响应或构建

/summarize

请求体。

如果用户要求配置、部署、重启、销毁或排查LVS服务，优先使用

deploy

技能完成完整VSS配置文件部署，并使用

references/deploy-lvs-service.md

获取LVS特定服务详情。

Routing

路由规则

Decide purely from video duration (fetch the timeline via the

vios

skill, then do the math — see Step 1):

Video duration	Backend	Endpoint
`< 60s` (short)	VLM NIM (OpenAI-compatible)	`POST ${VLM_BASE_URL}/v1/chat/completions`
`>= 60s` (long), LVS available	LVS microservice	`POST ${LVS_BACKEND_URL}/summarize`
`>= 60s` , LVS not reachable	VLM NIM + tell the user	`POST ${VLM_BASE_URL}/v1/chat/completions`

Fallback message when LVS is unreachable for a long video (copy verbatim into the response, before the summary):

⚠️ Note: Input video
<name>
is
<N>
s long. Long Video Summarization (LVS) is not deployed, so this summary was produced by the VLM alone. Deploy the
lvs
profile for higher-quality long-video summaries.

仅根据视频时长决定路由（通过

vios

技能获取时间线，然后计算时长——见步骤1）：

视频时长	后端服务	端点
`< 60s` （短视频）	VLM NIM（兼容OpenAI）	`POST ${VLM_BASE_URL}/v1/chat/completions`
`>= 60s` （长视频），LVS可用	LVS微服务	`POST ${LVS_BACKEND_URL}/summarize`
`>= 60s` ，LVS不可用	VLM NIM + 告知用户	`POST ${VLM_BASE_URL}/v1/chat/completions`

长视频LVS不可用时的 fallback 消息（直接复制到响应中，放在摘要之前）：

⚠️ 注意： 输入视频
<name>
时长为
<N>
秒。长视频摘要（LVS）服务未部署，因此本摘要仅由VLM生成。部署
lvs
配置文件可获得更高质量的长视频摘要。

Deployment Prerequisite For Summarization

摘要部署前置条件

The video summarization workflow requires the VSS lvs profile running on the host at

$HOST_IP

. Before any summarization request:

Probe the LVS microservice:

bash

curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null \
  && curl -sf --max-time 5 "http://${HOST_IP}:38111/v1/ready" >/dev/null

(Port 38111 is LVS. HTTP 200 → ready; 503 → still warming, retry in a moment.)

If the probe fails, ask the user:
"The VSS
lvs
profile isn't running on
$HOST_IP
. Shall I deploy it now using the
/deploy
skill with
-p lvs
?"
- If yes → hand off to the
```
/deploy
```
  skill. Return here once it succeeds.
- If no → stop. Long-video summarization without LVS falls back to VLM-only, which is a different (lower-quality) path — confirm with the user before substituting.
(If your caller has granted explicit pre-authorization to deploy autonomously — e.g. the request says "pre-authorized to deploy prerequisites", or you are running in a non-interactive evaluation harness with that permission — skip the confirmation and invoke
```
/deploy
```
directly.)
If the probe passes, proceed.

For LVS-specific service status, compose profile, ports, logs, or environment debugging, read

references/deploy-lvs-service.md

. The

deploy

skill remains canonical for full VSS profile deployment.

视频摘要工作流要求VSS lvs配置文件在

$HOST_IP

主机上运行。处理任何摘要请求前：

探测LVS微服务：

bash

curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null \
  && curl -sf --max-time 5 "http://${HOST_IP}:38111/v1/ready" >/dev/null

（端口38111为LVS服务端口。HTTP 200表示就绪；503表示仍在预热，请稍后重试。）

如果探测失败，询问用户：
"VSS
lvs
配置文件未在
$HOST_IP
上运行。是否现在使用
/deploy
技能并加上
-p lvs
参数进行部署？"
- 如果用户同意 → 移交至
```
/deploy
```
  技能。部署成功后返回此处。
- 如果用户拒绝 → 停止操作。无LVS的长视频摘要将 fallback 到仅VLM的路径，这是一种质量较低的方案——替换前请与用户确认。
（如果调用者已明确授权自主部署——例如请求中注明"已预先授权部署前置条件"，或您在具有该权限的非交互式评估环境中运行——则跳过确认步骤，直接调用
```
/deploy
```
。）
如果探测通过，继续执行后续步骤。

如需LVS特定服务状态、compose配置文件、端口、日志或环境调试信息，请阅读

references/deploy-lvs-service.md

。完整VSS配置文件部署以

deploy

技能为准。

Setup

配置

Endpoints (defaults for a local VSS deployment):

VLM NIM:
```
${VLM_BASE_URL}
```
— default
```
http://localhost:30082
```

LVS MS:

${LVS_BACKEND_URL}

— default

http://localhost:38111

VIOS: owned by the
```
vios
```
skill; refer there.

Endpoint resolution order:

If the env vars
```
VLM_BASE_URL
```
/
```
LVS_BACKEND_URL
```
are set, use them (strip a trailing
```
/v1
```
from
```
VLM_BASE_URL
```
— NIM exposes
```
/v1/...
```
and this skill appends it).
Otherwise use the defaults above.
If neither works, ask the user for the endpoints. Do not scan ports or read config files to guess them.

Model name: read

${VLM_NAME}

(default

nvidia/cosmos-reason2-8b

). Both VLM and LVS requests use the same model name.

For full LVS endpoint schemas, optional request fields, response envelopes, and error handling, read

references/lvs-api.md

Availability checks (run both before routing):

Readiness is determined by the HTTP status code only. Do not parse or inspect the response body — LVS's

/v1/ready

can legitimately return

with an empty body. Do not treat empty stdout from

curl

as "unavailable."

bash

undefined

端点（本地VSS部署默认值）：

VLM NIM:
```
${VLM_BASE_URL}
```
— 默认值为
```
http://localhost:30082
```

LVS微服务:

${LVS_BACKEND_URL}

— 默认值为

http://localhost:38111

VIOS: 由
```
vios
```
技能负责；请参考该技能文档。

端点解析顺序：

如果设置了环境变量
```
VLM_BASE_URL
```
/
```
LVS_BACKEND_URL
```
，则使用这些值（请移除
```
VLM_BASE_URL
```
末尾的
```
/v1
```
——NIM暴露
```
/v1/...
```
路径，本技能会自动追加）。
否则使用上述默认值。
如果两者均不可用，请询问用户获取端点信息。请勿扫描端口或读取配置文件猜测端点。

模型名称： 读取

${VLM_NAME}

（默认值为

nvidia/cosmos-reason2-8b

）。 VLM和LVS请求使用相同的模型名称。

如需完整LVS端点 schema、可选请求字段、响应包和错误处理，请阅读

references/lvs-api.md

。

可用性检查（路由前需运行两项检查）：

仅通过HTTP状态码判断就绪状态。 请勿解析或检查响应体——LVS的

/v1/ready

端点可能合法返回

且响应体为空。请勿将

curl

的空标准输出视为"不可用"。

bash

undefined

VLM: 200 on /v1/models

VLM: /v1/models返回200表示正常

vlm_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3
"${VLM_BASE_URL:-http://localhost:30082}/v1/models") [ "$vlm_code" = "200" ] && echo "VLM OK" || echo "VLM not reachable (HTTP $vlm_code)"

LVS: 200 on /v1/ready, with retry on 503 (warmup) for up to ~30s

LVS: /v1/ready返回200表示正常，503（预热中）时最多重试约30秒

LVS=${LVS_BACKEND_URL:-http://localhost:38111} lvs_code=000 for i in $(seq 1 10); do lvs_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$LVS/v1/ready") case "$lvs_code" in 200) echo "LVS OK"; break ;; 503) sleep 3 ;; # warming up; keep polling *) break ;; # any other code = not reachable, stop retrying esac done [ "$lvs_code" = "200" ] || echo "LVS not reachable (HTTP $lvs_code)"


**How to interpret the results:**

- `vlm_code = 200` and `lvs_code = 200` → normal routing (Step 2a for
  `<60s`, Step 2b for `>=60s`).
- `vlm_code != 200` → fail; summarization cannot run without the VLM.
- `vlm_code = 200`, `lvs_code != 200` → LVS is truly unavailable; use
  the VLM fallback path described above for long videos.
- A non-200 LVS code after the retry loop is the ONLY signal that LVS
  is unavailable. Empty stdout, missing JSON fields, or a "weird"
  response body are NOT "unavailable."

---

LVS=${LVS_BACKEND_URL:-http://localhost:38111} lvs_code=000 for i in $(seq 1 10); do lvs_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$LVS/v1/ready") case "$lvs_code" in 200) echo "LVS OK"; break ;; 503) sleep 3 ;; # 预热中；继续轮询 *) break ;; # 其他状态码表示不可用，停止重试 esac done [ "$lvs_code" = "200" ] || echo "LVS not reachable (HTTP $lvs_code)"


**结果解读：**

- `vlm_code = 200` 且 `lvs_code = 200` → 正常路由（短视频走步骤2a，长视频走步骤2b）。
- `vlm_code != 200` → 失败；无VLM则无法运行摘要服务。
- `vlm_code = 200`，`lvs_code != 200` → LVS确实不可用；长视频使用上述VLM fallback路径。
- 重试循环后LVS返回非200状态码是唯一的不可用信号。空标准输出、缺失JSON字段或"异常"响应体均不代表"不可用"。

---

Step 1 — Resolve the video to a clip URL (delegate to

vios

)

步骤1 — 将视频解析为剪辑URL（委托给

vios

技能）

Use the
vios
skill for all VIOS interactions — it owns the canonical curl recipes, parameter defaults, and delete/upload flows. Do not fabricate URLs or hand-roll VIOS calls here; they will drift.

From

vios

, you need exactly three things for summarization:

streamId
for the video (via
```
sensor/list
```
→
```
sensor/<id>/streams
```
, or directly from an upload response).
Timeline —
```
{startTime, endTime}
```
for the stream, ISO 8601 UTC.
```
endTime - startTime
```
is the duration that drives the routing decision below. Always compute; never assume.
Temporary MP4 clip URL — the
```
/storage/file/<streamId>/url
```
variant with
```
container=mp4
```
. The VLM and LVS both need an HTTP(S) URL they can
```
GET
```
; the
```
/url
```
variant is preferred over streaming bytes through the summarization client. Response field:
```
.videoUrl
```
.

Everything else (auth, error handling, upload,

disableAudio

, expiry, etc.) is covered in the

vios

skill — refer users there if the VIOS step fails.

所有VIOS交互均使用
vios
技能——它拥有标准curl脚本、参数默认值以及删除/上传流程。请勿在此处伪造URL或手动编写VIOS调用；否则会与标准流程脱节。

从

vios

技能中，您需要获取以下三项信息用于摘要生成：

streamId
视频流ID（通过
```
sensor/list
```
→
```
sensor/<id>/streams
```
获取，或直接从上传响应中获取）。
时间线 — 流的
```
{startTime, endTime}
```
，格式为ISO 8601 UTC。
```
endTime - startTime
```
即为驱动下方路由决策的时长。务必计算时长；切勿假设。
临时MP4剪辑URL — 使用
```
/storage/file/<streamId>/url
```
接口并指定
```
container=mp4
```
。VLM和LVS均需要可通过
```
GET
```
访问的HTTP(S) URL；
```
/url
```
接口优于通过摘要客户端流式传输字节。响应字段：
```
.videoUrl
```
。

其他所有内容（认证、错误处理、上传、

disableAudio

、过期时间等）均由

vios

技能覆盖——如果VIOS步骤失败，请引导用户参考该技能文档。

Step 2a — Short video (< 60s) → VLM direct

步骤2a — 短视频（< 60秒）→ 直接调用VLM

HITL: confirm the VLM prompt first (REQUIRED — do not skip)

人机交互（HITL）：先确认VLM提示词（必填——请勿跳过）

Full prompt-confirmation walk-through (questions to ask the user, examples, refusal handling) lives in

references/hitl-prompts.md

. Always run this step before calling the VLM.

完整的提示词确认流程（需询问用户的问题、示例、拒绝处理）请参考

references/hitl-prompts.md

。调用VLM前务必执行此步骤。

Call the VLM

调用VLM

Once the user confirms a prompt, send it as the

text

part of the VLM message. OpenAI-compatible chat completions with the video URL embedded in the message content:

bash

PROMPT='<confirmed_prompt_from_hitl>'

curl -s -X POST "${VLM_BASE_URL:-http://localhost:30082}/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "$(jq -n \
        --arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}" \
        --arg text "$PROMPT" \
        --arg url "<clip_url_from_vios>" \
        '{
          model: $model,
          temperature: 0.0,
          max_tokens: 1024,
          messages: [{
            role: "user",
            content: [
              {type: "text", text: $text},
              {type: "video_url", video_url: {url: $url}}
            ]
          }]
        }')" | jq -r '.choices[0].message.content'

Response: standard OpenAI chat-completion envelope. The summary is in

choices[0].message.content

Cosmos-model notes: Cosmos Reason 2 supports reasoning via

<think>...</think><answer>...</answer>

blocks. Omit the reasoning instructions if you want a plain summary. Frame sampling and pixel limits are applied server-side; no client-side prep is required when you pass a

video_url

用户确认提示词后，将其作为VLM消息的

text

部分发送。使用兼容OpenAI的聊天补全接口，将视频URL嵌入消息内容：

bash

PROMPT='<confirmed_prompt_from_hitl>'

curl -s -X POST "${VLM_BASE_URL:-http://localhost:30082}/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "$(jq -n \
        --arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}" \
        --arg text "$PROMPT" \
        --arg url "<clip_url_from_vios>" \
        '{
          model: $model,
          temperature: 0.0,
          max_tokens: 1024,
          messages: [{
            role: "user",
            content: [
              {type: "text", text: $text},
              {type: "video_url", video_url: {url: $url}}
            ]
          }]
        }')" | jq -r '.choices[0].message.content'

响应： 标准OpenAI聊天补全包。摘要内容位于

choices[0].message.content

。

Cosmos模型说明： Cosmos Reason 2支持通过

<think>...</think><answer>...</answer>

块进行推理。如果需要纯摘要，请省略推理指令。帧采样和像素限制在服务器端应用；传递

video_url

时无需客户端预处理。

Step 2b — Long video (>= 60s) → LVS microservice direct

步骤2b — 长视频（>= 60秒）→ 直接调用LVS微服务

This section contains the narrow long-video summarization path. For advanced LVS fields such as

media_info

schema

, structured output, chunk overlap, live stream timestamps, metrics, or recommended config, read

references/lvs-api.md

本节包含长视频摘要的核心流程。如需高级LVS字段（如

media_info

、

schema

、结构化输出、分片重叠、直播流时间戳、指标或推荐配置），请阅读

references/lvs-api.md

。

HITL: collect scenario and events first (REQUIRED — do not skip)

人机交互（HITL）：先收集场景和事件（必填——请勿跳过）

Full scenario/events collection walk-through lives in

references/hitl-prompts.md

. Always run this step before calling LVS.

完整的场景/事件收集流程请参考

references/hitl-prompts.md

。调用LVS前务必执行此步骤。

Extract the summary and events in one pipe:

一次性提取摘要和事件：

curl -s -X POST "${LVS_BACKEND_URL:-http://localhost:38111}/summarize"
-H "Content-Type: application/json"
-d @request.json
| jq -r '.choices[0].message.content'
| jq '{video_summary, events}'


If both `video_summary` and `events` come back empty, the clip probably
doesn't contain the requested events — re-run with different `events` or a
broader `scenario` rather than reporting "no content."

**Tuning:**

- `chunk_duration` (default `10`) — seconds per chunk. Smaller = finer
  timestamps, more VLM calls. Use `0` to send the whole video in one chunk.
- `num_frames_per_chunk` (default `20`) — frames sampled per chunk.
- `seed` (default `1`) — reproducibility; change or omit to get variety.

---

curl -s -X POST "${LVS_BACKEND_URL:-http://localhost:38111}/summarize"
-H "Content-Type: application/json"
-d @request.json
| jq -r '.choices[0].message.content'
| jq '{video_summary, events}'


如果`video_summary`和`events`均为空，可能是剪辑中不包含请求的事件——请使用不同的`events`或更宽泛的`scenario`重新运行，而非报告"无内容"。

**调优参数：**

- `chunk_duration`（默认值`10`）——每个分片的时长（秒）。值越小，时间戳越精细，VLM调用次数越多。设为`0`表示将整个视频作为一个分片发送。
- `num_frames_per_chunk`（默认值`20`）——每个分片采样的帧数。
- `seed`（默认值`1`）——用于结果复现；修改或省略该值可获得不同结果。

---

End-to-end examples

端到端示例

Assume the

vios

skill has already given you

$CLIP

(clip URL) and

$DURATION

(seconds) for the target video — those two values are the contract from Step 1.

假设

vios

技能已提供目标视频的

$CLIP

（剪辑URL）和

$DURATION

（时长，秒）——这两个值是步骤1的输出。

Short video (

$DURATION < 60

)

短视频（

$DURATION < 60

）

HITL (required, before the curl): post the Step 2a message, wait for

Submit

(or a

/generate

/refine

round-trip that ends in

Submit

), then set

PROMPT

to the confirmed text. Do not run the curl below until that confirmation has arrived.

bash

PROMPT='Describe in detail what is happening in this video,
including all visible people, vehicles, equipments, objects,
actions, and environmental conditions.
OUTPUT REQUIREMENTS:
[timestamp-timestamp] Description of what is happening.
EXAMPLE:
[0.0s-4.0s] <description of the first event>
[4.0s-12.0s] <description of the second event>'

curl -s -X POST "${VLM_BASE_URL:-http://localhost:30082}/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg url "$CLIP" --arg text "$PROMPT" \
        --arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}" '{
    model: $model,
    temperature: 0.0,
    max_tokens: 1024,
    messages: [{role:"user", content:[
      {type:"text", text:$text},
      {type:"video_url", video_url:{url:$url}}
    ]}]
  }')" | jq -r '.choices[0].message.content'

人机交互（必填，运行curl前）： 发送步骤2a的消息，等待用户点击

Submit

（或通过

/generate

/refine

循环最终确认），然后将

PROMPT

设置为确认后的文本。未获得确认前请勿运行下方curl命令。

bash

PROMPT='详细描述视频中的内容，
包括所有可见的人物、车辆、设备、物体、
动作和环境条件。
输出要求：
[时间戳-时间戳] 内容描述。
示例：
[0.0s-4.0s] <第一个事件的描述>
[4.0s-12.0s] <第二个事件的描述>'

curl -s -X POST "${VLM_BASE_URL:-http://localhost:30082}/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg url "$CLIP" --arg text "$PROMPT" \
        --arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}" '{
    model: $model,
    temperature: 0.0,
    max_tokens: 1024,
    messages: [{role:"user", content:[
      {type:"text", text:$text},
      {type:"video_url", video_url:{url:$url}}
    ]}]
  }')" | jq -r '.choices[0].message.content'

Long video (

$DURATION >= 60

)

长视频（

$DURATION >= 60

）

HITL (required, before the curl): post the Step 2b message and wait for the user's reply. Substitute their values (or the

defaults

opt-in) into

$SCENARIO

$EVENTS_JSON

, and

$OBJECTS_JSON

below. Do not run the curl without that reply.

bash

LVS=${LVS_BACKEND_URL:-http://localhost:38111}

人机交互（必填，运行curl前）： 发送步骤2b的消息并等待用户回复。将用户提供的值（或

defaults

选项）替换到下方的

$SCENARIO

、

$EVENTS_JSON

和

$OBJECTS_JSON

中。未获得回复前请勿运行curl命令。

bash

LVS=${LVS_BACKEND_URL:-http://localhost:38111}

From HITL reply:

来自人机交互回复：

SCENARIO='warehouse monitoring' # or whatever the user gave EVENTS_JSON='["notable activity"]' # jq-compatible JSON array OBJECTS_JSON='' # '' to omit, else '["cars","trucks"]'

SCENARIO='warehouse monitoring' # 或用户提供的其他场景 EVENTS_JSON='["notable activity"]' # 兼容jq的JSON数组 OBJECTS_JSON='' # 留空表示省略，否则填写'["cars","trucks"]'

Readiness = HTTP 200 on /v1/ready. Body may be empty — do not inspect it.

就绪状态判断：/v1/ready返回HTTP 200。响应体可能为空——请勿检查。

Retry on 503 (warmup) for up to ~30s before concluding LVS is unavailable.

503（预热中）时最多重试约30秒，再判定LVS不可用。

lvs_code=000 for i in $(seq 1 10); do lvs_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$LVS/v1/ready") case "$lvs_code" in 200) break ;; 503) sleep 3 ;; *) break ;; esac done

if [ "$lvs_code" = "200" ]; then curl -s -X POST "$LVS/summarize"
-H "Content-Type: application/json"
-d "$(jq -n --arg url "$CLIP"
--arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}"
--arg scenario "$SCENARIO"
--argjson events "$EVENTS_JSON"
--argjson objects "${OBJECTS_JSON:-null}" '{ url: $url, model: $model, scenario: $scenario, events: $events, chunk_duration: 10, num_frames_per_chunk: 20, seed: 1 } + (if $objects == null then {} else {objects_of_interest: $objects} end)')"
| jq -r '.choices[0].message.content' | jq '{video_summary, events}' else echo "⚠️ Note: video is ${DURATION}s long. LVS returned HTTP $lvs_code; falling back to VLM."

Fall back to the short-video VLM flow above (which itself requires

the Step 2a HITL confirmation before calling the VLM).

---

lvs_code=000 for i in $(seq 1 10); do lvs_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$LVS/v1/ready") case "$lvs_code" in 200) break ;; 503) sleep 3 ;; *) break ;; esac done

Fallback到上述短视频VLM流程（该流程本身需要步骤2a的人机交互确认后才能调用VLM）。

---

Responses

响应说明

VLM returns an OpenAI chat-completion envelope; the summary string is
```
choices[0].message.content
```
.

LVS returns the same envelope but

content

is a JSON string — run

jq -r '.choices[0].message.content' | jq

to reach

{video_summary, events}

Errors from VLM/LVS surface as HTTP non-2xx plus JSON
```
{error: ...}
```
.
```
503
```
from LVS typically means it is still warming up — wait and retry
```
v1/ready
```
.

VLM 返回OpenAI聊天补全包；摘要字符串位于
```
choices[0].message.content
```
。
LVS 返回相同格式的包，但
```
content
```
是JSON字符串——需运行
```
jq -r '.choices[0].message.content' | jq
```
来获取
```
{video_summary, events}
```
。
错误：VLM/LVS的错误表现为HTTP非2xx状态码加JSON格式的
```
{error: ...}
```
。 LVS返回503通常表示仍在预热中——请等待并重试
```
v1/ready
```
端点。

Presenting the output to the user (IMPORTANT — do not rewrite)

向用户展示输出（重要——请勿改写）

The VLM and LVS responses are the final user-facing product. Surface them with minimal transformation; do not paraphrase, re-voice, add emojis, or re-format into bullets/tables that weren't in the source.

Exactly one backend call, exactly one rendering. A single confirmed prompt (Step 2a) or a single confirmed scenario/events set (Step 2b) corresponds to exactly one

POST /v1/chat/completions

POST /summarize

request, and exactly one block of output to the user. Do NOT fan out parallel calls to hedge (e.g., one call for "full scene" plus another for "anomalies"), and do NOT render the same response twice with different headers. If the user wants a second pass (e.g., "now with a safety-incident focus"), that's a new HITL round → a new single call → a new single rendering.

Header line format. Start the response with exactly one header:

Summary of <video_name> (<duration>)

Use

<duration>

formatted as

Ns

for durations under 60 seconds (e.g.

25s

) and

Mm Ss

for durations ≥60 seconds (e.g.

3m 30s

). Never include the same header twice in different formats.

LVS output:

video_summary
(string) — render verbatim as the narrative summary. It is already a polished, tone-controlled "Observational Report"; the agent rewriting it loses fidelity (e.g., the model's neutral/formal voice becomes the agent's default voice, subtle phrasing gets smoothed out).
events
(list) — render each event with its
```
start_time
```
,
```
end_time
```
,
```
type
```
, and the full
```
description
```
verbatim. Pick a format that renders cleanly in the current client; you may use a table if the client renders them legibly, otherwise fall back to a per-event list. Do not shorten or paraphrase
```
description
```
.
You MAY add a one-line header identifying the video (e.g.
```
**Summary of <name>** (<duration>, scenario: <scenario>)
```
) and a closing offer to re-run with different parameters. You MAY NOT summarize, reorder, or interpret the content itself.

VLM output:

choices[0].message.content

is already the full assistant reply — render it verbatim. If the model produced

<think>...</think><answer>...</answer>

blocks, strip the

<think>

block and show the

<answer>

content (or the whole content if the tags are absent).

Fallback warning, when applicable, goes above the LVS/VLM output, not mixed into it.

VLM和LVS的响应是最终面向用户的产物。请以最小的转换展示它们；请勿释义、改写语气、添加表情符号，或重新格式化为源响应中没有的项目符号/表格。

一次后端调用，一次展示。 一个确认的提示词（步骤2a）或一组确认的场景/事件（步骤2b）对应恰好一次

POST /v1/chat/completions

或

POST /summarize

请求，以及恰好一段展示给用户的输出。请勿并行调用多个请求以规避风险（例如，一个请求用于"完整场景"，另一个用于"异常情况"），也请勿使用不同标题重复展示同一响应。如果用户需要重新生成（例如，"现在聚焦安全事件"），则需启动新的人机交互循环 → 新的单次调用 → 新的单次展示。

标题行格式。 响应开头需包含恰好一行标题：

Summary of <video_name> (<duration>)

<duration>

格式：60秒以下使用

Ns

（例如

25s

），60秒及以上使用

Mm Ss

（例如

3m 30s

）。请勿以不同格式重复显示同一标题。

LVS输出：

video_summary
（字符串）——原样展示为叙事性摘要。它已经是经过润色、语气受控的"观察报告"；助手改写会降低准确性（例如，模型的中立/正式语气会变为助手的默认语气，细微措辞会被简化）。
events
（列表）——原样展示每个事件的
```
start_time
```
、
```
end_time
```
、
```
type
```
和完整
```
description
```
。选择当前客户端能清晰展示的格式；如果客户端支持表格显示，可使用表格，否则退化为逐事件列表。请勿缩短或释义
```
description
```
。
您可以添加一行标题标识视频（例如
```
**Summary of <name>** (<duration>, scenario: <scenario>)
```
），并在结尾提供重新生成的选项。但您不得总结、重新排序或解读内容本身。

VLM输出：

choices[0].message.content

已是完整的助手回复——原样展示。如果模型生成了

<think>...</think><answer>...</answer>

块，请移除

<think>

块并展示

<answer>

内容（如果没有标签则展示全部内容）。

Fallback警告（如适用）需放在LVS/VLM输出上方，而非混入其中。

Tips

注意事项

HITL is not optional. Every summarization starts with the HITL message (Step 2a or 2b). Skipping it to "be efficient" is the single most common failure mode of this skill — do not do it.
LVS readiness = HTTP 200 on
/v1/ready
. Nothing else. The body is often empty (
```
size=0
```
). Do NOT pipe the readiness check through
```
head
```
,
```
jq
```
,
```
grep
```
, or any other command — bash will report the pipeline's last exit code, not curl's, and an empty body will look identical to a real failure. Use the
```
curl -s -o /dev/null -w '%{http_code}'
```
pattern from Setup → Availability checks verbatim.
Delegate VIOS to
vios
. Do not hand-roll clip-URL, timeline, or upload calls here — they'll drift from the canonical recipes.
Duration is authoritative. Don't route on filename or user hints; compute from the timeline returned by
```
vios
```
.
jq
twice for LVS. First unwraps the OpenAI-style envelope, second parses the JSON string inside
```
content
```
.
Do not rewrite LVS / VLM output. The
```
video_summary
```
from LVS and
```
choices[0].message.content
```
from VLM are the deliverables. Render them verbatim; don't paraphrase into your own voice or reformat. See Responses → Presenting the output to the user.
One call, one render. One confirmed HITL → one backend request → one block of output. No parallel hedging, no duplicate renderings with different headers.

人机交互（HITL）不可省略。 每次摘要生成都需从人机交互消息（步骤2a或2b）开始。跳过此步骤以"提高效率"是本技能最常见的失败原因——请勿这样做。
LVS就绪状态 =
/v1/ready
返回HTTP 200。仅此一条规则。响应体通常为空（
```
size=0
```
）。请勿将就绪检查通过
```
head
```
、
```
jq
```
、
```
grep
```
或其他命令处理——bash会返回管道的最后一个退出码，而非curl的退出码，空响应体将被视为真正的失败。请严格使用配置 → 可用性检查中的
```
curl -s -o /dev/null -w '%{http_code}'
```
格式。
VIOS操作委托给
vios
技能。请勿在此处手动编写剪辑URL、时间线或上传调用——否则会与标准流程脱节。
时长是权威依据。 请勿根据文件名或用户提示进行路由；务必根据
```
vios
```
返回的时间线计算时长。
LVS需两次使用
jq
。第一次解析OpenAI风格的包，第二次解析
```
content
```
中的JSON字符串。
请勿改写LVS/VLM输出。 LVS的
```
video_summary
```
和VLM的
```
choices[0].message.content
```
是交付产物。原样展示；请勿用自己的语气释义或重新格式化。请参考响应说明 → 向用户展示输出。
一次调用，一次展示。 一次确认的人机交互 → 一次后端请求 → 一段输出。请勿并行调用，请勿使用不同标题重复展示。

Cross-reference

交叉参考

deploy — bring up the
```
base
```
(VLM only) or
```
lvs
```
(VLM + LVS MS) profile
vios (VIOS API) — upload videos, list streams, get clip URLs
video-search — semantic search across the archive (different profile)
video-analytics — query incidents/events from Elasticsearch
LVS API reference —
```
references/lvs-api.md
```
LVS service ops reference —
```
references/deploy-lvs-service.md
```

deploy — 启动
```
base
```
（仅VLM）或
```
lvs
```
（VLM + LVS微服务）配置文件
vios（VIOS API）——上传视频、列出流、获取剪辑URL
video-search — 跨存档语义搜索（不同配置文件）
video-analytics — 从Elasticsearch查询事件/异常
LVS API参考 —
```
references/lvs-api.md
```
LVS服务运维参考 —
```
references/deploy-lvs-service.md
```

video-summarization

Original

Translation

Reference Map

参考映射

LVS API And Service Ops Requests

LVS API与服务运维请求

Routing

路由规则

Deployment Prerequisite For Summarization

摘要部署前置条件

Setup

配置

VLM: 200 on /v1/models

VLM: /v1/models返回200表示正常

LVS: 200 on /v1/ready, with retry on 503 (warmup) for up to ~30s

LVS: /v1/ready返回200表示正常，503（预热中）时最多重试约30秒

Step 1 — Resolve the video to a clip URL (delegate to vios)

步骤1 — 将视频解析为剪辑URL（委托给vios技能）

Step 2a — Short video (< 60s) → VLM direct

步骤2a — 短视频（< 60秒）→ 直接调用VLM

HITL: confirm the VLM prompt first (REQUIRED — do not skip)

人机交互（HITL）：先确认VLM提示词（必填——请勿跳过）

Call the VLM

调用VLM

Step 2b — Long video (>= 60s) → LVS microservice direct

步骤2b — 长视频（>= 60秒）→ 直接调用LVS微服务

HITL: collect scenario and events first (REQUIRED — do not skip)

人机交互（HITL）：先收集场景和事件（必填——请勿跳过）

Extract the summary and events in one pipe:

一次性提取摘要和事件：

End-to-end examples

端到端示例

Short video ($DURATION < 60)

短视频（$DURATION < 60）

Long video ($DURATION >= 60)

长视频（$DURATION >= 60）

From HITL reply:

来自人机交互回复：

Readiness = HTTP 200 on /v1/ready. Body may be empty — do not inspect it.

就绪状态判断：/v1/ready返回HTTP 200。响应体可能为空——请勿检查。

Retry on 503 (warmup) for up to ~30s before concluding LVS is unavailable.

503（预热中）时最多重试约30秒，再判定LVS不可用。

Fall back to the short-video VLM flow above (which itself requires

the Step 2a HITL confirmation before calling the VLM).

Fallback到上述短视频VLM流程（该流程本身需要步骤2a的人机交互确认后才能调用VLM）。

Responses

响应说明

Presenting the output to the user (IMPORTANT — do not rewrite)

向用户展示输出（重要——请勿改写）

Tips

注意事项

Cross-reference

交叉参考

Step 1 — Resolve the video to a clip URL (delegate to
`vios`
)

步骤1 — 将视频解析为剪辑URL（委托给
`vios`
技能）

Short video (
`$DURATION < 60`
)

短视频（
`$DURATION < 60`
）

Long video (
`$DURATION >= 60`
)

长视频（
`$DURATION >= 60`
）