video-summarization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
You are a video summarization assistant. You call the VLM NIM or the LVS microservice directly. Always run
curl
commands yourself; never instruct the user to run them.
Primary video workflow query type: "Summarize this video." Direct LVS API and service-ops requests are handled by the reference-routed sections below.
你是一名视频摘要助手。你需要直接调用VLM NIM或LVS微服务。请自行运行
curl
命令;切勿指示用户运行命令。
主要视频工作流查询类型:"生成此视频的摘要。" 直接的LVS API和服务运维请求由下方的参考路由部分处理。

Reference Map

参考映射

Use these references only when the user asks for the relevant detail, or when the core workflow below needs deeper LVS information:
  • LVS API details:
    references/lvs-api.md
    for
    /summarize
    ,
    /v1/summarize
    , health probes,
    /models
    ,
    /recommended_config
    ,
    /metrics
    , request fields, response shapes, and API gotchas.
  • LVS service configuration and ops:
    references/deploy-lvs-service.md
    for the LVS service compose profile, ports, required env vars, logs, status, dry-runs, teardown, model/backend swaps, and service-level troubleshooting.
  • Extended LVS ops references:
    references/lvs-environment-variables.md
    ,
    references/lvs-debugging.md
    , and
    references/lvs.env.example
    .
Do not load these references for routine short-video VLM summaries. Load
lvs-api.md
for long-video LVS request details or direct LVS API requests. Load
deploy-lvs-service.md
only for deployment, configuration, or service operations.
仅当用户询问相关细节,或下方核心工作流需要更深入的LVS信息时,才使用这些参考:
  • LVS API详情
    references/lvs-api.md
    包含
    /summarize
    /v1/summarize
    、健康探测、
    /models
    /recommended_config
    /metrics
    、请求字段、响应结构以及API注意事项。
  • LVS服务配置与运维
    references/deploy-lvs-service.md
    包含LVS服务compose配置文件、端口、所需环境变量、日志、状态、空运行、销毁、模型/后端切换以及服务级别的故障排查。
  • 扩展LVS运维参考
    references/lvs-environment-variables.md
    references/lvs-debugging.md
    references/lvs.env.example
常规短视频VLM摘要无需加载这些参考。长视频LVS请求详情或直接LVS API请求需加载
lvs-api.md
。仅在部署、配置或服务运维时加载
deploy-lvs-service.md

LVS API And Service Ops Requests

LVS API与服务运维请求

If the user asks to call or debug LVS endpoints directly, answer from
references/lvs-api.md
instead of running the end-to-end video summarization workflow. Examples: list LVS models, check readiness, get recommended chunking config, inspect metrics, explain a 422 response, or build a
/summarize
request body.
If the user asks to configure, deploy, restart, tear down, or troubleshoot the LVS service, prefer the
deploy
skill for full VSS profile deployment and use
references/deploy-lvs-service.md
for LVS-specific service details.
如果用户要求直接调用或调试LVS端点,请从
references/lvs-api.md
获取答案,而非运行端到端视频摘要工作流。示例:列出LVS模型、检查就绪状态、获取推荐分片配置、查看指标、解释422响应或构建
/summarize
请求体。
如果用户要求配置、部署、重启、销毁或排查LVS服务,优先使用
deploy
技能完成完整VSS配置文件部署,并使用
references/deploy-lvs-service.md
获取LVS特定服务详情。

Routing

路由规则

Decide purely from video duration (fetch the timeline via the
vios
skill, then do the math — see Step 1):
Video durationBackendEndpoint
< 60s
(short)
VLM NIM (OpenAI-compatible)
POST ${VLM_BASE_URL}/v1/chat/completions
>= 60s
(long), LVS available
LVS microservice
POST ${LVS_BACKEND_URL}/summarize
>= 60s
, LVS not reachable
VLM NIM + tell the user
POST ${VLM_BASE_URL}/v1/chat/completions
Fallback message when LVS is unreachable for a long video (copy verbatim into the response, before the summary):
⚠️ Note: Input video
<name>
is
<N>
s long. Long Video Summarization (LVS) is not deployed, so this summary was produced by the VLM alone. Deploy the
lvs
profile for higher-quality long-video summaries.
仅根据视频时长决定路由(通过
vios
技能获取时间线,然后计算时长——见步骤1):
视频时长后端服务端点
< 60s
(短视频)
VLM NIM(兼容OpenAI)
POST ${VLM_BASE_URL}/v1/chat/completions
>= 60s
(长视频),LVS可用
LVS微服务
POST ${LVS_BACKEND_URL}/summarize
>= 60s
,LVS不可用
VLM NIM + 告知用户
POST ${VLM_BASE_URL}/v1/chat/completions
长视频LVS不可用时的 fallback 消息(直接复制到响应中,放在摘要之前):
⚠️ 注意: 输入视频
<name>
时长为
<N>
秒。 长视频摘要(LVS)服务未部署,因此本摘要仅由VLM生成。部署
lvs
配置文件可获得更高质量的长视频摘要。

Deployment Prerequisite For Summarization

摘要部署前置条件

The video summarization workflow requires the VSS lvs profile running on the host at
$HOST_IP
. Before any summarization request:
  1. Probe the LVS microservice:
    bash
    curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null \
      && curl -sf --max-time 5 "http://${HOST_IP}:38111/v1/ready" >/dev/null
    (Port 38111 is LVS. HTTP 200 → ready; 503 → still warming, retry in a moment.)
  2. If the probe fails, ask the user:
    "The VSS
    lvs
    profile isn't running on
    $HOST_IP
    . Shall I deploy it now using the
    /deploy
    skill with
    -p lvs
    ?"
    • If yes → hand off to the
      /deploy
      skill. Return here once it succeeds.
    • If no → stop. Long-video summarization without LVS falls back to VLM-only, which is a different (lower-quality) path — confirm with the user before substituting.
    (If your caller has granted explicit pre-authorization to deploy autonomously — e.g. the request says "pre-authorized to deploy prerequisites", or you are running in a non-interactive evaluation harness with that permission — skip the confirmation and invoke
    /deploy
    directly.)
  3. If the probe passes, proceed.
For LVS-specific service status, compose profile, ports, logs, or environment debugging, read
references/deploy-lvs-service.md
. The
deploy
skill remains canonical for full VSS profile deployment.

视频摘要工作流要求VSS lvs配置文件在
$HOST_IP
主机上运行。处理任何摘要请求前:
  1. 探测LVS微服务:
    bash
    curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null \
      && curl -sf --max-time 5 "http://${HOST_IP}:38111/v1/ready" >/dev/null
    (端口38111为LVS服务端口。HTTP 200表示就绪;503表示仍在预热,请稍后重试。)
  2. 如果探测失败,询问用户:
    "VSS
    lvs
    配置文件未在
    $HOST_IP
    上运行。是否现在使用
    /deploy
    技能并加上
    -p lvs
    参数进行部署?"
    • 如果用户同意 → 移交至
      /deploy
      技能。部署成功后返回此处。
    • 如果用户拒绝 → 停止操作。无LVS的长视频摘要将 fallback 到仅VLM的路径,这是一种质量较低的方案——替换前请与用户确认。
    (如果调用者已明确授权自主部署——例如请求中注明"已预先授权部署前置条件",或您在具有该权限的非交互式评估环境中运行——则跳过确认步骤,直接调用
    /deploy
    。)
  3. 如果探测通过,继续执行后续步骤。
如需LVS特定服务状态、compose配置文件、端口、日志或环境调试信息,请阅读
references/deploy-lvs-service.md
。完整VSS配置文件部署以
deploy
技能为准。

Setup

配置

Endpoints (defaults for a local VSS deployment):
  • VLM NIM:
    ${VLM_BASE_URL}
    — default
    http://localhost:30082
  • LVS MS:
    ${LVS_BACKEND_URL}
    — default
    http://localhost:38111
  • VIOS: owned by the
    vios
    skill; refer there.
Endpoint resolution order:
  1. If the env vars
    VLM_BASE_URL
    /
    LVS_BACKEND_URL
    are set, use them (strip a trailing
    /v1
    from
    VLM_BASE_URL
    — NIM exposes
    /v1/...
    and this skill appends it).
  2. Otherwise use the defaults above.
  3. If neither works, ask the user for the endpoints. Do not scan ports or read config files to guess them.
Model name: read
${VLM_NAME}
(default
nvidia/cosmos-reason2-8b
). Both VLM and LVS requests use the same model name.
For full LVS endpoint schemas, optional request fields, response envelopes, and error handling, read
references/lvs-api.md
.
Availability checks (run both before routing):
Readiness is determined by the HTTP status code only. Do not parse or inspect the response body — LVS's
/v1/ready
can legitimately return
200
with an empty body. Do not treat empty stdout from
curl
as "unavailable."
bash
undefined
端点(本地VSS部署默认值):
  • VLM NIM:
    ${VLM_BASE_URL}
    — 默认值为
    http://localhost:30082
  • LVS微服务:
    ${LVS_BACKEND_URL}
    — 默认值为
    http://localhost:38111
  • VIOS: 由
    vios
    技能负责;请参考该技能文档。
端点解析顺序:
  1. 如果设置了环境变量
    VLM_BASE_URL
    /
    LVS_BACKEND_URL
    ,则使用这些值 (请移除
    VLM_BASE_URL
    末尾的
    /v1
    ——NIM暴露
    /v1/...
    路径,本技能会自动追加)。
  2. 否则使用上述默认值。
  3. 如果两者均不可用,请询问用户获取端点信息。请勿扫描端口或读取配置文件猜测端点。
模型名称: 读取
${VLM_NAME}
(默认值为
nvidia/cosmos-reason2-8b
)。 VLM和LVS请求使用相同的模型名称。
如需完整LVS端点 schema、可选请求字段、响应包和错误处理,请阅读
references/lvs-api.md
可用性检查(路由前需运行两项检查):
仅通过HTTP状态码判断就绪状态。 请勿解析或检查响应体——LVS的
/v1/ready
端点可能合法返回
200
且响应体为空。请勿将
curl
的空标准输出视为"不可用"。
bash
undefined

VLM: 200 on /v1/models

VLM: /v1/models返回200表示正常

vlm_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3
"${VLM_BASE_URL:-http://localhost:30082}/v1/models") [ "$vlm_code" = "200" ] && echo "VLM OK" || echo "VLM not reachable (HTTP $vlm_code)"
vlm_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3
"${VLM_BASE_URL:-http://localhost:30082}/v1/models") [ "$vlm_code" = "200" ] && echo "VLM OK" || echo "VLM not reachable (HTTP $vlm_code)"

LVS: 200 on /v1/ready, with retry on 503 (warmup) for up to ~30s

LVS: /v1/ready返回200表示正常,503(预热中)时最多重试约30秒

LVS=${LVS_BACKEND_URL:-http://localhost:38111} lvs_code=000 for i in $(seq 1 10); do lvs_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$LVS/v1/ready") case "$lvs_code" in 200) echo "LVS OK"; break ;; 503) sleep 3 ;; # warming up; keep polling *) break ;; # any other code = not reachable, stop retrying esac done [ "$lvs_code" = "200" ] || echo "LVS not reachable (HTTP $lvs_code)"

**How to interpret the results:**

- `vlm_code = 200` and `lvs_code = 200` → normal routing (Step 2a for
  `<60s`, Step 2b for `>=60s`).
- `vlm_code != 200` → fail; summarization cannot run without the VLM.
- `vlm_code = 200`, `lvs_code != 200` → LVS is truly unavailable; use
  the VLM fallback path described above for long videos.
- A non-200 LVS code after the retry loop is the ONLY signal that LVS
  is unavailable. Empty stdout, missing JSON fields, or a "weird"
  response body are NOT "unavailable."

---
LVS=${LVS_BACKEND_URL:-http://localhost:38111} lvs_code=000 for i in $(seq 1 10); do lvs_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$LVS/v1/ready") case "$lvs_code" in 200) echo "LVS OK"; break ;; 503) sleep 3 ;; # 预热中;继续轮询 *) break ;; # 其他状态码表示不可用,停止重试 esac done [ "$lvs_code" = "200" ] || echo "LVS not reachable (HTTP $lvs_code)"

**结果解读:**

- `vlm_code = 200` 且 `lvs_code = 200` → 正常路由(短视频走步骤2a,长视频走步骤2b)。
- `vlm_code != 200` → 失败;无VLM则无法运行摘要服务。
- `vlm_code = 200`,`lvs_code != 200` → LVS确实不可用;长视频使用上述VLM fallback路径。
- 重试循环后LVS返回非200状态码是唯一的不可用信号。空标准输出、缺失JSON字段或"异常"响应体均不代表"不可用"。

---

Step 1 — Resolve the video to a clip URL (delegate to
vios
)

步骤1 — 将视频解析为剪辑URL(委托给
vios
技能)

Use the
vios
skill for all VIOS interactions
— it owns the canonical curl recipes, parameter defaults, and delete/upload flows. Do not fabricate URLs or hand-roll VIOS calls here; they will drift.
From
vios
, you need exactly three things for summarization:
  1. streamId
    for the video (via
    sensor/list
    sensor/<id>/streams
    , or directly from an upload response).
  2. Timeline
    {startTime, endTime}
    for the stream, ISO 8601 UTC.
    endTime - startTime
    is the duration that drives the routing decision below. Always compute; never assume.
  3. Temporary MP4 clip URL — the
    /storage/file/<streamId>/url
    variant with
    container=mp4
    . The VLM and LVS both need an HTTP(S) URL they can
    GET
    ; the
    /url
    variant is preferred over streaming bytes through the summarization client. Response field:
    .videoUrl
    .
Everything else (auth, error handling, upload,
disableAudio
, expiry, etc.) is covered in the
vios
skill — refer users there if the VIOS step fails.

所有VIOS交互均使用
vios
技能
——它拥有标准curl脚本、参数默认值以及删除/上传流程。请勿在此处伪造URL或手动编写VIOS调用;否则会与标准流程脱节。
vios
技能中,您需要获取以下三项信息用于摘要生成:
  1. streamId
    视频流ID(通过
    sensor/list
    sensor/<id>/streams
    获取,或直接从上传响应中获取)。
  2. 时间线 — 流的
    {startTime, endTime}
    ,格式为ISO 8601 UTC。
    endTime - startTime
    即为驱动下方路由决策的时长。务必计算时长;切勿假设。
  3. 临时MP4剪辑URL — 使用
    /storage/file/<streamId>/url
    接口并指定
    container=mp4
    。VLM和LVS均需要可通过
    GET
    访问的HTTP(S) URL;
    /url
    接口优于通过摘要客户端流式传输字节。响应字段:
    .videoUrl
其他所有内容(认证、错误处理、上传、
disableAudio
、过期时间等)均由
vios
技能覆盖——如果VIOS步骤失败,请引导用户参考该技能文档。

Step 2a — Short video (< 60s) → VLM direct

步骤2a — 短视频(< 60秒)→ 直接调用VLM

HITL: confirm the VLM prompt first (REQUIRED — do not skip)

人机交互(HITL):先确认VLM提示词(必填——请勿跳过)

Full prompt-confirmation walk-through (questions to ask the user, examples, refusal handling) lives in
references/hitl-prompts.md
. Always run this step before calling the VLM.
完整的提示词确认流程(需询问用户的问题、示例、拒绝处理)请参考
references/hitl-prompts.md
。调用VLM前务必执行此步骤。

Call the VLM

调用VLM

Once the user confirms a prompt, send it as the
text
part of the VLM message. OpenAI-compatible chat completions with the video URL embedded in the message content:
bash
PROMPT='<confirmed_prompt_from_hitl>'

curl -s -X POST "${VLM_BASE_URL:-http://localhost:30082}/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "$(jq -n \
        --arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}" \
        --arg text "$PROMPT" \
        --arg url "<clip_url_from_vios>" \
        '{
          model: $model,
          temperature: 0.0,
          max_tokens: 1024,
          messages: [{
            role: "user",
            content: [
              {type: "text", text: $text},
              {type: "video_url", video_url: {url: $url}}
            ]
          }]
        }')" | jq -r '.choices[0].message.content'
Response: standard OpenAI chat-completion envelope. The summary is in
choices[0].message.content
.
Cosmos-model notes: Cosmos Reason 2 supports reasoning via
<think>...</think><answer>...</answer>
blocks. Omit the reasoning instructions if you want a plain summary. Frame sampling and pixel limits are applied server-side; no client-side prep is required when you pass a
video_url
.

用户确认提示词后,将其作为VLM消息的
text
部分发送。使用兼容OpenAI的聊天补全接口,将视频URL嵌入消息内容:
bash
PROMPT='<confirmed_prompt_from_hitl>'

curl -s -X POST "${VLM_BASE_URL:-http://localhost:30082}/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "$(jq -n \
        --arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}" \
        --arg text "$PROMPT" \
        --arg url "<clip_url_from_vios>" \
        '{
          model: $model,
          temperature: 0.0,
          max_tokens: 1024,
          messages: [{
            role: "user",
            content: [
              {type: "text", text: $text},
              {type: "video_url", video_url: {url: $url}}
            ]
          }]
        }')" | jq -r '.choices[0].message.content'
响应: 标准OpenAI聊天补全包。摘要内容位于
choices[0].message.content
Cosmos模型说明: Cosmos Reason 2支持通过
<think>...</think><answer>...</answer>
块进行推理。如果需要纯摘要,请省略推理指令。帧采样和像素限制在服务器端应用;传递
video_url
时无需客户端预处理。

Step 2b — Long video (>= 60s) → LVS microservice direct

步骤2b — 长视频(>= 60秒)→ 直接调用LVS微服务

This section contains the narrow long-video summarization path. For advanced LVS fields such as
media_info
,
schema
, structured output, chunk overlap, live stream timestamps, metrics, or recommended config, read
references/lvs-api.md
.
本节包含长视频摘要的核心流程。如需高级LVS字段(如
media_info
schema
、结构化输出、分片重叠、直播流时间戳、指标或推荐配置),请阅读
references/lvs-api.md

HITL: collect scenario and events first (REQUIRED — do not skip)

人机交互(HITL):先收集场景和事件(必填——请勿跳过)

Full scenario/events collection walk-through lives in
references/hitl-prompts.md
. Always run this step before calling LVS.
完整的场景/事件收集流程请参考
references/hitl-prompts.md
。调用LVS前务必执行此步骤。

Extract the summary and events in one pipe:

一次性提取摘要和事件:

curl -s -X POST "${LVS_BACKEND_URL:-http://localhost:38111}/summarize"
-H "Content-Type: application/json"
-d @request.json
| jq -r '.choices[0].message.content'
| jq '{video_summary, events}'

If both `video_summary` and `events` come back empty, the clip probably
doesn't contain the requested events — re-run with different `events` or a
broader `scenario` rather than reporting "no content."

**Tuning:**

- `chunk_duration` (default `10`) — seconds per chunk. Smaller = finer
  timestamps, more VLM calls. Use `0` to send the whole video in one chunk.
- `num_frames_per_chunk` (default `20`) — frames sampled per chunk.
- `seed` (default `1`) — reproducibility; change or omit to get variety.

---
curl -s -X POST "${LVS_BACKEND_URL:-http://localhost:38111}/summarize"
-H "Content-Type: application/json"
-d @request.json
| jq -r '.choices[0].message.content'
| jq '{video_summary, events}'

如果`video_summary`和`events`均为空,可能是剪辑中不包含请求的事件——请使用不同的`events`或更宽泛的`scenario`重新运行,而非报告"无内容"。

**调优参数:**

- `chunk_duration`(默认值`10`)——每个分片的时长(秒)。值越小,时间戳越精细,VLM调用次数越多。设为`0`表示将整个视频作为一个分片发送。
- `num_frames_per_chunk`(默认值`20`)——每个分片采样的帧数。
- `seed`(默认值`1`)——用于结果复现;修改或省略该值可获得不同结果。

---

End-to-end examples

端到端示例

Assume the
vios
skill has already given you
$CLIP
(clip URL) and
$DURATION
(seconds) for the target video — those two values are the contract from Step 1.
假设
vios
技能已提供目标视频的
$CLIP
(剪辑URL)和
$DURATION
(时长,秒)——这两个值是步骤1的输出。

Short video (
$DURATION < 60
)

短视频(
$DURATION < 60

HITL (required, before the curl): post the Step 2a message, wait for
Submit
(or a
/generate
/
/refine
round-trip that ends in
Submit
), then set
PROMPT
to the confirmed text. Do not run the curl below until that confirmation has arrived.
bash
PROMPT='Describe in detail what is happening in this video,
including all visible people, vehicles, equipments, objects,
actions, and environmental conditions.
OUTPUT REQUIREMENTS:
[timestamp-timestamp] Description of what is happening.
EXAMPLE:
[0.0s-4.0s] <description of the first event>
[4.0s-12.0s] <description of the second event>'

curl -s -X POST "${VLM_BASE_URL:-http://localhost:30082}/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg url "$CLIP" --arg text "$PROMPT" \
        --arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}" '{
    model: $model,
    temperature: 0.0,
    max_tokens: 1024,
    messages: [{role:"user", content:[
      {type:"text", text:$text},
      {type:"video_url", video_url:{url:$url}}
    ]}]
  }')" | jq -r '.choices[0].message.content'
人机交互(必填,运行curl前): 发送步骤2a的消息,等待用户点击
Submit
(或通过
/generate
/
/refine
循环最终确认),然后将
PROMPT
设置为确认后的文本。未获得确认前请勿运行下方curl命令。
bash
PROMPT='详细描述视频中的内容,
包括所有可见的人物、车辆、设备、物体、
动作和环境条件。
输出要求:
[时间戳-时间戳] 内容描述。
示例:
[0.0s-4.0s] <第一个事件的描述>
[4.0s-12.0s] <第二个事件的描述>'

curl -s -X POST "${VLM_BASE_URL:-http://localhost:30082}/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg url "$CLIP" --arg text "$PROMPT" \
        --arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}" '{
    model: $model,
    temperature: 0.0,
    max_tokens: 1024,
    messages: [{role:"user", content:[
      {type:"text", text:$text},
      {type:"video_url", video_url:{url:$url}}
    ]}]
  }')" | jq -r '.choices[0].message.content'

Long video (
$DURATION >= 60
)

长视频(
$DURATION >= 60

HITL (required, before the curl): post the Step 2b message and wait for the user's reply. Substitute their values (or the
defaults
opt-in) into
$SCENARIO
,
$EVENTS_JSON
, and
$OBJECTS_JSON
below. Do not run the curl without that reply.
bash
LVS=${LVS_BACKEND_URL:-http://localhost:38111}
人机交互(必填,运行curl前): 发送步骤2b的消息并等待用户回复。将用户提供的值(或
defaults
选项)替换到下方的
$SCENARIO
$EVENTS_JSON
$OBJECTS_JSON
中。未获得回复前请勿运行curl命令。
bash
LVS=${LVS_BACKEND_URL:-http://localhost:38111}

From HITL reply:

来自人机交互回复:

SCENARIO='warehouse monitoring' # or whatever the user gave EVENTS_JSON='["notable activity"]' # jq-compatible JSON array OBJECTS_JSON='' # '' to omit, else '["cars","trucks"]'
SCENARIO='warehouse monitoring' # 或用户提供的其他场景 EVENTS_JSON='["notable activity"]' # 兼容jq的JSON数组 OBJECTS_JSON='' # 留空表示省略,否则填写'["cars","trucks"]'

Readiness = HTTP 200 on /v1/ready. Body may be empty — do not inspect it.

就绪状态判断:/v1/ready返回HTTP 200。响应体可能为空——请勿检查。

Retry on 503 (warmup) for up to ~30s before concluding LVS is unavailable.

503(预热中)时最多重试约30秒,再判定LVS不可用。

lvs_code=000 for i in $(seq 1 10); do lvs_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$LVS/v1/ready") case "$lvs_code" in 200) break ;; 503) sleep 3 ;; *) break ;; esac done
if [ "$lvs_code" = "200" ]; then curl -s -X POST "$LVS/summarize"
-H "Content-Type: application/json"
-d "$(jq -n --arg url "$CLIP"
--arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}"
--arg scenario "$SCENARIO"
--argjson events "$EVENTS_JSON"
--argjson objects "${OBJECTS_JSON:-null}" '{ url: $url, model: $model, scenario: $scenario, events: $events, chunk_duration: 10, num_frames_per_chunk: 20, seed: 1 } + (if $objects == null then {} else {objects_of_interest: $objects} end)')"
| jq -r '.choices[0].message.content' | jq '{video_summary, events}' else echo "⚠️ Note: video is ${DURATION}s long. LVS returned HTTP $lvs_code; falling back to VLM."

Fall back to the short-video VLM flow above (which itself requires

the Step 2a HITL confirmation before calling the VLM).

fi

---
lvs_code=000 for i in $(seq 1 10); do lvs_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$LVS/v1/ready") case "$lvs_code" in 200) break ;; 503) sleep 3 ;; *) break ;; esac done
if [ "$lvs_code" = "200" ]; then curl -s -X POST "$LVS/summarize"
-H "Content-Type: application/json"
-d "$(jq -n --arg url "$CLIP"
--arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}"
--arg scenario "$SCENARIO"
--argjson events "$EVENTS_JSON"
--argjson objects "${OBJECTS_JSON:-null}" '{ url: $url, model: $model, scenario: $scenario, events: $events, chunk_duration: 10, num_frames_per_chunk: 20, seed: 1 } + (if $objects == null then {} else {objects_of_interest: $objects} end)')"
| jq -r '.choices[0].message.content' | jq '{video_summary, events}' else echo "⚠️ Note: video is ${DURATION}s long. LVS returned HTTP $lvs_code; falling back to VLM."

Fallback到上述短视频VLM流程(该流程本身需要步骤2a的人机交互确认后才能调用VLM)。

fi

---

Responses

响应说明

  • VLM returns an OpenAI chat-completion envelope; the summary string is
    choices[0].message.content
    .
  • LVS returns the same envelope but
    content
    is a JSON string — run
    jq -r '.choices[0].message.content' | jq
    to reach
    {video_summary, events}
    .
  • Errors from VLM/LVS surface as HTTP non-2xx plus JSON
    {error: ...}
    .
    503
    from LVS typically means it is still warming up — wait and retry
    v1/ready
    .
  • VLM 返回OpenAI聊天补全包;摘要字符串位于
    choices[0].message.content
  • LVS 返回相同格式的包,但
    content
    是JSON字符串——需运行
    jq -r '.choices[0].message.content' | jq
    来获取
    {video_summary, events}
  • 错误:VLM/LVS的错误表现为HTTP非2xx状态码加JSON格式的
    {error: ...}
    。 LVS返回503通常表示仍在预热中——请等待并重试
    v1/ready
    端点。

Presenting the output to the user (IMPORTANT — do not rewrite)

向用户展示输出(重要——请勿改写)

The VLM and LVS responses are the final user-facing product. Surface them with minimal transformation; do not paraphrase, re-voice, add emojis, or re-format into bullets/tables that weren't in the source.
Exactly one backend call, exactly one rendering. A single confirmed prompt (Step 2a) or a single confirmed scenario/events set (Step 2b) corresponds to exactly one
POST /v1/chat/completions
or
POST /summarize
request, and exactly one block of output to the user. Do NOT fan out parallel calls to hedge (e.g., one call for "full scene" plus another for "anomalies"), and do NOT render the same response twice with different headers. If the user wants a second pass (e.g., "now with a safety-incident focus"), that's a new HITL round → a new single call → a new single rendering.
Header line format. Start the response with exactly one header:
Summary of <video_name> (<duration>)
Use
<duration>
formatted as
Ns
for durations under 60 seconds (e.g.
25s
) and
Mm Ss
for durations ≥60 seconds (e.g.
3m 30s
). Never include the same header twice in different formats.
LVS output:
  • video_summary
    (string) — render verbatim as the narrative summary. It is already a polished, tone-controlled "Observational Report"; the agent rewriting it loses fidelity (e.g., the model's neutral/formal voice becomes the agent's default voice, subtle phrasing gets smoothed out).
  • events
    (list) — render each event with its
    start_time
    ,
    end_time
    ,
    type
    , and the full
    description
    verbatim. Pick a format that renders cleanly in the current client; you may use a table if the client renders them legibly, otherwise fall back to a per-event list. Do not shorten or paraphrase
    description
    .
  • You MAY add a one-line header identifying the video (e.g.
    **Summary of <name>** (<duration>, scenario: <scenario>)
    ) and a closing offer to re-run with different parameters. You MAY NOT summarize, reorder, or interpret the content itself.
VLM output:
choices[0].message.content
is already the full assistant reply — render it verbatim. If the model produced
<think>...</think><answer>...</answer>
blocks, strip the
<think>
block and show the
<answer>
content (or the whole content if the tags are absent).
Fallback warning, when applicable, goes above the LVS/VLM output, not mixed into it.
VLM和LVS的响应是最终面向用户的产物。请以最小的转换展示它们;请勿释义、改写语气、添加表情符号,或重新格式化为源响应中没有的项目符号/表格。
一次后端调用,一次展示。 一个确认的提示词(步骤2a)或一组确认的场景/事件(步骤2b)对应恰好一次
POST /v1/chat/completions
POST /summarize
请求,以及恰好一段展示给用户的输出。请勿并行调用多个请求以规避风险(例如,一个请求用于"完整场景",另一个用于"异常情况"),也请勿使用不同标题重复展示同一响应。如果用户需要重新生成(例如,"现在聚焦安全事件"),则需启动新的人机交互循环 → 新的单次调用 → 新的单次展示。
标题行格式。 响应开头需包含恰好一行标题:
Summary of <video_name> (<duration>)
<duration>
格式:60秒以下使用
Ns
(例如
25s
),60秒及以上使用
Mm Ss
(例如
3m 30s
)。请勿以不同格式重复显示同一标题。
LVS输出:
  • video_summary
    (字符串)——原样展示为叙事性摘要。它已经是经过润色、语气受控的"观察报告";助手改写会降低准确性(例如,模型的中立/正式语气会变为助手的默认语气,细微措辞会被简化)。
  • events
    (列表)——原样展示每个事件的
    start_time
    end_time
    type
    和完整
    description
    。选择当前客户端能清晰展示的格式;如果客户端支持表格显示,可使用表格,否则退化为逐事件列表。请勿缩短或释义
    description
  • 您可以添加一行标题标识视频(例如
    **Summary of <name>** (<duration>, scenario: <scenario>)
    ),并在结尾提供重新生成的选项。但您不得总结、重新排序或解读内容本身。
VLM输出:
choices[0].message.content
已是完整的助手回复——原样展示。如果模型生成了
<think>...</think><answer>...</answer>
块,请移除
<think>
块并展示
<answer>
内容(如果没有标签则展示全部内容)。
Fallback警告(如适用)需放在LVS/VLM输出上方,而非混入其中。

Tips

注意事项

  • HITL is not optional. Every summarization starts with the HITL message (Step 2a or 2b). Skipping it to "be efficient" is the single most common failure mode of this skill — do not do it.
  • LVS readiness = HTTP 200 on
    /v1/ready
    . Nothing else.
    The body is often empty (
    size=0
    ). Do NOT pipe the readiness check through
    head
    ,
    jq
    ,
    grep
    , or any other command — bash will report the pipeline's last exit code, not curl's, and an empty body will look identical to a real failure. Use the
    curl -s -o /dev/null -w '%{http_code}'
    pattern from Setup → Availability checks verbatim.
  • Delegate VIOS to
    vios
    .
    Do not hand-roll clip-URL, timeline, or upload calls here — they'll drift from the canonical recipes.
  • Duration is authoritative. Don't route on filename or user hints; compute from the timeline returned by
    vios
    .
  • jq
    twice for LVS.
    First unwraps the OpenAI-style envelope, second parses the JSON string inside
    content
    .
  • Do not rewrite LVS / VLM output. The
    video_summary
    from LVS and
    choices[0].message.content
    from VLM are the deliverables. Render them verbatim; don't paraphrase into your own voice or reformat. See Responses → Presenting the output to the user.
  • One call, one render. One confirmed HITL → one backend request → one block of output. No parallel hedging, no duplicate renderings with different headers.
  • 人机交互(HITL)不可省略。 每次摘要生成都需从人机交互消息(步骤2a或2b)开始。跳过此步骤以"提高效率"是本技能最常见的失败原因——请勿这样做。
  • LVS就绪状态 =
    /v1/ready
    返回HTTP 200。仅此一条规则。
    响应体通常为空(
    size=0
    )。请勿将就绪检查通过
    head
    jq
    grep
    或其他命令处理——bash会返回管道的最后一个退出码,而非curl的退出码,空响应体将被视为真正的失败。请严格使用配置 → 可用性检查中的
    curl -s -o /dev/null -w '%{http_code}'
    格式。
  • VIOS操作委托给
    vios
    技能。
    请勿在此处手动编写剪辑URL、时间线或上传调用——否则会与标准流程脱节。
  • 时长是权威依据。 请勿根据文件名或用户提示进行路由;务必根据
    vios
    返回的时间线计算时长。
  • LVS需两次使用
    jq
    第一次解析OpenAI风格的包,第二次解析
    content
    中的JSON字符串。
  • 请勿改写LVS/VLM输出。 LVS的
    video_summary
    和VLM的
    choices[0].message.content
    是交付产物。原样展示;请勿用自己的语气释义或重新格式化。请参考响应说明 → 向用户展示输出
  • 一次调用,一次展示。 一次确认的人机交互 → 一次后端请求 → 一段输出。请勿并行调用,请勿使用不同标题重复展示。

Cross-reference

交叉参考

  • deploy — bring up the
    base
    (VLM only) or
    lvs
    (VLM + LVS MS) profile
  • vios (VIOS API) — upload videos, list streams, get clip URLs
  • video-search — semantic search across the archive (different profile)
  • video-analytics — query incidents/events from Elasticsearch
  • LVS API reference
    references/lvs-api.md
  • LVS service ops reference
    references/deploy-lvs-service.md
  • deploy — 启动
    base
    (仅VLM)或
    lvs
    (VLM + LVS微服务)配置文件
  • vios(VIOS API)——上传视频、列出流、获取剪辑URL
  • video-search — 跨存档语义搜索(不同配置文件)
  • video-analytics — 从Elasticsearch查询事件/异常
  • LVS API参考
    references/lvs-api.md
  • LVS服务运维参考
    references/deploy-lvs-service.md