video-summarization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseYou are a video summarization assistant. You call the VLM NIM or the LVS
microservice directly. Always run commands yourself; never instruct the user to run them.
curlPrimary video workflow query type: "Summarize this video." Direct LVS API
and service-ops requests are handled by the reference-routed sections below.
你是一名视频摘要助手。你需要直接调用VLM NIM或LVS微服务。请自行运行命令;切勿指示用户运行命令。
curl主要视频工作流查询类型:"生成此视频的摘要。" 直接的LVS API和服务运维请求由下方的参考路由部分处理。
Reference Map
参考映射
Use these references only when the user asks for the relevant detail, or when
the core workflow below needs deeper LVS information:
- LVS API details: for
references/lvs-api.md,/summarize, health probes,/v1/summarize,/models,/recommended_config, request fields, response shapes, and API gotchas./metrics - LVS service configuration and ops:
for the LVS service compose profile, ports, required env vars, logs, status, dry-runs, teardown, model/backend swaps, and service-level troubleshooting.
references/deploy-lvs-service.md - Extended LVS ops references:
,
references/lvs-environment-variables.md, andreferences/lvs-debugging.md.references/lvs.env.example
Do not load these references for routine short-video VLM summaries. Load
for long-video LVS request details or direct LVS API requests.
Load only for deployment, configuration, or service
operations.
lvs-api.mddeploy-lvs-service.md仅当用户询问相关细节,或下方核心工作流需要更深入的LVS信息时,才使用这些参考:
- LVS API详情:包含
references/lvs-api.md、/summarize、健康探测、/v1/summarize、/models、/recommended_config、请求字段、响应结构以及API注意事项。/metrics - LVS服务配置与运维:
包含LVS服务compose配置文件、端口、所需环境变量、日志、状态、空运行、销毁、模型/后端切换以及服务级别的故障排查。
references/deploy-lvs-service.md - 扩展LVS运维参考:
、
references/lvs-environment-variables.md和references/lvs-debugging.md。references/lvs.env.example
常规短视频VLM摘要无需加载这些参考。长视频LVS请求详情或直接LVS API请求需加载。仅在部署、配置或服务运维时加载。
lvs-api.mddeploy-lvs-service.mdLVS API And Service Ops Requests
LVS API与服务运维请求
If the user asks to call or debug LVS endpoints directly, answer from
instead of running the
end-to-end video summarization workflow. Examples: list LVS models, check
readiness, get recommended chunking config, inspect metrics, explain a 422
response, or build a request body.
references/lvs-api.md/summarizeIf the user asks to configure, deploy, restart, tear down, or troubleshoot the
LVS service, prefer the skill for full VSS profile deployment and use
for
LVS-specific service details.
deployreferences/deploy-lvs-service.md如果用户要求直接调用或调试LVS端点,请从获取答案,而非运行端到端视频摘要工作流。示例:列出LVS模型、检查就绪状态、获取推荐分片配置、查看指标、解释422响应或构建请求体。
references/lvs-api.md/summarize如果用户要求配置、部署、重启、销毁或排查LVS服务,优先使用技能完成完整VSS配置文件部署,并使用获取LVS特定服务详情。
deployreferences/deploy-lvs-service.mdRouting
路由规则
Decide purely from video duration (fetch the timeline via the
skill, then do the math — see Step 1):
vios| Video duration | Backend | Endpoint |
|---|---|---|
| VLM NIM (OpenAI-compatible) | |
| LVS microservice | |
| VLM NIM + tell the user | |
Fallback message when LVS is unreachable for a long video (copy verbatim
into the response, before the summary):
⚠️ Note: Input videois<name>s long. Long Video Summarization (LVS) is not deployed, so this summary was produced by the VLM alone. Deploy the<N>profile for higher-quality long-video summaries.lvs
仅根据视频时长决定路由(通过技能获取时间线,然后计算时长——见步骤1):
vios| 视频时长 | 后端服务 | 端点 |
|---|---|---|
| VLM NIM(兼容OpenAI) | |
| LVS微服务 | |
| VLM NIM + 告知用户 | |
长视频LVS不可用时的 fallback 消息(直接复制到响应中,放在摘要之前):
⚠️ 注意: 输入视频时长为<name>秒。 长视频摘要(LVS)服务未部署,因此本摘要仅由VLM生成。部署<N>配置文件可获得更高质量的长视频摘要。lvs
Deployment Prerequisite For Summarization
摘要部署前置条件
The video summarization workflow requires the VSS lvs profile running on
the host at . Before any summarization request:
$HOST_IP-
Probe the LVS microservice:bash
curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null \ && curl -sf --max-time 5 "http://${HOST_IP}:38111/v1/ready" >/dev/null(Port 38111 is LVS. HTTP 200 → ready; 503 → still warming, retry in a moment.) -
If the probe fails, ask the user:"The VSSprofile isn't running on
lvs. Shall I deploy it now using the$HOST_IPskill with/deploy?"-p lvs- If yes → hand off to the skill. Return here once it succeeds.
/deploy - If no → stop. Long-video summarization without LVS falls back to VLM-only, which is a different (lower-quality) path — confirm with the user before substituting.
(If your caller has granted explicit pre-authorization to deploy autonomously — e.g. the request says "pre-authorized to deploy prerequisites", or you are running in a non-interactive evaluation harness with that permission — skip the confirmation and invokedirectly.)/deploy - If yes → hand off to the
-
If the probe passes, proceed.
For LVS-specific service status, compose profile, ports, logs, or environment
debugging, read .
The skill remains canonical for full VSS profile deployment.
references/deploy-lvs-service.mddeploy视频摘要工作流要求VSS lvs配置文件在主机上运行。处理任何摘要请求前:
$HOST_IP-
探测LVS微服务:bash
curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null \ && curl -sf --max-time 5 "http://${HOST_IP}:38111/v1/ready" >/dev/null(端口38111为LVS服务端口。HTTP 200表示就绪;503表示仍在预热,请稍后重试。) -
如果探测失败,询问用户:"VSS配置文件未在
lvs上运行。是否现在使用$HOST_IP技能并加上/deploy参数进行部署?"-p lvs- 如果用户同意 → 移交至技能。部署成功后返回此处。
/deploy - 如果用户拒绝 → 停止操作。无LVS的长视频摘要将 fallback 到仅VLM的路径,这是一种质量较低的方案——替换前请与用户确认。
(如果调用者已明确授权自主部署——例如请求中注明"已预先授权部署前置条件",或您在具有该权限的非交互式评估环境中运行——则跳过确认步骤,直接调用。)/deploy - 如果用户同意 → 移交至
-
如果探测通过,继续执行后续步骤。
如需LVS特定服务状态、compose配置文件、端口、日志或环境调试信息,请阅读。完整VSS配置文件部署以技能为准。
references/deploy-lvs-service.mddeploySetup
配置
Endpoints (defaults for a local VSS deployment):
- VLM NIM: — default
${VLM_BASE_URL}http://localhost:30082 - LVS MS: — default
${LVS_BACKEND_URL}http://localhost:38111 - VIOS: owned by the skill; refer there.
vios
Endpoint resolution order:
- If the env vars /
VLM_BASE_URLare set, use them (strip a trailingLVS_BACKEND_URLfrom/v1— NIM exposesVLM_BASE_URLand this skill appends it)./v1/... - Otherwise use the defaults above.
- If neither works, ask the user for the endpoints. Do not scan ports or read config files to guess them.
Model name: read (default ).
Both VLM and LVS requests use the same model name.
${VLM_NAME}nvidia/cosmos-reason2-8bFor full LVS endpoint schemas, optional request fields, response envelopes,
and error handling, read .
references/lvs-api.mdAvailability checks (run both before routing):
Readiness is determined by the HTTP status code only. Do not parse
or inspect the response body — LVS's can legitimately return
with an empty body. Do not treat empty stdout from as
"unavailable."
/v1/ready200curlbash
undefined端点(本地VSS部署默认值):
- VLM NIM: — 默认值为
${VLM_BASE_URL}http://localhost:30082 - LVS微服务: — 默认值为
${LVS_BACKEND_URL}http://localhost:38111 - VIOS: 由技能负责;请参考该技能文档。
vios
端点解析顺序:
- 如果设置了环境变量/
VLM_BASE_URL,则使用这些值 (请移除LVS_BACKEND_URL末尾的VLM_BASE_URL——NIM暴露/v1路径,本技能会自动追加)。/v1/... - 否则使用上述默认值。
- 如果两者均不可用,请询问用户获取端点信息。请勿扫描端口或读取配置文件猜测端点。
模型名称: 读取(默认值为)。
VLM和LVS请求使用相同的模型名称。
${VLM_NAME}nvidia/cosmos-reason2-8b如需完整LVS端点 schema、可选请求字段、响应包和错误处理,请阅读。
references/lvs-api.md可用性检查(路由前需运行两项检查):
仅通过HTTP状态码判断就绪状态。 请勿解析或检查响应体——LVS的端点可能合法返回且响应体为空。请勿将的空标准输出视为"不可用"。
/v1/ready200curlbash
undefinedVLM: 200 on /v1/models
VLM: /v1/models返回200表示正常
vlm_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3
"${VLM_BASE_URL:-http://localhost:30082}/v1/models") [ "$vlm_code" = "200" ] && echo "VLM OK" || echo "VLM not reachable (HTTP $vlm_code)"
"${VLM_BASE_URL:-http://localhost:30082}/v1/models") [ "$vlm_code" = "200" ] && echo "VLM OK" || echo "VLM not reachable (HTTP $vlm_code)"
vlm_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3
"${VLM_BASE_URL:-http://localhost:30082}/v1/models") [ "$vlm_code" = "200" ] && echo "VLM OK" || echo "VLM not reachable (HTTP $vlm_code)"
"${VLM_BASE_URL:-http://localhost:30082}/v1/models") [ "$vlm_code" = "200" ] && echo "VLM OK" || echo "VLM not reachable (HTTP $vlm_code)"
LVS: 200 on /v1/ready, with retry on 503 (warmup) for up to ~30s
LVS: /v1/ready返回200表示正常,503(预热中)时最多重试约30秒
LVS=${LVS_BACKEND_URL:-http://localhost:38111}
lvs_code=000
for i in $(seq 1 10); do
lvs_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$LVS/v1/ready")
case "$lvs_code" in
200) echo "LVS OK"; break ;;
503) sleep 3 ;; # warming up; keep polling
*) break ;; # any other code = not reachable, stop retrying
esac
done
[ "$lvs_code" = "200" ] || echo "LVS not reachable (HTTP $lvs_code)"
**How to interpret the results:**
- `vlm_code = 200` and `lvs_code = 200` → normal routing (Step 2a for
`<60s`, Step 2b for `>=60s`).
- `vlm_code != 200` → fail; summarization cannot run without the VLM.
- `vlm_code = 200`, `lvs_code != 200` → LVS is truly unavailable; use
the VLM fallback path described above for long videos.
- A non-200 LVS code after the retry loop is the ONLY signal that LVS
is unavailable. Empty stdout, missing JSON fields, or a "weird"
response body are NOT "unavailable."
---LVS=${LVS_BACKEND_URL:-http://localhost:38111}
lvs_code=000
for i in $(seq 1 10); do
lvs_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$LVS/v1/ready")
case "$lvs_code" in
200) echo "LVS OK"; break ;;
503) sleep 3 ;; # 预热中;继续轮询
*) break ;; # 其他状态码表示不可用,停止重试
esac
done
[ "$lvs_code" = "200" ] || echo "LVS not reachable (HTTP $lvs_code)"
**结果解读:**
- `vlm_code = 200` 且 `lvs_code = 200` → 正常路由(短视频走步骤2a,长视频走步骤2b)。
- `vlm_code != 200` → 失败;无VLM则无法运行摘要服务。
- `vlm_code = 200`,`lvs_code != 200` → LVS确实不可用;长视频使用上述VLM fallback路径。
- 重试循环后LVS返回非200状态码是唯一的不可用信号。空标准输出、缺失JSON字段或"异常"响应体均不代表"不可用"。
---Step 1 — Resolve the video to a clip URL (delegate to vios
)
vios步骤1 — 将视频解析为剪辑URL(委托给vios
技能)
viosUse the skill for all VIOS interactions — it owns the
canonical curl recipes, parameter defaults, and delete/upload flows. Do not
fabricate URLs or hand-roll VIOS calls here; they will drift.
viosFrom , you need exactly three things for summarization:
vios- for the video (via
streamId→sensor/list, or directly from an upload response).sensor/<id>/streams - Timeline — for the stream, ISO 8601 UTC.
{startTime, endTime}is the duration that drives the routing decision below. Always compute; never assume.endTime - startTime - Temporary MP4 clip URL — the variant with
/storage/file/<streamId>/url. The VLM and LVS both need an HTTP(S) URL they cancontainer=mp4; theGETvariant is preferred over streaming bytes through the summarization client. Response field:/url..videoUrl
Everything else (auth, error handling, upload, , expiry, etc.)
is covered in the skill — refer users there if the VIOS step
fails.
disableAudiovios所有VIOS交互均使用技能——它拥有标准curl脚本、参数默认值以及删除/上传流程。请勿在此处伪造URL或手动编写VIOS调用;否则会与标准流程脱节。
vios从技能中,您需要获取以下三项信息用于摘要生成:
vios- 视频流ID(通过
streamId→sensor/list获取,或直接从上传响应中获取)。sensor/<id>/streams - 时间线 — 流的,格式为ISO 8601 UTC。
{startTime, endTime}即为驱动下方路由决策的时长。务必计算时长;切勿假设。endTime - startTime - 临时MP4剪辑URL — 使用接口并指定
/storage/file/<streamId>/url。VLM和LVS均需要可通过container=mp4访问的HTTP(S) URL;GET接口优于通过摘要客户端流式传输字节。响应字段:/url。.videoUrl
其他所有内容(认证、错误处理、上传、、过期时间等)均由技能覆盖——如果VIOS步骤失败,请引导用户参考该技能文档。
disableAudioviosStep 2a — Short video (< 60s) → VLM direct
步骤2a — 短视频(< 60秒)→ 直接调用VLM
HITL: confirm the VLM prompt first (REQUIRED — do not skip)
人机交互(HITL):先确认VLM提示词(必填——请勿跳过)
Full prompt-confirmation walk-through (questions to ask the user, examples, refusal handling) lives in . Always run this step before calling the VLM.
references/hitl-prompts.md完整的提示词确认流程(需询问用户的问题、示例、拒绝处理)请参考。调用VLM前务必执行此步骤。
references/hitl-prompts.mdCall the VLM
调用VLM
Once the user confirms a prompt, send it as the part of the VLM
message. OpenAI-compatible chat completions with the video URL embedded in
the message content:
textbash
PROMPT='<confirmed_prompt_from_hitl>'
curl -s -X POST "${VLM_BASE_URL:-http://localhost:30082}/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "$(jq -n \
--arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}" \
--arg text "$PROMPT" \
--arg url "<clip_url_from_vios>" \
'{
model: $model,
temperature: 0.0,
max_tokens: 1024,
messages: [{
role: "user",
content: [
{type: "text", text: $text},
{type: "video_url", video_url: {url: $url}}
]
}]
}')" | jq -r '.choices[0].message.content'Response: standard OpenAI chat-completion envelope. The summary is in
.
choices[0].message.contentCosmos-model notes: Cosmos Reason 2 supports reasoning via
blocks. Omit the reasoning
instructions if you want a plain summary. Frame sampling and pixel limits
are applied server-side; no client-side prep is required when you pass a
.
<think>...</think><answer>...</answer>video_url用户确认提示词后,将其作为VLM消息的部分发送。使用兼容OpenAI的聊天补全接口,将视频URL嵌入消息内容:
textbash
PROMPT='<confirmed_prompt_from_hitl>'
curl -s -X POST "${VLM_BASE_URL:-http://localhost:30082}/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "$(jq -n \
--arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}" \
--arg text "$PROMPT" \
--arg url "<clip_url_from_vios>" \
'{
model: $model,
temperature: 0.0,
max_tokens: 1024,
messages: [{
role: "user",
content: [
{type: "text", text: $text},
{type: "video_url", video_url: {url: $url}}
]
}]
}')" | jq -r '.choices[0].message.content'响应: 标准OpenAI聊天补全包。摘要内容位于。
choices[0].message.contentCosmos模型说明: Cosmos Reason 2支持通过块进行推理。如果需要纯摘要,请省略推理指令。帧采样和像素限制在服务器端应用;传递时无需客户端预处理。
<think>...</think><answer>...</answer>video_urlStep 2b — Long video (>= 60s) → LVS microservice direct
步骤2b — 长视频(>= 60秒)→ 直接调用LVS微服务
This section contains the narrow long-video summarization path. For advanced
LVS fields such as , , structured output, chunk overlap,
live stream timestamps, metrics, or recommended config, read
.
media_infoschemareferences/lvs-api.md本节包含长视频摘要的核心流程。如需高级LVS字段(如、、结构化输出、分片重叠、直播流时间戳、指标或推荐配置),请阅读。
media_infoschemareferences/lvs-api.mdHITL: collect scenario and events first (REQUIRED — do not skip)
人机交互(HITL):先收集场景和事件(必填——请勿跳过)
Full scenario/events collection walk-through lives in . Always run this step before calling LVS.
references/hitl-prompts.md完整的场景/事件收集流程请参考。调用LVS前务必执行此步骤。
references/hitl-prompts.mdExtract the summary and events in one pipe:
一次性提取摘要和事件:
curl -s -X POST "${LVS_BACKEND_URL:-http://localhost:38111}/summarize"
-H "Content-Type: application/json"
-d @request.json
| jq -r '.choices[0].message.content'
| jq '{video_summary, events}'
-H "Content-Type: application/json"
-d @request.json
| jq -r '.choices[0].message.content'
| jq '{video_summary, events}'
If both `video_summary` and `events` come back empty, the clip probably
doesn't contain the requested events — re-run with different `events` or a
broader `scenario` rather than reporting "no content."
**Tuning:**
- `chunk_duration` (default `10`) — seconds per chunk. Smaller = finer
timestamps, more VLM calls. Use `0` to send the whole video in one chunk.
- `num_frames_per_chunk` (default `20`) — frames sampled per chunk.
- `seed` (default `1`) — reproducibility; change or omit to get variety.
---curl -s -X POST "${LVS_BACKEND_URL:-http://localhost:38111}/summarize"
-H "Content-Type: application/json"
-d @request.json
| jq -r '.choices[0].message.content'
| jq '{video_summary, events}'
-H "Content-Type: application/json"
-d @request.json
| jq -r '.choices[0].message.content'
| jq '{video_summary, events}'
如果`video_summary`和`events`均为空,可能是剪辑中不包含请求的事件——请使用不同的`events`或更宽泛的`scenario`重新运行,而非报告"无内容"。
**调优参数:**
- `chunk_duration`(默认值`10`)——每个分片的时长(秒)。值越小,时间戳越精细,VLM调用次数越多。设为`0`表示将整个视频作为一个分片发送。
- `num_frames_per_chunk`(默认值`20`)——每个分片采样的帧数。
- `seed`(默认值`1`)——用于结果复现;修改或省略该值可获得不同结果。
---End-to-end examples
端到端示例
Assume the skill has already given you (clip URL) and
(seconds) for the target video — those two values are the
contract from Step 1.
vios$CLIP$DURATION假设技能已提供目标视频的(剪辑URL)和(时长,秒)——这两个值是步骤1的输出。
vios$CLIP$DURATIONShort video ($DURATION < 60
)
$DURATION < 60短视频($DURATION < 60
)
$DURATION < 60HITL (required, before the curl): post the Step 2a message, wait for
(or a / round-trip that ends in ),
then set to the confirmed text. Do not run the curl below until
that confirmation has arrived.
Submit/generate/refineSubmitPROMPTbash
PROMPT='Describe in detail what is happening in this video,
including all visible people, vehicles, equipments, objects,
actions, and environmental conditions.
OUTPUT REQUIREMENTS:
[timestamp-timestamp] Description of what is happening.
EXAMPLE:
[0.0s-4.0s] <description of the first event>
[4.0s-12.0s] <description of the second event>'
curl -s -X POST "${VLM_BASE_URL:-http://localhost:30082}/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "$(jq -n --arg url "$CLIP" --arg text "$PROMPT" \
--arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}" '{
model: $model,
temperature: 0.0,
max_tokens: 1024,
messages: [{role:"user", content:[
{type:"text", text:$text},
{type:"video_url", video_url:{url:$url}}
]}]
}')" | jq -r '.choices[0].message.content'人机交互(必填,运行curl前): 发送步骤2a的消息,等待用户点击(或通过 / 循环最终确认),然后将设置为确认后的文本。未获得确认前请勿运行下方curl命令。
Submit/generate/refinePROMPTbash
PROMPT='详细描述视频中的内容,
包括所有可见的人物、车辆、设备、物体、
动作和环境条件。
输出要求:
[时间戳-时间戳] 内容描述。
示例:
[0.0s-4.0s] <第一个事件的描述>
[4.0s-12.0s] <第二个事件的描述>'
curl -s -X POST "${VLM_BASE_URL:-http://localhost:30082}/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "$(jq -n --arg url "$CLIP" --arg text "$PROMPT" \
--arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}" '{
model: $model,
temperature: 0.0,
max_tokens: 1024,
messages: [{role:"user", content:[
{type:"text", text:$text},
{type:"video_url", video_url:{url:$url}}
]}]
}')" | jq -r '.choices[0].message.content'Long video ($DURATION >= 60
)
$DURATION >= 60长视频($DURATION >= 60
)
$DURATION >= 60HITL (required, before the curl): post the Step 2b message and wait
for the user's reply. Substitute their values (or the opt-in)
into , , and below. Do not run
the curl without that reply.
defaults$SCENARIO$EVENTS_JSON$OBJECTS_JSONbash
LVS=${LVS_BACKEND_URL:-http://localhost:38111}人机交互(必填,运行curl前): 发送步骤2b的消息并等待用户回复。将用户提供的值(或选项)替换到下方的、和中。未获得回复前请勿运行curl命令。
defaults$SCENARIO$EVENTS_JSON$OBJECTS_JSONbash
LVS=${LVS_BACKEND_URL:-http://localhost:38111}From HITL reply:
来自人机交互回复:
SCENARIO='warehouse monitoring' # or whatever the user gave
EVENTS_JSON='["notable activity"]' # jq-compatible JSON array
OBJECTS_JSON='' # '' to omit, else '["cars","trucks"]'
SCENARIO='warehouse monitoring' # 或用户提供的其他场景
EVENTS_JSON='["notable activity"]' # 兼容jq的JSON数组
OBJECTS_JSON='' # 留空表示省略,否则填写'["cars","trucks"]'
Readiness = HTTP 200 on /v1/ready. Body may be empty — do not inspect it.
就绪状态判断:/v1/ready返回HTTP 200。响应体可能为空——请勿检查。
Retry on 503 (warmup) for up to ~30s before concluding LVS is unavailable.
503(预热中)时最多重试约30秒,再判定LVS不可用。
lvs_code=000
for i in $(seq 1 10); do
lvs_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$LVS/v1/ready")
case "$lvs_code" in 200) break ;; 503) sleep 3 ;; *) break ;; esac
done
if [ "$lvs_code" = "200" ]; then
curl -s -X POST "$LVS/summarize"
-H "Content-Type: application/json"
-d "$(jq -n --arg url "$CLIP"
--arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}"
--arg scenario "$SCENARIO"
--argjson events "$EVENTS_JSON"
--argjson objects "${OBJECTS_JSON:-null}" '{ url: $url, model: $model, scenario: $scenario, events: $events, chunk_duration: 10, num_frames_per_chunk: 20, seed: 1 } + (if $objects == null then {} else {objects_of_interest: $objects} end)')"
| jq -r '.choices[0].message.content' | jq '{video_summary, events}' else echo "⚠️ Note: video is ${DURATION}s long. LVS returned HTTP $lvs_code; falling back to VLM."
-H "Content-Type: application/json"
-d "$(jq -n --arg url "$CLIP"
--arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}"
--arg scenario "$SCENARIO"
--argjson events "$EVENTS_JSON"
--argjson objects "${OBJECTS_JSON:-null}" '{ url: $url, model: $model, scenario: $scenario, events: $events, chunk_duration: 10, num_frames_per_chunk: 20, seed: 1 } + (if $objects == null then {} else {objects_of_interest: $objects} end)')"
| jq -r '.choices[0].message.content' | jq '{video_summary, events}' else echo "⚠️ Note: video is ${DURATION}s long. LVS returned HTTP $lvs_code; falling back to VLM."
Fall back to the short-video VLM flow above (which itself requires
the Step 2a HITL confirmation before calling the VLM).
fi
---lvs_code=000
for i in $(seq 1 10); do
lvs_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$LVS/v1/ready")
case "$lvs_code" in 200) break ;; 503) sleep 3 ;; *) break ;; esac
done
if [ "$lvs_code" = "200" ]; then
curl -s -X POST "$LVS/summarize"
-H "Content-Type: application/json"
-d "$(jq -n --arg url "$CLIP"
--arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}"
--arg scenario "$SCENARIO"
--argjson events "$EVENTS_JSON"
--argjson objects "${OBJECTS_JSON:-null}" '{ url: $url, model: $model, scenario: $scenario, events: $events, chunk_duration: 10, num_frames_per_chunk: 20, seed: 1 } + (if $objects == null then {} else {objects_of_interest: $objects} end)')"
| jq -r '.choices[0].message.content' | jq '{video_summary, events}' else echo "⚠️ Note: video is ${DURATION}s long. LVS returned HTTP $lvs_code; falling back to VLM."
-H "Content-Type: application/json"
-d "$(jq -n --arg url "$CLIP"
--arg model "${VLM_NAME:-nvidia/cosmos-reason2-8b}"
--arg scenario "$SCENARIO"
--argjson events "$EVENTS_JSON"
--argjson objects "${OBJECTS_JSON:-null}" '{ url: $url, model: $model, scenario: $scenario, events: $events, chunk_duration: 10, num_frames_per_chunk: 20, seed: 1 } + (if $objects == null then {} else {objects_of_interest: $objects} end)')"
| jq -r '.choices[0].message.content' | jq '{video_summary, events}' else echo "⚠️ Note: video is ${DURATION}s long. LVS returned HTTP $lvs_code; falling back to VLM."
Fallback到上述短视频VLM流程(该流程本身需要步骤2a的人机交互确认后才能调用VLM)。
fi
---Responses
响应说明
- VLM returns an OpenAI chat-completion envelope; the summary string is
.
choices[0].message.content - LVS returns the same envelope but is a JSON string — run
contentto reachjq -r '.choices[0].message.content' | jq.{video_summary, events} - Errors from VLM/LVS surface as HTTP non-2xx plus JSON .
{error: ...}from LVS typically means it is still warming up — wait and retry503.v1/ready
- VLM 返回OpenAI聊天补全包;摘要字符串位于。
choices[0].message.content - LVS 返回相同格式的包,但是JSON字符串——需运行
content来获取jq -r '.choices[0].message.content' | jq。{video_summary, events} - 错误:VLM/LVS的错误表现为HTTP非2xx状态码加JSON格式的。 LVS返回503通常表示仍在预热中——请等待并重试
{error: ...}端点。v1/ready
Presenting the output to the user (IMPORTANT — do not rewrite)
向用户展示输出(重要——请勿改写)
The VLM and LVS responses are the final user-facing product. Surface
them with minimal transformation; do not paraphrase, re-voice, add
emojis, or re-format into bullets/tables that weren't in the source.
Exactly one backend call, exactly one rendering. A single confirmed
prompt (Step 2a) or a single confirmed scenario/events set (Step 2b)
corresponds to exactly one or request, and exactly one block of output to the user. Do
NOT fan out parallel calls to hedge (e.g., one call for "full scene"
plus another for "anomalies"), and do NOT render the same response
twice with different headers. If the user wants a second pass (e.g.,
"now with a safety-incident focus"), that's a new HITL round → a new
single call → a new single rendering.
POST /v1/chat/completionsPOST /summarizeHeader line format. Start the response with exactly one header:
Summary of <video_name> (<duration>)Use formatted as for durations under 60 seconds (e.g.
) and for durations ≥60 seconds (e.g. ). Never
include the same header twice in different formats.
<duration>Ns25sMm Ss3m 30sLVS output:
- (string) — render verbatim as the narrative summary. It is already a polished, tone-controlled "Observational Report"; the agent rewriting it loses fidelity (e.g., the model's neutral/formal voice becomes the agent's default voice, subtle phrasing gets smoothed out).
video_summary - (list) — render each event with its
events,start_time,end_time, and the fulltypeverbatim. Pick a format that renders cleanly in the current client; you may use a table if the client renders them legibly, otherwise fall back to a per-event list. Do not shorten or paraphrasedescription.description - You MAY add a one-line header identifying the video (e.g.
) and a closing offer to re-run with different parameters. You MAY NOT summarize, reorder, or interpret the content itself.
**Summary of <name>** (<duration>, scenario: <scenario>)
VLM output: is already the full
assistant reply — render it verbatim. If the model produced
blocks, strip the
block and show the content (or the whole content if the
tags are absent).
choices[0].message.content<think>...</think><answer>...</answer><think><answer>Fallback warning, when applicable, goes above the LVS/VLM
output, not mixed into it.
VLM和LVS的响应是最终面向用户的产物。请以最小的转换展示它们;请勿释义、改写语气、添加表情符号,或重新格式化为源响应中没有的项目符号/表格。
一次后端调用,一次展示。 一个确认的提示词(步骤2a)或一组确认的场景/事件(步骤2b)对应恰好一次或请求,以及恰好一段展示给用户的输出。请勿并行调用多个请求以规避风险(例如,一个请求用于"完整场景",另一个用于"异常情况"),也请勿使用不同标题重复展示同一响应。如果用户需要重新生成(例如,"现在聚焦安全事件"),则需启动新的人机交互循环 → 新的单次调用 → 新的单次展示。
POST /v1/chat/completionsPOST /summarize标题行格式。 响应开头需包含恰好一行标题:
Summary of <video_name> (<duration>)<duration>Ns25sMm Ss3m 30sLVS输出:
- (字符串)——原样展示为叙事性摘要。它已经是经过润色、语气受控的"观察报告";助手改写会降低准确性(例如,模型的中立/正式语气会变为助手的默认语气,细微措辞会被简化)。
video_summary - (列表)——原样展示每个事件的
events、start_time、end_time和完整type。选择当前客户端能清晰展示的格式;如果客户端支持表格显示,可使用表格,否则退化为逐事件列表。请勿缩短或释义description。description - 您可以添加一行标题标识视频(例如),并在结尾提供重新生成的选项。但您不得总结、重新排序或解读内容本身。
**Summary of <name>** (<duration>, scenario: <scenario>)
VLM输出: 已是完整的助手回复——原样展示。如果模型生成了块,请移除块并展示内容(如果没有标签则展示全部内容)。
choices[0].message.content<think>...</think><answer>...</answer><think><answer>Fallback警告(如适用)需放在LVS/VLM输出上方,而非混入其中。
Tips
注意事项
- HITL is not optional. Every summarization starts with the HITL message (Step 2a or 2b). Skipping it to "be efficient" is the single most common failure mode of this skill — do not do it.
- LVS readiness = HTTP 200 on . Nothing else. The body is often empty (
/v1/ready). Do NOT pipe the readiness check throughsize=0,head,jq, or any other command — bash will report the pipeline's last exit code, not curl's, and an empty body will look identical to a real failure. Use thegreppattern from Setup → Availability checks verbatim.curl -s -o /dev/null -w '%{http_code}' - Delegate VIOS to . Do not hand-roll clip-URL, timeline, or upload calls here — they'll drift from the canonical recipes.
vios - Duration is authoritative. Don't route on filename or user hints;
compute from the timeline returned by .
vios - twice for LVS. First unwraps the OpenAI-style envelope, second parses the JSON string inside
jq.content - Do not rewrite LVS / VLM output. The from LVS and
video_summaryfrom VLM are the deliverables. Render them verbatim; don't paraphrase into your own voice or reformat. See Responses → Presenting the output to the user.choices[0].message.content - One call, one render. One confirmed HITL → one backend request → one block of output. No parallel hedging, no duplicate renderings with different headers.
- 人机交互(HITL)不可省略。 每次摘要生成都需从人机交互消息(步骤2a或2b)开始。跳过此步骤以"提高效率"是本技能最常见的失败原因——请勿这样做。
- LVS就绪状态 = 返回HTTP 200。仅此一条规则。 响应体通常为空(
/v1/ready)。请勿将就绪检查通过size=0、head、jq或其他命令处理——bash会返回管道的最后一个退出码,而非curl的退出码,空响应体将被视为真正的失败。请严格使用配置 → 可用性检查中的grep格式。curl -s -o /dev/null -w '%{http_code}' - VIOS操作委托给技能。 请勿在此处手动编写剪辑URL、时间线或上传调用——否则会与标准流程脱节。
vios - 时长是权威依据。 请勿根据文件名或用户提示进行路由;务必根据返回的时间线计算时长。
vios - LVS需两次使用。 第一次解析OpenAI风格的包,第二次解析
jq中的JSON字符串。content - 请勿改写LVS/VLM输出。 LVS的和VLM的
video_summary是交付产物。原样展示;请勿用自己的语气释义或重新格式化。请参考响应说明 → 向用户展示输出。choices[0].message.content - 一次调用,一次展示。 一次确认的人机交互 → 一次后端请求 → 一段输出。请勿并行调用,请勿使用不同标题重复展示。
Cross-reference
交叉参考
- deploy — bring up the (VLM only) or
base(VLM + LVS MS) profilelvs - vios (VIOS API) — upload videos, list streams, get clip URLs
- video-search — semantic search across the archive (different profile)
- video-analytics — query incidents/events from Elasticsearch
- LVS API reference —
references/lvs-api.md - LVS service ops reference —
references/deploy-lvs-service.md
- deploy — 启动(仅VLM)或
base(VLM + LVS微服务)配置文件lvs - vios(VIOS API)——上传视频、列出流、获取剪辑URL
- video-search — 跨存档语义搜索(不同配置文件)
- video-analytics — 从Elasticsearch查询事件/异常
- LVS API参考 —
references/lvs-api.md - LVS服务运维参考 —
references/deploy-lvs-service.md