rt-vlm
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRTVI VLM Usage API (VSS 3.1)
RTVI VLM 使用API(VSS 3.1)
RTVI VLM is NVIDIA's real-time vision-language microservice: decode video (file or
RTSP) → segment into chunks → run a VLM (, , or any
OpenAI-compatible model) → stream dense captions back over SSE/HTTP and publish
captions + incident alerts + errors to Kafka. Use this skill whenever you need to hit
any endpoint on the VSS 3.1 rtvi-vlm microservice: caption generation, file
upload, live-stream management, health checks, NIM-compatible chat completions,
Prometheus metrics. API reference: https://docs.nvidia.com/vss/latest/real-time-vlm-api.html.
cosmos-reason1cosmos-reason2/v1/...RTVI VLM是NVIDIA的实时视觉语言微服务:解码视频(文件或
RTSP)→分割为片段→运行VLM模型(、或任何
兼容OpenAI的模型)→通过SSE/HTTP流式返回密集字幕,并将字幕+事件告警+错误发布到Kafka。当您需要调用VSS 3.1 rtvi-vlm微服务的任何端点时,请使用此技能:字幕生成、文件上传、实时流管理、健康检查、兼容NIM的聊天补全、Prometheus指标。API参考:https://docs.nvidia.com/vss/latest/real-time-vlm-api.html。
cosmos-reason1cosmos-reason2/v1/...Setup
配置
bash
export BASE_URL="http://localhost:8000" # RTVI VLM host:port — matches $RTVI_VLM_PORT in compose
export API_KEY="$NGC_API_KEY" # Bearer token (NGC key works if the service was deployed with NGC auth)Every request below uses . Health endpoints
(, , , ) typically work without auth.
Authorization: Bearer $API_KEY/v1/health/*/v1/ready/v1/live/v1/startupSmoke test before use:
bash
curl -fsS "$BASE_URL/v1/health/ready" && curl -fsS "$BASE_URL/v1/models" | jqbash
export BASE_URL="http://localhost:8000" # RTVI VLM主机:端口 — 与compose中的$RTVI_VLM_PORT匹配
export API_KEY="$NGC_API_KEY" # Bearer令牌(如果服务使用NGC认证部署,NGC密钥有效)以下所有请求均使用。健康端点
(、、、)通常无需认证即可使用。
Authorization: Bearer $API_KEY/v1/health/*/v1/ready/v1/live/v1/startup使用前的冒烟测试:
bash
curl -fsS "$BASE_URL/v1/health/ready" && curl -fsS "$BASE_URL/v1/models" | jqQuick Start — dense captions from a local video
快速入门 — 本地视频生成密集字幕
bash
undefinedbash
undefined1. Upload the video, capture its file id
1. 上传视频,获取其文件ID
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')
2. Generate captions + alerts (SSE stream of chunked responses)
2. 生成字幕+告警(分段响应的SSE流)
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Write a concise dense caption for each 10-second segment of this warehouse video.", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Write a concise dense caption for each 10-second segment of this warehouse video.", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"
undefinedcurl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "为这段仓库视频的每10秒片段编写简洁的密集字幕。", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "为这段仓库视频的每10秒片段编写简洁的密集字幕。", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"
undefinedEndpoints
端点
Captions
字幕
Generate VLM captions and alerts for videos and live streams.
为视频和实时流生成VLM字幕和告警。
POST /v1/generate_captions_alerts
— Generate VLM captions (and alerts) for video/stream
POST /v1/generate_captions_alertsPOST /v1/generate_captions_alerts
— 为视频/流生成VLM字幕(和告警)
POST /v1/generate_captions_alertsRequired:
| Field | Type | Description |
|---|---|---|
| string | array | UUID of a previously-uploaded file, or id of an active live stream. Accepts a list of ids for batch |
| string | User prompt to the VLM (e.g. dense-caption instruction) |
| string | Model name — see |
Key optional fields:
| Field | Type | Default | Description |
|---|---|---|---|
| string | — | System prompt; use |
| boolean | false | Turn on reasoning for Cosmos Reason models |
| boolean | false | Transcribe audio (via Riva) and fold into captions |
| integer | — | Segment video into N-second chunks ( |
| integer | 0 | Overlap between consecutive chunks |
| number | — | FPS (if |
| boolean | false | Interpret above as FPS vs. fixed-frame count |
| int | — | Resize frames before inference (0 = native) |
| object | — | |
| boolean | false | SSE: emit per-chunk caption deltas as |
| Standard sampling controls | ||
| object | — | Query response format object |
| object | — | Extra kwargs for the multimodal processor (e.g. size, shortest/longest edge) |
bash
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"id": "123e4567-e89b-12d3-a456-426614174000",
"prompt": "Dense-caption this warehouse video, one sentence per 10s chunk.",
"model": "cosmos-reason1",
"chunk_duration": 10,
"stream": true
}'Response (200, SSE when ): each event payload has , ,
, and a terminal event.
Response (200, non-stream): .
stream=truestart_tsend_tscontent{"status": "completed"}{ "id", "object": "caption", "choices": [{...}], "usage": {...} }必填字段:
| 字段 | 类型 | 描述 |
|---|---|---|
| string | array | 之前上传文件的UUID,或活跃实时流的ID。支持传入ID列表进行批量处理 |
| string | 给VLM的用户提示(例如密集字幕指令) |
| string | 模型名称 — 查看 |
关键可选字段:
| 字段 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| string | — | 系统提示;使用 |
| boolean | false | 为Cosmos Reason模型开启推理功能 |
| boolean | false | 转录音频(通过Riva)并整合到字幕中 |
| integer | — | 将视频分割为N秒的片段( |
| integer | 0 | 连续片段之间的重叠时长 |
| number | — | FPS(如果 |
| boolean | false | 将上述字段解释为FPS还是固定帧数 |
| int | — | 推理前调整帧大小(0 = 原始尺寸) |
| object | — | |
| boolean | false | SSE:以 |
| 标准采样控制参数 | ||
| object | — | 查询响应格式对象 |
| object | — | 多模态处理器的额外参数(例如尺寸、最短/最长边) |
bash
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"id": "123e4567-e89b-12d3-a456-426614174000",
"prompt": "为这段仓库视频生成密集字幕,每10秒片段用一句话描述。",
"model": "cosmos-reason1",
"chunk_duration": 10,
"stream": true
}'响应(200,时为SSE): 每个事件负载包含、、
,最后会返回一个终端事件。
响应(200,非流式): 。
stream=truestart_tsend_tscontent{"status": "completed"}{ "id", "object": "caption", "choices": [{...}], "usage": {...} }DELETE /v1/generate_captions_alerts/{stream_id}
— Stop caption generation for a live stream
DELETE /v1/generate_captions_alerts/{stream_id}DELETE /v1/generate_captions_alerts/{stream_id}
— 停止实时流的字幕生成
DELETE /v1/generate_captions_alerts/{stream_id}Stops inference while leaving the stream registered. Pair with
to also un-register the RTSP source.
DELETE /v1/streams/delete/{stream_id}bash
curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY"停止推理但保留流的注册信息。搭配
可同时注销RTSP源。
DELETE /v1/streams/delete/{stream_id}bash
curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY"Files
文件
Upload and manage media files consumed by./v1/generate_captions_alerts
上传和管理供使用的媒体文件。/v1/generate_captions_alerts
POST /v1/files
— Upload a media file (multipart)
POST /v1/filesPOST /v1/files
— 上传媒体文件(多部分表单)
POST /v1/filesbash
curl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
-F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"Response: .
{ "id", "object": "file", "bytes", "created_at", "filename", "purpose" }bash
curl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
-F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"响应: 。
{ "id", "object": "file", "bytes", "created_at", "filename", "purpose" }GET /v1/files?purpose=vision
— List uploaded files
GET /v1/files?purpose=visionGET /v1/files?purpose=vision
— 列出已上传文件
GET /v1/files?purpose=visionGET /v1/files/{file_id}
— File metadata
GET /v1/files/{file_id}GET /v1/files/{file_id}
— 文件元数据
GET /v1/files/{file_id}GET /v1/files/{file_id}/content
— Download original file content
GET /v1/files/{file_id}/contentGET /v1/files/{file_id}/content
— 下载原始文件内容
GET /v1/files/{file_id}/contentDELETE /v1/files/{file_id}
— Delete file (releases asset storage)
DELETE /v1/files/{file_id}DELETE /v1/files/{file_id}
— 删除文件(释放资产存储)
DELETE /v1/files/{file_id}Live Stream
实时流
RTSP stream lifecycle.
RTSP流生命周期管理。
POST /v1/streams/add
— Register one or more RTSP streams
POST /v1/streams/addPOST /v1/streams/add
— 注册一个或多个RTSP流
POST /v1/streams/addRequired per stream: (must start with ), .
Optional: , , , and placement metadata
(, , , , ,
, ).
liveStreamUrlrtsp://descriptionusernamepasswordsensor_nameplace_nameplace_typeplace_latplace_lonplace_altplace_coordinate_xplace_coordinate_ybash
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
-d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
| jq -r '.results[0].id')每个流必填字段: (必须以开头)、。
可选字段:、、,以及位置元数据
(、、、、、
、)。
liveStreamUrlrtsp://descriptionusernamepasswordsensor_nameplace_nameplace_typeplace_latplace_lonplace_altplace_coordinate_xplace_coordinate_ybash
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
-d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
| jq -r '.results[0].id')GET /v1/streams/get-stream-info
— List active streams
GET /v1/streams/get-stream-infoGET /v1/streams/get-stream-info
— 列出活跃流
GET /v1/streams/get-stream-infoDELETE /v1/streams/delete/{stream_id}
— Remove a single stream
DELETE /v1/streams/delete/{stream_id}DELETE /v1/streams/delete/{stream_id}
— 删除单个流
DELETE /v1/streams/delete/{stream_id}DELETE /v1/streams/delete-batch
— Remove many ({"stream_ids":[...]}
)
DELETE /v1/streams/delete-batch{"stream_ids":[...]}DELETE /v1/streams/delete-batch
— 删除多个流(传入{"stream_ids":[...]}
)
DELETE /v1/streams/delete-batch{"stream_ids":[...]}NIM Compatible
兼容NIM
OpenAI-compatible endpoints for interop with OpenAI/NVIDIA-API clients.
兼容OpenAI的端点,用于与OpenAI/NVIDIA-API客户端互操作。
POST /v1/chat/completions
— OpenAI-compatible chat (text + multimodal)
POST /v1/chat/completionsPOST /v1/chat/completions
— 兼容OpenAI的聊天(文本+多模态)
POST /v1/chat/completionsRequired: , . Text-only requests omit / / .
messagesmodelidvideo_urlimage_urlbash
curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"cosmos-reason1","messages":[{"role":"user","content":"Summarize this scene."}]}'必填字段: 、。纯文本请求无需传入 / / 。
messagesmodelidvideo_urlimage_urlbash
curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"cosmos-reason1","messages":[{"role":"user","content":"Summarize this scene."}]}'POST /v1/completions
— OpenAI-compatible legacy completions
POST /v1/completionsPOST /v1/completions
— 兼容OpenAI的旧版补全
POST /v1/completionsGET /v1/version
— { "version": "3.1.0-..." }
GET /v1/version{ "version": "3.1.0-..." }GET /v1/version
— 返回{ "version": "3.1.0-..." }
GET /v1/version{ "version": "3.1.0-..." }GET /v1/license
— license text
GET /v1/licenseGET /v1/license
— 许可证文本
GET /v1/licenseGET /v1/manifest
— NIM manifest
GET /v1/manifestGET /v1/manifest
— NIM清单
GET /v1/manifestGET /v1/health/live
· GET /v1/health/ready
— NIM-style probes
GET /v1/health/liveGET /v1/health/readyGET /v1/health/live
· GET /v1/health/ready
— NIM风格的探测端点
GET /v1/health/liveGET /v1/health/readyModels · Metadata · Metrics · Health Check
模型·元数据·指标·健康检查
GET /v1/models
— List loaded VLMs: { "data": [{ "id", "object": "model", "owned_by" }] }
GET /v1/models{ "data": [{ "id", "object": "model", "owned_by" }] }GET /v1/models
— 列出已加载的VLM模型:{ "data": [{ "id", "object": "model", "owned_by" }] }
GET /v1/models{ "data": [{ "id", "object": "model", "owned_by" }] }GET /v1/metadata
— Service metadata (build, release, image tag)
GET /v1/metadataGET /v1/metadata
— 服务元数据(构建信息、版本、镜像标签)
GET /v1/metadataGET /v1/metrics
— Prometheus metrics (plain text)
GET /v1/metricsGET /v1/metrics
— Prometheus指标(纯文本格式)
GET /v1/metricsGET /v1/ready
· GET /v1/live
· GET /v1/startup
— Kubernetes-style probes
GET /v1/readyGET /v1/liveGET /v1/startupGET /v1/ready
· GET /v1/live
· GET /v1/startup
— Kubernetes风格的探测端点
GET /v1/readyGET /v1/liveGET /v1/startupCommon Workflows
常见工作流
The four scenarios from the VSS 3.1 RT-VLM Usage Skill requirements.
以下是VSS 3.1 RT-VLM使用技能要求中的四个场景。
1. Dense captions from a stored video file
1. 从存储的视频文件生成密集字幕
bash
undefinedbash
undefinedUpload → capture file id → generate captions (SSE stream)
上传 → 获取文件ID → 生成字幕(SSE流)
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Describe warehouse events in 1 sentence per 10s chunk.", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Describe warehouse events in 1 sentence per 10s chunk.", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "每10秒片段用一句话描述仓库中的事件。", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "每10秒片段用一句话描述仓库中的事件。", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"
When done, free storage:
使用完成后释放存储:
curl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"
undefinedcurl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"
undefined2. Dense captions from an RTSP live stream
2. 从RTSP实时流生成密集字幕
bash
undefinedbash
undefinedRegister the stream
注册流
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
Start continuous caption generation (runs until stream stops or DELETE)
开始持续生成字幕(运行到流停止或调用DELETE)
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "Describe each event; start each sentence with a timestamp.", "model": "cosmos-reason1", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "Describe each event; start each sentence with a timestamp.", "model": "cosmos-reason1", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "描述每个事件;每句话以时间戳开头。", "model": "cosmos-reason1", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "描述每个事件;每句话以时间戳开头。", "model": "cosmos-reason1", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &
Tear down when finished:
使用完成后清理:
curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
curl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
undefinedcurl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
curl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
undefinedPre-req: the container was started with:
前提:容器启动时需设置以下环境变量:
RTVI_VLM_KAFKA_ENABLED=true
RTVI_VLM_KAFKA_ENABLED=true
RTVI_VLM_KAFKA_TOPIC=vision-llm-messages
RTVI_VLM_KAFKA_TOPIC=vision-llm-messages
RTVI_VLM_KAFKA_INCIDENT_TOPIC=vision-llm-events-incidents
RTVI_VLM_KAFKA_INCIDENT_TOPIC=vision-llm-events-incidents
RTVI_VLM_ERROR_MESSAGE_TOPIC=vision-llm-errors
RTVI_VLM_ERROR_MESSAGE_TOPIC=vision-llm-errors
HOST_IP=<kafka-host>
HOST_IP=<kafka-host>
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "You are a warehouse monitoring system. Describe the scene in one sentence, then on a new line output exactly:\nAnomaly Detected: Yes/No\nReason: <one sentence>\nFlag an anomaly if any worker is missing a hard hat or high-vis vest.", "system_prompt": "Answer the user's question correctly in yes or no.", "model": "cosmos-reason2", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "You are a warehouse monitoring system. Describe the scene in one sentence, then on a new line output exactly:\nAnomaly Detected: Yes/No\nReason: <one sentence>\nFlag an anomaly if any worker is missing a hard hat or high-vis vest.", "system_prompt": "Answer the user's question correctly in yes or no.", "model": "cosmos-reason2", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"
**Consume alerts from Kafka** (when using the VSS foundational Kafka container).
Kafka values are NvSchema protobuf payloads, so use `print.value=false` for a
clean validation pass that shows timestamp, key, and headers without dumping
binary payload bytes:
```bash
docker exec mdx-kafka kafka-console-consumer \
--bootstrap-server 127.0.0.1:9092 \
--topic vision-llm-events-incidents \
--from-beginning \
--timeout-ms 5000 \
--max-messages 10 \
--property print.timestamp=true \
--property print.key=true \
--property print.headers=true \
--property print.value=falseIf Kafka is not running in the VSS container, use the Kafka CLI from
the host running the broker:
mdx-kafkabash
kafka-console-consumer \
--bootstrap-server "$HOST_IP:9092" \
--topic vision-llm-events-incidents \
--from-beginning \
--timeout-ms 5000 \
--max-messages 10 \
--property print.timestamp=true \
--property print.key=true \
--property print.headers=true \
--property print.value=falseIncident protobuf () key fields: , , ,
, , , , , ( for
alerts), (nested VisionLLM), map including , ,
, , , (if the deployment supports the
query field — post-3.1).
ext.proto :: IncidentsensorIdtimestampendobjectIdsframeIdsplaceanalyticsModulecategoryisAnomalytruellminfotriggerPhraseverdictrequestIdchunkIdxstreamIdalertCategoryalert_categorySTREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "你是仓库监控系统。用一句话描述场景,然后在下一行准确输出:\nAnomaly Detected: Yes/No\nReason: <一句话>\n如果任何工人未佩戴安全帽或反光背心,则标记为异常。", "system_prompt": "用是或否正确回答用户的问题。", "model": "cosmos-reason2", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "你是仓库监控系统。用一句话描述场景,然后在下一行准确输出:\nAnomaly Detected: Yes/No\nReason: <一句话>\n如果任何工人未佩戴安全帽或反光背心,则标记为异常。", "system_prompt": "用是或否正确回答用户的问题。", "model": "cosmos-reason2", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"
**从Kafka消费告警**(当使用VSS基础Kafka容器时)。
Kafka值为NvSchema protobuf负载,因此使用`print.value=false`可获得清晰的验证结果,显示时间戳、键和头部,而不会输出二进制负载字节:
```bash
docker exec mdx-kafka kafka-console-consumer \
--bootstrap-server 127.0.0.1:9092 \
--topic vision-llm-events-incidents \
--from-beginning \
--timeout-ms 5000 \
--max-messages 10 \
--property print.timestamp=true \
--property print.key=true \
--property print.headers=true \
--property print.value=false如果Kafka未在VSS 容器中运行,请使用运行代理的主机上的Kafka CLI:
mdx-kafkabash
kafka-console-consumer \
--bootstrap-server "$HOST_IP:9092" \
--topic vision-llm-events-incidents \
--from-beginning \
--timeout-ms 5000 \
--max-messages 10 \
--property print.timestamp=true \
--property print.key=true \
--property print.headers=true \
--property print.value=false事件protobuf()的关键字段:、、、
、、、、、(告警时为)、(嵌套的VisionLLM)、映射,包括、、
、、、(如果部署支持查询字段 — 3.1版本之后)。
ext.proto :: IncidentsensorIdtimestampendobjectIdsframeIdsplaceanalyticsModulecategoryisAnomalytruellminfotriggerPhraseverdictrequestIdchunkIdxstreamIdalertCategoryalert_category3. Kafka workflows (alerts + message bus)
3. Kafka工作流(告警+消息总线)
Dense captioning with alerts on an RTSP stream and the HTTP-vs-Kafka response model are documented in .
references/kafka-workflows.mdRTSP流的密集字幕生成及告警,以及HTTP与Kafka响应模型的相关文档,请参考。
references/kafka-workflows.mdError Reference
错误参考
| Code | Meaning | Common Cause |
|---|---|---|
| 400 | Bad Request | Missing required field ( |
| 401 | Unauthorized | Missing/invalid |
| 404 | Not Found | |
| 413 | Payload Too Large | Uploaded file exceeds server |
| 422 | Unprocessable Entity | Pydantic schema violation — e.g. |
| 429 | Rate Limited | Too many concurrent streams — raise |
| 500 | Server Error | VLM inference exception (OOM, model unavailable) — check |
| 503 | Service Busy | Startup not complete (model still downloading) or upstream NIM dependency unhealthy |
| 代码 | 含义 | 常见原因 |
|---|---|---|
| 400 | 错误请求 | 缺少必填字段( |
| 401 | 未授权 | 缺少/无效的 |
| 404 | 未找到 | |
| 413 | 请求实体过大 | 上传文件超过服务器 |
| 422 | 无法处理的实体 | Pydantic schema违反 — 例如 |
| 429 | 请求受限 | 并发流过多 — 增大 |
| 500 | 服务器错误 | VLM推理异常(内存不足、模型不可用) — 查看 |
| 503 | 服务繁忙 | 启动未完成(模型仍在下载)或上游NIM依赖不健康 |
Gotchas
注意事项
- 3.1 GA endpoint is , not
/v1/generate_captions_alerts. The rename lands in a post-3.1 build. For VSS 3.1 releases (/v1/generate_captions–rtvi_vlm/26.01.x), always use the26.02.3suffix._alertsis the canonical reference.https://docs.nvidia.com/vss/latest/real-time-vlm-api.html - No URL-based input in 3.1 GA — the /
url/media_typefields were added post-3.1. You must upload viacreation_timefirst and then pass the returnedPOST /v1/files.id - Alert trigger = the tokens or
"yes"in the VLM response (case-insensitive). There is no per-request alert flag. Design prompts with an explicit"true"line and setAnomaly Detected: Yes/Noto constrain the model to Yes/No answers (per the VSS docs). Every chunk is published tosystem_prompt; matched chunks additionally go toKAFKA_TOPICwithKAFKA_INCIDENT_TOPIC,isAnomaly=trueset to the matched tokens, andinfo["triggerPhrase"].info["verdict"]="confirmed" - No query field in the 3.1 OpenAPI spec. The Kafka incident topic defaults
alert_categoryon 3.1. Post-3.1 builds expose an optionalincident.category = "vlm-alert"request field to overridealert_category.incident.category - Kafka topics are server-side config, not per-request. The env vars (via compose
KAFKA_*rewrites) are fixed at container start — clients can't override topics on a per-request basis. Kafka publish is additive to the HTTP response, never a replacement.RTVI_VLM_KAFKA_* - returns Server-Sent Events, not chunked JSON. Use
stream=true(no buffering). Each event iscurl -N, terminated bydata: {"content": "...", "start_ts": ..., "end_ts": ...}\n\n. Withoutdata: {"status":"completed"}\n\nthe server buffers until the full video is processed — fine for short clips (<1 min), avoid for live streams.stream=true - disables chunking — the entire video is sent to the VLM as one shot. Only meaningful for short clips; long videos will OOM or exceed
chunk_duration=0.max_model_len - Default frame budget caps at (256). Requesting FPS that implies >256 frames per chunk is silently capped; drop FPS or shorten
VLLM_MM_PROCESSOR_VIDEO_NUM_FRAMESto stay within budget.chunk_duration - requires a Cosmos Reason model. Passing it with Qwen3-VL or other non-reasoning models is a no-op.
enable_reasoning - requires auth, unlike
/v1/metrics. Prometheus scrapers need the Bearer token./v1/health/* - File upload is multipart, not JSON. Use ; a
-F file=@path -F purpose=vision -F media_type=videobody returns 422.-d - Live-stream lifecycle requires two deletes to fully tear down: stops inference;
DELETE /v1/generate_captions_alerts/{stream_id}un-registers the stream. Skipping the second leaks RTSP connection resources.DELETE /v1/streams/delete/{stream_id}
- 3.1正式版端点为,而非
/v1/generate_captions_alerts。 重命名将在3.1之后的版本中生效。对于VSS 3.1版本(/v1/generate_captions–rtvi_vlm/26.01.x),请始终使用26.02.3后缀。https://docs.nvidia.com/vss/latest/real-time-vlm-api.html是官方参考文档。_alerts - 3.1正式版不支持基于URL的输入 — /
url/media_type字段是在3.1之后添加的。您必须先通过creation_time上传文件,然后传入返回的POST /v1/files。id - 告警触发条件 = VLM响应中包含或
"yes"令牌(不区分大小写)。没有按请求设置的告警标志。设计提示时需包含明确的"true"行,并设置Anomaly Detected: Yes/No约束模型返回是/否答案(根据VSS文档)。每个片段都会发布到system_prompt;匹配的片段会额外发送到KAFKA_TOPIC,其中KAFKA_INCIDENT_TOPIC,isAnomaly=true设置为匹配的令牌,info["triggerPhrase"]。info["verdict"]="confirmed" - 3.1 OpenAPI规范中没有查询字段。 在3.1版本中,Kafka事件主题默认
alert_category。3.1之后的版本会暴露可选的incident.category = "vlm-alert"请求字段,用于覆盖alert_category。incident.category - Kafka主题是服务器端配置,而非按请求设置。 环境变量(通过compose的
KAFKA_*重写)在容器启动时固定 — 客户端无法按请求覆盖主题。Kafka发布是对HTTP响应的补充,而非替代。RTVI_VLM_KAFKA_* - 返回Server-Sent Events,而非分段JSON。 使用
stream=true(无缓冲)。每个事件格式为curl -N,最后以data: {"content": "...", "start_ts": ..., "end_ts": ...}\n\n结束。如果不设置data: {"status":"completed"}\n\n,服务器会缓冲直到整个视频处理完成 — 适用于短视频(<1分钟),避免用于实时流。stream=true - 会禁用分段 — 整个视频作为一个整体发送给VLM。仅适用于短视频;长视频会导致内存不足或超过
chunk_duration=0。max_model_len - 默认帧预算上限为(256)。 请求的FPS意味着每个片段超过256帧时会被自动限制;降低FPS或缩短
VLLM_MM_PROCESSOR_VIDEO_NUM_FRAMES以保持在预算内。chunk_duration - 需要使用Cosmos Reason模型。 与Qwen3-VL或其他非推理模型一起使用时无效。
enable_reasoning - 需要认证,与
/v1/metrics不同。Prometheus采集器需要Bearer令牌。/v1/health/* - 文件上传使用多部分表单,而非JSON。 使用;使用
-F file=@path -F purpose=vision -F media_type=video请求体将返回422错误。-d - 实时流生命周期需要两次删除操作才能完全清理: 停止推理;
DELETE /v1/generate_captions_alerts/{stream_id}注销流。跳过第二步会导致RTSP连接资源泄漏。DELETE /v1/streams/delete/{stream_id}