rt-vlm

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

RTVI VLM Usage API (VSS 3.1)

RTVI VLM 使用API(VSS 3.1)

RTVI VLM is NVIDIA's real-time vision-language microservice: decode video (file or RTSP) → segment into chunks → run a VLM (
cosmos-reason1
,
cosmos-reason2
, or any OpenAI-compatible model) → stream dense captions back over SSE/HTTP and publish captions + incident alerts + errors to Kafka. Use this skill whenever you need to hit any
/v1/...
endpoint on the VSS 3.1 rtvi-vlm microservice: caption generation, file upload, live-stream management, health checks, NIM-compatible chat completions, Prometheus metrics. API reference: https://docs.nvidia.com/vss/latest/real-time-vlm-api.html.
RTVI VLM是NVIDIA的实时视觉语言微服务:解码视频(文件或 RTSP)→分割为片段→运行VLM模型(
cosmos-reason1
cosmos-reason2
或任何 兼容OpenAI的模型)→通过SSE/HTTP流式返回密集字幕,并将字幕+事件告警+错误发布到Kafka。当您需要调用VSS 3.1 rtvi-vlm微服务的任何
/v1/...
端点时,请使用此技能:字幕生成、文件上传、实时流管理、健康检查、兼容NIM的聊天补全、Prometheus指标。API参考:https://docs.nvidia.com/vss/latest/real-time-vlm-api.html

Setup

配置

bash
export BASE_URL="http://localhost:8000"     # RTVI VLM host:port — matches $RTVI_VLM_PORT in compose
export API_KEY="$NGC_API_KEY"               # Bearer token (NGC key works if the service was deployed with NGC auth)
Every request below uses
Authorization: Bearer $API_KEY
. Health endpoints (
/v1/health/*
,
/v1/ready
,
/v1/live
,
/v1/startup
) typically work without auth.
Smoke test before use:
bash
curl -fsS "$BASE_URL/v1/health/ready" && curl -fsS "$BASE_URL/v1/models" | jq
bash
export BASE_URL="http://localhost:8000"     # RTVI VLM主机:端口 — 与compose中的$RTVI_VLM_PORT匹配
export API_KEY="$NGC_API_KEY"               # Bearer令牌(如果服务使用NGC认证部署,NGC密钥有效)
以下所有请求均使用
Authorization: Bearer $API_KEY
。健康端点 (
/v1/health/*
/v1/ready
/v1/live
/v1/startup
)通常无需认证即可使用。
使用前的冒烟测试:
bash
curl -fsS "$BASE_URL/v1/health/ready" && curl -fsS "$BASE_URL/v1/models" | jq

Quick Start — dense captions from a local video

快速入门 — 本地视频生成密集字幕

bash
undefined
bash
undefined

1. Upload the video, capture its file id

1. 上传视频,获取其文件ID

FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')

2. Generate captions + alerts (SSE stream of chunked responses)

2. 生成字幕+告警(分段响应的SSE流)

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Write a concise dense caption for each 10-second segment of this warehouse video.", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"
undefined
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "为这段仓库视频的每10秒片段编写简洁的密集字幕。", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"
undefined

Endpoints

端点

Captions

字幕

Generate VLM captions and alerts for videos and live streams.
为视频和实时流生成VLM字幕和告警。

POST /v1/generate_captions_alerts
— Generate VLM captions (and alerts) for video/stream

POST /v1/generate_captions_alerts
— 为视频/流生成VLM字幕(和告警)

Required:
FieldTypeDescription
id
string | arrayUUID of a previously-uploaded file, or id of an active live stream. Accepts a list of ids for batch
prompt
stringUser prompt to the VLM (e.g. dense-caption instruction)
model
stringModel name — see
GET /v1/models
Key optional fields:
FieldTypeDefaultDescription
system_prompt
stringSystem prompt; use
<think></think><answer></answer>
tags to enable reasoning on Cosmos Reason
enable_reasoning
booleanfalseTurn on reasoning for Cosmos Reason models
enable_audio
booleanfalseTranscribe audio (via Riva) and fold into captions
chunk_duration
integerSegment video into N-second chunks (
0
= no chunking)
chunk_overlap_duration
integer0Overlap between consecutive chunks
num_frames_per_second_or_fixed_frames_chunk
numberFPS (if
use_fps_for_chunking=true
) or fixed frames per chunk
use_fps_for_chunking
booleanfalseInterpret above as FPS vs. fixed-frame count
vlm_input_width
/
vlm_input_height
intResize frames before inference (0 = native)
media_info
object
{"start_offset_ms": ..., "end_offset_ms": ...}
to process a slice of a file (not live streams)
stream
booleanfalseSSE: emit per-chunk caption deltas as
data:
events (recommended for long videos)
max_tokens
/
temperature
/
top_p
/
top_k
/
seed
/
ignore_eos
Standard sampling controls
response_format
objectQuery response format object
mm_processor_kwargs
objectExtra kwargs for the multimodal processor (e.g. size, shortest/longest edge)
bash
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "prompt": "Dense-caption this warehouse video, one sentence per 10s chunk.",
    "model": "cosmos-reason1",
    "chunk_duration": 10,
    "stream": true
  }'
Response (200, SSE when
stream=true
):
each event payload has
start_ts
,
end_ts
,
content
, and a terminal
{"status": "completed"}
event. Response (200, non-stream):
{ "id", "object": "caption", "choices": [{...}], "usage": {...} }
.
必填字段:
字段类型描述
id
string | array之前上传文件的UUID,或活跃实时流的ID。支持传入ID列表进行批量处理
prompt
string给VLM的用户提示(例如密集字幕指令)
model
string模型名称 — 查看
GET /v1/models
关键可选字段:
字段类型默认值描述
system_prompt
string系统提示;使用
<think></think><answer></answer>
标签启用Cosmos Reason模型的推理功能
enable_reasoning
booleanfalse为Cosmos Reason模型开启推理功能
enable_audio
booleanfalse转录音频(通过Riva)并整合到字幕中
chunk_duration
integer将视频分割为N秒的片段(
0
= 不分割)
chunk_overlap_duration
integer0连续片段之间的重叠时长
num_frames_per_second_or_fixed_frames_chunk
numberFPS(如果
use_fps_for_chunking=true
)或每个片段的固定帧数
use_fps_for_chunking
booleanfalse将上述字段解释为FPS还是固定帧数
vlm_input_width
/
vlm_input_height
int推理前调整帧大小(0 = 原始尺寸)
media_info
object
{"start_offset_ms": ..., "end_offset_ms": ...}
用于处理文件的一部分(不适用于实时流)
stream
booleanfalseSSE:以
data:
事件形式逐段发送字幕增量(推荐用于长视频)
max_tokens
/
temperature
/
top_p
/
top_k
/
seed
/
ignore_eos
标准采样控制参数
response_format
object查询响应格式对象
mm_processor_kwargs
object多模态处理器的额外参数(例如尺寸、最短/最长边)
bash
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "prompt": "为这段仓库视频生成密集字幕,每10秒片段用一句话描述。",
    "model": "cosmos-reason1",
    "chunk_duration": 10,
    "stream": true
  }'
响应(200,
stream=true
时为SSE):
每个事件负载包含
start_ts
end_ts
content
,最后会返回一个终端事件
{"status": "completed"}
响应(200,非流式):
{ "id", "object": "caption", "choices": [{...}], "usage": {...} }

DELETE /v1/generate_captions_alerts/{stream_id}
— Stop caption generation for a live stream

DELETE /v1/generate_captions_alerts/{stream_id}
— 停止实时流的字幕生成

Stops inference while leaving the stream registered. Pair with
DELETE /v1/streams/delete/{stream_id}
to also un-register the RTSP source.
bash
curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
停止推理但保留流的注册信息。搭配
DELETE /v1/streams/delete/{stream_id}
可同时注销RTSP源。
bash
curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY"

Files

文件

Upload and manage media files consumed by
/v1/generate_captions_alerts
.
上传和管理供
/v1/generate_captions_alerts
使用的媒体文件。

POST /v1/files
— Upload a media file (multipart)

POST /v1/files
— 上传媒体文件(多部分表单)

bash
curl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
  -F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"
Response:
{ "id", "object": "file", "bytes", "created_at", "filename", "purpose" }
.
bash
curl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
  -F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"
响应:
{ "id", "object": "file", "bytes", "created_at", "filename", "purpose" }

GET /v1/files?purpose=vision
— List uploaded files

GET /v1/files?purpose=vision
— 列出已上传文件

GET /v1/files/{file_id}
— File metadata

GET /v1/files/{file_id}
— 文件元数据

GET /v1/files/{file_id}/content
— Download original file content

GET /v1/files/{file_id}/content
— 下载原始文件内容

DELETE /v1/files/{file_id}
— Delete file (releases asset storage)

DELETE /v1/files/{file_id}
— 删除文件(释放资产存储)

Live Stream

实时流

RTSP stream lifecycle.
RTSP流生命周期管理。

POST /v1/streams/add
— Register one or more RTSP streams

POST /v1/streams/add
— 注册一个或多个RTSP流

Required per stream:
liveStreamUrl
(must start with
rtsp://
),
description
. Optional:
username
,
password
,
sensor_name
, and placement metadata (
place_name
,
place_type
,
place_lat
,
place_lon
,
place_alt
,
place_coordinate_x
,
place_coordinate_y
).
bash
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
  | jq -r '.results[0].id')
每个流必填字段:
liveStreamUrl
(必须以
rtsp://
开头)、
description
。 可选字段:
username
password
sensor_name
,以及位置元数据 (
place_name
place_type
place_lat
place_lon
place_alt
place_coordinate_x
place_coordinate_y
)。
bash
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
  | jq -r '.results[0].id')

GET /v1/streams/get-stream-info
— List active streams

GET /v1/streams/get-stream-info
— 列出活跃流

DELETE /v1/streams/delete/{stream_id}
— Remove a single stream

DELETE /v1/streams/delete/{stream_id}
— 删除单个流

DELETE /v1/streams/delete-batch
— Remove many (
{"stream_ids":[...]}
)

DELETE /v1/streams/delete-batch
— 删除多个流(传入
{"stream_ids":[...]}

NIM Compatible

兼容NIM

OpenAI-compatible endpoints for interop with OpenAI/NVIDIA-API clients.
兼容OpenAI的端点,用于与OpenAI/NVIDIA-API客户端互操作。

POST /v1/chat/completions
— OpenAI-compatible chat (text + multimodal)

POST /v1/chat/completions
— 兼容OpenAI的聊天(文本+多模态)

Required:
messages
,
model
. Text-only requests omit
id
/
video_url
/
image_url
.
bash
curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"cosmos-reason1","messages":[{"role":"user","content":"Summarize this scene."}]}'
必填字段:
messages
model
。纯文本请求无需传入
id
/
video_url
/
image_url
bash
curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"cosmos-reason1","messages":[{"role":"user","content":"Summarize this scene."}]}'

POST /v1/completions
— OpenAI-compatible legacy completions

POST /v1/completions
— 兼容OpenAI的旧版补全

GET /v1/version
{ "version": "3.1.0-..." }

GET /v1/version
— 返回
{ "version": "3.1.0-..." }

GET /v1/license
— license text

GET /v1/license
— 许可证文本

GET /v1/manifest
— NIM manifest

GET /v1/manifest
— NIM清单

GET /v1/health/live
·
GET /v1/health/ready
— NIM-style probes

GET /v1/health/live
·
GET /v1/health/ready
— NIM风格的探测端点

Models · Metadata · Metrics · Health Check

模型·元数据·指标·健康检查

GET /v1/models
— List loaded VLMs:
{ "data": [{ "id", "object": "model", "owned_by" }] }

GET /v1/models
— 列出已加载的VLM模型:
{ "data": [{ "id", "object": "model", "owned_by" }] }

GET /v1/metadata
— Service metadata (build, release, image tag)

GET /v1/metadata
— 服务元数据(构建信息、版本、镜像标签)

GET /v1/metrics
— Prometheus metrics (plain text)

GET /v1/metrics
— Prometheus指标(纯文本格式)

GET /v1/ready
·
GET /v1/live
·
GET /v1/startup
— Kubernetes-style probes

GET /v1/ready
·
GET /v1/live
·
GET /v1/startup
— Kubernetes风格的探测端点



Common Workflows

常见工作流

The four scenarios from the VSS 3.1 RT-VLM Usage Skill requirements.
以下是VSS 3.1 RT-VLM使用技能要求中的四个场景。

1. Dense captions from a stored video file

1. 从存储的视频文件生成密集字幕

bash
undefined
bash
undefined

Upload → capture file id → generate captions (SSE stream)

上传 → 获取文件ID → 生成字幕(SSE流)

FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Describe warehouse events in 1 sentence per 10s chunk.", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "每10秒片段用一句话描述仓库中的事件。", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"

When done, free storage:

使用完成后释放存储:

curl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"
undefined
curl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"
undefined

2. Dense captions from an RTSP live stream

2. 从RTSP实时流生成密集字幕

bash
undefined
bash
undefined

Register the stream

注册流

STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')

Start continuous caption generation (runs until stream stops or DELETE)

开始持续生成字幕(运行到流停止或调用DELETE)

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "Describe each event; start each sentence with a timestamp.", "model": "cosmos-reason1", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "描述每个事件;每句话以时间戳开头。", "model": "cosmos-reason1", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &

Tear down when finished:

使用完成后清理:

curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY" curl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
undefined
curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY" curl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
undefined

Pre-req: the container was started with:

前提:容器启动时需设置以下环境变量:

RTVI_VLM_KAFKA_ENABLED=true

RTVI_VLM_KAFKA_ENABLED=true

RTVI_VLM_KAFKA_TOPIC=vision-llm-messages

RTVI_VLM_KAFKA_TOPIC=vision-llm-messages

RTVI_VLM_KAFKA_INCIDENT_TOPIC=vision-llm-events-incidents

RTVI_VLM_KAFKA_INCIDENT_TOPIC=vision-llm-events-incidents

RTVI_VLM_ERROR_MESSAGE_TOPIC=vision-llm-errors

RTVI_VLM_ERROR_MESSAGE_TOPIC=vision-llm-errors

HOST_IP=<kafka-host>

HOST_IP=<kafka-host>

STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "You are a warehouse monitoring system. Describe the scene in one sentence, then on a new line output exactly:\nAnomaly Detected: Yes/No\nReason: <one sentence>\nFlag an anomaly if any worker is missing a hard hat or high-vis vest.", "system_prompt": "Answer the user's question correctly in yes or no.", "model": "cosmos-reason2", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"

**Consume alerts from Kafka** (when using the VSS foundational Kafka container).
Kafka values are NvSchema protobuf payloads, so use `print.value=false` for a
clean validation pass that shows timestamp, key, and headers without dumping
binary payload bytes:
```bash
docker exec mdx-kafka kafka-console-consumer \
  --bootstrap-server 127.0.0.1:9092 \
  --topic vision-llm-events-incidents \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false
If Kafka is not running in the VSS
mdx-kafka
container, use the Kafka CLI from the host running the broker:
bash
kafka-console-consumer \
  --bootstrap-server "$HOST_IP:9092" \
  --topic vision-llm-events-incidents \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false
Incident protobuf (
ext.proto :: Incident
) key fields:
sensorId
,
timestamp
,
end
,
objectIds
,
frameIds
,
place
,
analyticsModule
,
category
,
isAnomaly
(
true
for alerts),
llm
(nested VisionLLM),
info
map including
triggerPhrase
,
verdict
,
requestId
,
chunkIdx
,
streamId
,
alertCategory
(if the deployment supports the
alert_category
query field — post-3.1).
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "你是仓库监控系统。用一句话描述场景,然后在下一行准确输出:\nAnomaly Detected: Yes/No\nReason: <一句话>\n如果任何工人未佩戴安全帽或反光背心,则标记为异常。", "system_prompt": "用是或否正确回答用户的问题。", "model": "cosmos-reason2", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"

**从Kafka消费告警**(当使用VSS基础Kafka容器时)。
Kafka值为NvSchema protobuf负载,因此使用`print.value=false`可获得清晰的验证结果,显示时间戳、键和头部,而不会输出二进制负载字节:
```bash
docker exec mdx-kafka kafka-console-consumer \
  --bootstrap-server 127.0.0.1:9092 \
  --topic vision-llm-events-incidents \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false
如果Kafka未在VSS
mdx-kafka
容器中运行,请使用运行代理的主机上的Kafka CLI:
bash
kafka-console-consumer \
  --bootstrap-server "$HOST_IP:9092" \
  --topic vision-llm-events-incidents \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false
事件protobuf(
ext.proto :: Incident
)的关键字段:
sensorId
timestamp
end
objectIds
frameIds
place
analyticsModule
category
isAnomaly
(告警时为
true
)、
llm
(嵌套的VisionLLM)、
info
映射,包括
triggerPhrase
verdict
requestId
chunkIdx
streamId
alertCategory
(如果部署支持
alert_category
查询字段 — 3.1版本之后)。

3. Kafka workflows (alerts + message bus)

3. Kafka工作流(告警+消息总线)

Dense captioning with alerts on an RTSP stream and the HTTP-vs-Kafka response model are documented in
references/kafka-workflows.md
.
RTSP流的密集字幕生成及告警,以及HTTP与Kafka响应模型的相关文档,请参考
references/kafka-workflows.md

Error Reference

错误参考

CodeMeaningCommon Cause
400Bad RequestMissing required field (
id
,
prompt
,
model
); unsupported
media_type
; unknown
model
name
401UnauthorizedMissing/invalid
Authorization: Bearer $API_KEY
— or wrong key format (expect
nvapi-...
)
404Not Found
file_id
deleted / stream_id not registered / wrong endpoint path (note:
{stream_id}
is required on
DELETE /v1/streams/delete/{stream_id}
)
413Payload Too LargeUploaded file exceeds server
MAX_FILE_SIZE
; increase or pre-chunk the video
422Unprocessable EntityPydantic schema violation — e.g.
use_fps_for_chunking=true
without
num_frames_per_second_or_fixed_frames_chunk
; stream ids supplied to a file-only field like
media_info
429Rate LimitedToo many concurrent streams — raise
VLM_BATCH_SIZE
or spread across instances
500Server ErrorVLM inference exception (OOM, model unavailable) — check
docker logs rtvi-vlm-*
503Service BusyStartup not complete (model still downloading) or upstream NIM dependency unhealthy

代码含义常见原因
400错误请求缺少必填字段(
id
prompt
model
);不支持的
media_type
;未知的
model
名称
401未授权缺少/无效的
Authorization: Bearer $API_KEY
— 或密钥格式错误(应为
nvapi-...
404未找到
file_id
已删除 / stream_id未注册 / 端点路径错误(注意:
DELETE /v1/streams/delete/{stream_id}
必须传入
{stream_id}
413请求实体过大上传文件超过服务器
MAX_FILE_SIZE
限制;增大限制或预先分割视频
422无法处理的实体Pydantic schema违反 — 例如
use_fps_for_chunking=true
但未传入
num_frames_per_second_or_fixed_frames_chunk
;向仅支持文件的字段(如
media_info
)传入流ID
429请求受限并发流过多 — 增大
VLM_BATCH_SIZE
或分散到多个实例
500服务器错误VLM推理异常(内存不足、模型不可用) — 查看
docker logs rtvi-vlm-*
503服务繁忙启动未完成(模型仍在下载)或上游NIM依赖不健康

Gotchas

注意事项

  • 3.1 GA endpoint is
    /v1/generate_captions_alerts
    , not
    /v1/generate_captions
    .
    The rename lands in a post-3.1 build. For VSS 3.1 releases (
    rtvi_vlm/26.01.x
    26.02.3
    ), always use the
    _alerts
    suffix.
    https://docs.nvidia.com/vss/latest/real-time-vlm-api.html
    is the canonical reference.
  • No URL-based input in 3.1 GA — the
    url
    /
    media_type
    /
    creation_time
    fields were added post-3.1. You must upload via
    POST /v1/files
    first and then pass the returned
    id
    .
  • Alert trigger = the tokens
    "yes"
    or
    "true"
    in the VLM response (case-insensitive)
    . There is no per-request alert flag. Design prompts with an explicit
    Anomaly Detected: Yes/No
    line and set
    system_prompt
    to constrain the model to Yes/No answers (per the VSS docs). Every chunk is published to
    KAFKA_TOPIC
    ; matched chunks additionally go to
    KAFKA_INCIDENT_TOPIC
    with
    isAnomaly=true
    ,
    info["triggerPhrase"]
    set to the matched tokens, and
    info["verdict"]="confirmed"
    .
  • No
    alert_category
    query field in the 3.1 OpenAPI spec.
    The Kafka incident topic defaults
    incident.category = "vlm-alert"
    on 3.1. Post-3.1 builds expose an optional
    alert_category
    request field to override
    incident.category
    .
  • Kafka topics are server-side config, not per-request. The
    KAFKA_*
    env vars (via compose
    RTVI_VLM_KAFKA_*
    rewrites) are fixed at container start — clients can't override topics on a per-request basis. Kafka publish is additive to the HTTP response, never a replacement.
  • stream=true
    returns Server-Sent Events, not chunked JSON.
    Use
    curl -N
    (no buffering). Each event is
    data: {"content": "...", "start_ts": ..., "end_ts": ...}\n\n
    , terminated by
    data: {"status":"completed"}\n\n
    . Without
    stream=true
    the server buffers until the full video is processed — fine for short clips (<1 min), avoid for live streams.
  • chunk_duration=0
    disables chunking
    — the entire video is sent to the VLM as one shot. Only meaningful for short clips; long videos will OOM or exceed
    max_model_len
    .
  • Default frame budget caps at
    VLLM_MM_PROCESSOR_VIDEO_NUM_FRAMES
    (256).
    Requesting FPS that implies >256 frames per chunk is silently capped; drop FPS or shorten
    chunk_duration
    to stay within budget.
  • enable_reasoning
    requires a Cosmos Reason model.
    Passing it with Qwen3-VL or other non-reasoning models is a no-op.
  • /v1/metrics
    requires auth
    , unlike
    /v1/health/*
    . Prometheus scrapers need the Bearer token.
  • File upload is multipart, not JSON. Use
    -F file=@path -F purpose=vision -F media_type=video
    ; a
    -d
    body returns 422.
  • Live-stream lifecycle requires two deletes to fully tear down:
    DELETE /v1/generate_captions_alerts/{stream_id}
    stops inference;
    DELETE /v1/streams/delete/{stream_id}
    un-registers the stream. Skipping the second leaks RTSP connection resources.
  • 3.1正式版端点为
    /v1/generate_captions_alerts
    ,而非
    /v1/generate_captions
    重命名将在3.1之后的版本中生效。对于VSS 3.1版本(
    rtvi_vlm/26.01.x
    26.02.3
    ),请始终使用
    _alerts
    后缀。https://docs.nvidia.com/vss/latest/real-time-vlm-api.html是官方参考文档。
  • 3.1正式版不支持基于URL的输入
    url
    /
    media_type
    /
    creation_time
    字段是在3.1之后添加的。您必须先通过
    POST /v1/files
    上传文件,然后传入返回的
    id
  • 告警触发条件 = VLM响应中包含
    "yes"
    "true"
    令牌(不区分大小写)
    。没有按请求设置的告警标志。设计提示时需包含明确的
    Anomaly Detected: Yes/No
    行,并设置
    system_prompt
    约束模型返回是/否答案(根据VSS文档)。每个片段都会发布到
    KAFKA_TOPIC
    ;匹配的片段会额外发送到
    KAFKA_INCIDENT_TOPIC
    ,其中
    isAnomaly=true
    info["triggerPhrase"]
    设置为匹配的令牌,
    info["verdict"]="confirmed"
  • 3.1 OpenAPI规范中没有
    alert_category
    查询字段。
    在3.1版本中,Kafka事件主题默认
    incident.category = "vlm-alert"
    。3.1之后的版本会暴露可选的
    alert_category
    请求字段,用于覆盖
    incident.category
  • Kafka主题是服务器端配置,而非按请求设置。
    KAFKA_*
    环境变量(通过compose的
    RTVI_VLM_KAFKA_*
    重写)在容器启动时固定 — 客户端无法按请求覆盖主题。Kafka发布是对HTTP响应的补充,而非替代。
  • stream=true
    返回Server-Sent Events,而非分段JSON。
    使用
    curl -N
    (无缓冲)。每个事件格式为
    data: {"content": "...", "start_ts": ..., "end_ts": ...}\n\n
    ,最后以
    data: {"status":"completed"}\n\n
    结束。如果不设置
    stream=true
    ,服务器会缓冲直到整个视频处理完成 — 适用于短视频(<1分钟),避免用于实时流。
  • chunk_duration=0
    会禁用分段
    — 整个视频作为一个整体发送给VLM。仅适用于短视频;长视频会导致内存不足或超过
    max_model_len
  • 默认帧预算上限为
    VLLM_MM_PROCESSOR_VIDEO_NUM_FRAMES
    (256)。
    请求的FPS意味着每个片段超过256帧时会被自动限制;降低FPS或缩短
    chunk_duration
    以保持在预算内。
  • enable_reasoning
    需要使用Cosmos Reason模型。
    与Qwen3-VL或其他非推理模型一起使用时无效。
  • /v1/metrics
    需要认证
    ,与
    /v1/health/*
    不同。Prometheus采集器需要Bearer令牌。
  • 文件上传使用多部分表单,而非JSON。 使用
    -F file=@path -F purpose=vision -F media_type=video
    ;使用
    -d
    请求体将返回422错误。
  • 实时流生命周期需要两次删除操作才能完全清理:
    DELETE /v1/generate_captions_alerts/{stream_id}
    停止推理;
    DELETE /v1/streams/delete/{stream_id}
    注销流。跳过第二步会导致RTSP连接资源泄漏。