rt-vlm

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

RTVI VLM Usage API (VSS 3.1)

RTVI VLM 使用API（VSS 3.1）

RTVI VLM is NVIDIA's real-time vision-language microservice: decode video (file or RTSP) → segment into chunks → run a VLM (

cosmos-reason1

cosmos-reason2

, or any OpenAI-compatible model) → stream dense captions back over SSE/HTTP and publish captions + incident alerts + errors to Kafka. Use this skill whenever you need to hit any

/v1/...

endpoint on the VSS 3.1 rtvi-vlm microservice: caption generation, file upload, live-stream management, health checks, NIM-compatible chat completions, Prometheus metrics. API reference: https://docs.nvidia.com/vss/latest/real-time-vlm-api.html.

RTVI VLM是NVIDIA的实时视觉语言微服务：解码视频（文件或 RTSP）→分割为片段→运行VLM模型（

cosmos-reason1

、

cosmos-reason2

或任何兼容OpenAI的模型）→通过SSE/HTTP流式返回密集字幕，并将字幕+事件告警+错误发布到Kafka。当您需要调用VSS 3.1 rtvi-vlm微服务的任何

/v1/...

端点时，请使用此技能：字幕生成、文件上传、实时流管理、健康检查、兼容NIM的聊天补全、Prometheus指标。API参考：https://docs.nvidia.com/vss/latest/real-time-vlm-api.html。

Setup

配置

bash

export BASE_URL="http://localhost:8000"     # RTVI VLM host:port — matches $RTVI_VLM_PORT in compose
export API_KEY="$NGC_API_KEY"               # Bearer token (NGC key works if the service was deployed with NGC auth)

Every request below uses

Authorization: Bearer $API_KEY

. Health endpoints (

/v1/health/*

/v1/ready

/v1/live

/v1/startup

) typically work without auth.

Smoke test before use:

bash

curl -fsS "$BASE_URL/v1/health/ready" && curl -fsS "$BASE_URL/v1/models" | jq

bash

export BASE_URL="http://localhost:8000"     # RTVI VLM主机:端口 — 与compose中的$RTVI_VLM_PORT匹配
export API_KEY="$NGC_API_KEY"               # Bearer令牌（如果服务使用NGC认证部署，NGC密钥有效）

以下所有请求均使用

Authorization: Bearer $API_KEY

。健康端点（

/v1/health/*

、

/v1/ready

、

/v1/live

、

/v1/startup

）通常无需认证即可使用。

使用前的冒烟测试：

bash

curl -fsS "$BASE_URL/v1/health/ready" && curl -fsS "$BASE_URL/v1/models" | jq

Quick Start — dense captions from a local video

快速入门 — 本地视频生成密集字幕

bash

undefined

bash

undefined

1. Upload the video, capture its file id

1. 上传视频，获取其文件ID

FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')

2. Generate captions + alerts (SSE stream of chunked responses)

2. 生成字幕+告警（分段响应的SSE流）

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Write a concise dense caption for each 10-second segment of this warehouse video.", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"

undefined

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "为这段仓库视频的每10秒片段编写简洁的密集字幕。", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"

undefined

Endpoints

端点

Captions

字幕

Generate VLM captions and alerts for videos and live streams.

为视频和实时流生成VLM字幕和告警。

POST /v1/generate_captions_alerts

— Generate VLM captions (and alerts) for video/stream

POST /v1/generate_captions_alerts

— 为视频/流生成VLM字幕（和告警）

Required:

Field	Type	Description
`id`	string \| array	UUID of a previously-uploaded file, or id of an active live stream. Accepts a list of ids for batch
`prompt`	string	User prompt to the VLM (e.g. dense-caption instruction)
`model`	string	Model name — see `GET /v1/models`

Key optional fields:

Field	Type	Default	Description
`system_prompt`	string	—	System prompt; use `<think></think><answer></answer>` tags to enable reasoning on Cosmos Reason
`enable_reasoning`	boolean	false	Turn on reasoning for Cosmos Reason models
`enable_audio`	boolean	false	Transcribe audio (via Riva) and fold into captions
`chunk_duration`	integer	—	Segment video into N-second chunks ( `0` = no chunking)
`chunk_overlap_duration`	integer	0	Overlap between consecutive chunks
`num_frames_per_second_or_fixed_frames_chunk`	number	—	FPS (if `use_fps_for_chunking=true` ) or fixed frames per chunk
`use_fps_for_chunking`	boolean	false	Interpret above as FPS vs. fixed-frame count
`vlm_input_width` / `vlm_input_height`	int	—	Resize frames before inference (0 = native)
`media_info`	object	—	`{"start_offset_ms": ..., "end_offset_ms": ...}` to process a slice of a file (not live streams)
`stream`	boolean	false	SSE: emit per-chunk caption deltas as `data:` events (recommended for long videos)
`max_tokens` / `temperature` / `top_p` / `top_k` / `seed` / `ignore_eos`			Standard sampling controls
`response_format`	object	—	Query response format object
`mm_processor_kwargs`	object	—	Extra kwargs for the multimodal processor (e.g. size, shortest/longest edge)

bash

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "prompt": "Dense-caption this warehouse video, one sentence per 10s chunk.",
    "model": "cosmos-reason1",
    "chunk_duration": 10,
    "stream": true
  }'

Response (200, SSE when
stream=true
): each event payload has

start_ts

end_ts

content

, and a terminal

{"status": "completed"}

event. Response (200, non-stream):

{ "id", "object": "caption", "choices": [{...}], "usage": {...} }

必填字段：

字段	类型	描述
`id`	string \| array	之前上传文件的UUID，或活跃实时流的ID。支持传入ID列表进行批量处理
`prompt`	string	给VLM的用户提示（例如密集字幕指令）
`model`	string	模型名称 — 查看 `GET /v1/models`

关键可选字段：

字段	类型	默认值	描述
`system_prompt`	string	—	系统提示；使用 `<think></think><answer></answer>` 标签启用Cosmos Reason模型的推理功能
`enable_reasoning`	boolean	false	为Cosmos Reason模型开启推理功能
`enable_audio`	boolean	false	转录音频（通过Riva）并整合到字幕中
`chunk_duration`	integer	—	将视频分割为N秒的片段（ `0` = 不分割）
`chunk_overlap_duration`	integer	0	连续片段之间的重叠时长
`num_frames_per_second_or_fixed_frames_chunk`	number	—	FPS（如果 `use_fps_for_chunking=true` ）或每个片段的固定帧数
`use_fps_for_chunking`	boolean	false	将上述字段解释为FPS还是固定帧数
`vlm_input_width` / `vlm_input_height`	int	—	推理前调整帧大小（0 = 原始尺寸）
`media_info`	object	—	`{"start_offset_ms": ..., "end_offset_ms": ...}` 用于处理文件的一部分（不适用于实时流）
`stream`	boolean	false	SSE：以 `data:` 事件形式逐段发送字幕增量（推荐用于长视频）
`max_tokens` / `temperature` / `top_p` / `top_k` / `seed` / `ignore_eos`			标准采样控制参数
`response_format`	object	—	查询响应格式对象
`mm_processor_kwargs`	object	—	多模态处理器的额外参数（例如尺寸、最短/最长边）

bash

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "prompt": "为这段仓库视频生成密集字幕，每10秒片段用一句话描述。",
    "model": "cosmos-reason1",
    "chunk_duration": 10,
    "stream": true
  }'

响应（200，
stream=true
时为SSE）：每个事件负载包含

start_ts

、

end_ts

、

content

，最后会返回一个终端事件

{"status": "completed"}

。 响应（200，非流式）：

{ "id", "object": "caption", "choices": [{...}], "usage": {...} }

。

DELETE /v1/generate_captions_alerts/{stream_id}

— Stop caption generation for a live stream

DELETE /v1/generate_captions_alerts/{stream_id}

— 停止实时流的字幕生成

Stops inference while leaving the stream registered. Pair with

DELETE /v1/streams/delete/{stream_id}

to also un-register the RTSP source.

bash

curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY"

停止推理但保留流的注册信息。搭配

DELETE /v1/streams/delete/{stream_id}

可同时注销RTSP源。

bash

curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY"

Files

文件

Upload and manage media files consumed by
/v1/generate_captions_alerts
.

上传和管理供
/v1/generate_captions_alerts
使用的媒体文件。

POST /v1/files

— Upload a media file (multipart)

POST /v1/files

— 上传媒体文件（多部分表单）

bash

curl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
  -F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"

Response:

{ "id", "object": "file", "bytes", "created_at", "filename", "purpose" }

bash

curl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
  -F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"

响应：

{ "id", "object": "file", "bytes", "created_at", "filename", "purpose" }

。

GET /v1/files?purpose=vision

— List uploaded files

GET /v1/files?purpose=vision

— 列出已上传文件

GET /v1/files/{file_id}

— File metadata

GET /v1/files/{file_id}

— 文件元数据

GET /v1/files/{file_id}/content

— Download original file content

GET /v1/files/{file_id}/content

— 下载原始文件内容

DELETE /v1/files/{file_id}

— Delete file (releases asset storage)

DELETE /v1/files/{file_id}

— 删除文件（释放资产存储）

Live Stream

实时流

RTSP stream lifecycle.

RTSP流生命周期管理。

POST /v1/streams/add

— Register one or more RTSP streams

POST /v1/streams/add

— 注册一个或多个RTSP流

Required per stream:

liveStreamUrl

(must start with

rtsp://

description

. Optional:

username

password

sensor_name

, and placement metadata (

place_name

place_type

place_lat

place_lon

place_alt

place_coordinate_x

place_coordinate_y

bash

STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
  | jq -r '.results[0].id')

每个流必填字段：

liveStreamUrl

（必须以

rtsp://

开头）、

description

。可选字段：

username

、

password

、

sensor_name

，以及位置元数据（

place_name

、

place_type

、

place_lat

、

place_lon

、

place_alt

、

place_coordinate_x

、

place_coordinate_y

）。

bash

STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
  | jq -r '.results[0].id')

GET /v1/streams/get-stream-info

— List active streams

GET /v1/streams/get-stream-info

— 列出活跃流

DELETE /v1/streams/delete/{stream_id}

— Remove a single stream

DELETE /v1/streams/delete/{stream_id}

— 删除单个流

DELETE /v1/streams/delete-batch

— Remove many (

{"stream_ids":[...]}

)

DELETE /v1/streams/delete-batch

— 删除多个流（传入

{"stream_ids":[...]}

）

NIM Compatible

兼容NIM

OpenAI-compatible endpoints for interop with OpenAI/NVIDIA-API clients.

兼容OpenAI的端点，用于与OpenAI/NVIDIA-API客户端互操作。

POST /v1/chat/completions

— OpenAI-compatible chat (text + multimodal)

POST /v1/chat/completions

— 兼容OpenAI的聊天（文本+多模态）

Required:

messages

model

. Text-only requests omit

id

video_url

image_url

bash

curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"cosmos-reason1","messages":[{"role":"user","content":"Summarize this scene."}]}'

必填字段：

messages

、

model

。纯文本请求无需传入

id

video_url

image_url

。

bash

curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"cosmos-reason1","messages":[{"role":"user","content":"Summarize this scene."}]}'

POST /v1/completions

— OpenAI-compatible legacy completions

POST /v1/completions

— 兼容OpenAI的旧版补全

GET /v1/version

—

{ "version": "3.1.0-..." }

GET /v1/version

— 返回

{ "version": "3.1.0-..." }

GET /v1/license

— license text

GET /v1/license

— 许可证文本

GET /v1/manifest

— NIM manifest

GET /v1/manifest

— NIM清单

GET /v1/health/live

GET /v1/health/ready

— NIM-style probes

GET /v1/health/live

GET /v1/health/ready

— NIM风格的探测端点

Models · Metadata · Metrics · Health Check

模型·元数据·指标·健康检查

GET /v1/models

— List loaded VLMs:

{ "data": [{ "id", "object": "model", "owned_by" }] }

GET /v1/models

— 列出已加载的VLM模型：

{ "data": [{ "id", "object": "model", "owned_by" }] }

GET /v1/metadata

— Service metadata (build, release, image tag)

GET /v1/metadata

— 服务元数据（构建信息、版本、镜像标签）

GET /v1/metrics

— Prometheus metrics (plain text)

GET /v1/metrics

— Prometheus指标（纯文本格式）

GET /v1/ready

GET /v1/live

GET /v1/startup

— Kubernetes-style probes

GET /v1/ready

GET /v1/live

GET /v1/startup

— Kubernetes风格的探测端点

Common Workflows

常见工作流

The four scenarios from the VSS 3.1 RT-VLM Usage Skill requirements.

以下是VSS 3.1 RT-VLM使用技能要求中的四个场景。

1. Dense captions from a stored video file

1. 从存储的视频文件生成密集字幕

bash

undefined

bash

undefined

Upload → capture file id → generate captions (SSE stream)

上传 → 获取文件ID → 生成字幕（SSE流）

FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Describe warehouse events in 1 sentence per 10s chunk.", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"

FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "每10秒片段用一句话描述仓库中的事件。", "model": "cosmos-reason1", "chunk_duration": 10, "stream": true }"

When done, free storage:

使用完成后释放存储：

curl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"

undefined

curl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"

undefined

2. Dense captions from an RTSP live stream

2. 从RTSP实时流生成密集字幕

bash

undefined

bash

undefined

Register the stream

注册流

STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')

Start continuous caption generation (runs until stream stops or DELETE)

开始持续生成字幕（运行到流停止或调用DELETE）

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "Describe each event; start each sentence with a timestamp.", "model": "cosmos-reason1", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "描述每个事件；每句话以时间戳开头。", "model": "cosmos-reason1", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &

Tear down when finished:

使用完成后清理：

curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY" curl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID" -H "Authorization: Bearer $API_KEY"

undefined

curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY" curl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID" -H "Authorization: Bearer $API_KEY"

undefined

Pre-req: the container was started with:

前提：容器启动时需设置以下环境变量：

RTVI_VLM_KAFKA_ENABLED=true

RTVI_VLM_KAFKA_TOPIC=vision-llm-messages

RTVI_VLM_KAFKA_INCIDENT_TOPIC=vision-llm-events-incidents

RTVI_VLM_ERROR_MESSAGE_TOPIC=vision-llm-errors

HOST_IP=<kafka-host>

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "You are a warehouse monitoring system. Describe the scene in one sentence, then on a new line output exactly:\nAnomaly Detected: Yes/No\nReason: <one sentence>\nFlag an anomaly if any worker is missing a hard hat or high-vis vest.", "system_prompt": "Answer the user's question correctly in yes or no.", "model": "cosmos-reason2", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"


**Consume alerts from Kafka** (when using the VSS foundational Kafka container).
Kafka values are NvSchema protobuf payloads, so use `print.value=false` for a
clean validation pass that shows timestamp, key, and headers without dumping
binary payload bytes:
```bash
docker exec mdx-kafka kafka-console-consumer \
  --bootstrap-server 127.0.0.1:9092 \
  --topic vision-llm-events-incidents \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false

If Kafka is not running in the VSS

mdx-kafka

container, use the Kafka CLI from the host running the broker:

bash

kafka-console-consumer \
  --bootstrap-server "$HOST_IP:9092" \
  --topic vision-llm-events-incidents \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false

Incident protobuf (

ext.proto :: Incident

) key fields:

sensorId

timestamp

end

objectIds

frameIds

place

analyticsModule

category

isAnomaly

(

true

for alerts),

llm

(nested VisionLLM),

info

map including

triggerPhrase

verdict

requestId

chunkIdx

streamId

alertCategory

(if the deployment supports the

alert_category

query field — post-3.1).

curl -N -X POST "$BASE_URL/v1/generate_captions_alerts"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "你是仓库监控系统。用一句话描述场景，然后在下一行准确输出：\nAnomaly Detected: Yes/No\nReason: <一句话>\n如果任何工人未佩戴安全帽或反光背心，则标记为异常。", "system_prompt": "用是或否正确回答用户的问题。", "model": "cosmos-reason2", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"


**从Kafka消费告警**（当使用VSS基础Kafka容器时）。
Kafka值为NvSchema protobuf负载，因此使用`print.value=false`可获得清晰的验证结果，显示时间戳、键和头部，而不会输出二进制负载字节：
```bash
docker exec mdx-kafka kafka-console-consumer \
  --bootstrap-server 127.0.0.1:9092 \
  --topic vision-llm-events-incidents \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false

如果Kafka未在VSS

mdx-kafka

容器中运行，请使用运行代理的主机上的Kafka CLI：

bash

kafka-console-consumer \
  --bootstrap-server "$HOST_IP:9092" \
  --topic vision-llm-events-incidents \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false

事件protobuf（

ext.proto :: Incident

）的关键字段：

sensorId

、

timestamp

、

end

、

objectIds

、

frameIds

、

place

、

analyticsModule

、

category

、

isAnomaly

（告警时为

true

）、

llm

（嵌套的VisionLLM）、

info

映射，包括

triggerPhrase

、

verdict

、

requestId

、

chunkIdx

、

streamId

、

alertCategory

（如果部署支持

alert_category

查询字段 — 3.1版本之后）。

3. Kafka workflows (alerts + message bus)

3. Kafka工作流（告警+消息总线）

Dense captioning with alerts on an RTSP stream and the HTTP-vs-Kafka response model are documented in

references/kafka-workflows.md

RTSP流的密集字幕生成及告警，以及HTTP与Kafka响应模型的相关文档，请参考

references/kafka-workflows.md

。

Error Reference

错误参考

Code	Meaning	Common Cause
400	Bad Request	Missing required field ( `id` , `prompt` , `model` ); unsupported `media_type` ; unknown `model` name
401	Unauthorized	Missing/invalid `Authorization: Bearer $API_KEY` — or wrong key format (expect `nvapi-...` )
404	Not Found	`file_id` deleted / stream_id not registered / wrong endpoint path (note: `{stream_id}` is required on `DELETE /v1/streams/delete/{stream_id}` )
413	Payload Too Large	Uploaded file exceeds server `MAX_FILE_SIZE` ; increase or pre-chunk the video
422	Unprocessable Entity	Pydantic schema violation — e.g. `use_fps_for_chunking=true` without `num_frames_per_second_or_fixed_frames_chunk` ; stream ids supplied to a file-only field like `media_info`
429	Rate Limited	Too many concurrent streams — raise `VLM_BATCH_SIZE` or spread across instances
500	Server Error	VLM inference exception (OOM, model unavailable) — check `docker logs rtvi-vlm-*`
503	Service Busy	Startup not complete (model still downloading) or upstream NIM dependency unhealthy

代码	含义	常见原因
400	错误请求	缺少必填字段（ `id` 、 `prompt` 、 `model` ）；不支持的 `media_type` ；未知的 `model` 名称
401	未授权	缺少/无效的 `Authorization: Bearer $API_KEY` — 或密钥格式错误（应为 `nvapi-...` ）
404	未找到	`file_id` 已删除 / stream_id未注册 / 端点路径错误（注意： `DELETE /v1/streams/delete/{stream_id}` 必须传入 `{stream_id}` ）
413	请求实体过大	上传文件超过服务器 `MAX_FILE_SIZE` 限制；增大限制或预先分割视频
422	无法处理的实体	Pydantic schema违反 — 例如 `use_fps_for_chunking=true` 但未传入 `num_frames_per_second_or_fixed_frames_chunk` ；向仅支持文件的字段（如 `media_info` ）传入流ID
429	请求受限	并发流过多 — 增大 `VLM_BATCH_SIZE` 或分散到多个实例
500	服务器错误	VLM推理异常（内存不足、模型不可用） — 查看 `docker logs rtvi-vlm-*`
503	服务繁忙	启动未完成（模型仍在下载）或上游NIM依赖不健康

Gotchas

注意事项

3.1 GA endpoint is
/v1/generate_captions_alerts
, not
/v1/generate_captions
. The rename lands in a post-3.1 build. For VSS 3.1 releases (
```
rtvi_vlm/26.01.x
```
–
```
26.02.3
```
), always use the
```
_alerts
```
suffix.
```
https://docs.nvidia.com/vss/latest/real-time-vlm-api.html
```
is the canonical reference.
No URL-based input in 3.1 GA — the
```
url
```
/
```
media_type
```
/
```
creation_time
```
fields were added post-3.1. You must upload via
```
POST /v1/files
```
first and then pass the returned
```
id
```
.
Alert trigger = the tokens
"yes"
or
"true"
in the VLM response (case-insensitive). There is no per-request alert flag. Design prompts with an explicit
```
Anomaly Detected: Yes/No
```
line and set
```
system_prompt
```
to constrain the model to Yes/No answers (per the VSS docs). Every chunk is published to
```
KAFKA_TOPIC
```
; matched chunks additionally go to
```
KAFKA_INCIDENT_TOPIC
```
with
```
isAnomaly=true
```
,
```
info["triggerPhrase"]
```
set to the matched tokens, and
```
info["verdict"]="confirmed"
```
.
No
alert_category
query field in the 3.1 OpenAPI spec. The Kafka incident topic defaults
```
incident.category = "vlm-alert"
```
on 3.1. Post-3.1 builds expose an optional
```
alert_category
```
request field to override
```
incident.category
```
.
Kafka topics are server-side config, not per-request. The
```
KAFKA_*
```
env vars (via compose
```
RTVI_VLM_KAFKA_*
```
rewrites) are fixed at container start — clients can't override topics on a per-request basis. Kafka publish is additive to the HTTP response, never a replacement.
stream=true
returns Server-Sent Events, not chunked JSON. Use
```
curl -N
```
(no buffering). Each event is
```
data: {"content": "...", "start_ts": ..., "end_ts": ...}\n\n
```
, terminated by
```
data: {"status":"completed"}\n\n
```
. Without
```
stream=true
```
the server buffers until the full video is processed — fine for short clips (<1 min), avoid for live streams.
chunk_duration=0
disables chunking — the entire video is sent to the VLM as one shot. Only meaningful for short clips; long videos will OOM or exceed
```
max_model_len
```
.
Default frame budget caps at
VLLM_MM_PROCESSOR_VIDEO_NUM_FRAMES
(256). Requesting FPS that implies >256 frames per chunk is silently capped; drop FPS or shorten
```
chunk_duration
```
to stay within budget.
enable_reasoning
requires a Cosmos Reason model. Passing it with Qwen3-VL or other non-reasoning models is a no-op.
/v1/metrics
requires auth, unlike
```
/v1/health/*
```
. Prometheus scrapers need the Bearer token.
File upload is multipart, not JSON. Use
```
-F file=@path -F purpose=vision -F media_type=video
```
; a
```
-d
```
body returns 422.
Live-stream lifecycle requires two deletes to fully tear down:
```
DELETE /v1/generate_captions_alerts/{stream_id}
```
stops inference;
```
DELETE /v1/streams/delete/{stream_id}
```
un-registers the stream. Skipping the second leaks RTSP connection resources.

3.1正式版端点为
/v1/generate_captions_alerts
，而非
/v1/generate_captions
。重命名将在3.1之后的版本中生效。对于VSS 3.1版本（
```
rtvi_vlm/26.01.x
```
–
```
26.02.3
```
），请始终使用
```
_alerts
```
后缀。https://docs.nvidia.com/vss/latest/real-time-vlm-api.html是官方参考文档。
3.1正式版不支持基于URL的输入 —
```
url
```
/
```
media_type
```
/
```
creation_time
```
字段是在3.1之后添加的。您必须先通过
```
POST /v1/files
```
上传文件，然后传入返回的
```
id
```
。
告警触发条件 = VLM响应中包含
"yes"
或
"true"
令牌（不区分大小写）。没有按请求设置的告警标志。设计提示时需包含明确的
```
Anomaly Detected: Yes/No
```
行，并设置
```
system_prompt
```
约束模型返回是/否答案（根据VSS文档）。每个片段都会发布到
```
KAFKA_TOPIC
```
；匹配的片段会额外发送到
```
KAFKA_INCIDENT_TOPIC
```
，其中
```
isAnomaly=true
```
，
```
info["triggerPhrase"]
```
设置为匹配的令牌，
```
info["verdict"]="confirmed"
```
。
3.1 OpenAPI规范中没有
alert_category
查询字段。在3.1版本中，Kafka事件主题默认
```
incident.category = "vlm-alert"
```
。3.1之后的版本会暴露可选的
```
alert_category
```
请求字段，用于覆盖
```
incident.category
```
。
Kafka主题是服务器端配置，而非按请求设置。
```
KAFKA_*
```
环境变量（通过compose的
```
RTVI_VLM_KAFKA_*
```
重写）在容器启动时固定 — 客户端无法按请求覆盖主题。Kafka发布是对HTTP响应的补充，而非替代。
stream=true
返回Server-Sent Events，而非分段JSON。使用
```
curl -N
```
（无缓冲）。每个事件格式为
```
data: {"content": "...", "start_ts": ..., "end_ts": ...}\n\n
```
，最后以
```
data: {"status":"completed"}\n\n
```
结束。如果不设置
```
stream=true
```
，服务器会缓冲直到整个视频处理完成 — 适用于短视频（<1分钟），避免用于实时流。
chunk_duration=0
会禁用分段 — 整个视频作为一个整体发送给VLM。仅适用于短视频；长视频会导致内存不足或超过
```
max_model_len
```
。
默认帧预算上限为
VLLM_MM_PROCESSOR_VIDEO_NUM_FRAMES
（256）。请求的FPS意味着每个片段超过256帧时会被自动限制；降低FPS或缩短
```
chunk_duration
```
以保持在预算内。
enable_reasoning
需要使用Cosmos Reason模型。与Qwen3-VL或其他非推理模型一起使用时无效。
/v1/metrics
需要认证，与
```
/v1/health/*
```
不同。Prometheus采集器需要Bearer令牌。
文件上传使用多部分表单，而非JSON。 使用
```
-F file=@path -F purpose=vision -F media_type=video
```
；使用
```
-d
```
请求体将返回422错误。
实时流生命周期需要两次删除操作才能完全清理：
```
DELETE /v1/generate_captions_alerts/{stream_id}
```
停止推理；
```
DELETE /v1/streams/delete/{stream_id}
```
注销流。跳过第二步会导致RTSP连接资源泄漏。

rt-vlm

Original

Translation

RTVI VLM Usage API (VSS 3.1)

RTVI VLM 使用API（VSS 3.1）

Setup

配置

Quick Start — dense captions from a local video

快速入门 — 本地视频生成密集字幕

1. Upload the video, capture its file id

1. 上传视频，获取其文件ID

2. Generate captions + alerts (SSE stream of chunked responses)

2. 生成字幕+告警（分段响应的SSE流）

Endpoints

端点

Captions

字幕

POST /v1/generate_captions_alerts — Generate VLM captions (and alerts) for video/stream

POST /v1/generate_captions_alerts — 为视频/流生成VLM字幕（和告警）

DELETE /v1/generate_captions_alerts/{stream_id} — Stop caption generation for a live stream

DELETE /v1/generate_captions_alerts/{stream_id} — 停止实时流的字幕生成

Files

文件

POST /v1/files — Upload a media file (multipart)

POST /v1/files — 上传媒体文件（多部分表单）

GET /v1/files?purpose=vision — List uploaded files

GET /v1/files?purpose=vision — 列出已上传文件

GET /v1/files/{file_id} — File metadata

GET /v1/files/{file_id} — 文件元数据

GET /v1/files/{file_id}/content — Download original file content

GET /v1/files/{file_id}/content — 下载原始文件内容

DELETE /v1/files/{file_id} — Delete file (releases asset storage)

DELETE /v1/files/{file_id} — 删除文件（释放资产存储）

Live Stream

实时流

POST /v1/streams/add — Register one or more RTSP streams

POST /v1/streams/add — 注册一个或多个RTSP流

GET /v1/streams/get-stream-info — List active streams

GET /v1/streams/get-stream-info — 列出活跃流

DELETE /v1/streams/delete/{stream_id} — Remove a single stream

DELETE /v1/streams/delete/{stream_id} — 删除单个流

DELETE /v1/streams/delete-batch — Remove many ({"stream_ids":[...]})

DELETE /v1/streams/delete-batch — 删除多个流（传入{"stream_ids":[...]}）

NIM Compatible

兼容NIM

POST /v1/chat/completions — OpenAI-compatible chat (text + multimodal)

POST /v1/chat/completions — 兼容OpenAI的聊天（文本+多模态）

POST /v1/completions — OpenAI-compatible legacy completions

POST /v1/completions — 兼容OpenAI的旧版补全

GET /v1/version — { "version": "3.1.0-..." }

GET /v1/version — 返回{ "version": "3.1.0-..." }

GET /v1/license — license text

GET /v1/license — 许可证文本

GET /v1/manifest — NIM manifest

GET /v1/manifest — NIM清单

GET /v1/health/live · GET /v1/health/ready — NIM-style probes

GET /v1/health/live · GET /v1/health/ready — NIM风格的探测端点

Models · Metadata · Metrics · Health Check

模型·元数据·指标·健康检查

GET /v1/models — List loaded VLMs: { "data": [{ "id", "object": "model", "owned_by" }] }

GET /v1/models — 列出已加载的VLM模型：{ "data": [{ "id", "object": "model", "owned_by" }] }

GET /v1/metadata — Service metadata (build, release, image tag)

GET /v1/metadata — 服务元数据（构建信息、版本、镜像标签）

GET /v1/metrics — Prometheus metrics (plain text)

GET /v1/metrics — Prometheus指标（纯文本格式）

GET /v1/ready · GET /v1/live · GET /v1/startup — Kubernetes-style probes

GET /v1/ready · GET /v1/live · GET /v1/startup — Kubernetes风格的探测端点

Common Workflows

常见工作流

1. Dense captions from a stored video file

1. 从存储的视频文件生成密集字幕

Upload → capture file id → generate captions (SSE stream)

上传 → 获取文件ID → 生成字幕（SSE流）

When done, free storage:

使用完成后释放存储：

2. Dense captions from an RTSP live stream

2. 从RTSP实时流生成密集字幕

Register the stream

注册流

Start continuous caption generation (runs until stream stops or DELETE)

`POST /v1/generate_captions_alerts`
— Generate VLM captions (and alerts) for video/stream

`POST /v1/generate_captions_alerts`
— 为视频/流生成VLM字幕（和告警）

`DELETE /v1/generate_captions_alerts/{stream_id}`
— Stop caption generation for a live stream

`DELETE /v1/generate_captions_alerts/{stream_id}`
— 停止实时流的字幕生成

`POST /v1/files`
— Upload a media file (multipart)

`POST /v1/files`
— 上传媒体文件（多部分表单）

`GET /v1/files?purpose=vision`
— List uploaded files

`GET /v1/files?purpose=vision`
— 列出已上传文件

`GET /v1/files/{file_id}`
— File metadata

`GET /v1/files/{file_id}`
— 文件元数据

`GET /v1/files/{file_id}/content`
— Download original file content

`GET /v1/files/{file_id}/content`
— 下载原始文件内容

`DELETE /v1/files/{file_id}`
— Delete file (releases asset storage)

`DELETE /v1/files/{file_id}`
— 删除文件（释放资产存储）

`POST /v1/streams/add`
— Register one or more RTSP streams

`POST /v1/streams/add`
— 注册一个或多个RTSP流

`GET /v1/streams/get-stream-info`
— List active streams

`GET /v1/streams/get-stream-info`
— 列出活跃流

`DELETE /v1/streams/delete/{stream_id}`
— Remove a single stream

`DELETE /v1/streams/delete/{stream_id}`
— 删除单个流

`DELETE /v1/streams/delete-batch`
— Remove many (
`{"stream_ids":[...]}`
)

`DELETE /v1/streams/delete-batch`
— 删除多个流（传入
`{"stream_ids":[...]}`
）

`POST /v1/chat/completions`
— OpenAI-compatible chat (text + multimodal)

`POST /v1/chat/completions`
— 兼容OpenAI的聊天（文本+多模态）

`POST /v1/completions`
— OpenAI-compatible legacy completions

`POST /v1/completions`
— 兼容OpenAI的旧版补全

`GET /v1/version`
—
`{ "version": "3.1.0-..." }`

`GET /v1/version`
— 返回
`{ "version": "3.1.0-..." }`

`GET /v1/license`
— license text

`GET /v1/license`
— 许可证文本

`GET /v1/manifest`
— NIM manifest

`GET /v1/manifest`
— NIM清单

`GET /v1/health/live`
·
`GET /v1/health/ready`
— NIM-style probes

`GET /v1/health/live`
·
`GET /v1/health/ready`
— NIM风格的探测端点

`GET /v1/models`
— List loaded VLMs:
`{ "data": [{ "id", "object": "model", "owned_by" }] }`

`GET /v1/models`
— 列出已加载的VLM模型：
`{ "data": [{ "id", "object": "model", "owned_by" }] }`

`GET /v1/metadata`
— Service metadata (build, release, image tag)

`GET /v1/metadata`
— 服务元数据（构建信息、版本、镜像标签）

`GET /v1/metrics`
— Prometheus metrics (plain text)

`GET /v1/metrics`
— Prometheus指标（纯文本格式）

`GET /v1/ready`
·
`GET /v1/live`
·
`GET /v1/startup`
— Kubernetes-style probes

`GET /v1/ready`
·
`GET /v1/live`
·
`GET /v1/startup`
— Kubernetes风格的探测端点