Loading...
Loading...
Use this skill when working with the RTVI VLM or RT-VLM microservice API on VSS 3.1. Generate dense captions and alerts for stored video files and live RTSP streams via `/v1/generate_captions_alerts`; upload media via `/v1/files`; add and remove live streams with `/v1/streams/add` and `/v1/streams/delete/{stream_id}`; call OpenAI-compatible `/v1/chat/completions`; consume Kafka caption, incident, and error topics; or debug rtvi-vlm responses. For deployment, read `references/deploy-rt-vlm-service.md` first.
npx skill4agent add nvidia/skills rt-vlmcosmos-reason1cosmos-reason2/v1/...export BASE_URL="http://localhost:8000" # RTVI VLM host:port — matches $RTVI_VLM_PORT in compose
export API_KEY="$NGC_API_KEY" # Bearer token (NGC key works if the service was deployed with NGC auth)Authorization: Bearer $API_KEY/v1/health/*/v1/ready/v1/live/v1/startupcurl -fsS "$BASE_URL/v1/health/ready" && curl -fsS "$BASE_URL/v1/models" | jq# 1. Upload the video, capture its file id
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files" \
-H "Authorization: Bearer $API_KEY" \
-F "file=@/path/to/warehouse.mp4" \
-F "purpose=vision" \
-F "media_type=video" | jq -r '.id')
# 2. Generate captions + alerts (SSE stream of chunked responses)
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"id\": \"$FILE_ID\",
\"prompt\": \"Write a concise dense caption for each 10-second segment of this warehouse video.\",
\"model\": \"cosmos-reason1\",
\"chunk_duration\": 10,
\"stream\": true
}"Generate VLM captions and alerts for videos and live streams.
POST /v1/generate_captions_alerts| Field | Type | Description |
|---|---|---|
| string | array | UUID of a previously-uploaded file, or id of an active live stream. Accepts a list of ids for batch |
| string | User prompt to the VLM (e.g. dense-caption instruction) |
| string | Model name — see |
| Field | Type | Default | Description |
|---|---|---|---|
| string | — | System prompt; use |
| boolean | false | Turn on reasoning for Cosmos Reason models |
| boolean | false | Transcribe audio (via Riva) and fold into captions |
| integer | — | Segment video into N-second chunks ( |
| integer | 0 | Overlap between consecutive chunks |
| number | — | FPS (if |
| boolean | false | Interpret above as FPS vs. fixed-frame count |
| int | — | Resize frames before inference (0 = native) |
| object | — | |
| boolean | false | SSE: emit per-chunk caption deltas as |
| Standard sampling controls | ||
| object | — | Query response format object |
| object | — | Extra kwargs for the multimodal processor (e.g. size, shortest/longest edge) |
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"id": "123e4567-e89b-12d3-a456-426614174000",
"prompt": "Dense-caption this warehouse video, one sentence per 10s chunk.",
"model": "cosmos-reason1",
"chunk_duration": 10,
"stream": true
}'stream=truestart_tsend_tscontent{"status": "completed"}{ "id", "object": "caption", "choices": [{...}], "usage": {...} }DELETE /v1/generate_captions_alerts/{stream_id}DELETE /v1/streams/delete/{stream_id}curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY"Upload and manage media files consumed by./v1/generate_captions_alerts
POST /v1/filescurl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
-F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"{ "id", "object": "file", "bytes", "created_at", "filename", "purpose" }GET /v1/files?purpose=visionGET /v1/files/{file_id}GET /v1/files/{file_id}/contentDELETE /v1/files/{file_id}RTSP stream lifecycle.
POST /v1/streams/addliveStreamUrlrtsp://descriptionusernamepasswordsensor_nameplace_nameplace_typeplace_latplace_lonplace_altplace_coordinate_xplace_coordinate_ySTREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
-d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
| jq -r '.results[0].id')GET /v1/streams/get-stream-infoDELETE /v1/streams/delete/{stream_id}DELETE /v1/streams/delete-batch{"stream_ids":[...]}OpenAI-compatible endpoints for interop with OpenAI/NVIDIA-API clients.
POST /v1/chat/completionsmessagesmodelidvideo_urlimage_urlcurl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"cosmos-reason1","messages":[{"role":"user","content":"Summarize this scene."}]}'POST /v1/completionsGET /v1/version{ "version": "3.1.0-..." }GET /v1/licenseGET /v1/manifestGET /v1/health/liveGET /v1/health/readyGET /v1/models{ "data": [{ "id", "object": "model", "owned_by" }] }GET /v1/metadataGET /v1/metricsGET /v1/readyGET /v1/liveGET /v1/startup# Upload → capture file id → generate captions (SSE stream)
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files" \
-H "Authorization: Bearer $API_KEY" \
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
-d "{
\"id\": \"$FILE_ID\",
\"prompt\": \"Describe warehouse events in 1 sentence per 10s chunk.\",
\"model\": \"cosmos-reason1\",
\"chunk_duration\": 10,
\"stream\": true
}"
# When done, free storage:
curl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"# Register the stream
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}' \
| jq -r '.results[0].id')
# Start continuous caption generation (runs until stream stops or DELETE)
curl -N -X POST "$BASE_URL/v1/generate_captions_alerts" \
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
-d "{
\"id\": \"$STREAM_ID\",
\"prompt\": \"Describe each event; start each sentence with a timestamp.\",
\"model\": \"cosmos-reason1\",
\"chunk_duration\": 10,
\"num_frames_per_second_or_fixed_frames_chunk\": 2,
\"use_fps_for_chunking\": true,
\"stream\": true
}" &
# Tear down when finished:
curl -X DELETE "$BASE_URL/v1/generate_captions_alerts/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
curl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
**Consume alerts from Kafka** (when using the VSS foundational Kafka container).
Kafka values are NvSchema protobuf payloads, so use `print.value=false` for a
clean validation pass that shows timestamp, key, and headers without dumping
binary payload bytes:
```bash
docker exec mdx-kafka kafka-console-consumer \
--bootstrap-server 127.0.0.1:9092 \
--topic vision-llm-events-incidents \
--from-beginning \
--timeout-ms 5000 \
--max-messages 10 \
--property print.timestamp=true \
--property print.key=true \
--property print.headers=true \
--property print.value=falsemdx-kafkakafka-console-consumer \
--bootstrap-server "$HOST_IP:9092" \
--topic vision-llm-events-incidents \
--from-beginning \
--timeout-ms 5000 \
--max-messages 10 \
--property print.timestamp=true \
--property print.key=true \
--property print.headers=true \
--property print.value=falseext.proto :: IncidentsensorIdtimestampendobjectIdsframeIdsplaceanalyticsModulecategoryisAnomalytruellminfotriggerPhraseverdictrequestIdchunkIdxstreamIdalertCategoryalert_categoryreferences/kafka-workflows.md| Code | Meaning | Common Cause |
|---|---|---|
| 400 | Bad Request | Missing required field ( |
| 401 | Unauthorized | Missing/invalid |
| 404 | Not Found | |
| 413 | Payload Too Large | Uploaded file exceeds server |
| 422 | Unprocessable Entity | Pydantic schema violation — e.g. |
| 429 | Rate Limited | Too many concurrent streams — raise |
| 500 | Server Error | VLM inference exception (OOM, model unavailable) — check |
| 503 | Service Busy | Startup not complete (model still downloading) or upstream NIM dependency unhealthy |
/v1/generate_captions_alerts/v1/generate_captionsrtvi_vlm/26.01.x26.02.3_alertshttps://docs.nvidia.com/vss/latest/real-time-vlm-api.htmlurlmedia_typecreation_timePOST /v1/filesid"yes""true"Anomaly Detected: Yes/Nosystem_promptKAFKA_TOPICKAFKA_INCIDENT_TOPICisAnomaly=trueinfo["triggerPhrase"]info["verdict"]="confirmed"alert_categoryincident.category = "vlm-alert"alert_categoryincident.categoryKAFKA_*RTVI_VLM_KAFKA_*stream=truecurl -Ndata: {"content": "...", "start_ts": ..., "end_ts": ...}\n\ndata: {"status":"completed"}\n\nstream=truechunk_duration=0max_model_lenVLLM_MM_PROCESSOR_VIDEO_NUM_FRAMESchunk_durationenable_reasoning/v1/metrics/v1/health/*-F file=@path -F purpose=vision -F media_type=video-dDELETE /v1/generate_captions_alerts/{stream_id}DELETE /v1/streams/delete/{stream_id}