Loading...
Loading...
Use this skill to bring any vision model from HuggingFace or NVIDIA NGC into an NVIDIA DeepStream pipeline with end-to-end automation: ONNX download, SafeTensors export, TRT engine build, custom nvinfer bbox parser, multi-stream benchmark, and PDF report. Object detection models only.
npx skill4agent add nvidia/skills deepstream-import-vision-modelconfig.json| Step | Phase | Reference | What it does |
|---|---|---|---|
| 1–3 | Model Acquire | references/model-acquire.md | Browse HF/NGC, detect format, download ONNX or export SafeTensors |
| 4–5 | Engine Build | references/engine-build.md | Build dynamic TRT engine, run trtexec BS=1 and BS=MAX_BS |
| 6–7 | DS Pipeline | references/pipeline-run.md | Custom bbox parser, nvinfer config, single-stream + multi-stream benchmarks |
| 8 | Report | references/report-generation.md | 5 charts, HTML, PDF benchmark report |
# 1. GPU and drivers
nvidia-smi
# 2. TensorRT version match (must match between builder and DS runtime)
trtexec 2>&1 | head -3
dpkg -l | grep libnvinfer-bin
# 3. Shared Python venv — create once, reuse across all models
mkdir -p build
VENV=build/.venv_optimum
if [ ! -x "$VENV/bin/python3" ]; then
python3 -m venv "$VENV"
"$VENV/bin/pip" install --upgrade pip -q
"$VENV/bin/pip" install "optimum[exporters]>=1.20,<2.0" "torch<2.12" \
transformers onnxruntime matplotlib numpy markdown -q
fi
# 4. System tools
which wkhtmltopdf || apt-get install -y wkhtmltopdf
which mediainfo || apt-get install -y mediainfo
which deepstream-app # required for KITTI dump (Step 6g) and benchmark perf-measurement (Step 7c); shipped with DeepStream SDK
# 5. Sample video — only check default path when user has not provided a custom DS_VIDEO
if [ -z "$DS_VIDEO" ]; then
[ -f /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ] || \
echo "WARNING: sample_720p.mp4 not found. Install DeepStream samples or set DS_VIDEO=/path/to/your.mp4"
fiMODEL_NAMEmodels/{model_name}/
model/ <- ONNX file(s)
parser/ <- .cpp, Makefile, .so
config/ <- nvinfer config, ds-app config, labels.txt
scripts/ <- run helper scripts
benchmarks/
engines/ <- _dynamic_b{MAX_BS}.engine, timing.cache, build logs
b1/ <- trtexec BS=1 log
b{MAX_BS}/ <- trtexec BS=MAX_BS log
ds/ <- DS benchmark logs
reports/ <- benchmark_report.md, .html, .pdf, benchmark_data.json
charts/ <- chart_*.png (5 charts)
samples/ <- output .mp4 or .ogv (theoraenc fallback), test frames
kitti_output/ <- KITTI detection .txt filesmkdir -p models/$MODEL_NAME/{model,parser,config,scripts,benchmarks/engines,benchmarks/ds,reports/charts,samples/kitti_output}{model}_dynamic_b{MAX_BS}.enginemodel_dynamic.enginebatch-sizetrtexec_b1.logtrtexec_b${MAX_BS}.logds_s${N}_run1.logds_s${N}_run2.logNvDsInferObjectDetectionInfo obj = {};obj;rotation_anglebuild/.venv_optimum--noDataTransfersskills/deepstream-import-vision-model/scripts/report/md-to-html-pdf.pywkhtmltopdfconfig.jsonx264encopenh264enctheoraenc + oggmux.ogvtheoraencoggmuxDS_SINGLE_STREAM_MODE=skippednvv4l2h264enctheoraenc-fallbackskippedsample_720p.mp4sample_1080p_h264.mp4DS_VIDEOSTEP_START=$(date +%s.%N)
# ... step commands ...
STEP_END=$(date +%s.%N)
STEP_DURATION=$(echo "$STEP_END - $STEP_START" | bc)
echo "[Step N] completed in ${STEP_DURATION}s"PIPELINE_STARTPIPELINE_ENDbenchmark_report.mdbenchmark_report.htmlbenchmark_report_{model_name}.pdfmd-to-html-pdf.pydata:image/pnggrep -o 'data:image/png' benchmark_report.html | wc -lsource build/.venv_optimum/bin/activate| Document | Use When |
|---|---|
| references/model-acquire.md | Steps 1–3: HF/NGC URL parsing, format detection, ONNX download, SafeTensors export, label extraction |
| references/engine-build.md | Steps 4–5: trtexec engine build, benchmarks, PEAK_GPU_STREAMS derivation, iterative scaling |
| references/pipeline-run.md | Steps 6–7: custom bbox parser, nvinfer config, single-stream validation, KITTI dump, multi-stream benchmark |
| references/report-generation.md | Step 8: benchmark_data.json, 5 charts, 12-section markdown report, HTML + PDF |
scripts/| Script | Phase | Purpose |
|---|---|---|
| 1–3 | List HuggingFace repo files |
| 1–3 | Download config.json from HF |
| 1–3 | List NGC model files |
| 1–3 | Download NGC model archive |
| 1–3 | Export SafeTensors → ONNX via optimum-cli |
| 1–5 | Inspect ONNX input/output shapes |
| 4–5 | Bake batch dim into ONNX |
| Any | Remove staging dirs, preserve shared venv |
| 4–5 | Run trtexec with standard flags |
| 6–7 | Single-stream visual validation (NVENC primary; theoraenc+oggmux fallback; skip if neither) |
| 6–7 | 2-phase batch size sweep |
| 6–7 | Fixed-stream DS benchmark |
| 6–7 | KITTI detection dump via deepstream-app |
| 7 | Step 7c two-run benchmark — wraps |
| 6–7 | Extract sample frames from output video ( |
| 8 | Generate 5 benchmark PNG charts |
| 8 | Markdown → styled HTML → PDF (canonical benchmark report path) |
| Any | Markdown → PDF via pandoc/pdflatex — for design docs and references only, NOT for benchmark reports (use md-to-html-pdf.py for those) |
| 8 | CSS for HTML report |
| 8 | Mermaid diagram → PNG |
| 8 | Vetted Puppeteer config for Mermaid (sandboxed; non-root) |
| 8 | Vetted Puppeteer config for Mermaid (used when running as root) |
| Error | Fix |
|---|---|
| Tilted/diagonal bounding boxes | Parser struct not zero-initialized — use |
| Zero KITTI files | |
| Engine rebuilds every DS run | |
| Add |
| Use |
| ForeignNode build failure (DETR) | Use dynamo export path or run |
| Zero detections | Wrong |
| Install into venv: |