deepstream-import-vision-model

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

DeepStream Import Vision Model

DeepStream 视觉模型导入

When this skill is active, read the relevant reference document before starting each phase. Do not rely on memory — reference documents contain exact script paths, bash variable conventions, log filename contracts, and critical parsing rules.

Current scope: Object detection models only. Fail fast on classification, segmentation, or other architectures detected in

config.json

激活该技能后，在每个阶段开始前请阅读相关参考文档。请勿依赖记忆——参考文档包含精确的脚本路径、bash变量约定、日志文件名规则以及关键解析规则。

当前适用范围：仅支持目标检测模型。若在

config.json

中检测到分类、分割或其他架构模型，将直接终止流程。

Pipeline Overview

流水线概述

Step	Phase	Reference	What it does
1–3	Model Acquire	references/model-acquire.md	Browse HF/NGC, detect format, download ONNX or export SafeTensors
4–5	Engine Build	references/engine-build.md	Build dynamic TRT engine, run trtexec BS=1 and BS=MAX_BS
6–7	DS Pipeline	references/pipeline-run.md	Custom bbox parser, nvinfer config, single-stream + multi-stream benchmarks
8	Report	references/report-generation.md	5 charts, HTML, PDF benchmark report

Run the full pipeline autonomously without pausing for confirmation at each step.

步骤	阶段	参考文档	功能说明
1–3	模型获取	references/model-acquire.md	浏览HF/NGC、检测模型格式、下载ONNX或导出SafeTensors
4–5	引擎构建	references/engine-build.md	构建动态TRT引擎、运行trtexec批大小为1和MAX_BS的测试
6–7	DS流水线	references/pipeline-run.md	自定义边界框解析器、nvinfer配置、单流+多流基准测试
8	报告生成	references/report-generation.md	生成5种图表、HTML及PDF格式的基准测试报告

全程自动运行完整流水线，无需在每个步骤暂停等待确认。

Pre-flight Checks

预检查

Run before starting:

bash

undefined

开始前执行以下检查：

bash

undefined

1. GPU and drivers

1. GPU及驱动

nvidia-smi

2. TensorRT version match (must match between builder and DS runtime)

2. TensorRT版本匹配（构建器与DS运行时版本必须一致）

trtexec 2>&1 | head -3 dpkg -l | grep libnvinfer-bin

3. Shared Python venv — create once, reuse across all models

3. 共享Python虚拟环境 — 仅创建一次，所有模型复用

mkdir -p build VENV=build/.venv_optimum if [ ! -x "$VENV/bin/python3" ]; then python3 -m venv "$VENV" "$VENV/bin/pip" install --upgrade pip -q "$VENV/bin/pip" install "optimum[exporters]>=1.20,<2.0" "torch<2.12"
transformers onnxruntime matplotlib numpy markdown -q fi

4. System tools

4. 系统工具

which wkhtmltopdf || apt-get install -y wkhtmltopdf which mediainfo || apt-get install -y mediainfo which deepstream-app # required for KITTI dump (Step 6g) and benchmark perf-measurement (Step 7c); shipped with DeepStream SDK

which wkhtmltopdf || apt-get install -y wkhtmltopdf which mediainfo || apt-get install -y mediainfo which deepstream-app # KITTI导出（步骤6g）和基准性能测试（步骤7c）必需；随DeepStream SDK一同发布

5. Sample video — only check default path when user has not provided a custom DS_VIDEO

5. 示例视频 — 仅当用户未提供自定义DS_VIDEO时检查默认路径

if [ -z "$DS_VIDEO" ]; then [ -f /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ] ||
echo "WARNING: sample_720p.mp4 not found. Install DeepStream samples or set DS_VIDEO=/path/to/your.mp4" fi

undefined

if [ -z "$DS_VIDEO" ]; then [ -f /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ] ||
echo "警告：未找到sample_720p.mp4。请安装DeepStream示例或设置DS_VIDEO=/path/to/your.mp4" fi

undefined

Mandatory Output Structure

强制输出结构

Create once

MODEL_NAME

is known (Step 1). Never dump files flat.

models/{model_name}/
  model/           <- ONNX file(s)
  parser/          <- .cpp, Makefile, .so
  config/          <- nvinfer config, ds-app config, labels.txt
  scripts/         <- run helper scripts
  benchmarks/
    engines/       <- _dynamic_b{MAX_BS}.engine, timing.cache, build logs
    b1/            <- trtexec BS=1 log
    b{MAX_BS}/     <- trtexec BS=MAX_BS log
    ds/            <- DS benchmark logs
  reports/         <- benchmark_report.md, .html, .pdf, benchmark_data.json
    charts/        <- chart_*.png (5 charts)
  samples/         <- output .mp4 or .ogv (theoraenc fallback), test frames
    kitti_output/  <- KITTI detection .txt files

bash

mkdir -p models/$MODEL_NAME/{model,parser,config,scripts,benchmarks/engines,benchmarks/ds,reports/charts,samples/kitti_output}

确定

MODEL_NAME

后（步骤1）立即创建以下结构。禁止将文件直接平铺存放。

models/{model_name}/
  model/           <- ONNX文件
  parser/          <- .cpp、Makefile、.so文件
  config/          <- nvinfer配置、ds-app配置、labels.txt
  scripts/         <- 运行辅助脚本
  benchmarks/
    engines/       <- _dynamic_b{MAX_BS}.engine、timing.cache、构建日志
    b1/            <- trtexec批大小为1的日志
    b{MAX_BS}/     <- trtexec批大小为MAX_BS的日志
    ds/            <- DS基准测试日志
  reports/         <- benchmark_report.md、.html、.pdf、benchmark_data.json
    charts/        <- chart_*.png（共5张图表）
  samples/         <- 输出.mp4或.ogv（theoraenc降级方案）、测试帧
    kitti_output/  <- KITTI检测结果.txt文件

bash

mkdir -p models/$MODEL_NAME/{model,parser,config,scripts,benchmarks/engines,benchmarks/ds,reports/charts,samples/kitti_output}

Critical Rules

关键规则

Engine naming — always

{model}_dynamic_b{MAX_BS}.engine

. Never bare

model_dynamic.engine

batch_size == num_streams — in DS runs,
```
batch-size
```
and stream count are always equal.
Log filenames are fixed —
```
trtexec_b1.log
```
,
```
trtexec_b${MAX_BS}.log
```
,
```
ds_s${N}_run1.log
```
,
```
ds_s${N}_run2.log
```
. No timestamps. Report generation reads exact paths.
Parser zero-init — always
```
NvDsInferObjectDetectionInfo obj = {};
```
. Required for DS 9.0 OBB support; bare
```
obj;
```
leaves
```
rotation_angle
```
uninitialized, causing tilted bounding boxes.
KITTI validation gate — do NOT proceed to Step 7 if KITTI frame count is zero or detection rate < 90%.
Shared venv —
```
build/.venv_optimum
```
reused across all models. Never create per-model venvs.
trtexec
--noDataTransfers
— GPU-only compute matches DeepStream's GPU-to-GPU data flow.
Report HTML+PDF — always use
```
skills/deepstream-import-vision-model/scripts/report/md-to-html-pdf.py
```
. Never write a custom HTML generator or call
```
wkhtmltopdf
```
directly.
Object detection only — reject non-detection architectures from
```
config.json
```
before building anything.
Encoder fallback (MANDATORY) —
```
x264enc
```
and
```
openh264enc
```
are prohibited. On NVENC-unavailable systems, use
```
theoraenc + oggmux
```
(LGPL; ships in gst-plugins-base; output is
```
.ogv
```
). If
```
theoraenc
```
/
```
oggmux
```
are absent, skip video creation (
```
DS_SINGLE_STREAM_MODE=skipped
```
). Report which mode was used:
```
nvv4l2h264enc
```
/
```
theoraenc-fallback
```
/
```
skipped
```
.
Video source (MANDATORY) — default is always
```
sample_720p.mp4
```
(1280×720). Never autonomously substitute
```
sample_1080p_h264.mp4
```
or any other file. Only use a different video when the user explicitly provides a path (via
```
DS_VIDEO
```
env var or script argument).

引擎命名 — 必须使用
```
{model}_dynamic_b{MAX_BS}.engine
```
格式。禁止使用无后缀的
```
model_dynamic.engine
```
。
批大小等于流数量 — 在DS运行时，
```
batch-size
```
必须与流数量保持一致。
日志文件名固定 — 必须使用
```
trtexec_b1.log
```
、
```
trtexec_b${MAX_BS}.log
```
、
```
ds_s${N}_run1.log
```
、
```
ds_s${N}_run2.log
```
。禁止添加时间戳。报告生成程序会读取固定路径。
解析器零初始化 — 必须使用
```
NvDsInferObjectDetectionInfo obj = {};
```
。这是DS 9.0 OBB支持的必需操作；仅声明
```
obj;
```
会导致
```
rotation_angle
```
未初始化，进而产生倾斜边界框。
KITTI验证关卡 — 若KITTI帧数量为0或检测率低于90%，请勿进入步骤7。
共享虚拟环境 —
```
build/.venv_optimum
```
需在所有模型间复用。禁止为每个模型单独创建虚拟环境。
trtexec
--noDataTransfers
参数 — 仅GPU计算模式需匹配DeepStream的GPU到GPU数据流。
报告HTML+PDF格式 — 必须使用
```
skills/deepstream-import-vision-model/scripts/report/md-to-html-pdf.py
```
生成。禁止自定义HTML生成器或直接调用
```
wkhtmltopdf
```
。
仅支持目标检测 — 在构建任何内容前，需拒绝
```
config.json
```
中的非检测架构模型。
编码器降级方案（强制要求） — 禁止使用
```
x264enc
```
和
```
openh264enc
```
。在无法使用NVENC的系统上，使用
```
theoraenc + oggmux
```
（LGPL协议；随gst-plugins-base一同发布；输出格式为
```
.ogv
```
）。若
```
theoraenc
```
/
```
oggmux
```
不可用，则跳过视频生成（
```
DS_SINGLE_STREAM_MODE=skipped
```
）。需在报告中说明使用的模式：
```
nvv4l2h264enc
```
/
```
theoraenc-fallback
```
/
```
skipped
```
。
视频源（强制要求） — 默认视频源始终为
```
sample_720p.mp4
```
（1280×720）。禁止自动替换为
```
sample_1080p_h264.mp4
```
或其他文件。仅当用户明确提供路径（通过
```
DS_VIDEO
```
环境变量或脚本参数）时，才可使用其他视频。

Pipeline Timing

流水线计时

Wrap every step:

bash

STEP_START=$(date +%s.%N)

每个步骤需包裹计时代码：

bash

STEP_START=$(date +%s.%N)

... step commands ...

... 步骤命令 ...

STEP_END=$(date +%s.%N) STEP_DURATION=$(echo "$STEP_END - $STEP_START" | bc) echo "[Step N] completed in ${STEP_DURATION}s"


Track `PIPELINE_START` (before Step 1) and `PIPELINE_END` (after Step 8). Report all durations in the benchmark report.

STEP_END=$(date +%s.%N) STEP_DURATION=$(echo "$STEP_END - $STEP_START" | bc) echo "[步骤N] 完成耗时 ${STEP_DURATION}s"


记录`PIPELINE_START`（步骤1前）和`PIPELINE_END`（步骤8后）。所有耗时需在基准测试报告中体现。

Report Output (MANDATORY — all 3 formats)

报告输出（强制要求 — 三种格式）

```
benchmark_report.md
```
— markdown source (12 mandatory sections)
```
benchmark_report.html
```
— styled HTML (charts base64-inlined, no local file access)

benchmark_report_{model_name}.pdf

— via

md-to-html-pdf.py

; verify charts are embedded by counting

data:image/png

occurrences in the HTML output:

grep -o 'data:image/png' benchmark_report.html | wc -l

should equal 5

Run charts and report scripts with the shared venv active:

source build/.venv_optimum/bin/activate

```
benchmark_report.md
```
— Markdown源文件（包含12个必填章节）
```
benchmark_report.html
```
— 带样式的HTML（图表以base64内联，无本地文件依赖）

benchmark_report_{model_name}.pdf

— 通过

md-to-html-pdf.py

生成；需验证图表是否嵌入，可通过统计HTML输出中

data:image/png

的出现次数：

grep -o 'data:image/png' benchmark_report.html | wc -l

结果应等于5

运行图表和报告脚本时需激活共享虚拟环境：

source build/.venv_optimum/bin/activate

。

Reference Documents

参考文档

IMPORTANT: Read the relevant reference before starting each phase. Do NOT generate code from memory.

Document	Use When
references/model-acquire.md	Steps 1–3: HF/NGC URL parsing, format detection, ONNX download, SafeTensors export, label extraction
references/engine-build.md	Steps 4–5: trtexec engine build, benchmarks, PEAK_GPU_STREAMS derivation, iterative scaling
references/pipeline-run.md	Steps 6–7: custom bbox parser, nvinfer config, single-stream validation, KITTI dump, multi-stream benchmark
references/report-generation.md	Step 8: benchmark_data.json, 5 charts, 12-section markdown report, HTML + PDF

重要提示：每个阶段开始前需阅读对应参考文档。禁止凭记忆生成代码。

文档	使用场景
references/model-acquire.md	步骤1–3：HF/NGC URL解析、格式检测、ONNX下载、SafeTensors导出、标签提取
references/engine-build.md	步骤4–5：trtexec引擎构建、基准测试、PEAK_GPU_STREAMS推导、迭代缩放
references/pipeline-run.md	步骤6–7：自定义边界框解析器、nvinfer配置、单流验证、KITTI导出、多流基准测试
references/report-generation.md	步骤8：benchmark_data.json、5张图表、12章节Markdown报告、HTML + PDF生成

Scripts

脚本

Located in

scripts/

Script	Phase	Purpose
`model/hf-list-files.sh`	1–3	List HuggingFace repo files
`model/hf-download-config.sh`	1–3	Download config.json from HF
`model/ngc-list-files.sh`	1–3	List NGC model files
`model/ngc-download.sh`	1–3	Download NGC model archive
`model/safetensors-to-onnx.sh`	1–3	Export SafeTensors → ONNX via optimum-cli
`model/inspect-onnx.py`	1–5	Inspect ONNX input/output shapes
`model/make-static-batch-onnx.py`	4–5	Bake batch dim into ONNX
`model/cleanup.sh`	Any	Remove staging dirs, preserve shared venv
`engine/benchmark-trtexec.sh`	4–5	Run trtexec with standard flags
`deepstream/ds-single-stream.sh`	6–7	Single-stream visual validation (NVENC primary; theoraenc+oggmux fallback; skip if neither)
`deepstream/ds-sweep.sh`	6–7	2-phase batch size sweep
`deepstream/benchmark-ds.sh`	6–7	Fixed-stream DS benchmark
`deepstream/ds-kitti-dump.sh`	6–7	KITTI detection dump via deepstream-app
`deepstream/ds-perf-run.sh`	7	Step 7c two-run benchmark — wraps `deepstream-app` with `enable-perf-measurement=1` , writes fixed-name log for the report parser
`deepstream/extract-frame.sh`	6–7	Extract sample frames from output video ( `.mp4` NVENC path or `.ogv` theoraenc fallback)
`report/generate-benchmark-charts.py`	8	Generate 5 benchmark PNG charts
`report/md-to-html-pdf.py`	8	Markdown → styled HTML → PDF (canonical benchmark report path)
`report/md-to-pdf.sh`	Any	Markdown → PDF via pandoc/pdflatex — for design docs and references only, NOT for benchmark reports (use md-to-html-pdf.py for those)
`report/report-style.css`	8	CSS for HTML report
`report/render-mermaid-for-pdf.py`	8	Mermaid diagram → PNG
`report/mermaid-puppeteer.json`	8	Vetted Puppeteer config for Mermaid (sandboxed; non-root)
`report/mermaid-puppeteer-root.json`	8	Vetted Puppeteer config for Mermaid (used when running as root)

所有脚本位于

scripts/

目录下。

脚本	阶段	用途
`model/hf-list-files.sh`	1–3	列出HuggingFace仓库文件
`model/hf-download-config.sh`	1–3	从HF下载config.json
`model/ngc-list-files.sh`	1–3	列出NGC模型文件
`model/ngc-download.sh`	1–3	下载NGC模型压缩包
`model/safetensors-to-onnx.sh`	1–3	通过optimum-cli将SafeTensors导出为ONNX
`model/inspect-onnx.py`	1–5	检查ONNX输入/输出形状
`model/make-static-batch-onnx.py`	4–5	将批处理维度嵌入ONNX
`model/cleanup.sh`	任意阶段	删除临时目录，保留共享虚拟环境
`engine/benchmark-trtexec.sh`	4–5	使用标准参数运行trtexec
`deepstream/ds-single-stream.sh`	6–7	单流可视化验证（优先使用NVENC；降级方案为theoraenc+oggmux；若两者均不可用则跳过）
`deepstream/ds-sweep.sh`	6–7	两阶段批大小扫描
`deepstream/benchmark-ds.sh`	6–7	固定流数DS基准测试
`deepstream/ds-kitti-dump.sh`	6–7	通过deepstream-app导出KITTI检测结果
`deepstream/ds-perf-run.sh`	7	步骤7c的两轮基准测试 — 以 `enable-perf-measurement=1` 参数包裹 `deepstream-app` ，为报告解析器写入固定名称的日志
`deepstream/extract-frame.sh`	6–7	从输出视频（NVENC路径的 `.mp4` 或theoraenc降级方案的 `.ogv` ）中提取示例帧
`report/generate-benchmark-charts.py`	8	生成5张PNG格式的基准测试图表
`report/md-to-html-pdf.py`	8	Markdown → 带样式HTML → PDF（基准测试报告的标准生成路径）
`report/md-to-pdf.sh`	任意阶段	通过pandoc/pdflatex将Markdown转为PDF — 仅用于设计文档和参考文档，禁止用于基准测试报告（此类报告请使用md-to-html-pdf.py）
`report/report-style.css`	8	HTML报告的CSS样式文件
`report/render-mermaid-for-pdf.py`	8	将Mermaid图表转为PNG
`report/mermaid-puppeteer.json`	8	经过验证的Mermaid Puppeteer配置（沙箱模式；非root用户）
`report/mermaid-puppeteer-root.json`	8	经过验证的Mermaid Puppeteer配置（root用户运行时使用）

Quick Error Reference

快速错误参考

Error	Fix
Tilted/diagonal bounding boxes	Parser struct not zero-initialized — use `NvDsInferObjectDetectionInfo obj = {};`
Zero KITTI files	`gie-kitti-output-dir` not read by nvinfer — use `ds-kitti-dump.sh` (wraps `deepstream-app` )
Engine rebuilds every DS run	`model-engine-file` path wrong — check relative path from `config/` dir
`setDimensions` negative dims	Add `infer-dims=3;H;W` to nvinfer config for dynamic ONNX models
`--memPoolSize` workspace 0.03 MiB	Use `M` suffix not `MiB` — e.g. `--memPoolSize=workspace:32768M`
ForeignNode build failure (DETR)	Use dynamo export path or run `onnxsim` — see references/engine-build.md
Zero detections	Wrong `net-scale-factor` — check model family table in references/pipeline-run.md
`No module named 'pyservicemaker'`	Install into venv: `pip install /opt/nvidia/deepstream/.../pyservicemaker*.whl`

错误	修复方案
边界框倾斜/对角线	解析器结构体未零初始化 — 使用 `NvDsInferObjectDetectionInfo obj = {};`
KITTI文件数量为0	nvinfer未读取 `gie-kitti-output-dir` — 使用 `ds-kitti-dump.sh` （包裹 `deepstream-app` ）
每次DS运行都重建引擎	`model-engine-file` 路径错误 — 检查相对于 `config/` 目录的路径
`setDimensions` 出现负维度	为动态ONNX模型的nvinfer配置添加 `infer-dims=3;H;W`
`--memPoolSize` 工作区为0.03 MiB	使用 `M` 后缀而非 `MiB` — 例如 `--memPoolSize=workspace:32768M`
ForeignNode构建失败（DETR）	使用dynamo导出路径或运行 `onnxsim` — 详见references/engine-build.md
检测结果数量为0	`net-scale-factor` 设置错误 — 查看references/pipeline-run.md中的模型家族表
`No module named 'pyservicemaker'`	在虚拟环境中安装： `pip install /opt/nvidia/deepstream/.../pyservicemaker*.whl`