deepstream-import-vision-model

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

DeepStream Import Vision Model

DeepStream 视觉模型导入

When this skill is active, read the relevant reference document before starting each phase. Do not rely on memory — reference documents contain exact script paths, bash variable conventions, log filename contracts, and critical parsing rules.
Current scope: Object detection models only. Fail fast on classification, segmentation, or other architectures detected in
config.json
.
激活该技能后,在每个阶段开始前请阅读相关参考文档。请勿依赖记忆——参考文档包含精确的脚本路径、bash变量约定、日志文件名规则以及关键解析规则。
当前适用范围:仅支持目标检测模型。若在
config.json
中检测到分类、分割或其他架构模型,将直接终止流程。

Pipeline Overview

流水线概述

StepPhaseReferenceWhat it does
1–3Model Acquirereferences/model-acquire.mdBrowse HF/NGC, detect format, download ONNX or export SafeTensors
4–5Engine Buildreferences/engine-build.mdBuild dynamic TRT engine, run trtexec BS=1 and BS=MAX_BS
6–7DS Pipelinereferences/pipeline-run.mdCustom bbox parser, nvinfer config, single-stream + multi-stream benchmarks
8Reportreferences/report-generation.md5 charts, HTML, PDF benchmark report
Run the full pipeline autonomously without pausing for confirmation at each step.
步骤阶段参考文档功能说明
1–3模型获取references/model-acquire.md浏览HF/NGC、检测模型格式、下载ONNX或导出SafeTensors
4–5引擎构建references/engine-build.md构建动态TRT引擎、运行trtexec批大小为1和MAX_BS的测试
6–7DS流水线references/pipeline-run.md自定义边界框解析器、nvinfer配置、单流+多流基准测试
8报告生成references/report-generation.md生成5种图表、HTML及PDF格式的基准测试报告
全程自动运行完整流水线,无需在每个步骤暂停等待确认。

Pre-flight Checks

预检查

Run before starting:
bash
undefined
开始前执行以下检查:
bash
undefined

1. GPU and drivers

1. GPU及驱动

nvidia-smi
nvidia-smi

2. TensorRT version match (must match between builder and DS runtime)

2. TensorRT版本匹配(构建器与DS运行时版本必须一致)

trtexec 2>&1 | head -3 dpkg -l | grep libnvinfer-bin
trtexec 2>&1 | head -3 dpkg -l | grep libnvinfer-bin

3. Shared Python venv — create once, reuse across all models

3. 共享Python虚拟环境 — 仅创建一次,所有模型复用

mkdir -p build VENV=build/.venv_optimum if [ ! -x "$VENV/bin/python3" ]; then python3 -m venv "$VENV" "$VENV/bin/pip" install --upgrade pip -q "$VENV/bin/pip" install "optimum[exporters]>=1.20,<2.0" "torch<2.12"
transformers onnxruntime matplotlib numpy markdown -q fi
mkdir -p build VENV=build/.venv_optimum if [ ! -x "$VENV/bin/python3" ]; then python3 -m venv "$VENV" "$VENV/bin/pip" install --upgrade pip -q "$VENV/bin/pip" install "optimum[exporters]>=1.20,<2.0" "torch<2.12"
transformers onnxruntime matplotlib numpy markdown -q fi

4. System tools

4. 系统工具

which wkhtmltopdf || apt-get install -y wkhtmltopdf which mediainfo || apt-get install -y mediainfo which deepstream-app # required for KITTI dump (Step 6g) and benchmark perf-measurement (Step 7c); shipped with DeepStream SDK
which wkhtmltopdf || apt-get install -y wkhtmltopdf which mediainfo || apt-get install -y mediainfo which deepstream-app # KITTI导出(步骤6g)和基准性能测试(步骤7c)必需;随DeepStream SDK一同发布

5. Sample video — only check default path when user has not provided a custom DS_VIDEO

5. 示例视频 — 仅当用户未提供自定义DS_VIDEO时检查默认路径

if [ -z "$DS_VIDEO" ]; then [ -f /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ] ||
echo "WARNING: sample_720p.mp4 not found. Install DeepStream samples or set DS_VIDEO=/path/to/your.mp4" fi
undefined
if [ -z "$DS_VIDEO" ]; then [ -f /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ] ||
echo "警告:未找到sample_720p.mp4。请安装DeepStream示例或设置DS_VIDEO=/path/to/your.mp4" fi
undefined

Mandatory Output Structure

强制输出结构

Create once
MODEL_NAME
is known (Step 1). Never dump files flat.
models/{model_name}/
  model/           <- ONNX file(s)
  parser/          <- .cpp, Makefile, .so
  config/          <- nvinfer config, ds-app config, labels.txt
  scripts/         <- run helper scripts
  benchmarks/
    engines/       <- _dynamic_b{MAX_BS}.engine, timing.cache, build logs
    b1/            <- trtexec BS=1 log
    b{MAX_BS}/     <- trtexec BS=MAX_BS log
    ds/            <- DS benchmark logs
  reports/         <- benchmark_report.md, .html, .pdf, benchmark_data.json
    charts/        <- chart_*.png (5 charts)
  samples/         <- output .mp4 or .ogv (theoraenc fallback), test frames
    kitti_output/  <- KITTI detection .txt files
bash
mkdir -p models/$MODEL_NAME/{model,parser,config,scripts,benchmarks/engines,benchmarks/ds,reports/charts,samples/kitti_output}
确定
MODEL_NAME
后(步骤1)立即创建以下结构。禁止将文件直接平铺存放。
models/{model_name}/
  model/           <- ONNX文件
  parser/          <- .cpp、Makefile、.so文件
  config/          <- nvinfer配置、ds-app配置、labels.txt
  scripts/         <- 运行辅助脚本
  benchmarks/
    engines/       <- _dynamic_b{MAX_BS}.engine、timing.cache、构建日志
    b1/            <- trtexec批大小为1的日志
    b{MAX_BS}/     <- trtexec批大小为MAX_BS的日志
    ds/            <- DS基准测试日志
  reports/         <- benchmark_report.md、.html、.pdf、benchmark_data.json
    charts/        <- chart_*.png(共5张图表)
  samples/         <- 输出.mp4或.ogv(theoraenc降级方案)、测试帧
    kitti_output/  <- KITTI检测结果.txt文件
bash
mkdir -p models/$MODEL_NAME/{model,parser,config,scripts,benchmarks/engines,benchmarks/ds,reports/charts,samples/kitti_output}

Critical Rules

关键规则

  1. Engine naming — always
    {model}_dynamic_b{MAX_BS}.engine
    . Never bare
    model_dynamic.engine
    .
  2. batch_size == num_streams — in DS runs,
    batch-size
    and stream count are always equal.
  3. Log filenames are fixed
    trtexec_b1.log
    ,
    trtexec_b${MAX_BS}.log
    ,
    ds_s${N}_run1.log
    ,
    ds_s${N}_run2.log
    . No timestamps. Report generation reads exact paths.
  4. Parser zero-init — always
    NvDsInferObjectDetectionInfo obj = {};
    . Required for DS 9.0 OBB support; bare
    obj;
    leaves
    rotation_angle
    uninitialized, causing tilted bounding boxes.
  5. KITTI validation gate — do NOT proceed to Step 7 if KITTI frame count is zero or detection rate < 90%.
  6. Shared venv
    build/.venv_optimum
    reused across all models. Never create per-model venvs.
  7. trtexec
    --noDataTransfers
    — GPU-only compute matches DeepStream's GPU-to-GPU data flow.
  8. Report HTML+PDF — always use
    skills/deepstream-import-vision-model/scripts/report/md-to-html-pdf.py
    . Never write a custom HTML generator or call
    wkhtmltopdf
    directly.
  9. Object detection only — reject non-detection architectures from
    config.json
    before building anything.
  10. Encoder fallback (MANDATORY)
    x264enc
    and
    openh264enc
    are prohibited. On NVENC-unavailable systems, use
    theoraenc + oggmux
    (LGPL; ships in gst-plugins-base; output is
    .ogv
    ). If
    theoraenc
    /
    oggmux
    are absent, skip video creation (
    DS_SINGLE_STREAM_MODE=skipped
    ). Report which mode was used:
    nvv4l2h264enc
    /
    theoraenc-fallback
    /
    skipped
    .
  11. Video source (MANDATORY) — default is always
    sample_720p.mp4
    (1280×720). Never autonomously substitute
    sample_1080p_h264.mp4
    or any other file. Only use a different video when the user explicitly provides a path (via
    DS_VIDEO
    env var or script argument).
  1. 引擎命名 — 必须使用
    {model}_dynamic_b{MAX_BS}.engine
    格式。禁止使用无后缀的
    model_dynamic.engine
  2. 批大小等于流数量 — 在DS运行时,
    batch-size
    必须与流数量保持一致。
  3. 日志文件名固定 — 必须使用
    trtexec_b1.log
    trtexec_b${MAX_BS}.log
    ds_s${N}_run1.log
    ds_s${N}_run2.log
    。禁止添加时间戳。报告生成程序会读取固定路径。
  4. 解析器零初始化 — 必须使用
    NvDsInferObjectDetectionInfo obj = {};
    。这是DS 9.0 OBB支持的必需操作;仅声明
    obj;
    会导致
    rotation_angle
    未初始化,进而产生倾斜边界框。
  5. KITTI验证关卡 — 若KITTI帧数量为0或检测率低于90%,请勿进入步骤7。
  6. 共享虚拟环境
    build/.venv_optimum
    需在所有模型间复用。禁止为每个模型单独创建虚拟环境。
  7. trtexec
    --noDataTransfers
    参数
    — 仅GPU计算模式需匹配DeepStream的GPU到GPU数据流。
  8. 报告HTML+PDF格式 — 必须使用
    skills/deepstream-import-vision-model/scripts/report/md-to-html-pdf.py
    生成。禁止自定义HTML生成器或直接调用
    wkhtmltopdf
  9. 仅支持目标检测 — 在构建任何内容前,需拒绝
    config.json
    中的非检测架构模型。
  10. 编码器降级方案(强制要求) — 禁止使用
    x264enc
    openh264enc
    。在无法使用NVENC的系统上,使用
    theoraenc + oggmux
    (LGPL协议;随gst-plugins-base一同发布;输出格式为
    .ogv
    )。若
    theoraenc
    /
    oggmux
    不可用,则跳过视频生成(
    DS_SINGLE_STREAM_MODE=skipped
    )。需在报告中说明使用的模式:
    nvv4l2h264enc
    /
    theoraenc-fallback
    /
    skipped
  11. 视频源(强制要求) — 默认视频源始终为
    sample_720p.mp4
    (1280×720)。禁止自动替换为
    sample_1080p_h264.mp4
    或其他文件。仅当用户明确提供路径(通过
    DS_VIDEO
    环境变量或脚本参数)时,才可使用其他视频。

Pipeline Timing

流水线计时

Wrap every step:
bash
STEP_START=$(date +%s.%N)
每个步骤需包裹计时代码:
bash
STEP_START=$(date +%s.%N)

... step commands ...

... 步骤命令 ...

STEP_END=$(date +%s.%N) STEP_DURATION=$(echo "$STEP_END - $STEP_START" | bc) echo "[Step N] completed in ${STEP_DURATION}s"

Track `PIPELINE_START` (before Step 1) and `PIPELINE_END` (after Step 8). Report all durations in the benchmark report.
STEP_END=$(date +%s.%N) STEP_DURATION=$(echo "$STEP_END - $STEP_START" | bc) echo "[步骤N] 完成耗时 ${STEP_DURATION}s"

记录`PIPELINE_START`(步骤1前)和`PIPELINE_END`(步骤8后)。所有耗时需在基准测试报告中体现。

Report Output (MANDATORY — all 3 formats)

报告输出(强制要求 — 三种格式)

  1. benchmark_report.md
    — markdown source (12 mandatory sections)
  2. benchmark_report.html
    — styled HTML (charts base64-inlined, no local file access)
  3. benchmark_report_{model_name}.pdf
    — via
    md-to-html-pdf.py
    ; verify charts are embedded by counting
    data:image/png
    occurrences in the HTML output:
    grep -o 'data:image/png' benchmark_report.html | wc -l
    should equal 5
Run charts and report scripts with the shared venv active:
source build/.venv_optimum/bin/activate
.
  1. benchmark_report.md
    — Markdown源文件(包含12个必填章节)
  2. benchmark_report.html
    — 带样式的HTML(图表以base64内联,无本地文件依赖)
  3. benchmark_report_{model_name}.pdf
    — 通过
    md-to-html-pdf.py
    生成;需验证图表是否嵌入,可通过统计HTML输出中
    data:image/png
    的出现次数:
    grep -o 'data:image/png' benchmark_report.html | wc -l
    结果应等于5
运行图表和报告脚本时需激活共享虚拟环境:
source build/.venv_optimum/bin/activate

Reference Documents

参考文档

IMPORTANT: Read the relevant reference before starting each phase. Do NOT generate code from memory.
DocumentUse When
references/model-acquire.mdSteps 1–3: HF/NGC URL parsing, format detection, ONNX download, SafeTensors export, label extraction
references/engine-build.mdSteps 4–5: trtexec engine build, benchmarks, PEAK_GPU_STREAMS derivation, iterative scaling
references/pipeline-run.mdSteps 6–7: custom bbox parser, nvinfer config, single-stream validation, KITTI dump, multi-stream benchmark
references/report-generation.mdStep 8: benchmark_data.json, 5 charts, 12-section markdown report, HTML + PDF
重要提示:每个阶段开始前需阅读对应参考文档。禁止凭记忆生成代码。
文档使用场景
references/model-acquire.md步骤1–3:HF/NGC URL解析、格式检测、ONNX下载、SafeTensors导出、标签提取
references/engine-build.md步骤4–5:trtexec引擎构建、基准测试、PEAK_GPU_STREAMS推导、迭代缩放
references/pipeline-run.md步骤6–7:自定义边界框解析器、nvinfer配置、单流验证、KITTI导出、多流基准测试
references/report-generation.md步骤8:benchmark_data.json、5张图表、12章节Markdown报告、HTML + PDF生成

Scripts

脚本

Located in
scripts/
.
ScriptPhasePurpose
model/hf-list-files.sh
1–3List HuggingFace repo files
model/hf-download-config.sh
1–3Download config.json from HF
model/ngc-list-files.sh
1–3List NGC model files
model/ngc-download.sh
1–3Download NGC model archive
model/safetensors-to-onnx.sh
1–3Export SafeTensors → ONNX via optimum-cli
model/inspect-onnx.py
1–5Inspect ONNX input/output shapes
model/make-static-batch-onnx.py
4–5Bake batch dim into ONNX
model/cleanup.sh
AnyRemove staging dirs, preserve shared venv
engine/benchmark-trtexec.sh
4–5Run trtexec with standard flags
deepstream/ds-single-stream.sh
6–7Single-stream visual validation (NVENC primary; theoraenc+oggmux fallback; skip if neither)
deepstream/ds-sweep.sh
6–72-phase batch size sweep
deepstream/benchmark-ds.sh
6–7Fixed-stream DS benchmark
deepstream/ds-kitti-dump.sh
6–7KITTI detection dump via deepstream-app
deepstream/ds-perf-run.sh
7Step 7c two-run benchmark — wraps
deepstream-app
with
enable-perf-measurement=1
, writes fixed-name log for the report parser
deepstream/extract-frame.sh
6–7Extract sample frames from output video (
.mp4
NVENC path or
.ogv
theoraenc fallback)
report/generate-benchmark-charts.py
8Generate 5 benchmark PNG charts
report/md-to-html-pdf.py
8Markdown → styled HTML → PDF (canonical benchmark report path)
report/md-to-pdf.sh
AnyMarkdown → PDF via pandoc/pdflatex — for design docs and references only, NOT for benchmark reports (use md-to-html-pdf.py for those)
report/report-style.css
8CSS for HTML report
report/render-mermaid-for-pdf.py
8Mermaid diagram → PNG
report/mermaid-puppeteer.json
8Vetted Puppeteer config for Mermaid (sandboxed; non-root)
report/mermaid-puppeteer-root.json
8Vetted Puppeteer config for Mermaid (used when running as root)
所有脚本位于
scripts/
目录下。
脚本阶段用途
model/hf-list-files.sh
1–3列出HuggingFace仓库文件
model/hf-download-config.sh
1–3从HF下载config.json
model/ngc-list-files.sh
1–3列出NGC模型文件
model/ngc-download.sh
1–3下载NGC模型压缩包
model/safetensors-to-onnx.sh
1–3通过optimum-cli将SafeTensors导出为ONNX
model/inspect-onnx.py
1–5检查ONNX输入/输出形状
model/make-static-batch-onnx.py
4–5将批处理维度嵌入ONNX
model/cleanup.sh
任意阶段删除临时目录,保留共享虚拟环境
engine/benchmark-trtexec.sh
4–5使用标准参数运行trtexec
deepstream/ds-single-stream.sh
6–7单流可视化验证(优先使用NVENC;降级方案为theoraenc+oggmux;若两者均不可用则跳过)
deepstream/ds-sweep.sh
6–7两阶段批大小扫描
deepstream/benchmark-ds.sh
6–7固定流数DS基准测试
deepstream/ds-kitti-dump.sh
6–7通过deepstream-app导出KITTI检测结果
deepstream/ds-perf-run.sh
7步骤7c的两轮基准测试 — 以
enable-perf-measurement=1
参数包裹
deepstream-app
,为报告解析器写入固定名称的日志
deepstream/extract-frame.sh
6–7从输出视频(NVENC路径的
.mp4
或theoraenc降级方案的
.ogv
)中提取示例帧
report/generate-benchmark-charts.py
8生成5张PNG格式的基准测试图表
report/md-to-html-pdf.py
8Markdown → 带样式HTML → PDF(基准测试报告的标准生成路径)
report/md-to-pdf.sh
任意阶段通过pandoc/pdflatex将Markdown转为PDF — 仅用于设计文档和参考文档,禁止用于基准测试报告(此类报告请使用md-to-html-pdf.py)
report/report-style.css
8HTML报告的CSS样式文件
report/render-mermaid-for-pdf.py
8将Mermaid图表转为PNG
report/mermaid-puppeteer.json
8经过验证的Mermaid Puppeteer配置(沙箱模式;非root用户)
report/mermaid-puppeteer-root.json
8经过验证的Mermaid Puppeteer配置(root用户运行时使用)

Quick Error Reference

快速错误参考

ErrorFix
Tilted/diagonal bounding boxesParser struct not zero-initialized — use
NvDsInferObjectDetectionInfo obj = {};
Zero KITTI files
gie-kitti-output-dir
not read by nvinfer — use
ds-kitti-dump.sh
(wraps
deepstream-app
)
Engine rebuilds every DS run
model-engine-file
path wrong — check relative path from
config/
dir
setDimensions
negative dims
Add
infer-dims=3;H;W
to nvinfer config for dynamic ONNX models
--memPoolSize
workspace 0.03 MiB
Use
M
suffix not
MiB
— e.g.
--memPoolSize=workspace:32768M
ForeignNode build failure (DETR)Use dynamo export path or run
onnxsim
— see references/engine-build.md
Zero detectionsWrong
net-scale-factor
— check model family table in references/pipeline-run.md
No module named 'pyservicemaker'
Install into venv:
pip install /opt/nvidia/deepstream/.../pyservicemaker*.whl
错误修复方案
边界框倾斜/对角线解析器结构体未零初始化 — 使用
NvDsInferObjectDetectionInfo obj = {};
KITTI文件数量为0nvinfer未读取
gie-kitti-output-dir
— 使用
ds-kitti-dump.sh
(包裹
deepstream-app
每次DS运行都重建引擎
model-engine-file
路径错误 — 检查相对于
config/
目录的路径
setDimensions
出现负维度
为动态ONNX模型的nvinfer配置添加
infer-dims=3;H;W
--memPoolSize
工作区为0.03 MiB
使用
M
后缀而非
MiB
— 例如
--memPoolSize=workspace:32768M
ForeignNode构建失败(DETR)使用dynamo导出路径或运行
onnxsim
— 详见references/engine-build.md
检测结果数量为0
net-scale-factor
设置错误 — 查看references/pipeline-run.md中的模型家族表
No module named 'pyservicemaker'
在虚拟环境中安装:
pip install /opt/nvidia/deepstream/.../pyservicemaker*.whl