Agent Skill · NVIDIA NIM

deepstream-import-vision-model

Use this skill to bring any vision model from HuggingFace or NVIDIA NGC into an NVIDIA DeepStream pipeline with end-to-end automation: ONNX download, SafeTensors export, TRT engine build, custom nvinfer bbox parser, multi-stream benchmark, and PDF report. Object detection models only.

View SKILL.md on GitHub → Source repository Provider profile

Provider: NVIDIA NIM Path in repo: skills/deepstream-import-vision-model/SKILL.md

Skill body

DeepStream Import Vision Model

When this skill is active, read the relevant reference document before starting each phase. Do not rely on memory — reference documents contain exact script paths, bash variable conventions, log filename contracts, and critical parsing rules.

Current scope: Object detection models only. Fail fast on classification, segmentation, or other architectures detected in config.json.

Pipeline Overview

Step	Phase	Reference	What it does
1–3	Model Acquire	references/model-acquire.md	Browse HF/NGC, detect format, download ONNX or export SafeTensors
4–5	Engine Build	references/engine-build.md	Build dynamic TRT engine, run trtexec BS=1 and BS=MAX_BS
6–7	DS Pipeline	references/pipeline-run.md	Custom bbox parser, nvinfer config, single-stream + multi-stream benchmarks
8	Report	references/report-generation.md	5 charts, HTML, PDF benchmark report

Run the full pipeline autonomously without pausing for confirmation at each step.

Pre-flight Checks

Run before starting:

# 1. GPU and drivers
nvidia-smi

# 2. TensorRT version match (must match between builder and DS runtime)
trtexec 2>&1 | head -3
dpkg -l | grep libnvinfer-bin

# 3. Shared Python venv — create once, reuse across all models
mkdir -p build
VENV=build/.venv_optimum
if [ ! -x "$VENV/bin/python3" ]; then
  python3 -m venv "$VENV"
  "$VENV/bin/pip" install --upgrade pip -q
  "$VENV/bin/pip" install "optimum[exporters]>=1.20,<2.0" "torch<2.12" \
    transformers onnxruntime matplotlib numpy markdown -q
fi

# 4. System tools
which wkhtmltopdf || apt-get install -y wkhtmltopdf
which mediainfo    || apt-get install -y mediainfo
which deepstream-app  # required for KITTI dump (Step 6g) and benchmark perf-measurement (Step 7c); shipped with DeepStream SDK

# 5. Sample video — only check default path when user has not provided a custom DS_VIDEO
if [ -z "$DS_VIDEO" ]; then
  [ -f /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ] || \
    echo "WARNING: sample_720p.mp4 not found. Install DeepStream samples or set DS_VIDEO=/path/to/your.mp4"
fi

Mandatory Output Structure

Create once MODEL_NAME is known (Step 1). Never dump files flat.

models/{model_name}/
  model/           <- ONNX file(s)
  parser/          <- .cpp, Makefile, .so
  config/          <- nvinfer config, ds-app config, labels.txt
  scripts/         <- run helper scripts
  benchmarks/
    engines/       <- _dynamic_b{MAX_BS}.engine, timing.cache, build logs
    b1/            <- trtexec BS=1 log
    b{MAX_BS}/     <- trtexec BS=MAX_BS log
    ds/            <- DS benchmark logs
  reports/         <- benchmark_report.md, .html, .pdf, benchmark_data.json
    charts/        <- chart_*.png (5 charts)
  samples/         <- output .mp4 or .ogv (theoraenc fallback), test frames
    kitti_output/  <- KITTI detection .txt files

mkdir -p models/$MODEL_NAME/{model,parser,config,scripts,benchmarks/engines,benchmarks/ds,reports/charts,samples/kitti_output}

Critical Rules

Engine naming — always {model}_dynamic_b{MAX_BS}.engine. Never bare model_dynamic.engine.
batch_size == num_streams — in DS runs, batch-size and stream count are always equal.
Log filenames are fixed — trtexec_b1.log, trtexec_b${MAX_BS}.log, ds_s${N}_run1.log, ds_s${N}_run2.log. No timestamps. Report generation reads exact paths.
Parser zero-init — always NvDsInferObjectDetectionInfo obj = {};. Required for DS 9.0 OBB support; bare obj; leaves rotation_angle uninitialized, causing tilted bounding boxes.
KITTI validation gate — do NOT proceed to Step 7 if KITTI frame count is zero or detection rate < 90%.
Shared venv — build/.venv_optimum reused across all models. Never create per-model venvs.
trtexec --noDataTransfers — GPU-only compute matches DeepStream’s GPU-to-GPU data flow.
Report HTML+PDF — always use skills/deepstream-import-vision-model/scripts/report/md-to-html-pdf.py. Never write a custom HTML generator or call wkhtmltopdf directly.
Object detection only — reject non-detection architectures from config.json before building anything.
Encoder fallback (MANDATORY) — x264enc and openh264enc are prohibited. On NVENC-unavailable systems, use theoraenc + oggmux (LGPL; ships in gst-plugins-base; output is .ogv). If theoraenc/oggmux are absent, skip video creation (DS_SINGLE_STREAM_MODE=skipped). Report which mode was used: nvv4l2h264enc / theoraenc-fallback / skipped.
Video source (MANDATORY) — default is always sample_720p.mp4 (1280×720). Never autonomously substitute sample_1080p_h264.mp4 or any other file. Only use a different video when the user explicitly provides a path (via DS_VIDEO env var or script argument).

Pipeline Timing

Wrap every step:

STEP_START=$(date +%s.%N)
# ... step commands ...
STEP_END=$(date +%s.%N)
STEP_DURATION=$(echo "$STEP_END - $STEP_START" | bc)
echo "[Step N] completed in ${STEP_DURATION}s"

Track PIPELINE_START (before Step 1) and PIPELINE_END (after Step 8). Report all durations in the benchmark report.

Report Output (MANDATORY — all 3 formats)

benchmark_report.md — markdown source (12 mandatory sections)
benchmark_report.html — styled HTML (charts base64-inlined, no local file access)
benchmark_report_{model_name}.pdf — via md-to-html-pdf.py; verify charts are embedded by counting data:image/png occurrences in the HTML output: grep -o 'data:image/png' benchmark_report.html | wc -l should equal 5

Run charts and report scripts with the shared venv active: source build/.venv_optimum/bin/activate.

Reference Documents

IMPORTANT: Read the relevant reference before starting each phase. Do NOT generate code from memory.

Document	Use When
references/model-acquire.md	Steps 1–3: HF/NGC URL parsing, format detection, ONNX download, SafeTensors export, label extraction
references/engine-build.md	Steps 4–5: trtexec engine build, benchmarks, PEAK_GPU_STREAMS derivation, iterative scaling
references/pipeline-run.md	Steps 6–7: custom bbox parser, nvinfer config, single-stream validation, KITTI dump, multi-stream benchmark
references/report-generation.md	Step 8: benchmark_data.json, 5 charts, 12-section markdown report, HTML + PDF

Scripts

Located in scripts/.

Script	Phase	Purpose
`model/hf-list-files.sh`	1–3	List HuggingFace repo files
`model/hf-download-config.sh`	1–3	Download config.json from HF
`model/ngc-list-files.sh`	1–3	List NGC model files
`model/ngc-download.sh`	1–3	Download NGC model archive
`model/safetensors-to-onnx.sh`	1–3	Export SafeTensors → ONNX via optimum-cli
`model/inspect-onnx.py`	1–5	Inspect ONNX input/output shapes
`model/make-static-batch-onnx.py`	4–5	Bake batch dim into ONNX
`model/cleanup.sh`	Any	Remove staging dirs, preserve shared venv
`engine/benchmark-trtexec.sh`	4–5	Run trtexec with standard flags
`deepstream/ds-single-stream.sh`	6–7	Single-stream visual validation (NVENC primary; theoraenc+oggmux fallback; skip if neither)
`deepstream/ds-sweep.sh`	6–7	2-phase batch size sweep
`deepstream/benchmark-ds.sh`	6–7	Fixed-stream DS benchmark
`deepstream/ds-kitti-dump.sh`	6–7	KITTI detection dump via deepstream-app
`deepstream/ds-perf-run.sh`	7	Step 7c two-run benchmark — wraps `deepstream-app` with `enable-perf-measurement=1`, writes fixed-name log for the report parser
`deepstream/extract-frame.sh`	6–7	Extract sample frames from output video (`.mp4` NVENC path or `.ogv` theoraenc fallback)
`report/generate-benchmark-charts.py`	8	Generate 5 benchmark PNG charts
`report/md-to-html-pdf.py`	8	Markdown → styled HTML → PDF (canonical benchmark report path)
`report/md-to-pdf.sh`	Any	Markdown → PDF via pandoc/pdflatex — for design docs and references only, NOT for benchmark reports (use md-to-html-pdf.py for those)
`report/report-style.css`	8	CSS for HTML report
`report/render-mermaid-for-pdf.py`	8	Mermaid diagram → PNG
`report/mermaid-puppeteer.json`	8	Vetted Puppeteer config for Mermaid (sandboxed; non-root)
`report/mermaid-puppeteer-root.json`	8	Vetted Puppeteer config for Mermaid (used when running as root)

Quick Error Reference

Error	Fix
Tilted/diagonal bounding boxes	Parser struct not zero-initialized — use `NvDsInferObjectDetectionInfo obj = {};`
Zero KITTI files	`gie-kitti-output-dir` not read by nvinfer — use `ds-kitti-dump.sh` (wraps `deepstream-app`)
Engine rebuilds every DS run	`model-engine-file` path wrong — check relative path from `config/` dir
`setDimensions` negative dims	Add `infer-dims=3;H;W` to nvinfer config for dynamic ONNX models
`--memPoolSize` workspace 0.03 MiB	Use `M` suffix not `MiB` — e.g. `--memPoolSize=workspace:32768M`
ForeignNode build failure (DETR)	Use dynamo export path or run `onnxsim` — see references/engine-build.md
Zero detections	Wrong `net-scale-factor` — check model family table in references/pipeline-run.md
`No module named 'pyservicemaker'`	Install into venv: `pip install /opt/nvidia/deepstream/.../pyservicemaker*.whl`

Skill frontmatter

license: CC-BY-4.0 AND Apache-2.0 metadata: {"author" => "NVIDIA CORPORATION", "version" => "1.2.1"}