deepstream-import-vision-model
Use this skill to bring any vision model from HuggingFace or NVIDIA NGC into an NVIDIA DeepStream pipeline with end-to-end automation: ONNX download, SafeTensors export, TRT engine build, custom nvinfer bbox parser, multi-stream benchmark, and PDF report. Object detection models only.
Skill body
DeepStream Import Vision Model
When this skill is active, read the relevant reference document before starting each phase. Do not rely on memory — reference documents contain exact script paths, bash variable conventions, log filename contracts, and critical parsing rules.
Current scope: Object detection models only. Fail fast on classification, segmentation, or other architectures detected in config.json.
Pipeline Overview
| Step | Phase | Reference | What it does |
|---|---|---|---|
| 1–3 | Model Acquire | references/model-acquire.md | Browse HF/NGC, detect format, download ONNX or export SafeTensors |
| 4–5 | Engine Build | references/engine-build.md | Build dynamic TRT engine, run trtexec BS=1 and BS=MAX_BS |
| 6–7 | DS Pipeline | references/pipeline-run.md | Custom bbox parser, nvinfer config, single-stream + multi-stream benchmarks |
| 8 | Report | references/report-generation.md | 5 charts, HTML, PDF benchmark report |
Run the full pipeline autonomously without pausing for confirmation at each step.
Pre-flight Checks
Run before starting:
# 1. GPU and drivers
nvidia-smi
# 2. TensorRT version match (must match between builder and DS runtime)
trtexec 2>&1 | head -3
dpkg -l | grep libnvinfer-bin
# 3. Shared Python venv — create once, reuse across all models
mkdir -p build
VENV=build/.venv_optimum
if [ ! -x "$VENV/bin/python3" ]; then
python3 -m venv "$VENV"
"$VENV/bin/pip" install --upgrade pip -q
"$VENV/bin/pip" install "optimum[exporters]>=1.20,<2.0" "torch<2.12" \
transformers onnxruntime matplotlib numpy markdown -q
fi
# 4. System tools
which wkhtmltopdf || apt-get install -y wkhtmltopdf
which mediainfo || apt-get install -y mediainfo
which deepstream-app # required for KITTI dump (Step 6g) and benchmark perf-measurement (Step 7c); shipped with DeepStream SDK
# 5. Sample video — only check default path when user has not provided a custom DS_VIDEO
if [ -z "$DS_VIDEO" ]; then
[ -f /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ] || \
echo "WARNING: sample_720p.mp4 not found. Install DeepStream samples or set DS_VIDEO=/path/to/your.mp4"
fi
Mandatory Output Structure
Create once MODEL_NAME is known (Step 1). Never dump files flat.
models/{model_name}/
model/ <- ONNX file(s)
parser/ <- .cpp, Makefile, .so
config/ <- nvinfer config, ds-app config, labels.txt
scripts/ <- run helper scripts
benchmarks/
engines/ <- _dynamic_b{MAX_BS}.engine, timing.cache, build logs
b1/ <- trtexec BS=1 log
b{MAX_BS}/ <- trtexec BS=MAX_BS log
ds/ <- DS benchmark logs
reports/ <- benchmark_report.md, .html, .pdf, benchmark_data.json
charts/ <- chart_*.png (5 charts)
samples/ <- output .mp4 or .ogv (theoraenc fallback), test frames
kitti_output/ <- KITTI detection .txt files
mkdir -p models/$MODEL_NAME/{model,parser,config,scripts,benchmarks/engines,benchmarks/ds,reports/charts,samples/kitti_output}
Critical Rules
- Engine naming — always
{model}_dynamic_b{MAX_BS}.engine. Never baremodel_dynamic.engine. - batch_size == num_streams — in DS runs,
batch-sizeand stream count are always equal. - Log filenames are fixed —
trtexec_b1.log,trtexec_b${MAX_BS}.log,ds_s${N}_run1.log,ds_s${N}_run2.log. No timestamps. Report generation reads exact paths. - Parser zero-init — always
NvDsInferObjectDetectionInfo obj = {};. Required for DS 9.0 OBB support; bareobj;leavesrotation_angleuninitialized, causing tilted bounding boxes. - KITTI validation gate — do NOT proceed to Step 7 if KITTI frame count is zero or detection rate < 90%.
- Shared venv —
build/.venv_optimumreused across all models. Never create per-model venvs. - trtexec
--noDataTransfers— GPU-only compute matches DeepStream’s GPU-to-GPU data flow. - Report HTML+PDF — always use
skills/deepstream-import-vision-model/scripts/report/md-to-html-pdf.py. Never write a custom HTML generator or callwkhtmltopdfdirectly. - Object detection only — reject non-detection architectures from
config.jsonbefore building anything. - Encoder fallback (MANDATORY) —
x264encandopenh264encare prohibited. On NVENC-unavailable systems, usetheoraenc + oggmux(LGPL; ships in gst-plugins-base; output is.ogv). Iftheoraenc/oggmuxare absent, skip video creation (DS_SINGLE_STREAM_MODE=skipped). Report which mode was used:nvv4l2h264enc/theoraenc-fallback/skipped. - Video source (MANDATORY) — default is always
sample_720p.mp4(1280×720). Never autonomously substitutesample_1080p_h264.mp4or any other file. Only use a different video when the user explicitly provides a path (viaDS_VIDEOenv var or script argument).
Pipeline Timing
Wrap every step:
STEP_START=$(date +%s.%N)
# ... step commands ...
STEP_END=$(date +%s.%N)
STEP_DURATION=$(echo "$STEP_END - $STEP_START" | bc)
echo "[Step N] completed in ${STEP_DURATION}s"
Track PIPELINE_START (before Step 1) and PIPELINE_END (after Step 8). Report all durations in the benchmark report.
Report Output (MANDATORY — all 3 formats)
benchmark_report.md— markdown source (12 mandatory sections)benchmark_report.html— styled HTML (charts base64-inlined, no local file access)benchmark_report_{model_name}.pdf— viamd-to-html-pdf.py; verify charts are embedded by countingdata:image/pngoccurrences in the HTML output:grep -o 'data:image/png' benchmark_report.html | wc -lshould equal 5
Run charts and report scripts with the shared venv active: source build/.venv_optimum/bin/activate.
Reference Documents
IMPORTANT: Read the relevant reference before starting each phase. Do NOT generate code from memory.
| Document | Use When |
|---|---|
| references/model-acquire.md | Steps 1–3: HF/NGC URL parsing, format detection, ONNX download, SafeTensors export, label extraction |
| references/engine-build.md | Steps 4–5: trtexec engine build, benchmarks, PEAK_GPU_STREAMS derivation, iterative scaling |
| references/pipeline-run.md | Steps 6–7: custom bbox parser, nvinfer config, single-stream validation, KITTI dump, multi-stream benchmark |
| references/report-generation.md | Step 8: benchmark_data.json, 5 charts, 12-section markdown report, HTML + PDF |
Scripts
Located in scripts/.
| Script | Phase | Purpose |
|---|---|---|
model/hf-list-files.sh |
1–3 | List HuggingFace repo files |
model/hf-download-config.sh |
1–3 | Download config.json from HF |
model/ngc-list-files.sh |
1–3 | List NGC model files |
model/ngc-download.sh |
1–3 | Download NGC model archive |
model/safetensors-to-onnx.sh |
1–3 | Export SafeTensors → ONNX via optimum-cli |
model/inspect-onnx.py |
1–5 | Inspect ONNX input/output shapes |
model/make-static-batch-onnx.py |
4–5 | Bake batch dim into ONNX |
model/cleanup.sh |
Any | Remove staging dirs, preserve shared venv |
engine/benchmark-trtexec.sh |
4–5 | Run trtexec with standard flags |
deepstream/ds-single-stream.sh |
6–7 | Single-stream visual validation (NVENC primary; theoraenc+oggmux fallback; skip if neither) |
deepstream/ds-sweep.sh |
6–7 | 2-phase batch size sweep |
deepstream/benchmark-ds.sh |
6–7 | Fixed-stream DS benchmark |
deepstream/ds-kitti-dump.sh |
6–7 | KITTI detection dump via deepstream-app |
deepstream/ds-perf-run.sh |
7 | Step 7c two-run benchmark — wraps deepstream-app with enable-perf-measurement=1, writes fixed-name log for the report parser |
deepstream/extract-frame.sh |
6–7 | Extract sample frames from output video (.mp4 NVENC path or .ogv theoraenc fallback) |
report/generate-benchmark-charts.py |
8 | Generate 5 benchmark PNG charts |
report/md-to-html-pdf.py |
8 | Markdown → styled HTML → PDF (canonical benchmark report path) |
report/md-to-pdf.sh |
Any | Markdown → PDF via pandoc/pdflatex — for design docs and references only, NOT for benchmark reports (use md-to-html-pdf.py for those) |
report/report-style.css |
8 | CSS for HTML report |
report/render-mermaid-for-pdf.py |
8 | Mermaid diagram → PNG |
report/mermaid-puppeteer.json |
8 | Vetted Puppeteer config for Mermaid (sandboxed; non-root) |
report/mermaid-puppeteer-root.json |
8 | Vetted Puppeteer config for Mermaid (used when running as root) |
Quick Error Reference
| Error | Fix |
|---|---|
| Tilted/diagonal bounding boxes | Parser struct not zero-initialized — use NvDsInferObjectDetectionInfo obj = {}; |
| Zero KITTI files | gie-kitti-output-dir not read by nvinfer — use ds-kitti-dump.sh (wraps deepstream-app) |
| Engine rebuilds every DS run | model-engine-file path wrong — check relative path from config/ dir |
setDimensions negative dims |
Add infer-dims=3;H;W to nvinfer config for dynamic ONNX models |
--memPoolSize workspace 0.03 MiB |
Use M suffix not MiB — e.g. --memPoolSize=workspace:32768M |
| ForeignNode build failure (DETR) | Use dynamo export path or run onnxsim — see references/engine-build.md |
| Zero detections | Wrong net-scale-factor — check model family table in references/pipeline-run.md |
No module named 'pyservicemaker' |
Install into venv: pip install /opt/nvidia/deepstream/.../pyservicemaker*.whl |