Agent Skill · NVIDIA NIM

physical-ai-video-data-augmentation

Use when running video data augmentation and auto-labeling workflows on OSMO: flow selection, preflight, submit-time interpolation, monitoring, and output retrieval. Trigger keywords: video data augmentation, data enrichment, auto labeling, VDA demo, OSMO workflow, pseudo labeling.

View SKILL.md on GitHub → Source repository Provider profile

Provider: NVIDIA NIM Path in repo: skills/physical-ai-video-data-augmentation/SKILL.md

Skill body

Physical AI Video Data Augmentation Workflow Orchestrator

Default workflow skill for VDA execution on OSMO. It owns flow selection, preflight, cache readiness, inference-path decisions, submit-time interpolation, monitoring, and output retrieval. Component skills are consult-only.

Purpose

Run the end-to-end VDA workflow safely and reproducibly from preflight to output download.

Do NOT use this skill for container-internal tuning-only questions.

Prerequisites

Confirm these before running preflight or any submit. Missing required secrets surface as USER_INPUT_REQUIRED: from scripts/preflight_credentials.sh.

Requirement	How it is satisfied	Used for
NGC API key (optional)	`NGC_API_KEY`, `NGC_CLI_API_KEY`, or compatible `nvapi-*` token in `NVIDIA_API_KEY`/`OPENAI_API_KEY`/`VLM_API_KEY`/`LLM_API_KEY`	Optional for `nvcr_io` credential refresh and NGC REST scope probe; default VDA image refs are validated via workflow registry probes
Hugging Face token	`HF_TOKEN` (or `HUGGING_FACE_HUB_TOKEN`), or a cached token at `~/.cache/huggingface/token`	Creates the OSMO `hf_token` credential; pulls gated Cosmos/SeedVR weights
OSMO CLI access	`osmo` on `PATH`, logged in, with a default profile and a registered DATA credential profile matching `storage_url`	Submitting/monitoring workflows and listing/downloading objects
GPU pool	At least one `ONLINE` pool in `osmo pool list --mode free`; `POD_TEMPLATE` carries GPU toleration/selectors	Scheduling setup + worker tasks

Optional (only for the strict NGC org/team probe): NGC_ORG + NGC_TEAM (or NGC_CLI_ORG / NGC_CLI_TEAM). External VLM/LLM endpoint keys are validated separately, not by preflight.

Key handling rule: nvapi-* tokens are first-class inputs for nvcr_io. Never reject by token prefix alone; use workflow registry probe results as source of truth.

Instructions

Select the workflow (auto_labeling, augmentation_and_al, e2e, e2e_super_resolution) from user intent.
Provide a tentative execution-time overview before starting run actions.
Run preflight and readiness checks before submit.
Derive submit-time values from the active dataset backend (never guess storage_url).
Submit the workflow with explicit interpolation values and monitor to completion.
Retrieve outputs, provide side-by-side comparison evidence for augmented flows, and summarize task outcomes.

Use run_script(...) for script execution. Canonical examples:

run_script("bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/augmentation_and_al.yaml")
run_script("python3 scripts/pre_submit_guard.py --workflow assets/configs/osmo/auto_labeling.yaml")
run_script("bash scripts/prepare_demo_assets.sh /srv/sdg/data/vda_inputs")

Available Scripts

Use script-level --help for exact arguments.

Script	Role
`scripts/preflight_credentials.sh`	Secrets/control-plane preflight and workflow image access checks
`scripts/pre_submit_guard.py`	Submit-time interpolation, cache, and dataset safety checks
`scripts/prepare_demo_assets.sh`	Demo video pull + flatten for default demo path
`scripts/generate_configs.py`	Setup-time config and cookbook projection generation
`scripts/cosmos_worker.sh`	Augmentation worker execution
`scripts/pl_original_worker.sh`	Original-video auto-labeling worker execution
`scripts/pl_augmented_worker.sh`	Augmented-video auto-labeling worker execution
`scripts/osmo_barrier.py`	Multi-node barrier synchronization
`scripts/stage_run_artifacts.sh`	Local mirror of full run output + input video
`scripts/render_side_by_side.sh`	Side-by-side comparison render from local artifacts

Supported Flows

Flow	OSMO YAML	Group sequence	Typical use
`augmentation_and_al`	`assets/configs/osmo/augmentation_and_al.yaml`	setup -> augmentation -> auto_labeling_augmented	Augment one or more videos, then auto-label augmented outputs
`auto_labeling`	`assets/configs/osmo/auto_labeling.yaml`	setup -> auto_labeling	Label original videos only
`e2e`	`assets/configs/osmo/e2e.yaml`	setup -> (auto_labeling_original + augmentation) -> auto_labeling_augmented	Throughput-first path
`e2e_super_resolution`	`assets/configs/osmo/e2e_super_resolution.yaml`	setup -> auto_labeling_original -> augmentation -> auto_labeling_augmented	Sequential path with SR gate before augmentation

Legacy alias assets/configs/osmo/augmentation_and_pl.yaml remains for backwards compatibility.

Pick the right workflow for the user’s request

User intent	Workflow
“Label my source videos” / “PL-only” / “no augmentation”	`auto_labeling`
“Create augmented videos and label them”	`augmentation_and_al`
“Run the full pipeline quickly”	`e2e`
“Run full pipeline, but gate on SR-enhanced originals first”	`e2e_super_resolution`

Disambiguation: handle vague requests before committing

Default to autonomy: ask only when missing information blocks execution.

Autonomous defaults (do NOT ask)

If dataset source is absent, run VDA demo path (scripts/prepare_demo_assets.sh) and continue with dataset=vda-demo.
If flow is not explicitly requested, default to augmentation_and_al.
If endpoint mode is unspecified, default to in-cluster persistent NIM reuse and automatic NIM deploy/repair when unhealthy.
If cache is missing, run setup_model_cache.yaml, rerun pre-submit guard, and continue automatically on success.
After any stage completes successfully, continue to the next stage immediately. Do not pause with “Ready when you are” or equivalent approval prompts.

Triggers that should pause for disambiguation

Missing input	Why it matters	Ask
`USER_INPUT_REQUIRED` from preflight	Required secret is missing	Ask one concise unblock question for exactly the missing value(s)
Storage backend prefix cannot be derived from the active dataset/upload root	Wrong scheme causes runtime storage auth mismatch	“What is the backend-native root prefix for this run?”
No ONLINE GPU pool/platform can be selected	Workflow cannot schedule setup/workers	“Which GPU pool/platform should this run target?”

When NOT to disambiguate

Do not ask for cookbook unless user explicitly asks to change scene profile.
Do not offer external endpoints by default.
Do not ask A/B cache strategy questions; default is automatic cache setup.
Do not ask to scale down existing NIMs; this is forbidden.
Do not invent, scrape, or generate random videos when input is missing.
Do not use non-VDA demo sources (for example Carline adaptation assets) unless the user explicitly requests a different dataset.

Step 0: Select Flow and Gather Inputs

Input video policy (non-negotiable)

Always preserve user-provided video inputs (dataset URL, local path, or upload folder) as first-class and preferred.
Never replace an explicit user video with demo assets or any other source.
If no video input is provided, default to VDA demo assets via scripts/prepare_demo_assets.sh (HF dataset flow) without asking extra source-selection questions.
If the user explicitly mentions an input video or dataset, prefer and use that input instead of demo assets.
Use only VDA demo assets (nvidia/video-data-augmentation-demo) for the default demo path.
Never propose arbitrary web clip downloads or placeholder videos unless the user explicitly requests that behavior.

Collect only missing values:

Dataset source (prefer explicit user-provided dataset_url or local upload folder; otherwise default to VDA demo assets and proceed).
Flow (auto_labeling, augmentation_and_al, e2e, e2e_super_resolution); default to augmentation_and_al when unspecified.
OSMO gpu_platform for all VDA resources (auto-select an ONLINE platform when unambiguous; ask only when no valid option exists).
Endpoint mode (default in-cluster NIM reuse/deploy unless explicitly overridden).

Do not guess gpu_platform (for example microk8s). Use the exact current platform label shown by osmo pool list --mode free (for example gpu).

Generate run stamp before each submit:

STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
RUN_ID="run-$STAMP"

Execution Time Overview (required before run)

Before running any mutating command (osmo credential set, NIM install/repair, cache workflow submit, or target VDA workflow submit), provide a short ETA overview to the user.

Keep it concise (one short paragraph or 4-6 bullets) and include:

whether this looks like a cold start (NIM/cache missing) or warm start (NIM/cache already healthy),
major phases with approximate durations,
a total expected range for the selected workflow.

Baseline ranges (from observed MicroK8s + OSMO runs):

Phase	Typical duration
Credentials + preflight	~1-2 min
NIM deploy/download/warmup (if needed)	~10-15 min
Demo assets download/upload (if demo path)	~1-3 min
Model cache population (if needed)	~15-25 min
Workflow submit + queue/start	~1-3 min

Workflow runtime ranges after submit:

Flow	Typical runtime
`auto_labeling`	~6-15 min
`augmentation_and_al`	~20-35 min
`e2e`	~22-40 min
`e2e_super_resolution`	~25-45 min

Cold-start end-to-end runs are commonly ~45-80 min; warm-start runs are usually ~20-45 min depending on flow and video length.

Common Preconditions (all flows)

Credential and control-plane preflight
```
bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/<mode>.yaml
```
Restricted egress:
```
bash scripts/preflight_credentials.sh --no-probe --workflow assets/configs/osmo/<mode>.yaml
```
Preflight does not require a workload-local .env. Runtime interpolation is driven by submit-time values (dataset, run_id, gpu_platform, video, storage_url, skills_dir) supplied in one --set-string list.

Passing --workflow validates pull access for the active workflow image refs (workflow.groups[].tasks[].image) using anonymous bearer access with credential fallback when provided. If replacement NGC/HF secrets are provided in env, preflight refreshes existing nvcr_io / hf_token automatically when present. Use --refresh to force overwrite even when no new env secrets were supplied:
```
bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/<mode>.yaml --refresh
```
If output contains USER_INPUT_REQUIRED:, ask one concise unblock question and stop.

On workflow image 401/403, report registry access failure after probe checks on the listed image refs; do not claim a key family (for example nvapi-*) is categorically unsupported.
Storage interpolation policy

storage_url must be derived from the actual dataset/upload backend for the current run.
```
dataset_url=azure://storiondevxah69/osmo-workflows/datasets/vda-demo
storage_url=azure://storiondevxah69/osmo-workflows
dataset=vda-demo
```
Never silently default to stale s3:// values on non-S3 backends.
Inference policy (non-negotiable)
- Reuse healthy in-cluster persistent NIM endpoints by default.
- If missing/unhealthy, deploy automatically — this is a prerequisite, not a user decision. Do NOT pause to ask; run the install with the VDA allow-list:
```
export NIM_SERVICES="qwen3-vl qwen25-14b"
skills/physical-ai-infrastructure-setup-and-resilient-scaling/components/inference-nim-operator/scripts/install.sh
```
- See references/nim/README.md for full endpoint docs and health checks.
- External endpoints are opt-in only (explicit request or explicit URLs); only then skip the in-cluster deploy.
- Never infer external mode from credential presence.
- Never scale down/delete existing NIMs to free GPUs.

Readiness guard

osmo pool list --mode free
osmo config show POD_TEMPLATE
python3 scripts/pre_submit_guard.py --workflow assets/configs/osmo/<mode>.yaml

Cache auto-remediation

If pre_submit_guard.py reports cache failure, default action is to run:
```
osmo workflow submit assets/configs/osmo/setup_model_cache.yaml \
  --set-string storage_url=<backend-prefix> path=data
```
Then rerun pre_submit_guard.py and submit the target VDA flow only after it passes. Ask user only when backend/prefix is ambiguous or cache setup fails.
Scheduling policy

VDA templates schedule setup and workers on gpu_platform (no system pool dependency for user workloads).

Submit (all flows)

Every flow uses the same submit shape; only the workflow YAML changes. Choose the YAML for the requested flow, then run the command below. Full per-flow walkthroughs (stage matrix and flow details) live in the linked references.

Flow	Workflow YAML	Walkthrough
Augmentation + auto-labeling	`assets/configs/osmo/augmentation_and_al.yaml`	`references/flows/augmentation_and_al.md`
Auto-labeling only	`assets/configs/osmo/auto_labeling.yaml`	`references/flows/auto_labeling.md`
E2E (parallel)	`assets/configs/osmo/e2e.yaml`	`references/flows/e2e.md`
E2E (super-resolution gated)	`assets/configs/osmo/e2e_super_resolution.yaml`	`references/flows/e2e_super_resolution.md`

SKILLS_DIR="$(cd "$(git rev-parse --show-toplevel)/skills/physical-ai-video-data-augmentation" && pwd)"
STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
osmo workflow submit assets/configs/osmo/<flow>.yaml \
  --pool <pool> \
  --set-string \
    dataset=<dataset> \
    run_id=run-$STAMP \
    storage_url=<backend-prefix> \
    gpu_platform=<gpu-platform> \
    video=<video-stem> \
    cosmos_model_cache_url=<backend-prefix>/data/models/cosmos_transfer \
    auto_labeling_model_cache_url=<backend-prefix>/data/models/auto_labeling \
    skills_dir="$SKILLS_DIR"

Compatibility note:

Use exactly one --set-string flag and pass all the key/value pairs after it.
Do not repeat --set/--set-string flags in the same command; some OSMO builds only honor the last occurrence.
Do not mix --set and --set-string in one submit command.
Pass explicit *_model_cache_url values to avoid nested-template interpolation differences across OSMO environments.
Do not brute-force permutations of flags. Use this shape directly.

Common optional overrides (append key/value pairs to the same --set-string list):

cookbook=<scene_profile> \
vlm_url=<openai_base_url> \
llm_url=<openai_base_url> \
cosmos_model_cache_url=<url> \
auto_labeling_model_cache_url=<url>

The auto-labeling-only flow has no augmentation stage, so it omits cosmos_model_cache_url at runtime; passing it is harmless and keeps one submit shape across flows.

OSMO Monitoring

# Workflow status + task states
osmo workflow query <workflow_id> --format-type json \
  | jq '{status, tasks: [.groups[].tasks[] | {name, status, exit_code}]}'

# Logs for a specific task
osmo workflow logs <workflow_id> --task <task_name> -n 200

# Output retrieval
osmo data list --no-pager <output_url>
osmo data download <output_url> <local_dir>/

For completion artifacts, always mirror the full run output into workspace:

ROOT="$(git rev-parse --show-toplevel)"
RUN_LOCAL_DIR="$ROOT/media/vda/runs/<run_id>"
mkdir -p "$RUN_LOCAL_DIR"
osmo data download "<storage_url>/datasets/<dataset>-outputs/<run_id>/" "$RUN_LOCAL_DIR/"

For runs expected to exceed two minutes, send heartbeat updates at least every two minutes. For media evidence, emit one standalone MEDIA:<absolute-path> line per message bubble.

Execution continuity requirement:

Heartbeats must report progress while continuing work; they are status updates, not permission prompts.
Do not stop between green stages waiting for approval.
Pause only on blocking failures or explicit user stop/redirect.
If submit fails on interpolation, rerun once with the same canonical single-flag shape and corrected values; do not loop through ad-hoc flag experiments.

MEDIA formatting is strict:

Emit exactly one line: MEDIA:/absolute/path/to/file.mp4
Keep MEDIA: contiguous on a single line (never split across lines).
No extra text in the same bubble.
No code fences, bullets, or quotes around the directive.
If render fails: retry once from a stable workspace path, then emit PNG fallback.

Post-Run Comparison Evidence (required for augmented flows)

Applies to augmentation_and_al, e2e, and e2e_super_resolution after a successful run.

Required completion output (do not stop at raw output URLs):

Stage full outputs + input video into workspace-local path:

bash scripts/stage_run_artifacts.sh \
  --storage-url <storage_url> --dataset <dataset> --run-id <run_id> --video <video>

Render side-by-side from that local run copy:

bash scripts/render_side_by_side.sh \
  --run-local-dir "<repo>/media/vda/runs/<run_id>" --dataset <dataset> --video <video>

Emit MEDIA from the local run copy and include:
- augmentation summary from <run_local_dir>/setup_b0/configs/manifest.yaml (sampled_vars for <video>_aug0)
- auto-labeling summary from <run_local_dir>/outputs/pseudo_labeled_augmented/<video>_aug0
- for e2e / e2e_super_resolution, original-label summary from <run_local_dir>/outputs/pseudo_labeled/<video>

If ffmpeg is unavailable, emit input and augmented MEDIA from the same local run copy and still provide augmentation + auto-labeling summaries.

For demo runs (no user video provided), explicitly state that input came from nvidia/video-data-augmentation-demo.

Supporting files

Use these canonical locations:

Workflows: assets/configs/osmo/*.yaml
Runtime scripts: scripts/*.sh, scripts/*.py
Flow walkthroughs: references/flows/*.md
Setup and triage: references/setup.md, references/troubleshooting.md
Images and endpoint policy: references/container-images.md, references/nim/README.md
Cookbook tuning: assets/cookbooks/TUNING_GUIDE.md

Skill frontmatter

license: CC-BY-4.0 AND Apache-2.0 metadata: {"owner" => "NVIDIA", "service" => "data", "version" => "1.0.0", "reviewed" => "2026-05-26", "author" => "NVIDIA", "tags" => ["physical-ai", "video-data-augmentation", "auto-labeling", "cosmos"]}