Agent Skill · NVIDIA NIM

physical-ai-video-data-augmentation

Use when running video data augmentation and auto-labeling workflows on OSMO: flow selection, preflight, submit-time interpolation, monitoring, and output retrieval. Trigger keywords: video data augmentation, data enrichment, auto labeling, VDA demo, OSMO workflow, pseudo labeling.

Provider: NVIDIA NIM Path in repo: skills/physical-ai-video-data-augmentation/SKILL.md

Skill body

Physical AI Video Data Augmentation Workflow Orchestrator

Default workflow skill for VDA execution on OSMO. It owns flow selection, preflight, cache readiness, inference-path decisions, submit-time interpolation, monitoring, and output retrieval. Component skills are consult-only.

Purpose

Run the end-to-end VDA workflow safely and reproducibly from preflight to output download.

Do NOT use this skill for container-internal tuning-only questions.

Prerequisites

Confirm these before running preflight or any submit. Missing required secrets surface as USER_INPUT_REQUIRED: from scripts/preflight_credentials.sh.

Requirement How it is satisfied Used for
NGC API key (optional) NGC_API_KEY, NGC_CLI_API_KEY, or compatible nvapi-* token in NVIDIA_API_KEY/OPENAI_API_KEY/VLM_API_KEY/LLM_API_KEY Optional for nvcr_io credential refresh and NGC REST scope probe; default VDA image refs are validated via workflow registry probes
Hugging Face token HF_TOKEN (or HUGGING_FACE_HUB_TOKEN), or a cached token at ~/.cache/huggingface/token Creates the OSMO hf_token credential; pulls gated Cosmos/SeedVR weights
OSMO CLI access osmo on PATH, logged in, with a default profile and a registered DATA credential profile matching storage_url Submitting/monitoring workflows and listing/downloading objects
GPU pool At least one ONLINE pool in osmo pool list --mode free; POD_TEMPLATE carries GPU toleration/selectors Scheduling setup + worker tasks

Optional (only for the strict NGC org/team probe): NGC_ORG + NGC_TEAM (or NGC_CLI_ORG / NGC_CLI_TEAM). External VLM/LLM endpoint keys are validated separately, not by preflight.

Key handling rule: nvapi-* tokens are first-class inputs for nvcr_io. Never reject by token prefix alone; use workflow registry probe results as source of truth.

Instructions

  1. Select the workflow (auto_labeling, augmentation_and_al, e2e, e2e_super_resolution) from user intent.
  2. Provide a tentative execution-time overview before starting run actions.
  3. Run preflight and readiness checks before submit.
  4. Derive submit-time values from the active dataset backend (never guess storage_url).
  5. Submit the workflow with explicit interpolation values and monitor to completion.
  6. Retrieve outputs, provide side-by-side comparison evidence for augmented flows, and summarize task outcomes.

Use run_script(...) for script execution. Canonical examples:

run_script("bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/augmentation_and_al.yaml")
run_script("python3 scripts/pre_submit_guard.py --workflow assets/configs/osmo/auto_labeling.yaml")
run_script("bash scripts/prepare_demo_assets.sh /srv/sdg/data/vda_inputs")

Available Scripts

Use script-level --help for exact arguments.

Script Role
scripts/preflight_credentials.sh Secrets/control-plane preflight and workflow image access checks
scripts/pre_submit_guard.py Submit-time interpolation, cache, and dataset safety checks
scripts/prepare_demo_assets.sh Demo video pull + flatten for default demo path
scripts/generate_configs.py Setup-time config and cookbook projection generation
scripts/cosmos_worker.sh Augmentation worker execution
scripts/pl_original_worker.sh Original-video auto-labeling worker execution
scripts/pl_augmented_worker.sh Augmented-video auto-labeling worker execution
scripts/osmo_barrier.py Multi-node barrier synchronization
scripts/stage_run_artifacts.sh Local mirror of full run output + input video
scripts/render_side_by_side.sh Side-by-side comparison render from local artifacts

Supported Flows

Flow OSMO YAML Group sequence Typical use
augmentation_and_al assets/configs/osmo/augmentation_and_al.yaml setup -> augmentation -> auto_labeling_augmented Augment one or more videos, then auto-label augmented outputs
auto_labeling assets/configs/osmo/auto_labeling.yaml setup -> auto_labeling Label original videos only
e2e assets/configs/osmo/e2e.yaml setup -> (auto_labeling_original + augmentation) -> auto_labeling_augmented Throughput-first path
e2e_super_resolution assets/configs/osmo/e2e_super_resolution.yaml setup -> auto_labeling_original -> augmentation -> auto_labeling_augmented Sequential path with SR gate before augmentation

Legacy alias assets/configs/osmo/augmentation_and_pl.yaml remains for backwards compatibility.

Pick the right workflow for the user’s request

User intent Workflow
“Label my source videos” / “PL-only” / “no augmentation” auto_labeling
“Create augmented videos and label them” augmentation_and_al
“Run the full pipeline quickly” e2e
“Run full pipeline, but gate on SR-enhanced originals first” e2e_super_resolution

Disambiguation: handle vague requests before committing

Default to autonomy: ask only when missing information blocks execution.

Autonomous defaults (do NOT ask)

Triggers that should pause for disambiguation

Missing input Why it matters Ask
USER_INPUT_REQUIRED from preflight Required secret is missing Ask one concise unblock question for exactly the missing value(s)
Storage backend prefix cannot be derived from the active dataset/upload root Wrong scheme causes runtime storage auth mismatch “What is the backend-native root prefix for this run?”
No ONLINE GPU pool/platform can be selected Workflow cannot schedule setup/workers “Which GPU pool/platform should this run target?”

When NOT to disambiguate

Step 0: Select Flow and Gather Inputs

Input video policy (non-negotiable)

Collect only missing values:

  1. Dataset source (prefer explicit user-provided dataset_url or local upload folder; otherwise default to VDA demo assets and proceed).
  2. Flow (auto_labeling, augmentation_and_al, e2e, e2e_super_resolution); default to augmentation_and_al when unspecified.
  3. OSMO gpu_platform for all VDA resources (auto-select an ONLINE platform when unambiguous; ask only when no valid option exists).
  4. Endpoint mode (default in-cluster NIM reuse/deploy unless explicitly overridden).

Do not guess gpu_platform (for example microk8s). Use the exact current platform label shown by osmo pool list --mode free (for example gpu).

Generate run stamp before each submit:

STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
RUN_ID="run-$STAMP"

Execution Time Overview (required before run)

Before running any mutating command (osmo credential set, NIM install/repair, cache workflow submit, or target VDA workflow submit), provide a short ETA overview to the user.

Keep it concise (one short paragraph or 4-6 bullets) and include:

Baseline ranges (from observed MicroK8s + OSMO runs):

Phase Typical duration
Credentials + preflight ~1-2 min
NIM deploy/download/warmup (if needed) ~10-15 min
Demo assets download/upload (if demo path) ~1-3 min
Model cache population (if needed) ~15-25 min
Workflow submit + queue/start ~1-3 min

Workflow runtime ranges after submit:

Flow Typical runtime
auto_labeling ~6-15 min
augmentation_and_al ~20-35 min
e2e ~22-40 min
e2e_super_resolution ~25-45 min

Cold-start end-to-end runs are commonly ~45-80 min; warm-start runs are usually ~20-45 min depending on flow and video length.

Common Preconditions (all flows)

  1. Credential and control-plane preflight

    bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/<mode>.yaml
    

    Restricted egress:

    bash scripts/preflight_credentials.sh --no-probe --workflow assets/configs/osmo/<mode>.yaml
    

    Preflight does not require a workload-local .env. Runtime interpolation is driven by submit-time values (dataset, run_id, gpu_platform, video, storage_url, skills_dir) supplied in one --set-string list.

    Passing --workflow validates pull access for the active workflow image refs (workflow.groups[].tasks[].image) using anonymous bearer access with credential fallback when provided. If replacement NGC/HF secrets are provided in env, preflight refreshes existing nvcr_io / hf_token automatically when present. Use --refresh to force overwrite even when no new env secrets were supplied:

    bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/<mode>.yaml --refresh
    

    If output contains USER_INPUT_REQUIRED:, ask one concise unblock question and stop.

    On workflow image 401/403, report registry access failure after probe checks on the listed image refs; do not claim a key family (for example nvapi-*) is categorically unsupported.

  2. Storage interpolation policy

    storage_url must be derived from the actual dataset/upload backend for the current run.

    dataset_url=azure://storiondevxah69/osmo-workflows/datasets/vda-demo
    storage_url=azure://storiondevxah69/osmo-workflows
    dataset=vda-demo
    

    Never silently default to stale s3:// values on non-S3 backends.

  3. Inference policy (non-negotiable)

    • Reuse healthy in-cluster persistent NIM endpoints by default.
    • If missing/unhealthy, deploy automatically — this is a prerequisite, not a user decision. Do NOT pause to ask; run the install with the VDA allow-list:
    export NIM_SERVICES="qwen3-vl qwen25-14b"
    skills/physical-ai-infrastructure-setup-and-resilient-scaling/components/inference-nim-operator/scripts/install.sh
    
    • See references/nim/README.md for full endpoint docs and health checks.
    • External endpoints are opt-in only (explicit request or explicit URLs); only then skip the in-cluster deploy.
    • Never infer external mode from credential presence.
    • Never scale down/delete existing NIMs to free GPUs.
  4. Readiness guard

    osmo pool list --mode free
    osmo config show POD_TEMPLATE
    python3 scripts/pre_submit_guard.py --workflow assets/configs/osmo/<mode>.yaml
    
  5. Cache auto-remediation

    If pre_submit_guard.py reports cache failure, default action is to run:

    osmo workflow submit assets/configs/osmo/setup_model_cache.yaml \
      --set-string storage_url=<backend-prefix> path=data
    

    Then rerun pre_submit_guard.py and submit the target VDA flow only after it passes. Ask user only when backend/prefix is ambiguous or cache setup fails.

  6. Scheduling policy

    VDA templates schedule setup and workers on gpu_platform (no system pool dependency for user workloads).

Submit (all flows)

Every flow uses the same submit shape; only the workflow YAML changes. Choose the YAML for the requested flow, then run the command below. Full per-flow walkthroughs (stage matrix and flow details) live in the linked references.

Flow Workflow YAML Walkthrough
Augmentation + auto-labeling assets/configs/osmo/augmentation_and_al.yaml references/flows/augmentation_and_al.md
Auto-labeling only assets/configs/osmo/auto_labeling.yaml references/flows/auto_labeling.md
E2E (parallel) assets/configs/osmo/e2e.yaml references/flows/e2e.md
E2E (super-resolution gated) assets/configs/osmo/e2e_super_resolution.yaml references/flows/e2e_super_resolution.md
SKILLS_DIR="$(cd "$(git rev-parse --show-toplevel)/skills/physical-ai-video-data-augmentation" && pwd)"
STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
osmo workflow submit assets/configs/osmo/<flow>.yaml \
  --pool <pool> \
  --set-string \
    dataset=<dataset> \
    run_id=run-$STAMP \
    storage_url=<backend-prefix> \
    gpu_platform=<gpu-platform> \
    video=<video-stem> \
    cosmos_model_cache_url=<backend-prefix>/data/models/cosmos_transfer \
    auto_labeling_model_cache_url=<backend-prefix>/data/models/auto_labeling \
    skills_dir="$SKILLS_DIR"

Compatibility note:

Common optional overrides (append key/value pairs to the same --set-string list):

cookbook=<scene_profile> \
vlm_url=<openai_base_url> \
llm_url=<openai_base_url> \
cosmos_model_cache_url=<url> \
auto_labeling_model_cache_url=<url>

The auto-labeling-only flow has no augmentation stage, so it omits cosmos_model_cache_url at runtime; passing it is harmless and keeps one submit shape across flows.

OSMO Monitoring

# Workflow status + task states
osmo workflow query <workflow_id> --format-type json \
  | jq '{status, tasks: [.groups[].tasks[] | {name, status, exit_code}]}'

# Logs for a specific task
osmo workflow logs <workflow_id> --task <task_name> -n 200

# Output retrieval
osmo data list --no-pager <output_url>
osmo data download <output_url> <local_dir>/

For completion artifacts, always mirror the full run output into workspace:

ROOT="$(git rev-parse --show-toplevel)"
RUN_LOCAL_DIR="$ROOT/media/vda/runs/<run_id>"
mkdir -p "$RUN_LOCAL_DIR"
osmo data download "<storage_url>/datasets/<dataset>-outputs/<run_id>/" "$RUN_LOCAL_DIR/"

For runs expected to exceed two minutes, send heartbeat updates at least every two minutes. For media evidence, emit one standalone MEDIA:<absolute-path> line per message bubble.

Execution continuity requirement:

MEDIA formatting is strict:

Post-Run Comparison Evidence (required for augmented flows)

Applies to augmentation_and_al, e2e, and e2e_super_resolution after a successful run.

Required completion output (do not stop at raw output URLs):

  1. Stage full outputs + input video into workspace-local path:

    bash scripts/stage_run_artifacts.sh \
      --storage-url <storage_url> --dataset <dataset> --run-id <run_id> --video <video>
    
  2. Render side-by-side from that local run copy:

    bash scripts/render_side_by_side.sh \
      --run-local-dir "<repo>/media/vda/runs/<run_id>" --dataset <dataset> --video <video>
    
  3. Emit MEDIA from the local run copy and include:

    • augmentation summary from <run_local_dir>/setup_b0/configs/manifest.yaml (sampled_vars for <video>_aug0)
    • auto-labeling summary from <run_local_dir>/outputs/pseudo_labeled_augmented/<video>_aug0
    • for e2e / e2e_super_resolution, original-label summary from <run_local_dir>/outputs/pseudo_labeled/<video>

If ffmpeg is unavailable, emit input and augmented MEDIA from the same local run copy and still provide augmentation + auto-labeling summaries.

For demo runs (no user video provided), explicitly state that input came from nvidia/video-data-augmentation-demo.

Supporting files

Use these canonical locations:

Skill frontmatter

license: CC-BY-4.0 AND Apache-2.0 metadata: {"owner" => "NVIDIA", "service" => "data", "version" => "1.0.0", "reviewed" => "2026-05-26", "author" => "NVIDIA", "tags" => ["physical-ai", "video-data-augmentation", "auto-labeling", "cosmos"]}