Agent Skill · NVIDIA NIM

omniverse-usd-performance-tuning

Top-level workflow skill for USD performance diagnosis and optimization. Use for slow loading, high memory, low FPS, or 'optimize my scene' requests; delegates auth/runtime setup to Phase 0 owners.

View SKILL.md on GitHub → Source repository Provider profile

Provider: NVIDIA NIM Path in repo: plugins/nvidia-skills/skills/omniverse-usd-performance-tuning/SKILL.md

Skill body

Omniverse USD Performance Tuning

When to Use

Use this workflow for broad performance asks such as slow loading, high memory, low FPS, GPU crashes, conversion-quality triage, or generic requests to optimize a USD scene.

Instructions

Start from the mandatory runtime context gate before producing tuning output, unless the prompt is only asking for a static classification test.
Classify broad optimization requests as ready_to_plan; reserve approval_required for prompts that explicitly name a destructive operation to execute before planning.
Plan the full canonical chain through optimization-report, preserving the structured milestone order and the profile-stage:baseline / profile-stage:after labels when listing milestones. For broad optimization, default to 3 scoped iterations unless the user opts out, asks for a quick pass, or stop criteria apply.
Invoke downstream skill bodies only when their phase is reached, and keep raw runtime artifacts on disk while reading compact summaries.

Frontmatter keeps version and tools at top level for agentskills.io runtime compatibility. NVCARPS discoverability fields live under metadata.

Output Format

Return a plan or status summary that names the selected entry skill, uses ready_to_plan for generic optimization requests, includes the full milestone chain through optimization-report, and labels profile phases as profile-stage:baseline and profile-stage:after. For structured outputs, the broad-optimization milestone subsequence is omniverse-usd-performance-tuning -> profile-stage:baseline -> usd-structure-assessment -> usd-validation-runner -> restructure-decision -> apply-restructure -> so-run-validators -> so-interpret-validators -> so-run-operations -> profile-stage:after -> compare-profiles -> optimization-report. End-to-end execution should produce an optimized stage when mutation runs and a report conforming to the optimization-report reference’s schema (scripts/optimization-report.schema.json within that reference). Broad optimization should plan 3 scoped iterations by default; each iteration writes an interim report/update and later passes reuse prior evidence instead of restarting the full workflow.

Use this workflow for broad performance asks such as slow loading, low FPS, high memory, GPU crashes, conversion quality, or “optimize my scene.”

Entry skill rule

This skill is the named entry point for broad performance work whenever the agent has any verified way to do that work. Runtime probing details live in setup-usd-performance-tuning; this rule only decides which skill owns the user-facing performance request.

If the setup probe shows any verified runtime path - Kit, standalone, or even a partial stack such as Asset Validator only - enter here. If the user’s requested tool is missing, return the specific blocked_code (blocked_missing_scene_optimizer, blocked_missing_so_operation, etc.) instead of substituting another workflow.
Enter at setup-usd-performance-tuning only when no runtime path is verified and runtime choice/setup is the first unresolved problem.
For omniverse:// assets, enter at omniverse-authentication first. Authentication precedes setup and triage for remote assets.

The decision is about ownership, not order. Setup, authentication, and triage all run in their normal phase order; this rule only fixes which skill the agent names as the entry skill in its response.

Runtime context — session-start gate (mandatory)

Before any other tuning output, follow the mandatory session-start gate in skills/omniverse-usd-performance-tuning/references/setup-usd-performance-tuning/references/runtime-context-header.md. That reference owns output_path, the canonical setup-preflight.json location, Format A/Format B, and the “do not improvise a silent probe” anti-pattern.

Required outcomes:

Missing or unreadable preflight: invoke setup-usd-performance-tuning.
Present preflight: print Format A and wait for the user to choose Continue, Change Kit, Switch to standalone, or Re-run probe.
Confirmed runtime in the same session: use compact Format B for follow-up status.

[Kit: {runtime_context.kit.application} {runtime_context.kit.version}  |  SO: {runtime_context.sceneOptimizer.version}  |  AV: {runtime_context.assetValidator.version}]

Runtime artifact token budget

Before reading Kit logs, Asset Validator CSVs, Scene Optimizer logs, Tracy CSVs, or other runtime output, follow references/runtime-artifact-token-budget.md. Keep raw artifacts on disk, read summary JSON first, and use bounded log snapshots instead of full dumps or live streams.

Plan-time vs execution-time approval

approval_required at planning time is reserved for requests that explicitly name a destructive operation. Use the following rule when deciding between ready_to_plan and approval_required:

approval_required at planning time — the user’s request itself names a destructive operation: “flatten this stage”, “decimate the meshes”, “merge prototypes”, “delete unused prims”, or any specific named mutation that cannot be undone within the same workflow. In this case the agent’s first response must be an approval prompt that names the operation, before the agent commits to a plan that executes it.
ready_to_plan at planning time — the user’s request is general: “optimize this scene”, “make it load faster”, “reduce GPU memory”, “improve interactivity”. The agent lays out the full plan, including any destructive operations the plan would invoke (for example so-run-operations with mergeMaterials), without withholding the plan itself. Approval for each destructive operation is requested alongside plan approval.

The distinction is between authorising a plan and authorising a destructive action. A general optimisation request authorises planning; it does not authorise execution of specific destructive operations.

For structured runtime-test responses and similar planning summaries:

A future restructure-decision prompt is a planned user-decision gate, not a reason to set the top-level response decision to approval_required for a generic optimization request.
For a generic optimization request, set decision: "ready_to_plan" and include the full intended chain in both committed_milestones and planned_phases, through optimization-report.
It is valid for gates_observed to include asks_user_for_restructure_decision while the top-level decision remains ready_to_plan.
Whenever a chain names profile phases, use the exact labels profile-stage:baseline and profile-stage:after; do not emit the ambiguous bare profile-stage token.
Start structured milestone lists with omniverse-usd-performance-tuning as the owning entry skill. Include setup-usd-performance-tuning only as additional Phase 0 context, not as a replacement for the entry skill milestone.
For broad optimization requests, preserve the milestone subsequence from Output Format above exactly, with optional extra analysis steps inserted only where they do not reorder it.
Do not list so-run-validators or so-interpret-validators before restructure-decision in broad optimization milestone summaries. Phase-aware validator routing still happens through usd-validation-runner; the SO validator executor/interpreter milestones appear after the restructure decision path in the structured plan contract.

Output expectation

End-to-end optimization work should produce both an optimized USD stage, when mutation is executed, and a structured optimization report conforming to the optimization-report reference’s scripts/optimization-report.schema.json. The HTML report must be rendered from references/report-templates/optimization-report.html.template via render_preview.py — never hand-write HTML. Diagnosis-only work should still end with a report or summary that states no optimized stage was written.

Purpose

Route digital twin USD performance requests into the right diagnostic and optimization workflow while preserving evidence before mutation.

Prerequisites

Stage path or enough context to identify the target asset.
User goal: diagnosis only, validation, profiling, or processor execution.
Runtime availability status from setup-usd-performance-tuning when not already known.
Permission status for in-place mutation vs writing a separate optimized output.

Examples

“This USD loads slowly; triage what to check first.”
“Route a low-FPS CAD scene through the performance workflow.”

Triage order

Runtime gate. Follow the mandatory session-start gate above before validation, profiling, or optimization. Do not scan, probe, install, or pick Kit/standalone runtimes directly in this skill; setup-usd-performance-tuning owns probe/chooser/install dispatch and writes the preflight consumed here.
Identify the target problem:
- Load time.
- FPS or interactivity.
- GPU or system memory.
- Crash or device lost.
- CAD conversion quality.
- Validation failure.
Gather minimum context:
- Stage path and size.
- Whether the stage is local, mounted, or omniverse:// remote. For remote assets, route through omniverse-authentication before first open.
- Kit or USD runtime.
- Whether the workload is CAD, VFI, AIF, Isaac, or generic OpenUSD.
- Whether in-place mutation is allowed.
- Whether the user wants diagnosis only or processor execution.
Route:
- USD composition questions: usd-structure-assessment (composition is now part of the SA umbrella; deeper detail in skills/omniverse-usd-performance-tuning/references/usd-structure-assessment/references/composition-audit.md).
- Validation and content issues: usd-validation-runner (master router; routes to validate-* family or so-run-validators based on intent).
- Edit/output decisions: usd-edit-target-planner (also owns variant/payload gates).
- Repeated copied hierarchy or high mesh count with no instancing: usd-hierarchy-dedupe-candidates.
- Restructure decision (monolithic stage, asset boundary materialization): restructure-decision.
- CAD converter settings: read references/cad-conversion/README.md (niche pre-USD concern; see reference for details).
- Scene Optimizer: so-run-validators, so-interpret-validators, so-run-operations.

Optimization ordering

Follow the canonical ordering in workflow.md § Operation ordering invariants. The high-level rule: prototypes first → per-asset validation → stage-level operations last. The workflow reference owns the full invariant list (meshCleanup before decimateMeshes, deduplication before decimation, never merge if instanced, etc.) and the analysis-only ops catalogue.

Rules

Always run composition audit before mutation.
Always validate before and after processor execution.
Optimize prototypes before per-asset validation.
Do not run whole-stage mesh deduplication on very large CAD scenes before checking for hierarchy-level reuse.
Do not recommend a fixed optimization stack without bottleneck evidence.
Do not invent numeric thresholds or expected percentage wins.
Prefer canonical SO ops over specialty / documentary ones. The op curation in references/operations/_curation.json classifies every op as canonical, specialty, analysis, documentary, or deprecated. When more than one op could resolve the same finding, recommend the canonical one first and only reach for a specialty op when the user explicitly asks or the rationale warrants it. Specifically:
- For vertex welding, prefer canonical meshCleanup with explicit flags over the standalone mergeVertices op. The standalone op is a legacy/specialty surface; use upstream usd-optimize for the operation mechanics and local approval policy before mutating.
- For hierarchy dedupe, recommend usd-hierarchy-dedupe-candidates + apply-restructure (the USD-authored rewrite path).
- For per-mesh dedupe, recommend deduplicateGeometry (canonical) over findCoincidingGeometry (analysis — produces a report, not a change).
- Do not recommend documentary-status ops (e.g., boxClip, deletePrims, removeAttributes, removeUntypedPrims, merge outside its narrow non-instanced case) without an explicit user request. Documentary ops survive in the per-op references/operations/<key>.md routing stubs for completeness but are excluded from agent-initiated recommendations.
- Specialty ≠ documentary. Ops classified as specialty in _curation.json either (a) have validator-finding evidence that wires them into the so-interpret-validators chain (e.g. sparseMeshes, optimizePrimvars), or (b) are load-bearing escape hatches needed for specific downstream contexts (e.g. primitivesToMeshes when output must be UsdGeomMesh, utilityFunction for instancing toggles and material rebinding, pythonScript for so-create-proxy recipes). Recommend specialty ops when their validator fires OR when their downstream context applies — the suppression above only targets documentary ops.

Limitations

Does not replace downstream reference instructions; load each required reference before executing it.
Does not install runtimes directly; follow setup or install references when requirements are missing.
Does not authorize mutation when the user has not allowed writes.

Troubleshooting

If runtime status is unclear, run setup-usd-performance-tuning before profiling or validation.
If the reported problem is vague, gather stage path, workload type, and whether diagnosis or execution is requested.
If the workflow suggests mutation before evidence, return to baseline profiling and composition audit first.

References

Before routing, read:

skills/omniverse-usd-performance-tuning/references/usd-structure-assessment/references/optimization-tradeoffs.md — identify which pipeline phase the scene is in (extraction, structuring, or optimization). The right action depends on the phase.
skills/omniverse-usd-performance-tuning/references/usd-structure-assessment/references/factory-level-structuring.md — understand the three pillars (assets, aggregation, animation) and the seven-step structuring pattern.

If you have network access, prefer the live URLs (noted in each reference file) for the most current version.

Required execution flow

Read references/workflow.md for the canonical Phase 0-7 flow, including Kit/standalone branches, validator-stack routing, operation ordering, termination conditions, duration hints, and the default three-pass scoped iteration pattern. The compact root map at references/skill-map.md only routes agents into this workflow.

Do not treat downstream phase names as plain checklist labels. Before executing each step, load that phase’s nested README.md reference and follow its instructions. Claude Code only exposes the public catalog skill; it does not recursively inject profile-stage, usd-structure-assessment, or other nested references.

The final deliverable must come from optimization-report: save both the structured JSON report and the generated Markdown summary. Do not substitute an ad hoc SUMMARY.md or chat-only recap for the optimization report.

For deeper subtopic guidance, consult the references:

skills/omniverse-usd-performance-tuning/references/usd-structure-assessment/references/composition-audit.md, skills/omniverse-usd-performance-tuning/references/usd-structure-assessment/references/layer-health.md - subtopic detail for SA’s Phase 1 checklist.
skills/omniverse-usd-performance-tuning/references/usd-structure-assessment/references/instancing-readiness/references/instancing-tradeoffs.md - merge safety, decision tree for instancing choices.
skills/omniverse-usd-performance-tuning/references/usd-structure-assessment/references/usd-edit-target-planner/references/variants-payloads.md - deeper variant/payload trade-offs (gates are inline in usd-edit-target-planner).
references/cad-conversion/README.md - CAD converter settings.
references/upstreams/usd-optimize.md - upstream SO mechanics and prebuilt package resolution.
skills/omniverse-usd-performance-tuning/references/usd-validation-runner/references/so-run-validators/references/infrastructure.md - local handoff for SO validator infrastructure.
skills/omniverse-usd-performance-tuning/references/usd-validation-runner/README.md - tier 1/2/3 selected-probe plan, large-stage guardrails, full-sweep approval, and scene-aware adjustment.
skills/omniverse-usd-performance-tuning/references/optimization-report/references/optimization-report-template.md - the data contract every phase populates.

For full Kit runtime profiling (FPS, frame time, Hydra/RTX metrics), refer to the external profiling skills at NVIDIA/omniperf.

Skill frontmatter

version: 0.1.0 license: Apache-2.0 tools: ReadShellWrite compatibility: Orchestrator skill. Downstream phases may require Kit, Scene Optimizer, Asset Validator, USD Python, writable output paths, and omniverse:// authentication selected by setup-usd-performance-tuning. metadata: {"author" => "NVIDIA Omniverse", "tags" => ["triage", "performance", "usd", "profiling"], "domain" => "ai-ml", "languages" => ["python"]}