Agent Skill · NVIDIA NIM

jetson-diagnostic

Read-only Jetson health snapshot for identity, memory, GPU, thermal, power, storage, services, and top processes.

Provider: NVIDIA NIM Path in repo: skills/jetson-diagnostic/SKILL.md

Skill body

Jetson Diagnostic

A unified, agent-friendly view of a running Jetson device. Replaces the need to remember which of tegrastats, jtop, procrank, /sys/kernel/debug/nvmap, nvpmodel, free, swapon, df, and systemctl list-units produces which slice of the truth.

Purpose

Capture a read-only health snapshot from the Jetson host so agents can answer device identity, memory, GPU, thermal, power, storage, and service-state questions using live data instead of guesses.

When to use

Activate when the user asks:

Do not use this skill to change power modes, drop caches, stop services, install packages, serve models, or tune inference flags. Report the observed state, then hand off to the action-oriented skill.

Prerequisites

Available Scripts

Script Purpose Arguments
scripts/snapshot.sh Emits the all-in-one JSON snapshot for identity, memory, GPU, thermal, power, disk, top processes, and candidate services. --human, --tegra-secs N, --top-procs N.
scripts/mem_summary.sh Emits a compact human-readable RAM/GPU/swap summary. --short, --watch, --interval N.
scripts/detect_jetson.sh Exports or prints canonical Jetson SKU/generation/product-line fields for this repo. No arguments.

If your agent runtime supports run_script, use it to run scripts/snapshot.sh or scripts/mem_summary.sh and summarize the returned output. Otherwise run the scripts with bash from the repository root.

Instructions

  1. Run scripts/snapshot.sh for the all-in-one JSON view (preferred default).
  2. For a quick human-readable memory line, run scripts/mem_summary.sh.
  3. To explain a single tegrastats line the user has pasted, see references/tegrastats-fields.md.
  4. To explain the NvMap clients output, see references/nvmap-clients.md.

Reporting guidance

Run the matching helper script before summarizing device state, and report only fields returned by that script. If direct execution is blocked by the runtime, run it with bash {baseDir}/scripts/<script-name> rather than trying to chmod files.

If your agent runtime does not automatically execute helper scripts relative to this skill directory, resolve script paths with the AgentSkills {baseDir} placeholder:

{baseDir}/scripts/snapshot.sh
{baseDir}/scripts/mem_summary.sh

Do not call jetson-diagnostic as a tool name unless the runtime explicitly registers skills as callable tools; Agent Skills are normally instructions plus files, not direct tool functions.

All scripts source the canonical platform detector at skills/jetson-diagnostic/scripts/detect_jetson.sh (exports JETSON_SKU, JETSON_GENERATION, JETSON_PRODUCT_LINE, JETSON_VARIANT, JETSON_MEM_GB, JETSON_L4T_VERSION, JETSON_PRODUCT_MODEL). Other skills may source this detector rather than duplicating Jetson identification logic. Exits 2 with a remediation message off-platform.

Limitations

Error handling

Output contract for snapshot.sh

{
  "sku": "orin-nano",
  "generation": "orin",
  "product_line": "orin-nano",
  "variant": "orin-nano-8gb",
  "mem_total_gb": 8,
  "l4t_version": "36.4.0",
  "product_model": "nvidia jetson orin nano developer kit",
  "memory_kb": { "total": 8123456, "available": 4123456, "swap_total": 0, "swap_free": 0, "cached": 1234567 },
  "tegrastats_sample": "RAM 4011/8138MB (lfb 8x4MB) ...",
  "thermal_c": { "CPU": 52.3, "GPU": 49.0, "AO": 47.0 },
  "power": { "nvpmodel_id": 0, "nvpmodel_name": "MAXN" },
  "disk": [ { "mount": "/", "used_pct": 41 } ],
  "gpu_source": "nvmap:iovmm-clients",
  "gpu_devices": [],
  "gpu_processes": [],
  "nvmap": {
    "readable": true,
    "total_kb": 654321,
    "stats_total_bytes": 669985280,
    "top_clients": [ { "pid": 1234, "cmd": "vlm-server", "kb": 524288 } ]
  },
  "top_processes": [ { "pid": 4321, "cmd": "vllm", "pss_kb": 4000000 } ],
  "candidate_services": { "gdm3": { "active": "inactive", "enabled": "disabled" } }
}

gpu_source names the specific datum the skill used to attribute per-process GPU memory, so the caller can tell exactly what the numbers represent:

The agent should present the salient parts back to the user (SKU, available memory, top GPU consumer per gpu_source, hottest zone, power mode) and offer to drill into specifics (top_processes, gpu_processes / nvmap, services).

Safety

This skill is read-only. It does not change nvpmodel, does not run jetson_clocks, does not modify services. To act on findings, hand off to:

Cross-platform behavior

Family Variants the skill recognises tegrastats nvidia-smi nvpmodel NvMap debugfs
Jetson Thor thor-t5000, thor-t4000 yes yes (full) yes yes (root)
Jetson AGX Orin orin-agx-64gb, orin-agx-32gb, orin-agx-industrial yes yes (stub, nvgpu)* yes yes (root)
Jetson Orin NX orin-nx-16gb, orin-nx-8gb yes yes (stub, nvgpu)* yes yes (root)
Jetson Orin Nano orin-nano-8gb, orin-nano-4gb yes yes (stub, nvgpu)* yes yes (root)

* On Jetsons whose GPU is driven by the in-tree nvgpu kernel driver, the nvidia-smi binary is present but most fields (Memory-Usage, power, utilisation, compute-process table) report Not Supported / N/A. To decide which source to trust at runtime, the script does a capability probe — nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits — and only uses nvidia-smi for per-process GPU memory when that query returns a real integer. When it doesn’t, the script falls back to /sys/kernel/debug/nvmap/iovmm/clients, which on nvgpu-stack Jetsons is the authoritative per-process GPU-memory source.

The script handles each tool’s presence gracefully and reports null / false for tools it cannot reach (typical when the agent isn’t running with the privilege needed for /sys/kernel/debug). Variant detection uses the /proc/device-tree/model string first (recognising names like T5000 / T4000) and falls back to memory-size heuristics when the model string is generic.

Skill frontmatter

version: 0.0.1 license: Apache-2.0 metadata: {"author" => "Jetson Team", "tags" => ["jetson", "diagnostic", "telemetry"], "languages" => ["bash"], "data-classification" => "public"}