Agent Skill · NVIDIA NIM

cupynumeric-install

Install and verify cuPyNumeric for Python — requirements, commands, verification. Source builds are out of scope.

Provider: NVIDIA NIM Path in repo: skills/cupynumeric-install/SKILL.md

Skill body

cuPyNumeric Install (user)

Purpose

Use this skill to install cuPyNumeric for use from Python and to verify the install actually works (including GPU usage). Apply it whenever a user wants cuPyNumeric running via conda or pip. Do not use it to build from source (to modify or contribute) — that is out of scope.

Mandatory rules

Prerequisites

Confirm these system requirements before recommending any install:

Instructions

Follow these steps in order: confirm the prerequisites, ask the scoping questions, install via the chosen path, then verify.

Ask before installing

  1. Package manager? Check conda --version and pip --version. Prefer conda (upstream-recommended); fall back to pip.
  2. Env target? GPU machine, CPU-only laptop, cloud, container, or remote/server.
  3. CUDA version? Ask only when forcing the GPU variant on a host without a visible GPU. Check with nvidia-smi / nvcc --version.

Bootstrap — install a package manager first

If neither conda nor pip is available, install one. Provide the command and the docs link; do not run itcurl | bash requires user trust.

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash "Miniforge3-$(uname)-$(uname -m).sh"

Docs: https://github.com/conda-forge/miniforge

Alternative: Python + pip

Install Python from your OS package manager (apt/dnf/brew) or https://www.python.org/downloads/. If pip is missing on an existing Python: python -m ensurepip --upgrade.

After installing, open a new shell so the binary is on PATH.

Install — conda path

conda create -n cupynumeric -c conda-forge -c legate cupynumeric
conda activate cupynumeric

Into an existing env: conda install -c conda-forge -c legate cupynumeric.

conda auto-selects the GPU vs CPU variant from whether nvidia-smi works at install time. To override that, see below.

Force the GPU variant

Set CONDA_OVERRIDE_CUDA only when no GPU is visible at install time (e.g. building a container for a GPU host). Use the runtime host’s CUDA version:

CONDA_OVERRIDE_CUDA="12.2" conda install -c conda-forge -c legate cupynumeric

Nightly (less validated)

conda install -c conda-forge -c legate-nightly cupynumeric

Install — pip path

python -m venv .venv
source .venv/bin/activate
pip install nvidia-cupynumeric

Verify

Smoke test (always run)

Run a self-contained script through the legate launcher — no repo checkout needed.

TMP=$(mktemp -d)
cat > "$TMP/smoke.py" <<'EOF'
import cupynumeric as np
a = np.arange(10)
b = np.ones((4, 4))
print("sum:", a.sum())            # expect 45
print("matmul:", (b @ b).sum())   # expect 64.0
EOF
legate "$TMP/smoke.py"
rm -rf "$TMP"

Expect sum: 45 and matmul: 64.0. If legate is missing, the env is not activated — see Troubleshooting.

GPU usage check (mandatory when a supported GPU is present)

A passing smoke test does not prove GPU usage — a CPU-variant install on a GPU box produces correct results too. Run both steps.

1. Force a GPU launch. legate --gpus N requests N GPUs; fails fast if no GPU is visible or the CPU variant is installed.

TMP=$(mktemp -d)
cat > "$TMP/check.py" <<'EOF'
import cupynumeric as np
print(np.ones((4096, 4096)).sum())
EOF
legate --gpus 1 "$TMP/check.py"
rm -rf "$TMP"

Expect 16777216.0. If you see CUDA driver, libcudart, or no GPUs available, the CPU variant is installed; reinstall with CONDA_OVERRIDE_CUDA.

2. Confirm the GPU was touched. Run a deadline-bounded matmul loop alongside nvidia-smi, all from one shell — no second-terminal race:

TMPDIR_GPU=$(mktemp -d)
SCRIPT="$TMPDIR_GPU/cupynumeric_gpu_check.py"
cat > "$SCRIPT" <<'EOF'
import cupynumeric as np, time
a = np.ones((10000, 10000))
deadline = time.time() + 20
iters = 0
while time.time() < deadline:
    b = a @ a
    _ = float(b.sum())   # force sync so the matmul actually runs
    iters += 1
print("iters:", iters)
EOF
legate --gpus 1 "$SCRIPT" &
WORKLOAD=$!
sleep 5                                     # buffer for Legate startup
for _ in $(seq 10); do                      # 10 samples at 1s — covers slow startup
  nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv,noheader
  sleep 1
done
wait "$WORKLOAD"
rm -rf "$TMPDIR_GPU"

Expect memory.used in the GiB range across most samples and non-trivial utilization.gpu in several. If both stay at baseline across every sample, the GPU variant is not installed — check conda list cupynumeric for *_gpu (not *_cpu).

Deeper recipes

See verification_examples.md for multi-GPU checks, CPU fallback, container, and troubleshooting.

Limitations

Troubleshooting

See also

Skill frontmatter

license: CC-BY-4.0 OR Apache-2.0 compatibility: linux-x86_64, linux-aarch64, darwin-aarch64, wsl-x86_64 metadata: {"author" => "NVIDIA Corporation ", "version" => "2.0.0", "tags" => ["cupynumeric", "legate", "numpy", "installation", "conda", "gpu", "distributed-computing"], "upstream" => "https://github.com/nv-legate/cupynumeric", "docs" => "https://docs.nvidia.com/cupynumeric/latest/installation.html"}