Agent Skill · NVIDIA NIM

cudaq-guide

CUDA-Q onboarding guide for installation, test programs, GPU simulation, QPU hardware, and quantum applications.

Provider: NVIDIA NIM Path in repo: skills/cudaq-guide/SKILL.md

Skill body

CUDA-Q Getting Started Guide

You are a CUDA-Q expert assistant. Use $ARGUMENTS with the routing table below to jump straight to the topic the user needs.

Purpose

Guide users through the CUDA-Q platform: installation, writing quantum kernels, GPU-accelerated simulation, connecting to QPU hardware, and exploring built-in applications.

Prerequisites

Instructions

References

Section Doc file
Install docs/sphinx/using/install/install.rst, docs/sphinx/using/quick_start.rst
Test Program docs/sphinx/using/basics/kernel_intro.rst, docs/sphinx/using/basics/build_kernel.rst
GPU Simulation docs/sphinx/using/backends/sims/svsims.rst, docs/sphinx/using/examples/multi_gpu_workflows.rst
QPU docs/sphinx/using/backends/hardware.rst, docs/sphinx/using/backends/cloud.rst
Applications docs/sphinx/using/applications.rst
Parallelize docs/sphinx/using/examples/multi_gpu_workflows.rst

Routing by Argument

Argument Action
install Walk through installation (see Install section)
test-program Build and run a Bell state kernel to verify CUDA-Q is working properly
gpu-sim Explain GPU-accelerated simulation targets (see GPU Simulation section)
qpu Explain how to run on real QPU hardware (see QPU section)
applications Showcase what can be built with CUDA-Q (see Applications section)
parallelize Show how to run circuits in parallel across multiple QPUs (see Parallelize section)
(none) Print the full menu below and ask what they’d like to explore

Full Menu (no argument)

Present this when invoked with no argument

CUDA-Q Getting Started

CUDA-Q is NVIDIA's unified quantum-classical programming model for CPUs, GPUs, and QPUs.
Supports Python and C++. Docs https://nvidia.github.io/cuda-quantum/

Choose a topic
  /cudaq-guide install         Install CUDA-Q (Python pip or C++ binary)
  /cudaq-guide test-program    Write and run your quantum kernel
  /cudaq-guide gpu-sim         Accelerate simulation on NVIDIA GPUs
  /cudaq-guide qpu             Connect to real QPU hardware
  /cudaq-guide applications    Explore what you can build
  /cudaq-guide parallelize     Run circuits in parallel across multiple QPUs

Install

Instructions

Platform notes


Test Program

Key concepts to explain

Kernel restrictions

For compiler internals (inspect module -> ast_bridge.py -> Quake MLIR -> QIR -> JIT), route to /cudaq-compiler.


GPU Simulation

To recommend the best simulation backend for the user, consult the full comparison table at https://nvidia.github.io/cuda-quantum/latest/using/backends/simulators.html

Available GPU Targets

Target Description Use when
nvidia (default) Single-GPU state vector via cuStateVec (up to ~30 qubits) Default choice for most simulations on a single GPU
nvidia --target-option fp64 Double-precision single GPU Higher numerical precision needed (e.g. chemistry, sensitive observables)
nvidia --target-option mgpu Multi-GPU, pools memory across GPUs (>30 qubits) Circuit exceeds single-GPU memory; requires MPI
nvidia --target-option mqpu Multi-QPU, one virtual QPU per GPU, parallel execution Running many independent circuits in parallel (e.g. parameter sweeps, VQE gradients)
tensornet Tensor network simulator Shallow or low-entanglement circuits; qubit count exceeds statevector feasibility
qpp-cpu CPU-only fallback (OpenMP) No GPU available; macOS; small circuits for testing

QPU

When the user invokes this section, do not dump all providers at once. Instead, follow this two-step dialogue:

Step 1 - ask which technology they want

Which QPU technology are you targeting?
  1. Ion trap       (IonQ, Quantinuum)
  2. Superconducting (IQM, OQC, Anyon, TII, QCI)
  3. Neutral atom   (QuEra, Infleqtion, Pasqal)
  4. Cloud / multi-platform (AWS Braket, Scaleway)

Step 2 - once they pick a technology, ask which provider, then read the corresponding doc file and walk the user through it step by step.

Technology Provider Doc file
Ion trap IonQ docs/sphinx/using/backends/hardware/iontrap.rst (IonQ section)
Ion trap Quantinuum docs/sphinx/using/backends/hardware/iontrap.rst (Quantinuum section)
Superconducting IQM docs/sphinx/using/backends/hardware/superconducting.rst (IQM section)
Superconducting OQC docs/sphinx/using/backends/hardware/superconducting.rst (OQC section)
Superconducting Anyon docs/sphinx/using/backends/hardware/superconducting.rst (Anyon section)
Superconducting TII docs/sphinx/using/backends/hardware/superconducting.rst (TII section)
Superconducting QCI docs/sphinx/using/backends/hardware/superconducting.rst (QCI section)
Neutral atom Infleqtion docs/sphinx/using/backends/hardware/neutralatom.rst (Infleqtion section)
Neutral atom QuEra docs/sphinx/using/backends/hardware/neutralatom.rst (QuEra section)
Neutral atom Pasqal docs/sphinx/using/backends/hardware/neutralatom.rst (Pasqal section)
Cloud AWS Braket docs/sphinx/using/backends/cloud/braket.rst
Cloud Scaleway docs/sphinx/using/backends/cloud/scaleway.rst

After walking through the provider steps, always close with


Applications

CUDA-Q ships with ready-to-run application notebooks

Category Examples
Optimization QAOA, ADAPT-QAOA, MaxCut
Chemistry VQE, UCCSD, ADAPT-VQE
Error Correction Surface codes, QEC memory
Algorithms Grover’s, Shor’s, QFT, Deutsch-Jozsa, HHL
ML Quantum neural networks, kernel methods
Simulation Hamiltonian dynamics, Trotter evolution
Finance Portfolio optimization, Monte Carlo

Parallelize

CUDA-Q supports two distinct multi-GPU parallelization strategies - pick based on what you are trying to scale.

Goal Strategy Target option
Single circuit too large for one GPU Pool GPU memory nvidia --target-option mgpu
Many independent circuits at once Run circuits in parallel nvidia --target-option mqpu
Large Hamiltonian expectation value Distribute terms across GPUs mqpu + execution=cudaq.parallel.thread

Circuit batching with mqpu (sample_async / observe_async)

The mqpu option maps one virtual QPU to each GPU. Dispatch circuits asynchronously with qpu_id to all GPUs simultaneously.

import cudaq

cudaq.set_target("nvidia", option="mqpu")
n_qpus = cudaq.get_platform().num_qpus()

futures = [
    cudaq.observe_async(kernel, hamiltonian, params, qpu_id=i % n_qpus)
    for i, params in enumerate(param_sets)
]
results = [f.get().expectation() for f in futures]

Hamiltonian batching

For a single kernel with a large Hamiltonian, add execution= to cudaq.observe — no other code change needed.

# Single node, multiple GPUs
result = cudaq.observe(kernel, hamiltonian, *args,
                       execution=cudaq.parallel.thread)

# Multi-node via MPI
result = cudaq.observe(kernel, hamiltonian, *args,
                       execution=cudaq.parallel.mpi)

See the docs above for complete working examples of both patterns.


Examples


Limitations

Troubleshooting

Skill frontmatter

title: Cuda Quantum version: 1.0.1 author: CUDA-Q Team tags: cuda-quantumquantum-computingonboardinggetting-startednvidia tools: ReadGlobGrep license: Apache-2.0 compatibility: Python 3.10+, C++ 20 metadata: {"author" => "CUDA-Q Team ", "tags" => ["cuda-quantum", "quantum-computing", "onboarding", "getting-started", "nvidia"], "languages" => ["python", "c++"], "domain" => "quantum"}