cuPyNumeric Install (user)
Purpose
Use this skill to install cuPyNumeric for use from Python and to verify the install actually works (including GPU usage). Apply it whenever a user wants cuPyNumeric running via conda or pip. Do not use it to build from source (to modify or contribute) — that is out of scope.
Mandatory rules
- Never run installs. Do not run , , or any installer. Print the command; let the user run it.
- Always isolate. No installs into base conda, system Python, or shared global envs.
- Detect before recommending. Read-only checks are fine.
Prerequisites
Confirm these system requirements before recommending any install:
- GPU: Compute Capability ≥ 7.0 (Volta+). CPU-only also supported.
- CUDA: 12.2+.
- OS: Linux (x86_64 / aarch64), macOS aarch64 (pip wheels only), Windows via WSL.
- Python: 3.11 through 3.14 on Linux; 3.11 through 3.13 on macOS aarch64.
- conda: ≥ 24.1 (conda path only).
- Package manager: conda (upstream-recommended) or pip. If neither is present, bootstrap one first (see Instructions).
Instructions
Follow these steps in order: confirm the prerequisites, ask the scoping questions, install via the chosen path, then verify.
Ask before installing
- Package manager? Check and . Prefer conda (upstream-recommended); fall back to pip.
- Env target? GPU machine, CPU-only laptop, cloud, container, or remote/server.
- CUDA version? Ask only when forcing the GPU variant on a host without a visible GPU. Check with / .
Bootstrap — install a package manager first
If neither
nor
is available, install one.
Provide the command and the docs link; do not run it —
requires user trust.
Recommended: Miniforge (full conda, conda-forge default)
bash
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash "Miniforge3-$(uname)-$(uname -m).sh"
Alternative: Python + pip
Install Python from your OS package manager (apt/dnf/brew) or
https://www.python.org/downloads/. If pip is missing on an existing Python:
python -m ensurepip --upgrade
.
After installing, open a new shell so the binary is on PATH.
Install — conda path
bash
conda create -n cupynumeric -c conda-forge -c legate cupynumeric
conda activate cupynumeric
Into an existing env:
conda install -c conda-forge -c legate cupynumeric
.
conda auto-selects the GPU vs CPU variant from whether
works at install time. To override that, see below.
Force the GPU variant
Set
only when no GPU is visible at install time (e.g. building a container for a GPU host). Use the runtime host's CUDA version:
bash
CONDA_OVERRIDE_CUDA="12.2" conda install -c conda-forge -c legate cupynumeric
Nightly (less validated)
bash
conda install -c conda-forge -c legate-nightly cupynumeric
Install — pip path
bash
python -m venv .venv
source .venv/bin/activate
pip install nvidia-cupynumeric
Verify
Smoke test (always run)
Run a self-contained script through the
launcher — no repo checkout needed.
bash
TMP=$(mktemp -d)
cat > "$TMP/smoke.py" <<'EOF'
import cupynumeric as np
a = np.arange(10)
b = np.ones((4, 4))
print("sum:", a.sum()) # expect 45
print("matmul:", (b @ b).sum()) # expect 64.0
EOF
legate "$TMP/smoke.py"
rm -rf "$TMP"
Expect
and
. If
is missing, the env is not activated — see Troubleshooting.
GPU usage check (mandatory when a supported GPU is present)
A passing smoke test does not prove GPU usage — a CPU-variant install on a GPU box produces correct results too. Run both steps.
1. Force a GPU launch. requests N GPUs; fails fast if no GPU is visible or the CPU variant is installed.
bash
TMP=$(mktemp -d)
cat > "$TMP/check.py" <<'EOF'
import cupynumeric as np
print(np.ones((4096, 4096)).sum())
EOF
legate --gpus 1 "$TMP/check.py"
rm -rf "$TMP"
Expect
. If you see
,
, or
, the CPU variant is installed; reinstall with
.
2. Confirm the GPU was touched. Run a deadline-bounded matmul loop alongside
, all from one shell — no second-terminal race:
bash
TMPDIR_GPU=$(mktemp -d)
SCRIPT="$TMPDIR_GPU/cupynumeric_gpu_check.py"
cat > "$SCRIPT" <<'EOF'
import cupynumeric as np, time
a = np.ones((10000, 10000))
deadline = time.time() + 20
iters = 0
while time.time() < deadline:
b = a @ a
_ = float(b.sum()) # force sync so the matmul actually runs
iters += 1
print("iters:", iters)
EOF
legate --gpus 1 "$SCRIPT" &
WORKLOAD=$!
sleep 5 # buffer for Legate startup
for _ in $(seq 10); do # 10 samples at 1s — covers slow startup
nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv,noheader
sleep 1
done
wait "$WORKLOAD"
rm -rf "$TMPDIR_GPU"
Expect
in the GiB range across most samples and non-trivial
in several. If both stay at baseline across every sample, the GPU variant is not installed — check
for
(not
).
Deeper recipes
See verification_examples.md for multi-GPU checks, CPU fallback, container, and troubleshooting.
Limitations
- Don't mix conda and pip in one env. Mixing overrides the first install and breaks at import. To switch, run
pip uninstall nvidia-cupynumeric
or first.
- Use the launcher for multi-GPU / multi-rank runs. Plain runs single-process:
legate --gpus 2 script.py
.
- Force the GPU variant on a CPU-only host with . conda otherwise auto-selects the CPU or GPU variant from at install time.
- Require Volta or newer. Pascal (GTX 10xx / P100) is unsupported.
- Verify ≥ 24.1. Older releases silently break variant selection.
- Treat multi-node / MPI / UCX as out of scope. Defer to https://docs.nvidia.com/legate/latest/networking-wheels.html and https://docs.nvidia.com/legate/latest/mpi-wrapper.html.
Troubleshooting
ModuleNotFoundError: No module named 'cupynumeric'
→ Run and pip list | grep cupynumeric
(or conda list | grep cupynumeric
) from the same shell to find the env mismatch.
- mentioning CUDA / → Reinstall with
CONDA_OVERRIDE_CUDA="<your-cuda-version>"
; the CPU variant is on a GPU box, or CUDA versions are mismatched.
legate: command not found
→ Activate the env, then run to confirm.
- Slower than NumPy on a laptop → Expect this for small problems (Legate per-task overhead). See the cuPyNumeric FAQ.
See also
- references/verification_examples.md — verification + troubleshooting recipes.
- Upstream docs: https://docs.nvidia.com/cupynumeric/latest/installation.html
- Legate requirements: https://docs.nvidia.com/legate/latest/installation.html