cupynumeric-install
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesecuPyNumeric Install (user)
cuPyNumeric 安装(用户版)
Purpose
用途
Use this skill to install cuPyNumeric for use from Python and to verify the install actually works (including GPU usage). Apply it whenever a user wants cuPyNumeric running via conda or pip. Do not use it to build from source (to modify or contribute) — that is out of scope.
本技能用于指导用户从Python环境安装cuPyNumeric并验证安装是否生效(包括GPU使用情况)。当用户希望通过conda或pip运行cuPyNumeric时可使用本指南。请勿用于源码构建(用于修改或贡献代码)——这不在本指南的讨论范围内。
Mandatory rules
强制规则
- Never run installs. Do not run ,
pip install, or any installer. Print the command; let the user run it.conda install - Always isolate. No installs into base conda, system Python, or shared global envs.
- Detect before recommending. Read-only checks are fine.
--version
- 绝不执行安装操作:不要运行、
pip install或任何安装命令。只需打印命令,让用户自行执行。conda install - 始终隔离环境:不要安装到base conda环境、系统Python环境或共享全局环境中。
- 先检测再推荐:仅允许执行只读的版本检查。
--version
Prerequisites
前置条件
Confirm these system requirements before recommending any install:
- GPU: Compute Capability ≥ 7.0 (Volta+). CPU-only also supported.
- CUDA: 12.2+.
- OS: Linux (x86_64 / aarch64), macOS aarch64 (pip wheels only), Windows via WSL.
- Python: 3.11 through 3.14 on Linux; 3.11 through 3.13 on macOS aarch64.
- conda: ≥ 24.1 (conda path only).
- Package manager: conda (upstream-recommended) or pip. If neither is present, bootstrap one first (see Instructions).
在推荐任何安装方式前,请确认系统满足以下要求:
- GPU:计算能力≥7.0(Volta及以上架构)。同时支持仅CPU环境。
- CUDA:版本12.2及以上。
- 操作系统:Linux(x86_64 / aarch64架构)、macOS aarch64架构(仅支持pip安装包)、通过WSL运行的Windows系统。
- Python:Linux系统需3.11至3.14版本;macOS aarch64架构需3.11至3.13版本。
- conda:版本≥24.1(仅conda安装路径要求)。
- 包管理器:优先使用conda(官方推荐)或pip。如果两者都未安装,请先安装其中一个(详见安装步骤)。
Instructions
安装步骤
Follow these steps in order: confirm the prerequisites, ask the scoping questions, install via the chosen path, then verify.
请按以下顺序操作:确认前置条件、询问范围问题、选择对应路径安装、最后验证安装。
Ask before installing
安装前询问
- Package manager? Check and
conda --version. Prefer conda (upstream-recommended); fall back to pip.pip --version - Env target? GPU machine, CPU-only laptop, cloud, container, or remote/server.
- CUDA version? Ask only when forcing the GPU variant on a host without a visible GPU. Check with /
nvidia-smi.nvcc --version
- 使用哪种包管理器? 检查和
conda --version。优先推荐conda(官方推荐);若无conda则使用pip。pip --version - 目标环境? GPU机器、仅CPU笔记本、云环境、容器还是远程服务器。
- CUDA版本? 仅当在无可见GPU的主机上强制安装GPU版本时需要询问。可通过/
nvidia-smi命令检查。nvcc --version
Bootstrap — install a package manager first
引导步骤——先安装包管理器
If neither nor is available, install one. Provide the command and the docs link; do not run it — requires user trust.
condapipcurl | bash如果和都不可用,请先安装其中一个。仅提供命令和文档链接,不要自行执行——需要用户信任。
condapipcurl | bashRecommended: Miniforge (full conda, conda-forge default)
推荐:Miniforge(完整conda环境,默认使用conda-forge源)
bash
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash "Miniforge3-$(uname)-$(uname -m).sh"bash
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash "Miniforge3-$(uname)-$(uname -m).sh"Alternative: Python + pip
替代方案:Python + pip
Install Python from your OS package manager (apt/dnf/brew) or https://www.python.org/downloads/. If pip is missing on an existing Python: .
python -m ensurepip --upgradeAfter installing, open a new shell so the binary is on PATH.
从操作系统包管理器(apt/dnf/brew)或https://www.python.org/downloads/安装Python。如果现有Python环境缺少pip,执行:`python -m ensurepip --upgrade`。
安装完成后,打开新的终端窗口,确保二进制文件已添加到PATH环境变量中。
Install — conda path
安装——conda路径
bash
conda create -n cupynumeric -c conda-forge -c legate cupynumeric
conda activate cupynumericInto an existing env: .
conda install -c conda-forge -c legate cupynumericconda auto-selects the GPU vs CPU variant from whether works at install time. To override that, see below.
nvidia-smibash
conda create -n cupynumeric -c conda-forge -c legate cupynumeric
conda activate cupynumeric安装到现有环境:。
conda install -c conda-forge -c legate cupynumericconda会根据安装时是否可用自动选择GPU或CPU版本。如需手动指定,请参考下文。
nvidia-smiForce the GPU variant
强制安装GPU版本
Set only when no GPU is visible at install time (e.g. building a container for a GPU host). Use the runtime host's CUDA version:
CONDA_OVERRIDE_CUDAbash
CONDA_OVERRIDE_CUDA="12.2" conda install -c conda-forge -c legate cupynumeric仅当安装时主机无可见GPU(例如为GPU主机构建容器)时,设置环境变量。请使用运行时主机的CUDA版本:
CONDA_OVERRIDE_CUDAbash
CONDA_OVERRIDE_CUDA="12.2" conda install -c conda-forge -c legate cupynumericNightly (less validated)
Nightly版本(验证程度较低)
bash
conda install -c conda-forge -c legate-nightly cupynumericbash
conda install -c conda-forge -c legate-nightly cupynumericInstall — pip path
安装——pip路径
bash
python -m venv .venv
source .venv/bin/activate
pip install nvidia-cupynumericbash
python -m venv .venv
source .venv/bin/activate
pip install nvidia-cupynumericVerify
验证安装
Smoke test (always run)
冒烟测试(必须执行)
Run a self-contained script through the launcher — no repo checkout needed.
legatebash
TMP=$(mktemp -d)
cat > "$TMP/smoke.py" <<'EOF'
import cupynumeric as np
a = np.arange(10)
b = np.ones((4, 4))
print("sum:", a.sum()) # expect 45
print("matmul:", (b @ b).sum()) # expect 64.0
EOF
legate "$TMP/smoke.py"
rm -rf "$TMP"Expect and . If is missing, the env is not activated — see Troubleshooting.
sum: 45matmul: 64.0legate通过启动器运行一个独立脚本——无需检出代码仓库。
legatebash
TMP=$(mktemp -d)
cat > "$TMP/smoke.py" <<'EOF'
import cupynumeric as np
a = np.arange(10)
b = np.ones((4, 4))
print("sum:", a.sum()) # 预期输出45
print("matmul:", (b @ b).sum()) # 预期输出64.0
EOF
legate "$TMP/smoke.py"
rm -rf "$TMP"预期输出和。如果命令不存在,说明环境未激活——请参考故障排除部分。
sum: 45matmul: 64.0legateGPU usage check (mandatory when a supported GPU is present)
GPU使用情况检查(当存在支持的GPU时必须执行)
A passing smoke test does not prove GPU usage — a CPU-variant install on a GPU box produces correct results too. Run both steps.
1. Force a GPU launch. requests N GPUs; fails fast if no GPU is visible or the CPU variant is installed.
legate --gpus Nbash
TMP=$(mktemp -d)
cat > "$TMP/check.py" <<'EOF'
import cupynumeric as np
print(np.ones((4096, 4096)).sum())
EOF
legate --gpus 1 "$TMP/check.py"
rm -rf "$TMP"Expect . If you see , , or , the CPU variant is installed; reinstall with .
16777216.0CUDA driverlibcudartno GPUs availableCONDA_OVERRIDE_CUDA2. Confirm the GPU was touched. Run a deadline-bounded matmul loop alongside , all from one shell — no second-terminal race:
nvidia-smibash
TMPDIR_GPU=$(mktemp -d)
SCRIPT="$TMPDIR_GPU/cupynumeric_gpu_check.py"
cat > "$SCRIPT" <<'EOF'
import cupynumeric as np, time
a = np.ones((10000, 10000))
deadline = time.time() + 20
iters = 0
while time.time() < deadline:
b = a @ a
_ = float(b.sum()) # force sync so the matmul actually runs
iters += 1
print("iters:", iters)
EOF
legate --gpus 1 "$SCRIPT" &
WORKLOAD=$!
sleep 5 # buffer for Legate startup
for _ in $(seq 10); do # 10 samples at 1s — covers slow startup
nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv,noheader
sleep 1
done
wait "$WORKLOAD"
rm -rf "$TMPDIR_GPU"Expect in the GiB range across most samples and non-trivial in several. If both stay at baseline across every sample, the GPU variant is not installed — check for (not ).
memory.usedutilization.gpuconda list cupynumeric*_gpu*_cpu冒烟测试通过并不代表GPU正在被使用——在GPU机器上安装CPU版本也能得到正确结果。请执行以下两个步骤。
1. 强制以GPU模式启动。请求使用N块GPU;如果无可见GPU或安装的是CPU版本,会快速失败。
legate --gpus Nbash
TMP=$(mktemp -d)
cat > "$TMP/check.py" <<'EOF'
import cupynumeric as np
print(np.ones((4096, 4096)).sum())
EOF
legate --gpus 1 "$TMP/check.py"
rm -rf "$TMP"预期输出。如果出现、或错误,说明安装的是CPU版本;请使用参数重新安装。
16777216.0CUDA driverlibcudartno GPUs availableCONDA_OVERRIDE_CUDA2. 确认GPU已被使用。在同一个终端中,同时运行一个限时的矩阵乘法循环和命令——避免在第二个终端运行导致的时序问题:
nvidia-smibash
TMPDIR_GPU=$(mktemp -d)
SCRIPT="$TMPDIR_GPU/cupynumeric_gpu_check.py"
cat > "$SCRIPT" <<'EOF'
import cupynumeric as np, time
a = np.ones((10000, 10000))
deadline = time.time() + 20
iters = 0
while time.time() < deadline:
b = a @ a
_ = float(b.sum()) # 强制同步,确保矩阵乘法实际执行
iters += 1
print("iters:", iters)
EOF
legate --gpus 1 "$SCRIPT" &
WORKLOAD=$!
sleep 5 # 为Legate启动预留缓冲时间
for _ in $(seq 10); do # 10次采样,每次间隔1秒——覆盖启动缓慢的情况
nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv,noheader
sleep 1
done
wait "$WORKLOAD"
rm -rf "$TMPDIR_GPU"预期大部分采样结果中处于GiB级别,且多个采样结果中有明显数值。如果所有采样结果都处于基线水平,说明未安装GPU版本——请检查的输出是否包含(而非)。
memory.usedutilization.gpuconda list cupynumeric*_gpu*_cpuDeeper recipes
进阶验证方法
See verification_examples.md for multi-GPU checks, CPU fallback, container, and troubleshooting.
请查看verification_examples.md获取多GPU检查、CPU fallback、容器环境及故障排除的相关内容。
Limitations
限制条件
- Don't mix conda and pip in one env. Mixing overrides the first install and breaks at import. To switch, run or
pip uninstall nvidia-cupynumericfirst.conda remove cupynumeric - Use the launcher for multi-GPU / multi-rank runs. Plain
legateruns single-process:python.legate --gpus 2 script.py - Force the GPU variant on a CPU-only host with . conda otherwise auto-selects the CPU or GPU variant from
CONDA_OVERRIDE_CUDAat install time.nvidia-smi - Require Volta or newer. Pascal (GTX 10xx / P100) is unsupported.
- Verify ≥ 24.1. Older releases silently break variant selection.
conda --version - Treat multi-node / MPI / UCX as out of scope. Defer to https://docs.nvidia.com/legate/latest/networking-wheels.html and https://docs.nvidia.com/legate/latest/mpi-wrapper.html.
- 不要在同一个环境中混合使用conda和pip:混合安装会覆盖首次安装的内容,导致导入时出错。如需切换安装方式,请先执行或
pip uninstall nvidia-cupynumeric。conda remove cupynumeric - 多GPU/多进程运行请使用启动器:直接使用
legate运行仅支持单进程:python。legate --gpus 2 script.py - 在仅CPU主机上强制安装GPU版本需设置:否则conda会根据安装时
CONDA_OVERRIDE_CUDA的可用性自动选择CPU或GPU版本。nvidia-smi - 要求Volta或更新架构:Pascal架构(GTX 10xx / P100)不被支持。
- 确认≥24.1:旧版本会导致版本选择功能失效。
conda --version - 多节点/MPI/UCX不在讨论范围内:请参考https://docs.nvidia.com/legate/latest/networking-wheels.html和https://docs.nvidia.com/legate/latest/mpi-wrapper.html。
Troubleshooting
故障排除
- → Run
ModuleNotFoundError: No module named 'cupynumeric'andwhich python(orpip list | grep cupynumeric) from the same shell to find the env mismatch.conda list | grep cupynumeric - mentioning CUDA /
ImportError→ Reinstall withlibcudart; the CPU variant is on a GPU box, or CUDA versions are mismatched.CONDA_OVERRIDE_CUDA="<your-cuda-version>" - → Activate the env, then run
legate: command not foundto confirm.which legate - Slower than NumPy on a laptop → Expect this for small problems (Legate per-task overhead). See the cuPyNumeric FAQ.
- → 在同一个终端中执行
ModuleNotFoundError: No module named 'cupynumeric'和which python(或pip list | grep cupynumeric),检查是否存在环境不匹配的问题。conda list | grep cupynumeric - 提到CUDA /
ImportError→ 使用libcudart重新安装;可能是在GPU机器上安装了CPU版本,或者CUDA版本不匹配。CONDA_OVERRIDE_CUDA="<你的CUDA版本>" - → 激活环境后,执行
legate: command not found确认命令是否存在。which legate - 在笔记本电脑上运行速度比NumPy慢 → 对于小任务,这是正常现象(Legate存在单任务开销)。请查看cuPyNumeric常见问题解答。
See also
相关链接
- references/verification_examples.md — verification + troubleshooting recipes.
- Upstream docs: https://docs.nvidia.com/cupynumeric/latest/installation.html
- Legate requirements: https://docs.nvidia.com/legate/latest/installation.html
- references/verification_examples.md — 验证及故障排除方法。
- 官方文档:https://docs.nvidia.com/cupynumeric/latest/installation.html
- Legate要求:https://docs.nvidia.com/legate/latest/installation.html