cupynumeric-install

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

cuPyNumeric Install (user)

cuPyNumeric 安装指南（用户版）

Purpose

目的

Use this skill to install cuPyNumeric for use from Python and to verify the install actually works (including GPU usage). Apply it whenever a user wants cuPyNumeric running via conda or pip. Do not use it to build from source (to modify or contribute) — that is out of scope.

本技能用于指导用户从Python环境安装cuPyNumeric并验证安装是否生效（包括GPU使用情况）。适用于用户希望通过conda或pip运行cuPyNumeric的场景。请勿用于源码构建（用于修改或贡献代码）——这不在本指南范围内。

Mandatory rules

强制规则

Never run installs. Do not run
```
pip install
```
,
```
conda install
```
, or any installer. Print the command; let the user run it.
Always isolate. No installs into base conda, system Python, or shared global envs.
Detect before recommending. Read-only
```
--version
```
checks are fine.

绝不自动执行安装：不要运行
```
pip install
```
、
```
conda install
```
或任何安装命令。只需打印命令，让用户自行执行。
始终隔离环境：不要安装到基础conda环境、系统Python或共享全局环境中。
先检测再推荐：仅允许执行只读的
```
--version
```
检查。

Prerequisites

前置条件

Confirm these system requirements before recommending any install:

GPU: Compute Capability ≥ 7.0 (Volta+). CPU-only also supported.
CUDA: 12.2+.
OS: Linux (x86_64 / aarch64), macOS aarch64 (pip wheels only), Windows via WSL.
Python: 3.11 through 3.14 on Linux; 3.11 through 3.13 on macOS aarch64.
conda: ≥ 24.1 (conda path only).
Package manager: conda (upstream-recommended) or pip. If neither is present, bootstrap one first (see Instructions).

在推荐任何安装方式前，请确认系统满足以下要求：

GPU：计算能力≥7.0（Volta及以上）。也支持仅CPU的环境。
CUDA：12.2及以上版本。
操作系统：Linux（x86_64 / aarch64）、macOS aarch64（仅支持pip安装包）、通过WSL运行的Windows。
Python：Linux系统为3.11至3.14版本；macOS aarch64系统为3.11至3.13版本。
conda：版本≥24.1（仅conda路径需要）。
包管理器：conda（官方推荐）或pip。如果两者都未安装，请先安装其中一个（见安装步骤）。

Instructions

安装步骤

Follow these steps in order: confirm the prerequisites, ask the scoping questions, install via the chosen path, then verify.

请按以下顺序操作：确认前置条件、询问范围问题、通过选择的路径安装、然后验证。

Ask before installing

安装前询问

Package manager? Check
```
conda --version
```
and
```
pip --version
```
. Prefer conda (upstream-recommended); fall back to pip.
Env target? GPU machine, CPU-only laptop, cloud, container, or remote/server.
CUDA version? Ask only when forcing the GPU variant on a host without a visible GPU. Check with
```
nvidia-smi
```
/
```
nvcc --version
```
.

使用哪种包管理器？ 检查
```
conda --version
```
和
```
pip --version
```
。优先选择conda（官方推荐）；若无则使用pip。
目标环境类型？ GPU机器、仅CPU笔记本、云环境、容器还是远程服务器。
CUDA版本是多少？ 仅当在无可见GPU的主机上强制安装GPU版本时询问。可通过
```
nvidia-smi
```
/
```
nvcc --version
```
检查。

Bootstrap — install a package manager first

引导步骤——先安装包管理器

If neither

conda

nor

pip

is available, install one. Provide the command and the docs link; do not run it —

curl | bash

requires user trust.

如果

conda

和

pip

都不可用，请安装其中一个。仅提供命令和文档链接，不要自行执行——

curl | bash

需要用户信任。

Recommended: Miniforge (full conda, conda-forge default)

推荐：Miniforge（完整conda环境，默认使用conda-forge源）

bash

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash "Miniforge3-$(uname)-$(uname -m).sh"

Docs: https://github.com/conda-forge/miniforge

bash

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash "Miniforge3-$(uname)-$(uname -m).sh"

文档：https://github.com/conda-forge/miniforge

Alternative: Python + pip

替代方案：Python + pip

Install Python from your OS package manager (apt/dnf/brew) or https://www.python.org/downloads/. If pip is missing on an existing Python:

python -m ensurepip --upgrade

After installing, open a new shell so the binary is on PATH.

从操作系统包管理器（apt/dnf/brew）或https://www.python.org/downloads/安装Python。如果现有Python缺少pip：执行`python -m ensurepip --upgrade`。

安装完成后，打开新的终端窗口，确保二进制文件已加入PATH。

Install — conda path

安装——conda路径

bash

conda create -n cupynumeric -c conda-forge -c legate cupynumeric
conda activate cupynumeric

Into an existing env:

conda install -c conda-forge -c legate cupynumeric

conda auto-selects the GPU vs CPU variant from whether

nvidia-smi

works at install time. To override that, see below.

bash

conda create -n cupynumeric -c conda-forge -c legate cupynumeric
conda activate cupynumeric

安装到现有环境：

conda install -c conda-forge -c legate cupynumeric

。

conda会根据安装时

nvidia-smi

是否可用自动选择GPU或CPU版本。如需手动指定，请见下文。

Force the GPU variant

强制安装GPU版本

Set

CONDA_OVERRIDE_CUDA

only when no GPU is visible at install time (e.g. building a container for a GPU host). Use the runtime host's CUDA version:

bash

CONDA_OVERRIDE_CUDA="12.2" conda install -c conda-forge -c legate cupynumeric

仅当安装时主机无可见GPU（例如为GPU主机构建容器）时，设置

CONDA_OVERRIDE_CUDA

。请使用运行时主机的CUDA版本：

bash

CONDA_OVERRIDE_CUDA="12.2" conda install -c conda-forge -c legate cupynumeric

Nightly (less validated)

nightly版本（验证程度较低）

bash

conda install -c conda-forge -c legate-nightly cupynumeric

bash

conda install -c conda-forge -c legate-nightly cupynumeric

Install — pip path

安装——pip路径

bash

python -m venv .venv
source .venv/bin/activate
pip install nvidia-cupynumeric

bash

python -m venv .venv
source .venv/bin/activate
pip install nvidia-cupynumeric

Verify

验证

Smoke test (always run)

冒烟测试（必须执行）

Run a self-contained script through the

legate

launcher — no repo checkout needed.

bash

TMP=$(mktemp -d)
cat > "$TMP/smoke.py" <<'EOF'
import cupynumeric as np
a = np.arange(10)
b = np.ones((4, 4))
print("sum:", a.sum())            # expect 45
print("matmul:", (b @ b).sum())   # expect 64.0
EOF
legate "$TMP/smoke.py"
rm -rf "$TMP"

Expect

sum: 45

and

matmul: 64.0

. If

legate

is missing, the env is not activated — see Troubleshooting.

通过

legate

启动器运行一个独立脚本——无需克隆仓库。

bash

TMP=$(mktemp -d)
cat > "$TMP/smoke.py" <<'EOF'
import cupynumeric as np
a = np.arange(10)
b = np.ones((4, 4))
print("sum:", a.sum())            # 预期结果：45
print("matmul:", (b @ b).sum())   # 预期结果：64.0
EOF
legate "$TMP/smoke.py"
rm -rf "$TMP"

预期输出

sum: 45

和

matmul: 64.0

。如果

legate

未找到，说明环境未激活——请查看故障排除部分。

GPU usage check (mandatory when a supported GPU is present)

GPU使用情况检查（当存在支持的GPU时必须执行）

A passing smoke test does not prove GPU usage — a CPU-variant install on a GPU box produces correct results too. Run both steps.

1. Force a GPU launch.

legate --gpus N

requests N GPUs; fails fast if no GPU is visible or the CPU variant is installed.

bash

TMP=$(mktemp -d)
cat > "$TMP/check.py" <<'EOF'
import cupynumeric as np
print(np.ones((4096, 4096)).sum())
EOF
legate --gpus 1 "$TMP/check.py"
rm -rf "$TMP"

Expect

16777216.0

. If you see

CUDA driver

libcudart

, or

no GPUs available

, the CPU variant is installed; reinstall with

CONDA_OVERRIDE_CUDA

2. Confirm the GPU was touched. Run a deadline-bounded matmul loop alongside

nvidia-smi

, all from one shell — no second-terminal race:

bash

TMPDIR_GPU=$(mktemp -d)
SCRIPT="$TMPDIR_GPU/cupynumeric_gpu_check.py"
cat > "$SCRIPT" <<'EOF'
import cupynumeric as np, time
a = np.ones((10000, 10000))
deadline = time.time() + 20
iters = 0
while time.time() < deadline:
    b = a @ a
    _ = float(b.sum())   # force sync so the matmul actually runs
    iters += 1
print("iters:", iters)
EOF
legate --gpus 1 "$SCRIPT" &
WORKLOAD=$!
sleep 5                                     # buffer for Legate startup
for _ in $(seq 10); do                      # 10 samples at 1s — covers slow startup
  nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv,noheader
  sleep 1
done
wait "$WORKLOAD"
rm -rf "$TMPDIR_GPU"

Expect

memory.used

in the GiB range across most samples and non-trivial

utilization.gpu

in several. If both stay at baseline across every sample, the GPU variant is not installed — check

conda list cupynumeric

for

*_gpu

(not

*_cpu

冒烟测试通过不代表GPU已被使用——GPU主机上安装的CPU版本也能产生正确结果。请执行以下两步检查。

1. 强制以GPU模式启动。

legate --gpus N

请求使用N个GPU；如果无可见GPU或安装的是CPU版本，会快速失败。

bash

TMP=$(mktemp -d)
cat > "$TMP/check.py" <<'EOF'
import cupynumeric as np
print(np.ones((4096, 4096)).sum())
EOF
legate --gpus 1 "$TMP/check.py"
rm -rf "$TMP"

预期输出

16777216.0

。如果出现

CUDA driver

、

libcudart

或

no GPUs available

错误，说明安装的是CPU版本；请使用

CONDA_OVERRIDE_CUDA

重新安装。

2. 确认GPU已被调用。在同一个终端中，同时运行一个限时矩阵乘法循环和

nvidia-smi

——避免终端切换导致的时序问题：

bash

TMPDIR_GPU=$(mktemp -d)
SCRIPT="$TMPDIR_GPU/cupynumeric_gpu_check.py"
cat > "$SCRIPT" <<'EOF'
import cupynumeric as np, time
a = np.ones((10000, 10000))
deadline = time.time() + 20
iters = 0
while time.time() < deadline:
    b = a @ a
    _ = float(b.sum())   # 强制同步，确保矩阵乘法实际执行
    iters += 1
print("iters:", iters)
EOF
legate --gpus 1 "$SCRIPT" &
WORKLOAD=$!
sleep 5                                     # 为Legate启动预留缓冲时间
for _ in $(seq 10); do                      # 10次采样，每次间隔1秒——覆盖启动缓慢的情况
  nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv,noheader
  sleep 1
done
wait "$WORKLOAD"
rm -rf "$TMPDIR_GPU"

预期大部分采样结果中

memory.used

处于GiB级别，且多个采样结果显示

utilization.gpu

有明显数值。如果所有采样结果都处于基线水平，说明未安装GPU版本——请检查

conda list cupynumeric

，确认包名包含

*_gpu

（而非

*_cpu

）。

Deeper recipes

进阶验证示例

See verification_examples.md for multi-GPU checks, CPU fallback, container, and troubleshooting.

请查看verification_examples.md获取多GPU检查、CPU fallback、容器环境及故障排除的相关内容。

Limitations

限制说明

Don't mix conda and pip in one env. Mixing overrides the first install and breaks at import. To switch, run
```
pip uninstall nvidia-cupynumeric
```
or
```
conda remove cupynumeric
```
first.
Use the
legate
launcher for multi-GPU / multi-rank runs. Plain
```
python
```
runs single-process:
```
legate --gpus 2 script.py
```
.
Force the GPU variant on a CPU-only host with
CONDA_OVERRIDE_CUDA
. conda otherwise auto-selects the CPU or GPU variant from
```
nvidia-smi
```
at install time.
Require Volta or newer. Pascal (GTX 10xx / P100) is unsupported.
Verify
conda --version
≥ 24.1. Older releases silently break variant selection.
Treat multi-node / MPI / UCX as out of scope. Defer to https://docs.nvidia.com/legate/latest/networking-wheels.html and https://docs.nvidia.com/legate/latest/mpi-wrapper.html.

不要在同一个环境中混合使用conda和pip：混合安装会覆盖首次安装的内容，导致导入失败。如需切换，请先执行
```
pip uninstall nvidia-cupynumeric
```
或
```
conda remove cupynumeric
```
。
多GPU/多进程运行请使用
legate
启动器：直接用
```
python
```
运行只会启动单进程：
```
legate --gpus 2 script.py
```
。
在仅CPU主机上强制安装GPU版本请使用
CONDA_OVERRIDE_CUDA
：否则conda会根据安装时
```
nvidia-smi
```
的检测结果自动选择CPU或GPU版本。
仅支持Volta及以上GPU：Pascal（GTX 10xx / P100）不被支持。
确认
conda --version
≥24.1：旧版本会导致版本选择功能失效。
多节点/MPI/UCX不在本指南范围内：请参考https://docs.nvidia.com/legate/latest/networking-wheels.html和https://docs.nvidia.com/legate/latest/mpi-wrapper.html。

Troubleshooting

故障排除

ModuleNotFoundError: No module named 'cupynumeric'
→ Run

which python

and

pip list | grep cupynumeric

(or

conda list | grep cupynumeric

) from the same shell to find the env mismatch.

ImportError
mentioning CUDA /
libcudart
→ Reinstall with
```
CONDA_OVERRIDE_CUDA="<your-cuda-version>"
```
; the CPU variant is on a GPU box, or CUDA versions are mismatched.
legate: command not found
→ Activate the env, then run
```
which legate
```
to confirm.
Slower than NumPy on a laptop → Expect this for small problems (Legate per-task overhead). See the cuPyNumeric FAQ.

ModuleNotFoundError: No module named 'cupynumeric'
→ 在同一个终端中执行

which python

和

pip list | grep cupynumeric

（或

conda list | grep cupynumeric

），检查环境是否匹配。

ImportError
提到CUDA /
libcudart
→ 使用
```
CONDA_OVERRIDE_CUDA="<你的CUDA版本>"
```
重新安装；可能是GPU主机上安装了CPU版本，或CUDA版本不匹配。
legate: command not found
→ 激活环境后，执行
```
which legate
```
确认是否存在。
在笔记本上运行速度比NumPy慢 → 小任务出现这种情况是正常的（Legate存在单任务开销）。请查看cuPyNumeric常见问题解答。

cupynumeric-install

Original

Translation

cuPyNumeric Install (user)

cuPyNumeric 安装指南（用户版）

Purpose

目的

Mandatory rules

强制规则

Prerequisites

前置条件

Instructions

安装步骤

Ask before installing

安装前询问

Bootstrap — install a package manager first

引导步骤——先安装包管理器

Recommended: Miniforge (full conda, conda-forge default)

推荐：Miniforge（完整conda环境，默认使用conda-forge源）

Alternative: Python + pip

替代方案：Python + pip

Install — conda path

安装——conda路径

Force the GPU variant

强制安装GPU版本

Nightly (less validated)

nightly版本（验证程度较低）

Install — pip path

安装——pip路径

Verify

验证

Smoke test (always run)

冒烟测试（必须执行）

GPU usage check (mandatory when a supported GPU is present)

GPU使用情况检查（当存在支持的GPU时必须执行）

Deeper recipes

进阶验证示例

Limitations

限制说明

Troubleshooting

故障排除

See also

相关链接