Loading...
Loading...
Compare original and translation side by side
legate.io.hdf5.h5.hdf5assets/legatelegate.io.hdf5.h5.hdf5assets/legate.h5.hdf5.h5.hdf5legate.io.hdf5legate.io.hdf5.npznp.loadcn.asarray(...)legate.io.hdf5cupynumeric.load.npywith h5py.File(path, "r") as f: arr = f["dataset"][:]legate.io.hdf5legate.io.hdf5.npznp.loadcn.asarray(...)legate.io.hdf5cupynumeric.load.npywith h5py.File(path, "r") as f: arr = f["dataset"][:]legate.io.hdf5conda install -c conda-forge h5py # required; legate/io/hdf5.py imports it at loadfrom legate.io.hdf5 import ...ModuleNotFoundErrorh5pylegate.io.hdf5conda install -c conda-forge h5py # required; legate/io/hdf5.py imports it at loadfrom legate.io.hdf5 import ...ModuleNotFoundError| Function | Signature | Purpose |
|---|---|---|
| | Write a cuPyNumeric array / |
| | Read one HDF5 dataset into a distributed array. |
| | Read a dataset in chunks — chunks the file read, not the assembled array. |
legate.io.hdf5dataset_name"/data""/group/x"| 函数 | 签名 | 用途 |
|---|---|---|
| | 将cuPyNumeric数组/ |
| | 将单个HDF5数据集读取为分布式数组。 |
| | 分块读取数据集——对文件读取进行分块,而非对组装后的数组分块。 |
legate.io.hdf5dataset_name"/data""/group/x"import cupynumeric as cn
from legate.core import get_legate_runtime
from legate.io.hdf5 import from_file, to_file
a = cn.arange(64, dtype=cn.float32).reshape(8, 8)import cupynumeric as cn
from legate.core import get_legate_runtime
from legate.io.hdf5 import from_file, to_file
a = cn.arange(64, dtype=cn.float32).reshape(8, 8)
Run `assets/hdf5_roundtrip.py` to verify (optional — not needed to answer).
可运行`assets/hdf5_roundtrip.py`进行验证(可选——回答问题无需执行)。from_file_batchedLogicalArraychunk_size=2chunk_sizeoutimport h5py
import cupynumeric as cn
from legate.core import get_legate_runtime
from legate.io.hdf5 import from_file_batched
with h5py.File("big.h5", "r") as f: # read shape/dtype without loading data
shape, dtype = f["data"].shape, f["data"].dtype
out = cn.empty(shape, dtype=dtype)
for chunk, (r0, c0) in from_file_batched("big.h5", "data", chunk_size=(4096, 4096)):
out[r0:r0 + chunk.shape[0], c0:c0 + chunk.shape[1]] = cn.asarray(chunk)
get_legate_runtime().issue_execution_fence(block=True)chunk_sizefrom_file_batchedValueErrorassets/hdf5_batched_read.pyfrom_file_batchedLogicalArraychunk_size=2chunk_sizeoutimport h5py
import cupynumeric as cn
from legate.core import get_legate_runtime
from legate.io.hdf5 import from_file_batched
with h5py.File("big.h5", "r") as f: # read shape/dtype without loading data
shape, dtype = f["data"].shape, f["data"].dtype
out = cn.empty(shape, dtype=dtype)
for chunk, (r0, c0) in from_file_batched("big.h5", "data", chunk_size=(4096, 4096)):
out[r0:r0 + chunk.shape[0], c0:c0 + chunk.shape[1]] = cn.asarray(chunk)
get_legate_runtime().issue_execution_fence(block=True)chunk_sizefrom_file_batchedValueErrorassets/hdf5_batched_read.pyto_file__legate_data_interface__to_fileLogicalArrayLikenp.array(...)cn.asarray(...)from_filefrom_file_batchedLogicalArraycn.asarray(la)to_fileget_legate_runtime().issue_execution_fence(block=True)from_filecd /tmpsys.pathcupynumeric/ModuleNotFoundError: cupynumeric.install_infopathto_filefrom_filepathtempfile.mkstemp()to_fileh5pyto_file__legate_data_interface__to_fileLogicalArrayLikenp.array(...)cn.asarray(...)from_filefrom_file_batchedLogicalArraycn.asarray(la)to_fileget_legate_runtime().issue_execution_fence(block=True)from_filecd /tmpsys.pathcupynumeric/ModuleNotFoundError: cupynumeric.install_infopathto_filefrom_filepathtempfile.mkstemp()to_fileh5pyto_fileto_fileto_filepathto_filepath/path/to/file.h5ValueErrorto_fileValueErrorcreate_array(dtype, ndim=n)LogicalArrayto_filepathto_filepath/path/to/file.h5ValueErrorto_fileValueErrorcreate_array(dtype, ndim=n)LogicalArrayLEGATE_IO_USE_VFD_GDS=1export LEGATE_IO_USE_VFD_GDS=1 # set before launchingLEGATE_IO_USE_VFD_GDS=1export LEGATE_IO_USE_VFD_GDS=1 # set before launching
- **Read into the GPU through the GDS VFD, not the default path.** The default (POSIX) VFD stages each GPU read through zero-copy memory (ZCMEM), of which Legate reserves only 128 MB — so a GPU read of an array larger than ~128 MB aborts. The GDS VFD removes that staging buffer.
- **Leave it unset when reading into host (CPU) memory** — the VFD GDS plugin is unnecessary there and only adds overhead.
- **Keep `=1` even without GPUDirect-capable storage** — cuFile falls back to compatibility mode automatically (set `export CUFILE_ALLOW_COMPAT_MODE=true` if it is not already on), and `=1` still avoids the ZCMEM abort.
- **Attribute it correctly:** the GDS VFD is the [nv-legate/vfd-gds](https://github.com/nv-legate/vfd-gds) plugin over NVIDIA [cuFile](https://developer.nvidia.com/gpudirect-storage), **not** KvikIO (KvikIO backs Legate's Zarr/tile I/O, not HDF5). Confirm it engaged by grepping the run log for `H5FD__gds_open: Successfully opened file w/GDS VFD`.
- **通过GDS VFD将数据读取到GPU,而非默认路径**。默认(POSIX)VFD会将每个GPU读取操作通过零拷贝内存(ZCMEM)中转,而Legate仅预留了128 MB的ZCMEM——因此读取大于约128 MB的GPU数组会导致程序终止。GDS VFD会移除该中转缓冲区。
- **当读取到主机(CPU)内存时,请不要设置该环境变量**——VFD GDS插件在此场景下是不必要的,只会增加开销。
- **即使没有支持GPUDirect的存储,也请保持设置为`=1`**——cuFile会自动回退到兼容模式(如果尚未开启,请设置`export CUFILE_ALLOW_COMPAT_MODE=true`),且设置`=1`仍可避免ZCMEM导致的程序终止。
- **正确区分相关组件**:GDS VFD是基于NVIDIA [cuFile](https://developer.nvidia.com/gpudirect-storage)的[nv-legate/vfd-gds](https://github.com/nv-legate/vfd-gds)插件,**而非**KvikIO(KvikIO为Legate的Zarr/分块I/O提供支持,而非HDF5)。可通过在运行日志中搜索`H5FD__gds_open: Successfully opened file w/GDS VFD`来确认其是否已启用。| Symptom | Cause and fix |
|---|---|
| h5py is missing — |
File looks empty/truncated to h5py right after | The async write hasn't landed — add |
| |
| Running inside the source tree — |
| Abort/crash reading a GPU array ≳128 MB | Default 128 MB ZCMEM staging buffer — set |
| Expected — wrap it with |
| 症状 | 原因与解决方法 |
|---|---|
导入时出现 | 缺少h5py——执行 |
在 | 异步写入尚未完成——在外部读取前添加 |
| |
出现 | 在源码树内运行——切换到 |
| 读取大于约128 MB的GPU数组时程序终止/崩溃 | 默认的128 MB ZCMEM中转缓冲区限制——读取GPU数据时设置 |
| 这是预期行为——使用 |
legate.io.hdf5legate.core.io.hdf5dataset_nameLEGATE_IO_USE_VFD_GDS=1legate.io.hdf5legate.core.io.hdf5dataset_nameLEGATE_IO_USE_VFD_GDS=1cd /tmp # outside the cupynumeric source tree
conda install -c conda-forge h5py # one-time, if not already present
LEGATE_CONFIG="--cpus 4" LEGATE_AUTO_CONFIG=0 python <skill>/assets/hdf5_roundtrip.py
LEGATE_CONFIG="--cpus 4" LEGATE_AUTO_CONFIG=0 python <skill>/assets/hdf5_batched_read.pyHDF5 ROUND TRIP OKHDF5 BATCHED READ OK--gpus 1LEGATE_IO_USE_VFD_GDS=1cd /tmp # outside the cupynumeric source tree
conda install -c conda-forge h5py # one-time, if not already present
LEGATE_CONFIG="--cpus 4" LEGATE_AUTO_CONFIG=0 python <skill>/assets/hdf5_roundtrip.py
LEGATE_CONFIG="--cpus 4" LEGATE_AUTO_CONFIG=0 python <skill>/assets/hdf5_batched_read.pyHDF5 ROUND TRIP OKHDF5 BATCHED READ OK--gpus 1LEGATE_IO_USE_VFD_GDS=1