cupynumeric-hdf5

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

cuPyNumeric HDF5 I/O

Purpose

用途

Use

legate.io.hdf5

to read and write cuPyNumeric arrays as HDF5 files. Reach for it whenever a cuPyNumeric array must land in — or load from — an

.h5

.hdf5

file: every rank reads and writes its own tile in parallel, so never funnel a large array through a single process.

Answer inline. Treat the snippets and rules below as complete and verified — answer save / load / stream / fence / bridge questions directly, without opening the

assets/

scripts or reading the installed

legate

source. Reach for the assets only to run a verification.

使用

legate.io.hdf5

将cuPyNumeric数组读取为或写入到HDF5文件。当需要将cuPyNumeric数组保存到或从.h5/.hdf5文件加载时，均可使用该方法：每个进程并行读写自己的数据块，因此无需通过单个进程传输大型数组。

直接回答问题。请将下面的代码片段和规则视为完整且已验证的内容——直接回答保存/加载/流处理/栅栏/桥接相关问题，无需打开

assets/

脚本或阅读已安装的

legate

源码。仅在需要验证时才使用相关资源。

Activate

适用场景

Activate when the user asks about: saving a cuPyNumeric array to an

.h5

.hdf5

file, loading an HDF5 dataset into a cuPyNumeric array, reading a large HDF5 dataset in chunks, producing a single file for an HPC post-processing pipeline, or speeding up HDF5 disk I/O with GPUDirect Storage.

当用户询问以下内容时启用本技能：将cuPyNumeric数组保存到.h5/.hdf5文件、将HDF5数据集加载到cuPyNumeric数组、分块读取大型HDF5数据集、为HPC后处理流水线生成单个文件，或通过GPUDirect Storage加速HDF5磁盘I/O。

When NOT to use

不适用场景

Redirect these requests elsewhere instead of reaching for

legate.io.hdf5

Route Parquet / Arrow / cuDF, raw-binary, or sharded / custom on-disk layouts to the cupynumeric-parallel-data-load skill — it owns cuPyNumeric's no-built-in-loader paths;
```
legate.io.hdf5
```
covers single-file HDF5 only.
Answer pure array compute with cuPyNumeric ops (FFT, matmul, reductions, slicing, linear algebra) — this skill covers disk I/O only.
Send chunked or object-store (S3) output to a chunked format such as Zarr — not single-file HDF5.
Load
.npz
or pickled archives with NumPy (
```
np.load
```
), then bridge with
```
cn.asarray(...)
```
—
```
legate.io.hdf5
```
reads HDF5 only, and
```
cupynumeric.load
```
reads single
```
.npy
```
only.
Use h5py directly for plain HDF5 reads with no cuPyNumeric/Legate —
```
with h5py.File(path, "r") as f: arr = f["dataset"][:]
```
.

对于以下请求，请引导至其他技能而非使用

legate.io.hdf5

：

将Parquet/Arrow/cuDF、原始二进制或分片/自定义磁盘布局相关请求引导至cupynumeric-parallel-data-load技能——该技能负责cuPyNumeric无内置加载器的路径；
```
legate.io.hdf5
```
仅支持单文件HDF5。
纯数组计算请求使用cuPyNumeric操作回答（FFT、矩阵乘法、归约、切片、线性代数）——本技能仅覆盖磁盘I/O。
分块或对象存储（S3）输出使用分块格式如Zarr——而非单文件HDF5。
使用NumPy加载.npz或pickle归档文件（
```
np.load
```
），然后通过
```
cn.asarray(...)
```
桥接——
```
legate.io.hdf5
```
仅读取HDF5，
```
cupynumeric.load
```
仅读取单个.npy文件。
无cuPyNumeric/Legate时直接使用h5py进行纯HDF5读取——
```
with h5py.File(path, "r") as f: arr = f["dataset"][:]
```
。

Prerequisites

前置条件

Install h5py before importing anything from

legate.io.hdf5

bash

conda install -c conda-forge h5py        # required; legate/io/hdf5.py imports it at load

Expect

from legate.io.hdf5 import ...

to raise

ModuleNotFoundError

until you do — the module imports

h5py

at load time. (h5py · conda-forge build)

在导入

legate.io.hdf5

中的任何内容之前，请先安装h5py：

bash

conda install -c conda-forge h5py        # 必填；legate/io/hdf5.py在加载时会导入它

在完成安装前，

from legate.io.hdf5 import ...

会抛出

ModuleNotFoundError

——该模块在加载时会导入

h5py

。(h5py · conda-forge构建版)

API

Function	Signature	Purpose
`to_file`	`to_file(array, path, dataset_name)`	Write a cuPyNumeric array / `LogicalArray` to one HDF5 file as a virtual dataset (VDS) — each rank writes its own tile.
`from_file`	`from_file(path, dataset_name) -> LogicalArray`	Read one HDF5 dataset into a distributed array.
`from_file_batched`	`from_file_batched(path, dataset_name, chunk_size) -> Iterator[(LogicalArray, offsets)]`	Read a dataset in chunks — chunks the file read, not the assembled array.

Import all three from

legate.io.hdf5

. Always pass

dataset_name

as the full path to a single array inside the file (e.g.

"/data"

"/group/x"

), never a group.

函数	签名	用途
`to_file`	`to_file(array, path, dataset_name)`	将cuPyNumeric数组/ `LogicalArray` 写入单个HDF5文件作为虚拟数据集（VDS）——每个进程写入自己的数据块。
`from_file`	`from_file(path, dataset_name) -> LogicalArray`	将单个HDF5数据集读取为分布式数组。
`from_file_batched`	`from_file_batched(path, dataset_name, chunk_size) -> Iterator[(LogicalArray, offsets)]`	分块读取数据集——分块操作针对文件读取，而非组装后的数组。

从

legate.io.hdf5

导入这三个函数。请始终将

dataset_name

作为文件内单个数组的完整路径（例如

"/data"

或

"/group/x"

），切勿传入组路径。

Examples

示例

Round trip

往返读写

python

import cupynumeric as cn
from legate.core import get_legate_runtime
from legate.io.hdf5 import from_file, to_file

a = cn.arange(64, dtype=cn.float32).reshape(8, 8)

python

import cupynumeric as cn
from legate.core import get_legate_runtime
from legate.io.hdf5 import from_file, to_file

a = cn.arange(64, dtype=cn.float32).reshape(8, 8)

Write: pass the cuPyNumeric ndarray straight in - no manual conversion.

写入：直接传入cuPyNumeric ndarray——无需手动转换。

to_file(array=a, path="out.h5", dataset_name="/data") get_legate_runtime().issue_execution_fence(block=True) # needed before any external reader

to_file(array=a, path="out.h5", dataset_name="/data") get_legate_runtime().issue_execution_fence(block=True) # 在任何外部读取器操作前需要执行此步骤

Read: from_file returns a legate LogicalArray; cn.asarray bridges it back.

读取：from_file返回Legate LogicalArray；通过cn.asarray转换回cuPyNumeric数组。

b = cn.asarray(from_file("out.h5", dataset_name="/data")) assert cn.array_equal(a, b)


Run `assets/hdf5_roundtrip.py` to verify (optional — not needed to answer).

b = cn.asarray(from_file("out.h5", dataset_name="/data")) assert cn.array_equal(a, b)


运行`assets/hdf5_roundtrip.py`进行验证（可选——回答问题时无需执行）。

Read a large file in chunks

分块读取大型文件

Use

from_file_batched

to read the source file in chunks instead of pulling it into host memory all at once. It yields one

LogicalArray

per chunk plus that chunk's offsets in the global shape. Expect clipped boundary chunks (an axis of length 5 with

chunk_size=2

yields 2, 2, 1), so place each chunk by its actual shape, not the requested

chunk_size

. Note that this chunks the file read, not the result — the assembled array (

out

) still has to fit in distributed memory:

python

import h5py
import cupynumeric as cn
from legate.core import get_legate_runtime
from legate.io.hdf5 import from_file_batched

with h5py.File("big.h5", "r") as f:          # read shape/dtype without loading data
    shape, dtype = f["data"].shape, f["data"].dtype

out = cn.empty(shape, dtype=dtype)
for chunk, (r0, c0) in from_file_batched("big.h5", "data", chunk_size=(4096, 4096)):
    out[r0:r0 + chunk.shape[0], c0:c0 + chunk.shape[1]] = cn.asarray(chunk)
get_legate_runtime().issue_execution_fence(block=True)

Keep every

chunk_size

entry positive and its length equal to the dataset's rank, or

from_file_batched

raises

ValueError

. Run

assets/hdf5_batched_read.py

to verify (optional).

使用

from_file_batched

分块读取源文件，而非一次性将所有数据加载到主机内存。它会逐个返回每个块对应的

LogicalArray

以及该块在全局形状中的偏移量。注意边界块可能会被截断（例如长度为5的轴，

chunk_size=2

会生成2、2、1的块），因此请根据块的实际形状而非请求的

chunk_size

来放置每个块。请注意，分块操作针对的是文件读取，而非最终结果——组装后的数组（

out

）仍需能放入分布式内存：

python

import h5py
import cupynumeric as cn
from legate.core import get_legate_runtime
from legate.io.hdf5 import from_file_batched

with h5py.File("big.h5", "r") as f:          # 读取形状/数据类型而不加载数据
    shape, dtype = f["data"].shape, f["data"].dtype

out = cn.empty(shape, dtype=dtype)
for chunk, (r0, c0) in from_file_batched("big.h5", "data", chunk_size=(4096, 4096)):
    out[r0:r0 + chunk.shape[0], c0:c0 + chunk.shape[1]] = cn.asarray(chunk)
get_legate_runtime().issue_execution_fence(block=True)

请确保每个

chunk_size

的取值为正数，且长度与数据集的维度数一致，否则

from_file_batched

会抛出

ValueError

。运行

assets/hdf5_batched_read.py

进行验证（可选）。

Instructions

使用说明

Pass the cuPyNumeric ndarray directly to
to_file
- it implements
```
__legate_data_interface__
```
, which
```
to_file
```
accepts as
```
LogicalArrayLike
```
. Skip any
```
np.array(...)
```
round-trip.
Bridge results back with
cn.asarray(...)
.
```
from_file
```
and each
```
from_file_batched
```
chunk return a Legate
```
LogicalArray
```
; wrap it with
```
cn.asarray(la)
```
to get a cuPyNumeric ndarray (zero-copy, no host bounce).
Fence before any external reader. Legate I/O is asynchronous:
```
to_file
```
only queues the write. Insert
```
get_legate_runtime().issue_execution_fence(block=True)
```
before h5py, a subprocess, or another tool opens the file. Skip the fence for a
```
from_file
```
issued later in the same Legate program — the runtime preserves that ordering.
Run from outside the cuPyNumeric source tree (e.g.
```
cd /tmp
```
). Python puts the cwd first on
```
sys.path
```
, so an in-tree
```
cupynumeric/
```
directory shadows the installed package (
```
ModuleNotFoundError: cupynumeric.install_info
```
).
Give every rank the same
path
. The program runs on every rank (SPMD), so pass
```
to_file
```
/
```
from_file
```
an identical
```
path
```
on each — a per-rank
```
tempfile.mkstemp()
```
name breaks the collective I/O. When the program creates the file itself, write it with the collective
```
to_file
```
, not a per-rank
```
h5py
```
write.

直接将cuPyNumeric ndarray传入
to_file
——它实现了
```
__legate_data_interface__
```
，
```
to_file
```
会将其视为
```
LogicalArrayLike
```
。无需进行
```
np.array(...)
```
往返转换。
通过
cn.asarray(...)
转换结果。
```
from_file
```
和每个
```
from_file_batched
```
块都会返回Legate
```
LogicalArray
```
；使用
```
cn.asarray(la)
```
将其包装为cuPyNumeric ndarray（零拷贝，无需主机内存中转）。
在外部读取器操作前执行栅栏操作。Legate I/O是异步的：
```
to_file
```
仅会将写入操作加入队列。在h5py、子进程或其他工具打开文件前，插入
```
get_legate_runtime().issue_execution_fence(block=True)
```
。如果是在同一个Legate程序中后续执行
```
from_file
```
，则无需执行栅栏操作——运行时会保留操作顺序。
在cuPyNumeric源码树外运行（例如
```
cd /tmp
```
）。Python会将当前工作目录放在
```
sys.path
```
的首位，因此源码树内的
```
cupynumeric/
```
目录会覆盖已安装的包（导致
```
ModuleNotFoundError: cupynumeric.install_info
```
）。
为每个进程传入相同的
path
。程序在每个进程上运行（SPMD模式），因此请为每个进程的
```
to_file
```
/
```
from_file
```
传入相同的
```
path
```
——每个进程使用
```
tempfile.mkstemp()
```
生成的独立文件名会破坏集体I/O。当程序自行创建文件时，请使用集体
```
to_file
```
写入，而非每个进程单独使用
```
h5py
```
写入。

to_file

behavior to plan around

to_file

的行为注意事项

Expect an HDF5 virtual dataset (VDS): each rank writes its own tile and the file presents them as one logical dataset.
Treat
```
to_file
```
as destructive — it overwrites
```
path
```
if it already exists, so guard any file you must not clobber.
Let
```
to_file
```
create missing parent directories; do not pre-create them.
Give
```
path
```
a file name (
```
/path/to/file.h5
```
), never a directory — a directory raises
```
ValueError
```
. Pass a bound array (one with a known shape);
```
to_file
```
raises
```
ValueError
```
on an unbound array — a Legate array created without a shape (e.g.
```
create_array(dtype, ndim=n)
```
) whose extent a producing task fills in later. cuPyNumeric ndarrays are always bound — even lazy/deferred ones — so this only affects raw
```
LogicalArray
```
s.

生成的是HDF5虚拟数据集（VDS）：每个进程写入自己的数据块，文件将它们呈现为单个逻辑数据集。
将
```
to_file
```
视为破坏性操作——如果
```
path
```
已存在，它会覆盖该文件，因此请保护任何不能被覆盖的文件。
```
to_file
```
会自动创建缺失的父目录；无需预先创建。
```
path
```
必须是文件名（例如
```
/path/to/file.h5
```
），切勿传入目录——传入目录会抛出
```
ValueError
```
。请传入绑定数组（具有已知形状的数组）；
```
to_file
```
会对未绑定数组抛出
```
ValueError
```
——未绑定数组是指创建时未指定形状的Legate数组（例如
```
create_array(dtype, ndim=n)
```
），其范围由后续生成任务填充。cuPyNumeric ndarray始终是绑定的——即使是延迟计算的数组——因此这仅影响原始
```
LogicalArray
```
。

GPUDirect Storage (GDS)

Always set
LEGATE_IO_USE_VFD_GDS=1
for runs that read HDF5 into GPU memory — whether or not the cluster has GPUDirect-capable storage:

bash

export LEGATE_IO_USE_VFD_GDS=1          # set before launching

当读取HDF5数据到GPU内存时，请始终设置
LEGATE_IO_USE_VFD_GDS=1
——无论集群是否支持GPUDirect存储：

bash

export LEGATE_IO_USE_VFD_GDS=1          # 在启动前设置

or, with the legate driver:

或者使用legate驱动：

legate --io-use-vfd-gds my_script.py


- **Read into the GPU through the GDS VFD, not the default path.** The default (POSIX) VFD stages each GPU read through zero-copy memory (ZCMEM), of which Legate reserves only 128 MB — so a GPU read of an array larger than ~128 MB aborts. The GDS VFD removes that staging buffer.
- **Leave it unset when reading into host (CPU) memory** — the VFD GDS plugin is unnecessary there and only adds overhead.
- **Keep `=1` even without GPUDirect-capable storage** — cuFile falls back to compatibility mode automatically (set `export CUFILE_ALLOW_COMPAT_MODE=true` if it is not already on), and `=1` still avoids the ZCMEM abort.
- **Attribute it correctly:** the GDS VFD is the [nv-legate/vfd-gds](https://github.com/nv-legate/vfd-gds) plugin over NVIDIA [cuFile](https://developer.nvidia.com/gpudirect-storage), **not** KvikIO (KvikIO backs Legate's Zarr/tile I/O, not HDF5). Confirm it engaged by grepping the run log for `H5FD__gds_open: Successfully opened file w/GDS VFD`.

legate --io-use-vfd-gds my_script.py


- **通过GDS VFD读取到GPU，而非默认路径**。默认（POSIX）VFD会将每个GPU读取操作通过零拷贝内存（ZCMEM）中转，而Legate仅预留了128 MB的ZCMEM——因此读取大于约128 MB的GPU数组会导致程序终止。GDS VFD会移除该中转缓冲区。
- **当读取到主机（CPU）内存时，请取消设置该环境变量**——VFD GDS插件在此场景下不必要，只会增加开销。
- **即使没有GPUDirect兼容存储，也请保持设置为`=1`**——cuFile会自动回退到兼容模式（如果未开启，请设置`export CUFILE_ALLOW_COMPAT_MODE=true`），且`=1`仍可避免ZCMEM导致的程序终止。
- **正确归因：** GDS VFD是基于NVIDIA [cuFile](https://developer.nvidia.com/gpudirect-storage)的[nv-legate/vfd-gds](https://github.com/nv-legate/vfd-gds)插件，**而非**KvikIO（KvikIO支持Legate的Zarr/分块I/O，而非HDF5）。通过在运行日志中搜索`H5FD__gds_open: Successfully opened file w/GDS VFD`确认其已启用。

Troubleshooting

问题排查

Symptom	Cause and fix
`ModuleNotFoundError: No module named 'h5py'` on import	h5py is missing — `conda install -c conda-forge h5py` .
File looks empty/truncated to h5py right after `to_file`	The async write hasn't landed — add `get_legate_runtime().issue_execution_fence(block=True)` before the external read.
`ValueError` from `to_file`	`path` is a directory — pass a file path such as `results/data.h5` .
`ModuleNotFoundError: No module named 'cupynumeric.install_info'`	Running inside the source tree — `cd /tmp` (any directory outside the repo).
Abort/crash reading a GPU array ≳128 MB	Default 128 MB ZCMEM staging buffer — set `LEGATE_IO_USE_VFD_GDS=1` for GPU reads.
`from_file` returned `LogicalArray(...)`	Expected — wrap it with `cn.asarray(...)` .

症状	原因与解决方法
导入时出现 `ModuleNotFoundError: No module named 'h5py'`	缺少h5py——执行 `conda install -c conda-forge h5py` 安装。
`to_file` 执行后立即用h5py查看文件显示为空/被截断	异步写入尚未完成——在外部读取前添加 `get_legate_runtime().issue_execution_fence(block=True)` 。
`to_file` 抛出 `ValueError`	`path` 是目录——传入文件路径，例如 `results/data.h5` 。
出现 `ModuleNotFoundError: No module named 'cupynumeric.install_info'`	在源码树内运行——切换到 `/tmp` （任何仓库外的目录）。
读取大于约128 MB的GPU数组时程序终止/崩溃	默认128 MB的ZCMEM中转缓冲区——读取GPU数据时设置 `LEGATE_IO_USE_VFD_GDS=1` 。
`from_file` 返回 `LogicalArray(...)`	此为预期行为——使用 `cn.asarray(...)` 包装。

Limitations & version notes

限制与版本说明

Import from
legate.io.hdf5
(Legate 26.01+); rewrite any
```
legate.core.io.hdf5
```
import left over from the 25.03 line (e.g. the 25.03 launch blog still shows the old path).
Install h5py explicitly — it ships in no default cuPyNumeric env.
Point
dataset_name
at a single array, never a group; traverse groups with h5py first to discover dataset paths.
On GPU, always read with
LEGATE_IO_USE_VFD_GDS=1
(see GPUDirect Storage) — the default path aborts on GPU arrays larger than the 128 MB ZCMEM buffer. Leave it unset for CPU reads.

从
legate.io.hdf5
导入（Legate 26.01及以上版本）；请修改25.03版本遗留的
```
legate.core.io.hdf5
```
导入路径（例如25.03发布博客仍显示旧路径）。
需显式安装h5py——它未包含在默认的cuPyNumeric环境中。
dataset_name
需指向单个数组，而非组；请先使用h5py遍历组以发现数据集路径。
在GPU上读取时，请始终设置
LEGATE_IO_USE_VFD_GDS=1
（参考GPUDirect Storage）——默认路径会在GPU数组超过128 MB ZCMEM缓冲区时导致程序终止。CPU读取时请取消设置。

Verify

验证步骤

bash

cd /tmp                                  # outside the cupynumeric source tree
conda install -c conda-forge h5py        # one-time, if not already present
LEGATE_CONFIG="--cpus 4" LEGATE_AUTO_CONFIG=0 python <skill>/assets/hdf5_roundtrip.py
LEGATE_CONFIG="--cpus 4" LEGATE_AUTO_CONFIG=0 python <skill>/assets/hdf5_batched_read.py

Expect

HDF5 ROUND TRIP OK

and

HDF5 BATCHED READ OK

. Add

--gpus 1

(and

LEGATE_IO_USE_VFD_GDS=1

) to exercise the GPU / GDS path.

bash

cd /tmp                                  # 切换到cuPyNumeric源码树外
conda install -c conda-forge h5py        # 一次性安装（如果尚未安装）
LEGATE_CONFIG="--cpus 4" LEGATE_AUTO_CONFIG=0 python <skill>/assets/hdf5_roundtrip.py
LEGATE_CONFIG="--cpus 4" LEGATE_AUTO_CONFIG=0 python <skill>/assets/hdf5_batched_read.py

预期输出为

HDF5 ROUND TRIP OK

和

HDF5 BATCHED READ OK

。添加

--gpus 1

（并设置

LEGATE_IO_USE_VFD_GDS=1

）以测试GPU/GDS路径。

cupynumeric-hdf5

Original

Translation

cuPyNumeric HDF5 I/O

cuPyNumeric HDF5 I/O

Purpose

用途

Activate

适用场景

When NOT to use

不适用场景

Prerequisites

前置条件

API

API

Examples

示例

Round trip

往返读写

Write: pass the cuPyNumeric ndarray straight in - no manual conversion.

写入：直接传入cuPyNumeric ndarray——无需手动转换。

Read: from_file returns a legate LogicalArray; cn.asarray bridges it back.

读取：from_file返回Legate LogicalArray；通过cn.asarray转换回cuPyNumeric数组。

Read a large file in chunks

分块读取大型文件

Instructions

使用说明

`to_file`
behavior to plan around

`to_file`
的行为注意事项

GPUDirect Storage (GDS)

GPUDirect Storage (GDS)

or, with the legate driver:

或者使用legate驱动：

Troubleshooting

问题排查

Limitations & version notes

限制与版本说明

Verify

验证步骤

cupynumeric-hdf5

Original

Translation

cuPyNumeric HDF5 I/O

cuPyNumeric HDF5 I/O

Purpose

用途

Activate

适用场景

When NOT to use

不适用场景

Prerequisites

前置条件

API

API

Examples

示例

Round trip

往返读写

Write: pass the cuPyNumeric ndarray straight in - no manual conversion.

写入：直接传入cuPyNumeric ndarray——无需手动转换。

Read: from_file returns a legate LogicalArray; cn.asarray bridges it back.

读取：from_file返回Legate LogicalArray；通过cn.asarray转换回cuPyNumeric数组。

Read a large file in chunks

分块读取大型文件

Instructions

使用说明

to_file behavior to plan around

to_file的行为注意事项

GPUDirect Storage (GDS)

GPUDirect Storage (GDS)

or, with the legate driver:

或者使用legate驱动：

Troubleshooting

问题排查

Limitations & version notes

限制与版本说明

Verify

验证步骤

`to_file`
behavior to plan around

`to_file`
的行为注意事项