numpy

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

NumPy Best Practices

NumPy 最佳实践

NumPy is the fundamental package for scientific computing with Python. It provides N-dimensional array objects, vectorized math operations, broadcasting, linear algebra, Fourier transforms, and random number generation. This skill covers best practices for writing correct, efficient, and maintainable NumPy code.

NumPy是Python科学计算的基础包。它提供了N维数组对象、向量化数学运算、广播机制、线性代数、傅里叶变换和随机数生成功能。本Skill涵盖了编写正确、高效且可维护的NumPy代码的最佳实践。

Import Convention

导入约定

Always import NumPy with the standard alias:

python

import numpy as np

Never use

from numpy import *

— it pollutes the namespace and makes code harder to read.

始终使用标准别名导入NumPy：

python

import numpy as np

切勿使用

from numpy import *

——这会污染命名空间，使代码更难阅读。

Array Creation

数组创建

Choose the right creation function

选择合适的创建函数

Use case	Function
Known values	`np.array([1, 2, 3])`
Zeros	`np.zeros(shape)`
Ones	`np.ones(shape)`
Uninitialized (fill later)	`np.empty(shape)`
Integer range	`np.arange(start, stop, step)`
Evenly spaced floats	`np.linspace(start, stop, num)`
Identity matrix	`np.eye(n)`
Like existing array	`np.zeros_like(arr)` , `np.ones_like(arr)`

使用场景	函数
已知值	`np.array([1, 2, 3])`
全零数组	`np.zeros(shape)`
全一数组	`np.ones(shape)`
未初始化（后续填充）	`np.empty(shape)`
整数序列	`np.arange(start, stop, step)`
等间距浮点数	`np.linspace(start, stop, num)`
单位矩阵	`np.eye(n)`
匹配现有数组形状	`np.zeros_like(arr)` , `np.ones_like(arr)`

Specify dtype explicitly

显式指定dtype

Always specify

dtype

when the intended type differs from NumPy's default (

float64

for floats,

int64

for integers):

python

undefined

当预期类型与NumPy默认类型（浮点数为

float64

，整数为

int64

）不同时，始终显式指定

dtype

：

python

undefined

Explicit dtype avoids silent precision issues

显式dtype可避免隐性精度问题

weights = np.ones(1000, dtype=np.float32) # saves memory for ML models indices = np.arange(100, dtype=np.int32) # sufficient range, half memory flags = np.zeros(50, dtype=np.bool_) # boolean array


Do not rely on implicit upcasting — declare the dtype the data actually needs.

weights = np.ones(1000, dtype=np.float32) # 为机器学习模型节省内存 indices = np.arange(100, dtype=np.int32) # 足够的范围，内存占用减半 flags = np.zeros(50, dtype=np.bool_) # 布尔数组


不要依赖隐式向上转换——声明数据实际需要的dtype。

Use

np.random.default_rng()

for random numbers

使用

np.random.default_rng()

生成随机数

The legacy

np.random.*

functions (e.g.,

np.random.rand

) are deprecated in favour of the Generator API:

python

undefined

旧版

np.random.*

函数（如

np.random.rand

）已被弃用，推荐使用Generator API：

python

undefined

Correct — reproducible, modern API

正确用法——可复现的现代API

rng = np.random.default_rng(seed=42) samples = rng.normal(loc=0.0, scale=1.0, size=(100, 3)) integers = rng.integers(0, 10, size=50)

Avoid — legacy API, global state

避免使用——旧版API，全局状态

np.random.seed(42) samples = np.random.randn(100, 3)


Pass `seed` to `default_rng` for reproducibility in tests and experiments.

np.random.seed(42) samples = np.random.randn(100, 3)


在测试和实验中，将`seed`传入`default_rng`以保证结果可复现。

Vectorization Over Loops

用向量化替代循环

Replace Python loops with vectorized NumPy operations wherever possible. NumPy operations execute in optimized C code, making them orders of magnitude faster.

python

undefined

尽可能用NumPy向量化操作替代Python循环。NumPy操作在优化后的C代码中执行，速度比Python循环快几个数量级。

python

undefined

Avoid — Python loop

避免使用——Python循环

result = [] for x in data: result.append(x ** 2 + 2 * x + 1) result = np.array(result)

Correct — fully vectorized

正确用法——完全向量化

result = data ** 2 + 2 * data + 1


Use `np.vectorize` only as a convenience wrapper for scalar functions — it does **not** improve performance since it still calls Python per element.

result = data ** 2 + 2 * data + 1


仅将`np.vectorize`用作标量函数的便捷包装器——它不会提升性能，因为仍会逐元素调用Python。

Broadcasting Rules

广播规则

Broadcasting allows operations on arrays of different shapes without copying data. Apply broadcasting instead of explicit

tile

repeat

calls.

Broadcasting rules (trailing dimensions are compared):

Dimensions are equal — compatible.
One dimension is 1 — that dimension is stretched.
Otherwise —
```
ValueError
```
.

python

undefined

广播机制允许对不同形状的数组执行操作而无需复制数据。优先使用广播而非显式调用

tile

或

repeat

。

广播规则（比较尾部维度）：

维度相等——兼容。
某一维度为1——该维度会被扩展。
其他情况——抛出
```
ValueError
```
。

python

undefined

Add a bias vector to each row of a matrix — broadcasting handles shape (3,) vs (4, 3)

给矩阵的每一行添加偏置向量——广播自动处理形状(3,)与(4, 3)

matrix = np.zeros((4, 3)) bias = np.array([1.0, 2.0, 3.0]) result = matrix + bias # shape (4, 3); no copy made

matrix = np.zeros((4, 3)) bias = np.array([1.0, 2.0, 3.0]) result = matrix + bias # 形状为(4, 3)；无需复制数据

Outer product via newaxis

通过newaxis实现外积

a = np.array([0.0, 10.0, 20.0]) # shape (3,) b = np.array([1.0, 2.0, 3.0]) # shape (3,) outer = a[:, np.newaxis] + b # shape (3, 3)


Avoid broadcasting that produces very large intermediate arrays — use an explicit loop for memory-constrained cases.

a = np.array([0.0, 10.0, 20.0]) # 形状(3,) b = np.array([1.0, 2.0, 3.0]) # 形状(3,) outer = a[:, np.newaxis] + b # 形状(3, 3)


避免广播产生过大的中间数组——在内存受限的情况下使用显式循环。

Views vs Copies

视图与副本

Basic indexing (slices) returns a view — modifying it modifies the original:

python

x = np.arange(10)
y = x[2:5]     # view — shares memory
y[0] = 99      # also changes x[2]

Advanced indexing (integer arrays, boolean masks) returns a copy:

python

x = np.arange(10)
idx = [1, 3, 5]
y = x[idx]     # copy — independent of x

Check ownership with

arr.base

python

y.base is None    # True → copy
y.base is x       # True → view of x

基础索引（切片）返回视图——修改视图会修改原数组：

python

x = np.arange(10)
y = x[2:5]     # 视图——共享内存
y[0] = 99      # 同时修改x[2]

高级索引（整数数组、布尔掩码）返回副本：

python

x = np.arange(10)
idx = [1, 3, 5]
y = x[idx]     # 副本——与原数组独立

使用

arr.base

检查是否为视图：

python

y.base is None    # True → 副本
y.base is x       # True → x的视图

When to force a copy

何时强制生成副本

Call

.copy()

explicitly when an independent array is needed:

python

backup = original.copy()

Use

.ravel()

(view when possible) over

.flatten()

(always copies) when write access to the parent is acceptable. Use

reshape(-1)

as the most reliable way to get a flat view.

当需要独立数组时，显式调用

.copy()

：

python

backup = original.copy()

当可以接受修改原数组时，优先使用

.ravel()

（尽可能返回视图）而非

.flatten()

（始终返回副本）。使用

reshape(-1)

是获取扁平化视图最可靠的方式。

Indexing and Selection

索引与选择

Boolean indexing for filtering

布尔索引用于过滤

python

arr = np.array([1, -2, 3, -4, 5])
positive = arr[arr > 0]           # copy: [1, 3, 5]
arr[arr < 0] = 0                  # in-place modification via boolean mask

python

arr = np.array([1, -2, 3, -4, 5])
positive = arr[arr > 0]           # 副本：[1, 3, 5]
arr[arr < 0] = 0                  # 通过布尔掩码原地修改

np.where

for conditional selection

np.where

用于条件选择

python

undefined

python

undefined

Replace negatives with zero, keep positives

将负数替换为0，保留正数

cleaned = np.where(arr > 0, arr, 0)

undefined

cleaned = np.where(arr > 0, arr, 0)

undefined

Avoid loops for aggregations

避免用循环进行聚合

Use axis-aware aggregation functions instead of looping over rows or columns:

python

matrix = np.arange(12).reshape(3, 4)
row_sums = matrix.sum(axis=1)    # sum each row → shape (3,)
col_max  = matrix.max(axis=0)    # max each column → shape (4,)

使用支持轴参数的聚合函数替代遍历行或列的循环：

python

matrix = np.arange(12).reshape(3, 4)
row_sums = matrix.sum(axis=1)    # 对每行求和 → 形状(3,)
col_max  = matrix.max(axis=0)    # 对每列取最大值 → 形状(4,)

Data Types and Precision

数据类型与精度

Choose the smallest sufficient dtype

选择最小的适用dtype

Scenario	Recommended dtype
ML model weights	`np.float32`
High-precision scientific	`np.float64`
Small integer counts (<32 k)	`np.int16`
Large integer counts	`np.int32` or `np.int64`
Boolean flags	`np.bool_`
Complex numbers	`np.complex64` or `np.complex128`

Use

arr.astype(np.float32, copy=False)

to cast in-place when the data is already the right type —

copy=False

avoids an unnecessary allocation.

场景	推荐dtype
机器学习模型权重	`np.float32`
高精度科学计算	`np.float64`
小整数计数（<32k）	`np.int16`
大整数计数	`np.int32` 或 `np.int64`
布尔标记	`np.bool_`
复数	`np.complex64` 或 `np.complex128`

当数据类型已经正确时，使用

arr.astype(np.float32, copy=False)

原地转换——

copy=False

可避免不必要的内存分配。

Watch for integer overflow

注意整数溢出

NumPy integer arithmetic wraps silently:

python

x = np.array([200], dtype=np.int8)   # max 127
x + 100    # array([ 44], dtype=int8) — silent overflow!

Cast to a wider type before operations that risk overflow.

NumPy整数运算会静默溢出：

python

x = np.array([200], dtype=np.int8)   # 最大值为127
x + 100    # array([ 44], dtype=int8) —— 静默溢出！

在进行可能溢出的运算前，先转换为更宽的数据类型。

Saving and Loading Arrays

数组的保存与加载

Format	Function	Use case
Single array (binary)	`np.save` / `np.load`	Fast, preserves dtype and shape
Multiple arrays	`np.savez` / `np.savez_compressed`	Archive multiple arrays
Text (CSV etc.)	`np.savetxt` / `np.loadtxt`	Human-readable interchange

python

undefined

格式	函数	使用场景
单个数组（二进制）	`np.save` / `np.load`	速度快，保留dtype和形状
多个数组	`np.savez` / `np.savez_compressed`	归档多个数组
文本（CSV等）	`np.savetxt` / `np.loadtxt`	人类可读的交互格式

python

undefined

Save and reload with full metadata preserved

保存并重新加载，完整保留元数据

np.save("data.npy", arr) arr_loaded = np.load("data.npy")

Save several arrays

保存多个数组

np.savez("dataset.npz", X=X_train, y=y_train) npz = np.load("dataset.npz") X_train = npz["X"]


Prefer `.npy`/`.npz` over text formats for large arrays — binary I/O is faster and lossless.

np.savez("dataset.npz", X=X_train, y=y_train) npz = np.load("dataset.npz") X_train = npz["X"]


对于大型数组，优先使用`.npy`/`.npz`而非文本格式——二进制I/O速度更快且无精度损失。

Reshaping and Shape Manipulation

重塑与形状操作

Use

-1

as a wildcard dimension — NumPy infers the correct size:

python

flat = arr.reshape(-1)        # flatten to 1-D (view when possible)
col  = arr.reshape(-1, 1)     # column vector
row  = arr.reshape(1, -1)     # row vector

Use

np.newaxis

(equivalent to

None

) to insert a dimension for broadcasting:

python

a = np.array([1, 2, 3])       # shape (3,)
a_col = a[:, np.newaxis]      # shape (3, 1)
a_row = a[np.newaxis, :]      # shape (1, 3)

使用

-1

作为通配符维度——NumPy会自动推断正确的大小：

python

flat = arr.reshape(-1)        # 扁平化为1维数组（尽可能返回视图）
col  = arr.reshape(-1, 1)     # 列向量
row  = arr.reshape(1, -1)     # 行向量

使用

np.newaxis

（等同于

None

）插入维度以实现广播：

python

a = np.array([1, 2, 3])       # 形状(3,)
a_col = a[:, np.newaxis]      # 形状(3, 1)
a_row = a[np.newaxis, :]      # 形状(1, 3)

Linear Algebra

线性代数

Use

np.linalg

for matrix operations:

python

A = np.array([[1, 2], [3, 4]], dtype=np.float64)

使用

np.linalg

进行矩阵运算：

python

A = np.array([[1, 2], [3, 4]], dtype=np.float64)

Matrix multiplication — use @ operator (Python 3.5+)

矩阵乘法——使用@运算符（Python 3.5+）

C = A @ B # preferred over np.dot(A, B) for 2-D

C = A @ B # 对于2维数组，优先使用此方式而非np.dot(A, B)

Common operations

常见运算

vals, vecs = np.linalg.eig(A) inv_A = np.linalg.inv(A) rank = np.linalg.matrix_rank(A) det = np.linalg.det(A)

Solve Ax = b without computing inverse (faster, more stable)

无需计算逆矩阵即可求解Ax = b（更快、更稳定）

x = np.linalg.solve(A, b) # preferred over inv(A) @ b


Never use `np.matrix` — it is deprecated. Use 2-D `ndarray` with `@` instead.

x = np.linalg.solve(A, b) # 优先使用此方式而非inv(A) @ b


切勿使用`np.matrix`——它已被弃用。改用2维`ndarray`并配合`@`运算符。

Quick Reference

快速参考

Task	Idiomatic code
Import	`import numpy as np`
Array from list	`np.array([1, 2, 3])`
Shape / ndim / size	`arr.shape` , `arr.ndim` , `arr.size`
Reshape	`arr.reshape(rows, -1)`
Flatten (view)	`arr.reshape(-1)` or `arr.ravel()`
Flatten (copy)	`arr.flatten()`
Transpose	`arr.T` or `arr.transpose()`
Boolean mask	`arr[arr > 0]`
Axis aggregation	`arr.sum(axis=0)`
Matrix multiply	`A @ B`
Copy	`arr.copy()`
Check view	`arr.base is not None`
Cast dtype	`arr.astype(np.float32, copy=False)`
Random (modern)	`np.random.default_rng(seed)`
Save / load	`np.save` / `np.load`

任务	惯用代码
导入	`import numpy as np`
从列表创建数组	`np.array([1, 2, 3])`
形状/维度/元素数	`arr.shape` , `arr.ndim` , `arr.size`
重塑	`arr.reshape(rows, -1)`
扁平化（视图）	`arr.reshape(-1)` 或 `arr.ravel()`
扁平化（副本）	`arr.flatten()`
转置	`arr.T` 或 `arr.transpose()`
布尔掩码	`arr[arr > 0]`
轴聚合	`arr.sum(axis=0)`
矩阵乘法	`A @ B`
复制	`arr.copy()`
检查是否为视图	`arr.base is not None`
转换dtype	`arr.astype(np.float32, copy=False)`
随机数生成（现代）	`np.random.default_rng(seed)`
保存/加载	`np.save` / `np.load`

Additional Resources

额外资源

Reference Files

参考文档

For deeper guidance, consult:

references/performance-and-memory.md
— Vectorization patterns, memory layout, dtype selection, profiling, and avoiding common performance traps
references/array-operations.md
— Broadcasting in depth, advanced indexing, ufuncs, structured arrays, and I/O patterns

如需更深入的指导，请查阅：

references/performance-and-memory.md
——向量化模式、内存布局、dtype选择、性能分析及常见性能陷阱规避
references/array-operations.md
——深入讲解广播机制、高级索引、通用函数、结构化数组及I/O模式

numpy

Original

Translation

NumPy Best Practices

NumPy 最佳实践

Import Convention

导入约定

Array Creation

数组创建

Choose the right creation function

选择合适的创建函数

Specify dtype explicitly

显式指定dtype

Explicit dtype avoids silent precision issues

显式dtype可避免隐性精度问题

Use np.random.default_rng() for random numbers

使用np.random.default_rng()生成随机数

Correct — reproducible, modern API

正确用法——可复现的现代API

Avoid — legacy API, global state

避免使用——旧版API，全局状态

Vectorization Over Loops

用向量化替代循环

Avoid — Python loop

避免使用——Python循环

Correct — fully vectorized

正确用法——完全向量化

Broadcasting Rules

广播规则

Add a bias vector to each row of a matrix — broadcasting handles shape (3,) vs (4, 3)

给矩阵的每一行添加偏置向量——广播自动处理形状(3,)与(4, 3)

Outer product via newaxis

通过newaxis实现外积

Views vs Copies

视图与副本

When to force a copy

何时强制生成副本

Indexing and Selection

索引与选择

Boolean indexing for filtering

布尔索引用于过滤

np.where for conditional selection

np.where用于条件选择

Replace negatives with zero, keep positives

将负数替换为0，保留正数

Avoid loops for aggregations

避免用循环进行聚合

Data Types and Precision

数据类型与精度

Choose the smallest sufficient dtype

选择最小的适用dtype

Watch for integer overflow

注意整数溢出

Saving and Loading Arrays

数组的保存与加载

Save and reload with full metadata preserved

保存并重新加载，完整保留元数据

Save several arrays

保存多个数组

Reshaping and Shape Manipulation

重塑与形状操作

Linear Algebra

线性代数

Matrix multiplication — use @ operator (Python 3.5+)

矩阵乘法——使用@运算符（Python 3.5+）

Common operations

常见运算

Solve Ax = b without computing inverse (faster, more stable)

无需计算逆矩阵即可求解Ax = b（更快、更稳定）

Quick Reference

快速参考

Additional Resources

额外资源

Reference Files

参考文档

Use
`np.random.default_rng()`
for random numbers

使用
`np.random.default_rng()`
生成随机数

`np.where`
for conditional selection

`np.where`
用于条件选择