numpy

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

NumPy Best Practices

NumPy 最佳实践

NumPy is the fundamental package for scientific computing with Python. It provides N-dimensional array objects, vectorized math operations, broadcasting, linear algebra, Fourier transforms, and random number generation. This skill covers best practices for writing correct, efficient, and maintainable NumPy code.
NumPy是Python科学计算的基础包。它提供了N维数组对象、向量化数学运算、广播机制、线性代数、傅里叶变换和随机数生成功能。本Skill涵盖了编写正确、高效且可维护的NumPy代码的最佳实践。

Import Convention

导入约定

Always import NumPy with the standard alias:
python
import numpy as np
Never use
from numpy import *
— it pollutes the namespace and makes code harder to read.
始终使用标准别名导入NumPy:
python
import numpy as np
切勿使用
from numpy import *
——这会污染命名空间,使代码更难阅读。

Array Creation

数组创建

Choose the right creation function

选择合适的创建函数

Use caseFunction
Known values
np.array([1, 2, 3])
Zeros
np.zeros(shape)
Ones
np.ones(shape)
Uninitialized (fill later)
np.empty(shape)
Integer range
np.arange(start, stop, step)
Evenly spaced floats
np.linspace(start, stop, num)
Identity matrix
np.eye(n)
Like existing array
np.zeros_like(arr)
,
np.ones_like(arr)
使用场景函数
已知值
np.array([1, 2, 3])
全零数组
np.zeros(shape)
全一数组
np.ones(shape)
未初始化(后续填充)
np.empty(shape)
整数序列
np.arange(start, stop, step)
等间距浮点数
np.linspace(start, stop, num)
单位矩阵
np.eye(n)
匹配现有数组形状
np.zeros_like(arr)
,
np.ones_like(arr)

Specify dtype explicitly

显式指定dtype

Always specify
dtype
when the intended type differs from NumPy's default (
float64
for floats,
int64
for integers):
python
undefined
当预期类型与NumPy默认类型(浮点数为
float64
,整数为
int64
)不同时,始终显式指定
dtype
python
undefined

Explicit dtype avoids silent precision issues

显式dtype可避免隐性精度问题

weights = np.ones(1000, dtype=np.float32) # saves memory for ML models indices = np.arange(100, dtype=np.int32) # sufficient range, half memory flags = np.zeros(50, dtype=np.bool_) # boolean array

Do not rely on implicit upcasting — declare the dtype the data actually needs.
weights = np.ones(1000, dtype=np.float32) # 为机器学习模型节省内存 indices = np.arange(100, dtype=np.int32) # 足够的范围,内存占用减半 flags = np.zeros(50, dtype=np.bool_) # 布尔数组

不要依赖隐式向上转换——声明数据实际需要的dtype。

Use
np.random.default_rng()
for random numbers

使用
np.random.default_rng()
生成随机数

The legacy
np.random.*
functions (e.g.,
np.random.rand
) are deprecated in favour of the Generator API:
python
undefined
旧版
np.random.*
函数(如
np.random.rand
)已被弃用,推荐使用Generator API:
python
undefined

Correct — reproducible, modern API

正确用法——可复现的现代API

rng = np.random.default_rng(seed=42) samples = rng.normal(loc=0.0, scale=1.0, size=(100, 3)) integers = rng.integers(0, 10, size=50)
rng = np.random.default_rng(seed=42) samples = rng.normal(loc=0.0, scale=1.0, size=(100, 3)) integers = rng.integers(0, 10, size=50)

Avoid — legacy API, global state

避免使用——旧版API,全局状态

np.random.seed(42) samples = np.random.randn(100, 3)

Pass `seed` to `default_rng` for reproducibility in tests and experiments.
np.random.seed(42) samples = np.random.randn(100, 3)

在测试和实验中,将`seed`传入`default_rng`以保证结果可复现。

Vectorization Over Loops

用向量化替代循环

Replace Python loops with vectorized NumPy operations wherever possible. NumPy operations execute in optimized C code, making them orders of magnitude faster.
python
undefined
尽可能用NumPy向量化操作替代Python循环。NumPy操作在优化后的C代码中执行,速度比Python循环快几个数量级。
python
undefined

Avoid — Python loop

避免使用——Python循环

result = [] for x in data: result.append(x ** 2 + 2 * x + 1) result = np.array(result)
result = [] for x in data: result.append(x ** 2 + 2 * x + 1) result = np.array(result)

Correct — fully vectorized

正确用法——完全向量化

result = data ** 2 + 2 * data + 1

Use `np.vectorize` only as a convenience wrapper for scalar functions — it does **not** improve performance since it still calls Python per element.
result = data ** 2 + 2 * data + 1

仅将`np.vectorize`用作标量函数的便捷包装器——它不会提升性能,因为仍会逐元素调用Python。

Broadcasting Rules

广播规则

Broadcasting allows operations on arrays of different shapes without copying data. Apply broadcasting instead of explicit
tile
or
repeat
calls.
Broadcasting rules (trailing dimensions are compared):
  1. Dimensions are equal — compatible.
  2. One dimension is 1 — that dimension is stretched.
  3. Otherwise —
    ValueError
    .
python
undefined
广播机制允许对不同形状的数组执行操作而无需复制数据。优先使用广播而非显式调用
tile
repeat
广播规则(比较尾部维度):
  1. 维度相等——兼容。
  2. 某一维度为1——该维度会被扩展。
  3. 其他情况——抛出
    ValueError
python
undefined

Add a bias vector to each row of a matrix — broadcasting handles shape (3,) vs (4, 3)

给矩阵的每一行添加偏置向量——广播自动处理形状(3,)与(4, 3)

matrix = np.zeros((4, 3)) bias = np.array([1.0, 2.0, 3.0]) result = matrix + bias # shape (4, 3); no copy made
matrix = np.zeros((4, 3)) bias = np.array([1.0, 2.0, 3.0]) result = matrix + bias # 形状为(4, 3);无需复制数据

Outer product via newaxis

通过newaxis实现外积

a = np.array([0.0, 10.0, 20.0]) # shape (3,) b = np.array([1.0, 2.0, 3.0]) # shape (3,) outer = a[:, np.newaxis] + b # shape (3, 3)

Avoid broadcasting that produces very large intermediate arrays — use an explicit loop for memory-constrained cases.
a = np.array([0.0, 10.0, 20.0]) # 形状(3,) b = np.array([1.0, 2.0, 3.0]) # 形状(3,) outer = a[:, np.newaxis] + b # 形状(3, 3)

避免广播产生过大的中间数组——在内存受限的情况下使用显式循环。

Views vs Copies

视图与副本

Basic indexing (slices) returns a view — modifying it modifies the original:
python
x = np.arange(10)
y = x[2:5]     # view — shares memory
y[0] = 99      # also changes x[2]
Advanced indexing (integer arrays, boolean masks) returns a copy:
python
x = np.arange(10)
idx = [1, 3, 5]
y = x[idx]     # copy — independent of x
Check ownership with
arr.base
:
python
y.base is None    # True → copy
y.base is x       # True → view of x
基础索引(切片)返回视图——修改视图会修改原数组:
python
x = np.arange(10)
y = x[2:5]     # 视图——共享内存
y[0] = 99      # 同时修改x[2]
高级索引(整数数组、布尔掩码)返回副本
python
x = np.arange(10)
idx = [1, 3, 5]
y = x[idx]     # 副本——与原数组独立
使用
arr.base
检查是否为视图:
python
y.base is None    # True → 副本
y.base is x       # True → x的视图

When to force a copy

何时强制生成副本

Call
.copy()
explicitly when an independent array is needed:
python
backup = original.copy()
Use
.ravel()
(view when possible) over
.flatten()
(always copies) when write access to the parent is acceptable. Use
reshape(-1)
as the most reliable way to get a flat view.
当需要独立数组时,显式调用
.copy()
python
backup = original.copy()
当可以接受修改原数组时,优先使用
.ravel()
(尽可能返回视图)而非
.flatten()
(始终返回副本)。使用
reshape(-1)
是获取扁平化视图最可靠的方式。

Indexing and Selection

索引与选择

Boolean indexing for filtering

布尔索引用于过滤

python
arr = np.array([1, -2, 3, -4, 5])
positive = arr[arr > 0]           # copy: [1, 3, 5]
arr[arr < 0] = 0                  # in-place modification via boolean mask
python
arr = np.array([1, -2, 3, -4, 5])
positive = arr[arr > 0]           # 副本:[1, 3, 5]
arr[arr < 0] = 0                  # 通过布尔掩码原地修改

np.where
for conditional selection

np.where
用于条件选择

python
undefined
python
undefined

Replace negatives with zero, keep positives

将负数替换为0,保留正数

cleaned = np.where(arr > 0, arr, 0)
undefined
cleaned = np.where(arr > 0, arr, 0)
undefined

Avoid loops for aggregations

避免用循环进行聚合

Use axis-aware aggregation functions instead of looping over rows or columns:
python
matrix = np.arange(12).reshape(3, 4)
row_sums = matrix.sum(axis=1)    # sum each row → shape (3,)
col_max  = matrix.max(axis=0)    # max each column → shape (4,)
使用支持轴参数的聚合函数替代遍历行或列的循环:
python
matrix = np.arange(12).reshape(3, 4)
row_sums = matrix.sum(axis=1)    # 对每行求和 → 形状(3,)
col_max  = matrix.max(axis=0)    # 对每列取最大值 → 形状(4,)

Data Types and Precision

数据类型与精度

Choose the smallest sufficient dtype

选择最小的适用dtype

ScenarioRecommended dtype
ML model weights
np.float32
High-precision scientific
np.float64
Small integer counts (<32 k)
np.int16
Large integer counts
np.int32
or
np.int64
Boolean flags
np.bool_
Complex numbers
np.complex64
or
np.complex128
Use
arr.astype(np.float32, copy=False)
to cast in-place when the data is already the right type —
copy=False
avoids an unnecessary allocation.
场景推荐dtype
机器学习模型权重
np.float32
高精度科学计算
np.float64
小整数计数(<32k)
np.int16
大整数计数
np.int32
np.int64
布尔标记
np.bool_
复数
np.complex64
np.complex128
当数据类型已经正确时,使用
arr.astype(np.float32, copy=False)
原地转换——
copy=False
可避免不必要的内存分配。

Watch for integer overflow

注意整数溢出

NumPy integer arithmetic wraps silently:
python
x = np.array([200], dtype=np.int8)   # max 127
x + 100    # array([ 44], dtype=int8) — silent overflow!
Cast to a wider type before operations that risk overflow.
NumPy整数运算会静默溢出:
python
x = np.array([200], dtype=np.int8)   # 最大值为127
x + 100    # array([ 44], dtype=int8) —— 静默溢出!
在进行可能溢出的运算前,先转换为更宽的数据类型。

Saving and Loading Arrays

数组的保存与加载

FormatFunctionUse case
Single array (binary)
np.save
/
np.load
Fast, preserves dtype and shape
Multiple arrays
np.savez
/
np.savez_compressed
Archive multiple arrays
Text (CSV etc.)
np.savetxt
/
np.loadtxt
Human-readable interchange
python
undefined
格式函数使用场景
单个数组(二进制)
np.save
/
np.load
速度快,保留dtype和形状
多个数组
np.savez
/
np.savez_compressed
归档多个数组
文本(CSV等)
np.savetxt
/
np.loadtxt
人类可读的交互格式
python
undefined

Save and reload with full metadata preserved

保存并重新加载,完整保留元数据

np.save("data.npy", arr) arr_loaded = np.load("data.npy")
np.save("data.npy", arr) arr_loaded = np.load("data.npy")

Save several arrays

保存多个数组

np.savez("dataset.npz", X=X_train, y=y_train) npz = np.load("dataset.npz") X_train = npz["X"]

Prefer `.npy`/`.npz` over text formats for large arrays — binary I/O is faster and lossless.
np.savez("dataset.npz", X=X_train, y=y_train) npz = np.load("dataset.npz") X_train = npz["X"]

对于大型数组,优先使用`.npy`/`.npz`而非文本格式——二进制I/O速度更快且无精度损失。

Reshaping and Shape Manipulation

重塑与形状操作

Use
-1
as a wildcard dimension — NumPy infers the correct size:
python
flat = arr.reshape(-1)        # flatten to 1-D (view when possible)
col  = arr.reshape(-1, 1)     # column vector
row  = arr.reshape(1, -1)     # row vector
Use
np.newaxis
(equivalent to
None
) to insert a dimension for broadcasting:
python
a = np.array([1, 2, 3])       # shape (3,)
a_col = a[:, np.newaxis]      # shape (3, 1)
a_row = a[np.newaxis, :]      # shape (1, 3)
使用
-1
作为通配符维度——NumPy会自动推断正确的大小:
python
flat = arr.reshape(-1)        # 扁平化为1维数组(尽可能返回视图)
col  = arr.reshape(-1, 1)     # 列向量
row  = arr.reshape(1, -1)     # 行向量
使用
np.newaxis
(等同于
None
)插入维度以实现广播:
python
a = np.array([1, 2, 3])       # 形状(3,)
a_col = a[:, np.newaxis]      # 形状(3, 1)
a_row = a[np.newaxis, :]      # 形状(1, 3)

Linear Algebra

线性代数

Use
np.linalg
for matrix operations:
python
A = np.array([[1, 2], [3, 4]], dtype=np.float64)
使用
np.linalg
进行矩阵运算:
python
A = np.array([[1, 2], [3, 4]], dtype=np.float64)

Matrix multiplication — use @ operator (Python 3.5+)

矩阵乘法——使用@运算符(Python 3.5+)

C = A @ B # preferred over np.dot(A, B) for 2-D
C = A @ B # 对于2维数组,优先使用此方式而非np.dot(A, B)

Common operations

常见运算

vals, vecs = np.linalg.eig(A) inv_A = np.linalg.inv(A) rank = np.linalg.matrix_rank(A) det = np.linalg.det(A)
vals, vecs = np.linalg.eig(A) inv_A = np.linalg.inv(A) rank = np.linalg.matrix_rank(A) det = np.linalg.det(A)

Solve Ax = b without computing inverse (faster, more stable)

无需计算逆矩阵即可求解Ax = b(更快、更稳定)

x = np.linalg.solve(A, b) # preferred over inv(A) @ b

Never use `np.matrix` — it is deprecated. Use 2-D `ndarray` with `@` instead.
x = np.linalg.solve(A, b) # 优先使用此方式而非inv(A) @ b

切勿使用`np.matrix`——它已被弃用。改用2维`ndarray`并配合`@`运算符。

Quick Reference

快速参考

TaskIdiomatic code
Import
import numpy as np
Array from list
np.array([1, 2, 3])
Shape / ndim / size
arr.shape
,
arr.ndim
,
arr.size
Reshape
arr.reshape(rows, -1)
Flatten (view)
arr.reshape(-1)
or
arr.ravel()
Flatten (copy)
arr.flatten()
Transpose
arr.T
or
arr.transpose()
Boolean mask
arr[arr > 0]
Axis aggregation
arr.sum(axis=0)
Matrix multiply
A @ B
Copy
arr.copy()
Check view
arr.base is not None
Cast dtype
arr.astype(np.float32, copy=False)
Random (modern)
np.random.default_rng(seed)
Save / load
np.save
/
np.load
任务惯用代码
导入
import numpy as np
从列表创建数组
np.array([1, 2, 3])
形状/维度/元素数
arr.shape
,
arr.ndim
,
arr.size
重塑
arr.reshape(rows, -1)
扁平化(视图)
arr.reshape(-1)
arr.ravel()
扁平化(副本)
arr.flatten()
转置
arr.T
arr.transpose()
布尔掩码
arr[arr > 0]
轴聚合
arr.sum(axis=0)
矩阵乘法
A @ B
复制
arr.copy()
检查是否为视图
arr.base is not None
转换dtype
arr.astype(np.float32, copy=False)
随机数生成(现代)
np.random.default_rng(seed)
保存/加载
np.save
/
np.load

Additional Resources

额外资源

Reference Files

参考文档

For deeper guidance, consult:
  • references/performance-and-memory.md
    — Vectorization patterns, memory layout, dtype selection, profiling, and avoiding common performance traps
  • references/array-operations.md
    — Broadcasting in depth, advanced indexing, ufuncs, structured arrays, and I/O patterns
如需更深入的指导,请查阅:
  • references/performance-and-memory.md
    ——向量化模式、内存布局、dtype选择、性能分析及常见性能陷阱规避
  • references/array-operations.md
    ——深入讲解广播机制、高级索引、通用函数、结构化数组及I/O模式