numpy
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNumPy Best Practices
NumPy 最佳实践
NumPy is the fundamental package for scientific computing with Python. It provides N-dimensional array objects, vectorized math operations, broadcasting, linear algebra, Fourier transforms, and random number generation. This skill covers best practices for writing correct, efficient, and maintainable NumPy code.
NumPy是Python科学计算的基础包。它提供了N维数组对象、向量化数学运算、广播机制、线性代数、傅里叶变换和随机数生成功能。本Skill涵盖了编写正确、高效且可维护的NumPy代码的最佳实践。
Import Convention
导入约定
Always import NumPy with the standard alias:
python
import numpy as npNever use — it pollutes the namespace and makes code harder to read.
from numpy import *始终使用标准别名导入NumPy:
python
import numpy as np切勿使用——这会污染命名空间,使代码更难阅读。
from numpy import *Array Creation
数组创建
Choose the right creation function
选择合适的创建函数
| Use case | Function |
|---|---|
| Known values | |
| Zeros | |
| Ones | |
| Uninitialized (fill later) | |
| Integer range | |
| Evenly spaced floats | |
| Identity matrix | |
| Like existing array | |
| 使用场景 | 函数 |
|---|---|
| 已知值 | |
| 全零数组 | |
| 全一数组 | |
| 未初始化(后续填充) | |
| 整数序列 | |
| 等间距浮点数 | |
| 单位矩阵 | |
| 匹配现有数组形状 | |
Specify dtype explicitly
显式指定dtype
Always specify when the intended type differs from NumPy's default ( for floats, for integers):
dtypefloat64int64python
undefined当预期类型与NumPy默认类型(浮点数为,整数为)不同时,始终显式指定:
float64int64dtypepython
undefinedExplicit dtype avoids silent precision issues
显式dtype可避免隐性精度问题
weights = np.ones(1000, dtype=np.float32) # saves memory for ML models
indices = np.arange(100, dtype=np.int32) # sufficient range, half memory
flags = np.zeros(50, dtype=np.bool_) # boolean array
Do not rely on implicit upcasting — declare the dtype the data actually needs.weights = np.ones(1000, dtype=np.float32) # 为机器学习模型节省内存
indices = np.arange(100, dtype=np.int32) # 足够的范围,内存占用减半
flags = np.zeros(50, dtype=np.bool_) # 布尔数组
不要依赖隐式向上转换——声明数据实际需要的dtype。Use np.random.default_rng()
for random numbers
np.random.default_rng()使用np.random.default_rng()
生成随机数
np.random.default_rng()The legacy functions (e.g., ) are deprecated in favour of the Generator API:
np.random.*np.random.randpython
undefined旧版函数(如)已被弃用,推荐使用Generator API:
np.random.*np.random.randpython
undefinedCorrect — reproducible, modern API
正确用法——可复现的现代API
rng = np.random.default_rng(seed=42)
samples = rng.normal(loc=0.0, scale=1.0, size=(100, 3))
integers = rng.integers(0, 10, size=50)
rng = np.random.default_rng(seed=42)
samples = rng.normal(loc=0.0, scale=1.0, size=(100, 3))
integers = rng.integers(0, 10, size=50)
Avoid — legacy API, global state
避免使用——旧版API,全局状态
np.random.seed(42)
samples = np.random.randn(100, 3)
Pass `seed` to `default_rng` for reproducibility in tests and experiments.np.random.seed(42)
samples = np.random.randn(100, 3)
在测试和实验中,将`seed`传入`default_rng`以保证结果可复现。Vectorization Over Loops
用向量化替代循环
Replace Python loops with vectorized NumPy operations wherever possible. NumPy operations execute in optimized C code, making them orders of magnitude faster.
python
undefined尽可能用NumPy向量化操作替代Python循环。NumPy操作在优化后的C代码中执行,速度比Python循环快几个数量级。
python
undefinedAvoid — Python loop
避免使用——Python循环
result = []
for x in data:
result.append(x ** 2 + 2 * x + 1)
result = np.array(result)
result = []
for x in data:
result.append(x ** 2 + 2 * x + 1)
result = np.array(result)
Correct — fully vectorized
正确用法——完全向量化
result = data ** 2 + 2 * data + 1
Use `np.vectorize` only as a convenience wrapper for scalar functions — it does **not** improve performance since it still calls Python per element.result = data ** 2 + 2 * data + 1
仅将`np.vectorize`用作标量函数的便捷包装器——它不会提升性能,因为仍会逐元素调用Python。Broadcasting Rules
广播规则
Broadcasting allows operations on arrays of different shapes without copying data. Apply broadcasting instead of explicit or calls.
tilerepeatBroadcasting rules (trailing dimensions are compared):
- Dimensions are equal — compatible.
- One dimension is 1 — that dimension is stretched.
- Otherwise — .
ValueError
python
undefined广播机制允许对不同形状的数组执行操作而无需复制数据。优先使用广播而非显式调用或。
tilerepeat广播规则(比较尾部维度):
- 维度相等——兼容。
- 某一维度为1——该维度会被扩展。
- 其他情况——抛出。
ValueError
python
undefinedAdd a bias vector to each row of a matrix — broadcasting handles shape (3,) vs (4, 3)
给矩阵的每一行添加偏置向量——广播自动处理形状(3,)与(4, 3)
matrix = np.zeros((4, 3))
bias = np.array([1.0, 2.0, 3.0])
result = matrix + bias # shape (4, 3); no copy made
matrix = np.zeros((4, 3))
bias = np.array([1.0, 2.0, 3.0])
result = matrix + bias # 形状为(4, 3);无需复制数据
Outer product via newaxis
通过newaxis实现外积
a = np.array([0.0, 10.0, 20.0]) # shape (3,)
b = np.array([1.0, 2.0, 3.0]) # shape (3,)
outer = a[:, np.newaxis] + b # shape (3, 3)
Avoid broadcasting that produces very large intermediate arrays — use an explicit loop for memory-constrained cases.a = np.array([0.0, 10.0, 20.0]) # 形状(3,)
b = np.array([1.0, 2.0, 3.0]) # 形状(3,)
outer = a[:, np.newaxis] + b # 形状(3, 3)
避免广播产生过大的中间数组——在内存受限的情况下使用显式循环。Views vs Copies
视图与副本
Basic indexing (slices) returns a view — modifying it modifies the original:
python
x = np.arange(10)
y = x[2:5] # view — shares memory
y[0] = 99 # also changes x[2]Advanced indexing (integer arrays, boolean masks) returns a copy:
python
x = np.arange(10)
idx = [1, 3, 5]
y = x[idx] # copy — independent of xCheck ownership with :
arr.basepython
y.base is None # True → copy
y.base is x # True → view of x基础索引(切片)返回视图——修改视图会修改原数组:
python
x = np.arange(10)
y = x[2:5] # 视图——共享内存
y[0] = 99 # 同时修改x[2]高级索引(整数数组、布尔掩码)返回副本:
python
x = np.arange(10)
idx = [1, 3, 5]
y = x[idx] # 副本——与原数组独立使用检查是否为视图:
arr.basepython
y.base is None # True → 副本
y.base is x # True → x的视图When to force a copy
何时强制生成副本
Call explicitly when an independent array is needed:
.copy()python
backup = original.copy()Use (view when possible) over (always copies) when write access to the parent is acceptable. Use as the most reliable way to get a flat view.
.ravel().flatten()reshape(-1)当需要独立数组时,显式调用:
.copy()python
backup = original.copy()当可以接受修改原数组时,优先使用(尽可能返回视图)而非(始终返回副本)。使用是获取扁平化视图最可靠的方式。
.ravel().flatten()reshape(-1)Indexing and Selection
索引与选择
Boolean indexing for filtering
布尔索引用于过滤
python
arr = np.array([1, -2, 3, -4, 5])
positive = arr[arr > 0] # copy: [1, 3, 5]
arr[arr < 0] = 0 # in-place modification via boolean maskpython
arr = np.array([1, -2, 3, -4, 5])
positive = arr[arr > 0] # 副本:[1, 3, 5]
arr[arr < 0] = 0 # 通过布尔掩码原地修改np.where
for conditional selection
np.wherenp.where
用于条件选择
np.wherepython
undefinedpython
undefinedReplace negatives with zero, keep positives
将负数替换为0,保留正数
cleaned = np.where(arr > 0, arr, 0)
undefinedcleaned = np.where(arr > 0, arr, 0)
undefinedAvoid loops for aggregations
避免用循环进行聚合
Use axis-aware aggregation functions instead of looping over rows or columns:
python
matrix = np.arange(12).reshape(3, 4)
row_sums = matrix.sum(axis=1) # sum each row → shape (3,)
col_max = matrix.max(axis=0) # max each column → shape (4,)使用支持轴参数的聚合函数替代遍历行或列的循环:
python
matrix = np.arange(12).reshape(3, 4)
row_sums = matrix.sum(axis=1) # 对每行求和 → 形状(3,)
col_max = matrix.max(axis=0) # 对每列取最大值 → 形状(4,)Data Types and Precision
数据类型与精度
Choose the smallest sufficient dtype
选择最小的适用dtype
| Scenario | Recommended dtype |
|---|---|
| ML model weights | |
| High-precision scientific | |
| Small integer counts (<32 k) | |
| Large integer counts | |
| Boolean flags | |
| Complex numbers | |
Use to cast in-place when the data is already the right type — avoids an unnecessary allocation.
arr.astype(np.float32, copy=False)copy=False| 场景 | 推荐dtype |
|---|---|
| 机器学习模型权重 | |
| 高精度科学计算 | |
| 小整数计数(<32k) | |
| 大整数计数 | |
| 布尔标记 | |
| 复数 | |
当数据类型已经正确时,使用原地转换——可避免不必要的内存分配。
arr.astype(np.float32, copy=False)copy=FalseWatch for integer overflow
注意整数溢出
NumPy integer arithmetic wraps silently:
python
x = np.array([200], dtype=np.int8) # max 127
x + 100 # array([ 44], dtype=int8) — silent overflow!Cast to a wider type before operations that risk overflow.
NumPy整数运算会静默溢出:
python
x = np.array([200], dtype=np.int8) # 最大值为127
x + 100 # array([ 44], dtype=int8) —— 静默溢出!在进行可能溢出的运算前,先转换为更宽的数据类型。
Saving and Loading Arrays
数组的保存与加载
| Format | Function | Use case |
|---|---|---|
| Single array (binary) | | Fast, preserves dtype and shape |
| Multiple arrays | | Archive multiple arrays |
| Text (CSV etc.) | | Human-readable interchange |
python
undefined| 格式 | 函数 | 使用场景 |
|---|---|---|
| 单个数组(二进制) | | 速度快,保留dtype和形状 |
| 多个数组 | | 归档多个数组 |
| 文本(CSV等) | | 人类可读的交互格式 |
python
undefinedSave and reload with full metadata preserved
保存并重新加载,完整保留元数据
np.save("data.npy", arr)
arr_loaded = np.load("data.npy")
np.save("data.npy", arr)
arr_loaded = np.load("data.npy")
Save several arrays
保存多个数组
np.savez("dataset.npz", X=X_train, y=y_train)
npz = np.load("dataset.npz")
X_train = npz["X"]
Prefer `.npy`/`.npz` over text formats for large arrays — binary I/O is faster and lossless.np.savez("dataset.npz", X=X_train, y=y_train)
npz = np.load("dataset.npz")
X_train = npz["X"]
对于大型数组,优先使用`.npy`/`.npz`而非文本格式——二进制I/O速度更快且无精度损失。Reshaping and Shape Manipulation
重塑与形状操作
Use as a wildcard dimension — NumPy infers the correct size:
-1python
flat = arr.reshape(-1) # flatten to 1-D (view when possible)
col = arr.reshape(-1, 1) # column vector
row = arr.reshape(1, -1) # row vectorUse (equivalent to ) to insert a dimension for broadcasting:
np.newaxisNonepython
a = np.array([1, 2, 3]) # shape (3,)
a_col = a[:, np.newaxis] # shape (3, 1)
a_row = a[np.newaxis, :] # shape (1, 3)使用作为通配符维度——NumPy会自动推断正确的大小:
-1python
flat = arr.reshape(-1) # 扁平化为1维数组(尽可能返回视图)
col = arr.reshape(-1, 1) # 列向量
row = arr.reshape(1, -1) # 行向量使用(等同于)插入维度以实现广播:
np.newaxisNonepython
a = np.array([1, 2, 3]) # 形状(3,)
a_col = a[:, np.newaxis] # 形状(3, 1)
a_row = a[np.newaxis, :] # 形状(1, 3)Linear Algebra
线性代数
Use for matrix operations:
np.linalgpython
A = np.array([[1, 2], [3, 4]], dtype=np.float64)使用进行矩阵运算:
np.linalgpython
A = np.array([[1, 2], [3, 4]], dtype=np.float64)Matrix multiplication — use @ operator (Python 3.5+)
矩阵乘法——使用@运算符(Python 3.5+)
C = A @ B # preferred over np.dot(A, B) for 2-D
C = A @ B # 对于2维数组,优先使用此方式而非np.dot(A, B)
Common operations
常见运算
vals, vecs = np.linalg.eig(A)
inv_A = np.linalg.inv(A)
rank = np.linalg.matrix_rank(A)
det = np.linalg.det(A)
vals, vecs = np.linalg.eig(A)
inv_A = np.linalg.inv(A)
rank = np.linalg.matrix_rank(A)
det = np.linalg.det(A)
Solve Ax = b without computing inverse (faster, more stable)
无需计算逆矩阵即可求解Ax = b(更快、更稳定)
x = np.linalg.solve(A, b) # preferred over inv(A) @ b
Never use `np.matrix` — it is deprecated. Use 2-D `ndarray` with `@` instead.x = np.linalg.solve(A, b) # 优先使用此方式而非inv(A) @ b
切勿使用`np.matrix`——它已被弃用。改用2维`ndarray`并配合`@`运算符。Quick Reference
快速参考
| Task | Idiomatic code |
|---|---|
| Import | |
| Array from list | |
| Shape / ndim / size | |
| Reshape | |
| Flatten (view) | |
| Flatten (copy) | |
| Transpose | |
| Boolean mask | |
| Axis aggregation | |
| Matrix multiply | |
| Copy | |
| Check view | |
| Cast dtype | |
| Random (modern) | |
| Save / load | |
| 任务 | 惯用代码 |
|---|---|
| 导入 | |
| 从列表创建数组 | |
| 形状/维度/元素数 | |
| 重塑 | |
| 扁平化(视图) | |
| 扁平化(副本) | |
| 转置 | |
| 布尔掩码 | |
| 轴聚合 | |
| 矩阵乘法 | |
| 复制 | |
| 检查是否为视图 | |
| 转换dtype | |
| 随机数生成(现代) | |
| 保存/加载 | |
Additional Resources
额外资源
Reference Files
参考文档
For deeper guidance, consult:
- — Vectorization patterns, memory layout, dtype selection, profiling, and avoiding common performance traps
references/performance-and-memory.md - — Broadcasting in depth, advanced indexing, ufuncs, structured arrays, and I/O patterns
references/array-operations.md
如需更深入的指导,请查阅:
- ——向量化模式、内存布局、dtype选择、性能分析及常见性能陷阱规避
references/performance-and-memory.md - ——深入讲解广播机制、高级索引、通用函数、结构化数组及I/O模式
references/array-operations.md