numpy
Original:🇺🇸 English
Translated
This skill should be used when the user asks to "use NumPy", "write NumPy code", "optimize NumPy arrays", "vectorize with NumPy", or needs guidance on NumPy best practices, array operations, broadcasting, memory management, or scientific computing with Python.
4installs
Added on
NPX Install
npx skill4agent add the-perfect-developer/the-perfect-opencode numpyTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →NumPy Best Practices
NumPy is the fundamental package for scientific computing with Python. It provides N-dimensional array objects, vectorized math operations, broadcasting, linear algebra, Fourier transforms, and random number generation. This skill covers best practices for writing correct, efficient, and maintainable NumPy code.
Import Convention
Always import NumPy with the standard alias:
python
import numpy as npNever use — it pollutes the namespace and makes code harder to read.
from numpy import *Array Creation
Choose the right creation function
| Use case | Function |
|---|---|
| Known values | |
| Zeros | |
| Ones | |
| Uninitialized (fill later) | |
| Integer range | |
| Evenly spaced floats | |
| Identity matrix | |
| Like existing array | |
Specify dtype explicitly
Always specify when the intended type differs from NumPy's default ( for floats, for integers):
dtypefloat64int64python
# Explicit dtype avoids silent precision issues
weights = np.ones(1000, dtype=np.float32) # saves memory for ML models
indices = np.arange(100, dtype=np.int32) # sufficient range, half memory
flags = np.zeros(50, dtype=np.bool_) # boolean arrayDo not rely on implicit upcasting — declare the dtype the data actually needs.
Use np.random.default_rng()
for random numbers
np.random.default_rng()The legacy functions (e.g., ) are deprecated in favour of the Generator API:
np.random.*np.random.randpython
# Correct — reproducible, modern API
rng = np.random.default_rng(seed=42)
samples = rng.normal(loc=0.0, scale=1.0, size=(100, 3))
integers = rng.integers(0, 10, size=50)
# Avoid — legacy API, global state
np.random.seed(42)
samples = np.random.randn(100, 3)Pass to for reproducibility in tests and experiments.
seeddefault_rngVectorization Over Loops
Replace Python loops with vectorized NumPy operations wherever possible. NumPy operations execute in optimized C code, making them orders of magnitude faster.
python
# Avoid — Python loop
result = []
for x in data:
result.append(x ** 2 + 2 * x + 1)
result = np.array(result)
# Correct — fully vectorized
result = data ** 2 + 2 * data + 1Use only as a convenience wrapper for scalar functions — it does not improve performance since it still calls Python per element.
np.vectorizeBroadcasting Rules
Broadcasting allows operations on arrays of different shapes without copying data. Apply broadcasting instead of explicit or calls.
tilerepeatBroadcasting rules (trailing dimensions are compared):
- Dimensions are equal — compatible.
- One dimension is 1 — that dimension is stretched.
- Otherwise — .
ValueError
python
# Add a bias vector to each row of a matrix — broadcasting handles shape (3,) vs (4, 3)
matrix = np.zeros((4, 3))
bias = np.array([1.0, 2.0, 3.0])
result = matrix + bias # shape (4, 3); no copy made
# Outer product via newaxis
a = np.array([0.0, 10.0, 20.0]) # shape (3,)
b = np.array([1.0, 2.0, 3.0]) # shape (3,)
outer = a[:, np.newaxis] + b # shape (3, 3)Avoid broadcasting that produces very large intermediate arrays — use an explicit loop for memory-constrained cases.
Views vs Copies
Basic indexing (slices) returns a view — modifying it modifies the original:
python
x = np.arange(10)
y = x[2:5] # view — shares memory
y[0] = 99 # also changes x[2]Advanced indexing (integer arrays, boolean masks) returns a copy:
python
x = np.arange(10)
idx = [1, 3, 5]
y = x[idx] # copy — independent of xCheck ownership with :
arr.basepython
y.base is None # True → copy
y.base is x # True → view of xWhen to force a copy
Call explicitly when an independent array is needed:
.copy()python
backup = original.copy()Use (view when possible) over (always copies) when write access to the parent is acceptable. Use as the most reliable way to get a flat view.
.ravel().flatten()reshape(-1)Indexing and Selection
Boolean indexing for filtering
python
arr = np.array([1, -2, 3, -4, 5])
positive = arr[arr > 0] # copy: [1, 3, 5]
arr[arr < 0] = 0 # in-place modification via boolean masknp.where
for conditional selection
np.wherepython
# Replace negatives with zero, keep positives
cleaned = np.where(arr > 0, arr, 0)Avoid loops for aggregations
Use axis-aware aggregation functions instead of looping over rows or columns:
python
matrix = np.arange(12).reshape(3, 4)
row_sums = matrix.sum(axis=1) # sum each row → shape (3,)
col_max = matrix.max(axis=0) # max each column → shape (4,)Data Types and Precision
Choose the smallest sufficient dtype
| Scenario | Recommended dtype |
|---|---|
| ML model weights | |
| High-precision scientific | |
| Small integer counts (<32 k) | |
| Large integer counts | |
| Boolean flags | |
| Complex numbers | |
Use to cast in-place when the data is already the right type — avoids an unnecessary allocation.
arr.astype(np.float32, copy=False)copy=FalseWatch for integer overflow
NumPy integer arithmetic wraps silently:
python
x = np.array([200], dtype=np.int8) # max 127
x + 100 # array([ 44], dtype=int8) — silent overflow!Cast to a wider type before operations that risk overflow.
Saving and Loading Arrays
| Format | Function | Use case |
|---|---|---|
| Single array (binary) | | Fast, preserves dtype and shape |
| Multiple arrays | | Archive multiple arrays |
| Text (CSV etc.) | | Human-readable interchange |
python
# Save and reload with full metadata preserved
np.save("data.npy", arr)
arr_loaded = np.load("data.npy")
# Save several arrays
np.savez("dataset.npz", X=X_train, y=y_train)
npz = np.load("dataset.npz")
X_train = npz["X"]Prefer / over text formats for large arrays — binary I/O is faster and lossless.
.npy.npzReshaping and Shape Manipulation
Use as a wildcard dimension — NumPy infers the correct size:
-1python
flat = arr.reshape(-1) # flatten to 1-D (view when possible)
col = arr.reshape(-1, 1) # column vector
row = arr.reshape(1, -1) # row vectorUse (equivalent to ) to insert a dimension for broadcasting:
np.newaxisNonepython
a = np.array([1, 2, 3]) # shape (3,)
a_col = a[:, np.newaxis] # shape (3, 1)
a_row = a[np.newaxis, :] # shape (1, 3)Linear Algebra
Use for matrix operations:
np.linalgpython
A = np.array([[1, 2], [3, 4]], dtype=np.float64)
# Matrix multiplication — use @ operator (Python 3.5+)
C = A @ B # preferred over np.dot(A, B) for 2-D
# Common operations
vals, vecs = np.linalg.eig(A)
inv_A = np.linalg.inv(A)
rank = np.linalg.matrix_rank(A)
det = np.linalg.det(A)
# Solve Ax = b without computing inverse (faster, more stable)
x = np.linalg.solve(A, b) # preferred over inv(A) @ bNever use — it is deprecated. Use 2-D with instead.
np.matrixndarray@Quick Reference
| Task | Idiomatic code |
|---|---|
| Import | |
| Array from list | |
| Shape / ndim / size | |
| Reshape | |
| Flatten (view) | |
| Flatten (copy) | |
| Transpose | |
| Boolean mask | |
| Axis aggregation | |
| Matrix multiply | |
| Copy | |
| Check view | |
| Cast dtype | |
| Random (modern) | |
| Save / load | |
Additional Resources
Reference Files
For deeper guidance, consult:
- — Vectorization patterns, memory layout, dtype selection, profiling, and avoiding common performance traps
references/performance-and-memory.md - — Broadcasting in depth, advanced indexing, ufuncs, structured arrays, and I/O patterns
references/array-operations.md