numpy
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNumPy - Numerical Python
NumPy - 数值Python
The fundamental package for numerical computing in Python, providing multi-dimensional arrays and fast operations.
Python数值计算的基础包,提供多维数组与快速运算能力。
When to Use
适用场景
- Working with multi-dimensional arrays and matrices
- Performing element-wise operations on arrays
- Linear algebra computations (matrix multiplication, eigenvalues, SVD)
- Random number generation and statistical distributions
- Fourier transforms and signal processing basics
- Mathematical operations (trigonometric, exponential, logarithmic)
- Broadcasting operations across different array shapes
- Vectorizing Python loops for performance
- Reading and writing numerical data to files
- Building numerical algorithms and simulations
- Serving as foundation for pandas, scikit-learn, SciPy
- 处理多维数组和矩阵
- 对数组执行逐元素操作
- 线性代数计算(矩阵乘法、特征值、奇异值分解SVD)
- 随机数生成与统计分布
- 傅里叶变换与基础信号处理
- 数学运算(三角函数、指数、对数)
- 不同形状数组间的广播运算
- 向量化Python循环以提升性能
- 读写数值数据到文件
- 构建数值算法与仿真模型
- 作为pandas、scikit-learn、SciPy的基础依赖
Reference Documentation
参考文档
Official docs: https://numpy.org/doc/
Search patterns:, , , , ,
Search patterns:
np.arraynp.zerosnp.dotnp.linalgnp.randomnp.broadcastCore Principles
核心原则
Use NumPy For
适合使用NumPy的场景
| Task | Function | Example |
|---|---|---|
| Create arrays | | |
| Mathematical ops | | |
| Linear algebra | | |
| Statistics | | |
| Random numbers | | |
| Indexing | | |
| Broadcasting | Automatic | |
| Reshaping | | |
| 任务 | 函数 | 示例 |
|---|---|---|
| 创建数组 | | |
| 数学运算 | | |
| 线性代数 | | |
| 统计计算 | | |
| 随机数生成 | | |
| 索引访问 | | |
| 广播运算 | 自动执行 | |
| 数组重塑 | | |
Do NOT Use For
不适合使用NumPy的场景
- String manipulation (use built-in str or pandas)
- Complex data structures (use pandas DataFrame)
- Symbolic mathematics (use SymPy)
- Deep learning (use PyTorch, TensorFlow)
- Sparse matrices (use scipy.sparse)
- 字符串处理(使用Python内置str或pandas)
- 复杂数据结构(使用pandas DataFrame)
- 符号数学计算(使用SymPy)
- 深度学习(使用PyTorch、TensorFlow)
- 稀疏矩阵(使用scipy.sparse)
Quick Reference
快速参考
Installation
安装方法
bash
undefinedbash
undefinedpip
pip安装
pip install numpy
pip install numpy
conda
conda安装
conda install numpy
conda install numpy
Specific version
指定版本安装
pip install numpy==1.26.0
undefinedpip install numpy==1.26.0
undefinedStandard Imports
标准导入方式
python
import numpy as nppython
import numpy as npCommon submodules
常用子模块导入
from numpy import linalg as la
from numpy import random as rand
from numpy import fft
from numpy import linalg as la
from numpy import random as rand
from numpy import fft
Never import *
不要使用通配符导入
from numpy import * # DON'T DO THIS!
from numpy import * # 绝对不要这么做!
undefinedundefinedBasic Pattern - Array Creation
基础模式 - 数组创建
python
import numpy as nppython
import numpy as npFrom list
从列表创建
arr = np.array([1, 2, 3, 4, 5])
arr = np.array([1, 2, 3, 4, 5])
Zeros and ones
创建全零或全一数组
zeros = np.zeros((3, 4))
ones = np.ones((2, 3))
zeros = np.zeros((3, 4))
ones = np.ones((2, 3))
Range
生成连续序列
range_arr = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
range_arr = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
Linspace
生成等间隔序列
linspace_arr = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1]
print(f"Array: {arr}")
print(f"Shape: {arr.shape}")
print(f"Dtype: {arr.dtype}")
undefinedlinspace_arr = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1]
print(f"数组: {arr}")
print(f"形状: {arr.shape}")
print(f"数据类型: {arr.dtype}")
undefinedBasic Pattern - Array Operations
基础模式 - 数组运算
python
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])python
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])Element-wise operations
逐元素运算
c = a + b # [5, 7, 9]
d = a * b # [4, 10, 18]
e = a ** 2 # [1, 4, 9]
c = a + b # [5, 7, 9]
d = a * b # [4, 10, 18]
e = a ** 2 # [1, 4, 9]
Mathematical functions
数学函数运算
f = np.sin(a)
g = np.exp(a)
print(f"Sum: {c}")
print(f"Product: {d}")
undefinedf = np.sin(a)
g = np.exp(a)
print(f"求和结果: {c}")
print(f"乘积结果: {d}")
undefinedBasic Pattern - Linear Algebra
基础模式 - 线性代数
python
import numpy as nppython
import numpy as npMatrix multiplication
矩阵乘法
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
Dot product
点积运算
C = np.dot(A, B) # or A @ B
C = np.dot(A, B) # 或使用 A @ B 语法
Matrix inverse
矩阵求逆
A_inv = np.linalg.inv(A)
A_inv = np.linalg.inv(A)
Eigenvalues
特征值计算
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Matrix product:\n{C}")
print(f"Eigenvalues: {eigenvalues}")
undefinedeigenvalues, eigenvectors = np.linalg.eig(A)
print(f"矩阵乘积:\n{C}")
print(f"特征值: {eigenvalues}")
undefinedCritical Rules
重要规则
✅ DO
✅ 推荐做法
- Use vectorization - Avoid Python loops, use array operations
- Specify dtype explicitly - For memory efficiency and precision control
- Use views when possible - Avoid unnecessary copies
- Broadcast properly - Understand broadcasting rules
- Check array shapes - Use frequently
.shape - Use axis parameter - For operations along specific dimensions
- Pre-allocate arrays - Don't grow arrays in loops
- Use appropriate dtypes - int32, float64, complex128, etc.
- Copy when needed - Use for independent arrays
.copy() - Use built-in functions - They're optimized in C
- 使用向量化 - 避免Python循环,使用数组运算
- 显式指定数据类型 - 提升内存效率与精度控制
- 尽可能使用视图 - 避免不必要的数组复制
- 正确使用广播 - 理解广播规则
- 检查数组形状 - 频繁使用属性
.shape - 使用axis参数 - 沿指定维度执行运算
- 预分配数组 - 不要在循环中动态扩展数组
- 选择合适的数据类型 - 如int32、float64、complex128等
- 必要时复制数组 - 使用创建独立数组
.copy() - 使用内置函数 - 它们是C语言优化实现
❌ DON'T
❌ 禁止做法
- Loop over arrays - Use vectorization instead
- Grow arrays dynamically - Pre-allocate instead
- Use Python lists for math - Convert to arrays first
- Ignore memory layout - C-contiguous vs Fortran-contiguous matters
- Mix dtypes carelessly - Know implicit type promotion rules
- Modify arrays during iteration - Can cause undefined behavior
- Use == for array comparison - Use or
np.array_equal()np.allclose() - Assume views vs copies - Check with attribute
.base - Ignore NaN handling - Use ,
np.nanmean(), etc.np.nanstd() - Use outdated APIs - Check for deprecated functions
- 循环遍历数组 - 改用向量化运算
- 动态扩展数组 - 提前预分配内存
- 使用Python列表进行数学计算 - 先转换为数组
- 忽略内存布局 - C连续与Fortran连续布局会影响性能
- 随意混合数据类型 - 了解隐式类型提升规则
- 迭代时修改数组 - 可能导致未定义行为
- 使用==比较数组 - 改用或
np.array_equal()np.allclose() - 假设视图与复制的区别 - 使用属性检查
.base - 忽略NaN处理 - 使用、
np.nanmean()等函数np.nanstd() - 使用过时API - 检查是否有废弃函数
Anti-Patterns (NEVER)
反模式(绝对避免)
python
import numpy as nppython
import numpy as np❌ BAD: Python loops
❌ 错误:Python循环
result = []
for i in range(len(arr)):
result.append(arr[i] * 2)
result = np.array(result)
result = []
for i in range(len(arr)):
result.append(arr[i] * 2)
result = np.array(result)
✅ GOOD: Vectorization
✅ 正确:向量化运算
result = arr * 2
result = arr * 2
❌ BAD: Growing arrays
❌ 错误:动态扩展数组
result = np.array([])
for i in range(1000):
result = np.append(result, i) # Very slow!
result = np.array([])
for i in range(1000):
result = np.append(result, i) # 速度极慢!
✅ GOOD: Pre-allocate
✅ 正确:预分配数组
result = np.zeros(1000)
for i in range(1000):
result[i] = i
result = np.zeros(1000)
for i in range(1000):
result[i] = i
Even better: Use arange
更优方案:使用arange
result = np.arange(1000)
result = np.arange(1000)
❌ BAD: Comparing arrays with ==
❌ 错误:使用==比较数组
if arr1 == arr2: # This is ambiguous!
print("Equal")
if arr1 == arr2: # 结果不明确!
print("相等")
✅ GOOD: Use appropriate comparison
✅ 正确:使用合适的比较方式
if np.array_equal(arr1, arr2):
print("Equal")
if np.array_equal(arr1, arr2):
print("相等")
Or for floating point
对于浮点数数组
if np.allclose(arr1, arr2, rtol=1e-5):
print("Close enough")
if np.allclose(arr1, arr2, rtol=1e-5):
print("足够接近")
❌ BAD: Ignoring dtypes
❌ 错误:忽略数据类型
arr = np.array([1, 2, 3])
arr[0] = 1.5 # Silently truncates to 1!
arr = np.array([1, 2, 3])
arr[0] = 1.5 # 会被静默截断为1!
✅ GOOD: Explicit dtype
✅ 正确:显式指定数据类型
arr = np.array([1, 2, 3], dtype=float)
arr[0] = 1.5 # Now works correctly
arr = np.array([1, 2, 3], dtype=float)
arr[0] = 1.5 # 现在可以正确赋值
❌ BAD: Unintentional modification
❌ 错误:无意的数组修改
a = np.array([1, 2, 3])
b = a # b is just a reference!
b[0] = 999 # Also modifies a!
a = np.array([1, 2, 3])
b = a # b只是a的引用!
b[0] = 999 # 同时修改了a!
✅ GOOD: Explicit copy
✅ 正确:显式复制
a = np.array([1, 2, 3])
b = a.copy() # b is independent
b[0] = 999 # a is unchanged
undefineda = np.array([1, 2, 3])
b = a.copy() # b是独立数组
b[0] = 999 # a不受影响
undefinedArray Creation
数组创建
Basic Array Creation
基础数组创建
python
import numpy as nppython
import numpy as npFrom Python list
从Python列表创建
arr1 = np.array([1, 2, 3, 4, 5])
arr1 = np.array([1, 2, 3, 4, 5])
From nested list (2D)
从嵌套列表创建二维数组
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
Specify dtype
指定数据类型
arr3 = np.array([1, 2, 3], dtype=np.float64)
arr4 = np.array([1, 2, 3], dtype=np.int32)
arr3 = np.array([1, 2, 3], dtype=np.float64)
arr4 = np.array([1, 2, 3], dtype=np.int32)
From tuple
从元组创建
arr5 = np.array((1, 2, 3))
arr5 = np.array((1, 2, 3))
Complex numbers
复数数组
arr6 = np.array([1+2j, 3+4j])
print(f"1D array: {arr1}")
print(f"2D array:\n{arr2}")
print(f"Float array: {arr3}")
undefinedarr6 = np.array([1+2j, 3+4j])
print(f"一维数组: {arr1}")
print(f"二维数组:\n{arr2}")
print(f"浮点数组: {arr3}")
undefinedSpecial Array Creation
特殊数组创建
python
import numpy as nppython
import numpy as npZeros
全零数组
zeros = np.zeros((3, 4)) # 3x4 array of zeros
zeros = np.zeros((3, 4)) # 3x4的全零数组
Ones
全一数组
ones = np.ones((2, 3, 4)) # 2x3x4 array of ones
ones = np.ones((2, 3, 4)) # 2x3x4的全一数组
Empty (uninitialized)
空数组(未初始化)
empty = np.empty((2, 2)) # Faster but values are garbage
empty = np.empty((2, 2)) # 速度快但值为垃圾数据
Full (constant value)
常量数组
full = np.full((3, 3), 7) # 3x3 array filled with 7
full = np.full((3, 3), 7) # 3x3的7填充数组
Identity matrix
单位矩阵
identity = np.eye(4) # 4x4 identity matrix
identity = np.eye(4) # 4x4单位矩阵
Diagonal matrix
对角矩阵
diag = np.diag([1, 2, 3, 4])
print(f"Zeros shape: {zeros.shape}")
print(f"Identity:\n{identity}")
undefineddiag = np.diag([1, 2, 3, 4])
print(f"全零数组形状: {zeros.shape}")
print(f"单位矩阵:\n{identity}")
undefinedRange-Based Creation
基于范围的数组创建
python
import numpy as nppython
import numpy as npArange (like Python range)
arange(类似Python的range)
a = np.arange(10) # [0, 1, 2, ..., 9]
b = np.arange(2, 10) # [2, 3, 4, ..., 9]
c = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
d = np.arange(0, 1, 0.1) # [0, 0.1, 0.2, ..., 0.9]
a = np.arange(10) # [0, 1, 2, ..., 9]
b = np.arange(2, 10) # [2, 3, 4, ..., 9]
c = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
d = np.arange(0, 1, 0.1) # [0, 0.1, 0.2, ..., 0.9]
Linspace (linearly spaced)
linspace(等间隔序列)
e = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1]
f = np.linspace(0, 10, 100) # 100 points from 0 to 10
e = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1]
f = np.linspace(0, 10, 100) # 0到10间的100个等间隔点
Logspace (logarithmically spaced)
logspace(对数间隔序列)
g = np.logspace(0, 2, 5) # [1, 10^0.5, 10, 10^1.5, 100]
g = np.logspace(0, 2, 5) # [1, 10^0.5, 10, 10^1.5, 100]
Geomspace (geometrically spaced)
geomspace(等比间隔序列)
h = np.geomspace(1, 1000, 4) # [1, 10, 100, 1000]
print(f"Arange: {a}")
print(f"Linspace: {e}")
undefinedh = np.geomspace(1, 1000, 4) # [1, 10, 100, 1000]
print(f"arange结果: {a}")
print(f"linspace结果: {e}")
undefinedArray Copies and Views
数组复制与视图
python
import numpy as np
original = np.array([1, 2, 3, 4, 5])python
import numpy as np
original = np.array([1, 2, 3, 4, 5])View (shares memory)
视图(共享内存)
view = original[:]
view[0] = 999 # Modifies original!
view = original[:]
view[0] = 999 # 会修改原数组!
Copy (independent)
复制(独立内存)
copy = original.copy()
copy[0] = 777 # Doesn't affect original
copy = original.copy()
copy[0] = 777 # 不影响原数组
Check if array is a view
检查是否为视图
print(f"Is view? {view.base is original}")
print(f"Is copy? {copy.base is None}")
print(f"是否为视图? {view.base is original}")
print(f"是否为复制? {copy.base is None}")
Some operations create views, some create copies
部分操作生成视图,部分生成复制
slice_view = original[1:3] # View
boolean_copy = original[original > 2] # Copy!
undefinedslice_view = original[1:3] # 视图
boolean_copy = original[original > 2] # 复制!
undefinedArray Indexing and Slicing
数组索引与切片
Basic Indexing
基础索引
python
import numpy as np
arr = np.array([10, 20, 30, 40, 50])python
import numpy as np
arr = np.array([10, 20, 30, 40, 50])Single element
单个元素
print(arr[0]) # 10
print(arr[-1]) # 50 (last element)
print(arr[0]) # 10
print(arr[-1]) # 50(最后一个元素)
Slicing
切片
print(arr[1:4]) # [20, 30, 40]
print(arr[:3]) # [10, 20, 30]
print(arr[2:]) # [30, 40, 50]
print(arr[::2]) # [10, 30, 50] (every 2nd element)
print(arr[1:4]) # [20, 30, 40]
print(arr[:3]) # [10, 20, 30]
print(arr[2:]) # [30, 40, 50]
print(arr[::2]) # [10, 30, 50](每隔一个元素)
Negative indices
负索引
print(arr[-3:-1]) # [30, 40]
print(arr[-3:-1]) # [30, 40]
Reverse
反转数组
print(arr[::-1]) # [50, 40, 30, 20, 10]
undefinedprint(arr[::-1]) # [50, 40, 30, 20, 10]
undefinedMulti-Dimensional Indexing
多维索引
python
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])python
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])Single element
单个元素
print(arr[0, 0]) # 1
print(arr[1, 2]) # 6
print(arr[-1, -1]) # 9
print(arr[0, 0]) # 1
print(arr[1, 2]) # 6
print(arr[-1, -1]) # 9
Row slicing
行切片
print(arr[0]) # [1, 2, 3] (first row)
print(arr[1, :]) # [4, 5, 6] (second row)
print(arr[0]) # [1, 2, 3](第一行)
print(arr[1, :]) # [4, 5, 6](第二行)
Column slicing
列切片
print(arr[:, 0]) # [1, 4, 7] (first column)
print(arr[:, 1]) # [2, 5, 8] (second column)
print(arr[:, 0]) # [1, 4, 7](第一列)
print(arr[:, 1]) # [2, 5, 8](第二列)
Sub-array
子数组
print(arr[0:2, 1:3]) # [[2, 3], [5, 6]]
print(arr[0:2, 1:3]) # [[2, 3], [5, 6]]
Every other element
每隔一个元素
print(arr[::2, ::2]) # [[1, 3], [7, 9]]
undefinedprint(arr[::2, ::2]) # [[1, 3], [7, 9]]
undefinedBoolean Indexing
布尔索引
python
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])python
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])Boolean condition
布尔条件
mask = arr > 5
print(mask) # [False, False, False, False, False, True, True, True, True, True]
mask = arr > 5
print(mask) # [False, False, False, False, False, True, True, True, True, True]
Boolean indexing
布尔索引
filtered = arr[arr > 5]
print(filtered) # [6, 7, 8, 9, 10]
filtered = arr[arr > 5]
print(filtered) # [6, 7, 8, 9, 10]
Multiple conditions (use & and |, not 'and' and 'or')
多条件(使用&和|,而非'and'和'or')
result = arr[(arr > 3) & (arr < 8)]
print(result) # [4, 5, 6, 7]
result = arr[(arr > 3) & (arr < 8)]
print(result) # [4, 5, 6, 7]
Or condition
或条件
result = arr[(arr < 3) | (arr > 8)]
print(result) # [1, 2, 9, 10]
result = arr[(arr < 3) | (arr > 8)]
print(result) # [1, 2, 9, 10]
Negation
取反
result = arr[~(arr > 5)]
print(result) # [1, 2, 3, 4, 5]
undefinedresult = arr[~(arr > 5)]
print(result) # [1, 2, 3, 4, 5]
undefinedFancy Indexing
花式索引
python
import numpy as np
arr = np.array([10, 20, 30, 40, 50])python
import numpy as np
arr = np.array([10, 20, 30, 40, 50])Index with array of integers
使用整数数组索引
indices = np.array([0, 2, 4])
result = arr[indices]
print(result) # [10, 30, 50]
indices = np.array([0, 2, 4])
result = arr[indices]
print(result) # [10, 30, 50]
2D fancy indexing
二维花式索引
arr2d = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
rows = np.array([0, 2])
cols = np.array([1, 2])
result = arr2d[rows, cols] # Elements at (0,1) and (2,2)
print(result) # [2, 9]
arr2d = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
rows = np.array([0, 2])
cols = np.array([1, 2])
result = arr2d[rows, cols] # 取(0,1)和(2,2)位置的元素
print(result) # [2, 9]
Combining boolean and fancy indexing
结合布尔索引与花式索引
mask = arr > 25
indices_of_large = np.where(mask)[0]
print(indices_of_large) # [2, 3, 4]
undefinedmask = arr > 25
indices_of_large = np.where(mask)[0]
print(indices_of_large) # [2, 3, 4]
undefinedArray Operations
数组运算
Element-wise Operations
逐元素运算
python
import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])python
import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])Arithmetic operations
算术运算
print(a + b) # [6, 8, 10, 12]
print(a - b) # [-4, -4, -4, -4]
print(a * b) # [5, 12, 21, 32]
print(a / b) # [0.2, 0.333..., 0.428..., 0.5]
print(a ** 2) # [1, 4, 9, 16]
print(a // b) # [0, 0, 0, 0] (floor division)
print(a % b) # [1, 2, 3, 4] (modulo)
print(a + b) # [6, 8, 10, 12]
print(a - b) # [-4, -4, -4, -4]
print(a * b) # [5, 12, 21, 32]
print(a / b) # [0.2, 0.333..., 0.428..., 0.5]
print(a ** 2) # [1, 4, 9, 16]
print(a // b) # [0, 0, 0, 0](地板除法)
print(a % b) # [1, 2, 3, 4](取模)
With scalars
与标量运算
print(a + 10) # [11, 12, 13, 14]
print(a * 2) # [2, 4, 6, 8]
undefinedprint(a + 10) # [11, 12, 13, 14]
print(a * 2) # [2, 4, 6, 8]
undefinedMathematical Functions
数学函数
python
import numpy as np
x = np.array([0, np.pi/6, np.pi/4, np.pi/3, np.pi/2])python
import numpy as np
x = np.array([0, np.pi/6, np.pi/4, np.pi/3, np.pi/2])Trigonometric
三角函数
sin_x = np.sin(x)
cos_x = np.cos(x)
tan_x = np.tan(x)
sin_x = np.sin(x)
cos_x = np.cos(x)
tan_x = np.tan(x)
Inverse trig
反三角函数
arcsin_x = np.arcsin([0, 0.5, 1])
arcsin_x = np.arcsin([0, 0.5, 1])
Exponential and logarithm
指数与对数
arr = np.array([1, 2, 3, 4])
exp_arr = np.exp(arr)
log_arr = np.log(arr)
log10_arr = np.log10(arr)
arr = np.array([1, 2, 3, 4])
exp_arr = np.exp(arr)
log_arr = np.log(arr)
log10_arr = np.log10(arr)
Rounding
取整
floats = np.array([1.2, 2.7, 3.5, 4.9])
print(np.round(floats)) # [1, 3, 4, 5]
print(np.floor(floats)) # [1, 2, 3, 4]
print(np.ceil(floats)) # [2, 3, 4, 5]
floats = np.array([1.2, 2.7, 3.5, 4.9])
print(np.round(floats)) # [1, 3, 4, 5]
print(np.floor(floats)) # [1, 2, 3, 4]
print(np.ceil(floats)) # [2, 3, 4, 5]
Absolute value
绝对值
print(np.abs([-1, -2, 3, -4])) # [1, 2, 3, 4]
undefinedprint(np.abs([-1, -2, 3, -4])) # [1, 2, 3, 4]
undefinedAggregation Functions
聚合函数
python
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])python
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])Sum
求和
print(np.sum(arr)) # 45 (all elements)
print(np.sum(arr, axis=0)) # [12, 15, 18] (column sums)
print(np.sum(arr, axis=1)) # [6, 15, 24] (row sums)
print(np.sum(arr)) # 45(所有元素)
print(np.sum(arr, axis=0)) # [12, 15, 18](列求和)
print(np.sum(arr, axis=1)) # [6, 15, 24](行求和)
Mean
均值
print(np.mean(arr)) # 5.0
print(np.mean(arr)) # 5.0
Standard deviation
标准差
print(np.std(arr)) # ~2.58
print(np.std(arr)) # ~2.58
Min and max
最小与最大值
print(np.min(arr)) # 1
print(np.max(arr)) # 9
print(np.argmin(arr)) # 0 (index of min)
print(np.argmax(arr)) # 8 (index of max)
print(np.min(arr)) # 1
print(np.max(arr)) # 9
print(np.argmin(arr)) # 0(最小值索引)
print(np.argmax(arr)) # 8(最大值索引)
Median and percentiles
中位数与分位数
print(np.median(arr)) # 5.0
print(np.percentile(arr, 25)) # 3.0 (25th percentile)
undefinedprint(np.median(arr)) # 5.0
print(np.percentile(arr, 25)) # 3.0(25分位数)
undefinedBroadcasting
广播机制
Broadcasting Rules
广播规则
python
import numpy as nppython
import numpy as npScalar and array
标量与数组运算
arr = np.array([1, 2, 3, 4])
result = arr + 10 # Broadcast scalar to array shape
print(result) # [11, 12, 13, 14]
arr = np.array([1, 2, 3, 4])
result = arr + 10 # 标量广播为数组形状
print(result) # [11, 12, 13, 14]
1D and 2D
一维与二维数组运算
arr1d = np.array([1, 2, 3])
arr2d = np.array([[10], [20], [30]])
result = arr1d + arr2d
print(result)
arr1d = np.array([1, 2, 3])
arr2d = np.array([[10], [20], [30]])
result = arr1d + arr2d
print(result)
[[11, 12, 13],
[[11, 12, 13],
[21, 22, 23],
[21, 22, 23],
[31, 32, 33]]
[31, 32, 33]]
Broadcasting example: standardization
广播示例:数据标准化
data = np.random.randn(100, 3) # 100 samples, 3 features
mean = np.mean(data, axis=0) # Mean of each column
std = np.std(data, axis=0) # Std of each column
standardized = (data - mean) / std # Broadcasting!
undefineddata = np.random.randn(100, 3) # 100个样本,3个特征
mean = np.mean(data, axis=0) # 每列的均值
std = np.std(data, axis=0) # 每列的标准差
standardized = (data - mean) / std # 自动广播!
undefinedExplicit Broadcasting
显式广播
python
import numpy as nppython
import numpy as npUsing broadcast_to
使用broadcast_to
arr = np.array([1, 2, 3])
broadcasted = np.broadcast_to(arr, (4, 3))
print(broadcasted)
arr = np.array([1, 2, 3])
broadcasted = np.broadcast_to(arr, (4, 3))
print(broadcasted)
[[1, 2, 3],
[[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]
[1, 2, 3]]
Using newaxis
使用newaxis
arr1d = np.array([1, 2, 3])
col_vector = arr1d[:, np.newaxis] # Shape (3, 1)
row_vector = arr1d[np.newaxis, :] # Shape (1, 3)
arr1d = np.array([1, 2, 3])
col_vector = arr1d[:, np.newaxis] # 形状(3, 1)
row_vector = arr1d[np.newaxis, :] # 形状(1, 3)
Outer product using broadcasting
使用广播计算外积
outer = col_vector * row_vector
print(outer)
outer = col_vector * row_vector
print(outer)
[[1, 2, 3],
[[1, 2, 3],
[2, 4, 6],
[2, 4, 6],
[3, 6, 9]]
[3, 6, 9]]
undefinedundefinedLinear Algebra
线性代数
Matrix Operations
矩阵运算
python
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])python
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])Matrix multiplication
矩阵乘法
C = np.dot(A, B) # Traditional
C = A @ B # Modern syntax (Python 3.5+)
C = np.dot(A, B) # 传统方式
C = A @ B # 现代语法(Python 3.5+)
Element-wise multiplication
逐元素乘法
D = A * B # Not matrix multiplication!
D = A * B # 不是矩阵乘法!
Matrix transpose
矩阵转置
A_T = A.T
A_T = A.T
Trace (sum of diagonal)
迹(对角线元素和)
trace = np.trace(A)
trace = np.trace(A)
Matrix power
矩阵幂
A_squared = np.linalg.matrix_power(A, 2)
print(f"Matrix product:\n{C}")
print(f"Transpose:\n{A_T}")
print(f"Trace: {trace}")
undefinedA_squared = np.linalg.matrix_power(A, 2)
print(f"矩阵乘积:\n{C}")
print(f"转置矩阵:\n{A_T}")
print(f"迹: {trace}")
undefinedSolving Linear Systems
线性方程组求解
python
import numpy as nppython
import numpy as npSolve Ax = b
求解Ax = b
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])
Solve for x
求解x
x = np.linalg.solve(A, b)
print(f"Solution: {x}") # [2, 3]
x = np.linalg.solve(A, b)
print(f"解: {x}") # [2, 3]
Verify solution
验证解
print(f"Verification: {np.allclose(A @ x, b)}") # True
print(f"验证结果: {np.allclose(A @ x, b)}") # True
Matrix inverse
矩阵求逆
A_inv = np.linalg.inv(A)
print(f"Inverse:\n{A_inv}")
A_inv = np.linalg.inv(A)
print(f"逆矩阵:\n{A_inv}")
Determinant
行列式
det = np.linalg.det(A)
print(f"Determinant: {det}")
undefineddet = np.linalg.det(A)
print(f"行列式: {det}")
undefinedEigenvalues and Eigenvectors
特征值与特征向量
python
import numpy as nppython
import numpy as npSquare matrix
方阵
A = np.array([[1, 2], [2, 1]])
A = np.array([[1, 2], [2, 1]])
Eigenvalue decomposition
特征值分解
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"特征值: {eigenvalues}")
print(f"特征向量:\n{eigenvectors}")
Verify: A * v = λ * v
验证:A * v = λ * v
for i in range(len(eigenvalues)):
lam = eigenvalues[i]
v = eigenvectors[:, i]
left = A @ v
right = lam * v
print(f"Eigenvalue {i}: {np.allclose(left, right)}")undefinedfor i in range(len(eigenvalues)):
lam = eigenvalues[i]
v = eigenvectors[:, i]
left = A @ v
right = lam * v
print(f"特征值{i}验证: {np.allclose(left, right)}")undefinedSingular Value Decomposition (SVD)
奇异值分解(SVD)
python
import numpy as nppython
import numpy as npAny matrix
任意矩阵
A = np.array([[1, 2, 3],
[4, 5, 6]])
A = np.array([[1, 2, 3],
[4, 5, 6]])
SVD: A = U @ S @ Vt
SVD分解: A = U @ S @ Vt
U, s, Vt = np.linalg.svd(A)
U, s, Vt = np.linalg.svd(A)
Reconstruct original matrix
重构原矩阵
S = np.zeros((2, 3))
S[:2, :2] = np.diag(s)
A_reconstructed = U @ S @ Vt
print(f"Original:\n{A}")
print(f"Reconstructed:\n{A_reconstructed}")
print(f"Close? {np.allclose(A, A_reconstructed)}")
S = np.zeros((2, 3))
S[:2, :2] = np.diag(s)
A_reconstructed = U @ S @ Vt
print(f"原矩阵:\n{A}")
print(f"重构矩阵:\n{A_reconstructed}")
print(f"是否接近原矩阵? {np.allclose(A, A_reconstructed)}")
Singular values
奇异值
print(f"Singular values: {s}")
undefinedprint(f"奇异值: {s}")
undefinedMatrix Norms
矩阵范数
python
import numpy as np
A = np.array([[1, 2], [3, 4]])python
import numpy as np
A = np.array([[1, 2], [3, 4]])Frobenius norm (default)
Frobenius范数(默认)
norm_fro = np.linalg.norm(A)
norm_fro = np.linalg.norm(A)
1-norm (max column sum)
1-范数(列和最大值)
norm_1 = np.linalg.norm(A, ord=1)
norm_1 = np.linalg.norm(A, ord=1)
Infinity norm (max row sum)
无穷范数(行和最大值)
norm_inf = np.linalg.norm(A, ord=np.inf)
norm_inf = np.linalg.norm(A, ord=np.inf)
2-norm (spectral norm)
2-范数(谱范数)
norm_2 = np.linalg.norm(A, ord=2)
print(f"Frobenius: {norm_fro:.4f}")
print(f"1-norm: {norm_1:.4f}")
print(f"2-norm: {norm_2:.4f}")
print(f"inf-norm: {norm_inf:.4f}")
undefinednorm_2 = np.linalg.norm(A, ord=2)
print(f"Frobenius范数: {norm_fro:.4f}")
print(f"1-范数: {norm_1:.4f}")
print(f"2-范数: {norm_2:.4f}")
print(f"无穷范数: {norm_inf:.4f}")
undefinedRandom Number Generation
随机数生成
Basic Random Generation
基础随机数生成
python
import numpy as nppython
import numpy as npSet seed for reproducibility
设置随机种子以保证可复现
np.random.seed(42)
np.random.seed(42)
Random floats [0, 1)
生成[0,1)间的随机浮点数
rand_uniform = np.random.rand(5) # 1D array of 5 elements
rand_2d = np.random.rand(3, 4) # 3x4 array
rand_uniform = np.random.rand(5) # 一维数组,5个元素
rand_2d = np.random.rand(3, 4) # 3x4数组
Random integers
生成随机整数
rand_int = np.random.randint(0, 10, size=5) # [0, 10)
rand_int_2d = np.random.randint(0, 100, size=(3, 3))
rand_int = np.random.randint(0, 10, size=5) # [0,10)区间
rand_int_2d = np.random.randint(0, 100, size=(3, 3))
Random normal distribution
生成正态分布随机数
rand_normal = np.random.randn(1000) # Mean=0, std=1
rand_normal_custom = np.random.normal(loc=5, scale=2, size=1000)
rand_normal = np.random.randn(1000) # 均值0,标准差1
rand_normal_custom = np.random.normal(loc=5, scale=2, size=1000)
Random choice
随机选择
choices = np.random.choice(['a', 'b', 'c'], size=10)
weighted_choices = np.random.choice([1, 2, 3], size=100, p=[0.1, 0.3, 0.6])
undefinedchoices = np.random.choice(['a', 'b', 'c'], size=10)
weighted_choices = np.random.choice([1, 2, 3], size=100, p=[0.1, 0.3, 0.6])
undefinedStatistical Distributions
统计分布
python
import numpy as nppython
import numpy as npUniform distribution [low, high)
均匀分布 [low, high)
uniform = np.random.uniform(low=0, high=10, size=1000)
uniform = np.random.uniform(low=0, high=10, size=1000)
Normal (Gaussian) distribution
正态(高斯)分布
normal = np.random.normal(loc=0, scale=1, size=1000)
normal = np.random.normal(loc=0, scale=1, size=1000)
Exponential distribution
指数分布
exponential = np.random.exponential(scale=2, size=1000)
exponential = np.random.exponential(scale=2, size=1000)
Binomial distribution
二项分布
binomial = np.random.binomial(n=10, p=0.5, size=1000)
binomial = np.random.binomial(n=10, p=0.5, size=1000)
Poisson distribution
泊松分布
poisson = np.random.poisson(lam=3, size=1000)
poisson = np.random.poisson(lam=3, size=1000)
Beta distribution
Beta分布
beta = np.random.beta(a=2, b=5, size=1000)
beta = np.random.beta(a=2, b=5, size=1000)
Chi-squared distribution
卡方分布
chisq = np.random.chisquare(df=2, size=1000)
undefinedchisq = np.random.chisquare(df=2, size=1000)
undefinedModern Random Generator (numpy.random.Generator)
现代随机数生成器(numpy.random.Generator)
python
import numpy as nppython
import numpy as npCreate generator
创建生成器
rng = np.random.default_rng(seed=42)
rng = np.random.default_rng(seed=42)
Generate random numbers
生成随机数
rand = rng.random(size=10)
ints = rng.integers(low=0, high=100, size=10)
normal = rng.normal(loc=0, scale=1, size=10)
rand = rng.random(size=10)
ints = rng.integers(low=0, high=100, size=10)
normal = rng.normal(loc=0, scale=1, size=10)
Shuffle array in-place
原地打乱数组
arr = np.arange(10)
rng.shuffle(arr)
arr = np.arange(10)
rng.shuffle(arr)
Sample without replacement
无放回抽样
sample = rng.choice(100, size=10, replace=False)
print(f"Random: {rand}")
print(f"Shuffled: {arr}")
undefinedsample = rng.choice(100, size=10, replace=False)
print(f"随机浮点数: {rand}")
print(f"打乱后的数组: {arr}")
undefinedReshaping and Manipulation
数组重塑与操作
Reshaping Arrays
数组重塑
python
import numpy as nppython
import numpy as npOriginal array
原数组
arr = np.arange(12) # [0, 1, 2, ..., 11]
arr = np.arange(12) # [0, 1, 2, ..., 11]
Reshape
重塑形状
arr_2d = arr.reshape(3, 4)
arr_3d = arr.reshape(2, 2, 3)
arr_2d = arr.reshape(3, 4)
arr_3d = arr.reshape(2, 2, 3)
Automatic dimension calculation with -1
使用-1自动计算维度
arr_auto = arr.reshape(3, -1) # Automatically calculates 4
arr_auto = arr.reshape(3, -1) # 自动计算为4列
Flatten to 1D
展平为一维数组
flat = arr_2d.flatten() # Returns copy
flat = arr_2d.ravel() # Returns view if possible
flat = arr_2d.flatten() # 返回复制
flat = arr_2d.ravel() # 尽可能返回视图
Transpose
转置
arr_t = arr_2d.T
print(f"Original shape: {arr.shape}")
print(f"2D shape: {arr_2d.shape}")
print(f"3D shape: {arr_3d.shape}")
undefinedarr_t = arr_2d.T
print(f"原数组形状: {arr.shape}")
print(f"二维数组形状: {arr_2d.shape}")
print(f"三维数组形状: {arr_3d.shape}")
undefinedStacking and Splitting
数组堆叠与拆分
python
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.array([7, 8, 9])python
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.array([7, 8, 9])Vertical stacking (vstack)
垂直堆叠(vstack)
vstacked = np.vstack([a, b, c])
print(vstacked)
vstacked = np.vstack([a, b, c])
print(vstacked)
[[1, 2, 3],
[[1, 2, 3],
[4, 5, 6],
[4, 5, 6],
[7, 8, 9]]
[7, 8, 9]]
Horizontal stacking (hstack)
水平堆叠(hstack)
hstacked = np.hstack([a, b, c])
print(hstacked) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
hstacked = np.hstack([a, b, c])
print(hstacked) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
Column stack
列堆叠
col_stacked = np.column_stack([a, b, c])
col_stacked = np.column_stack([a, b, c])
Concatenate (more general)
通用拼接
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
concat_axis0 = np.concatenate([arr1, arr2], axis=0)
concat_axis1 = np.concatenate([arr1, arr2], axis=1)
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
concat_axis0 = np.concatenate([arr1, arr2], axis=0)
concat_axis1 = np.concatenate([arr1, arr2], axis=1)
Splitting
数组拆分
arr = np.arange(12)
split = np.split(arr, 3) # Split into 3 equal parts
print(split) # [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([8, 9, 10, 11])]
undefinedarr = np.arange(12)
split = np.split(arr, 3) # 拆分为3个等长子数组
print(split) # [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([8, 9, 10, 11])]
undefinedFile I/O
文件I/O
Text Files
文本文件
python
import numpy as nppython
import numpy as npSave to text file
保存到文本文件
data = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
np.savetxt('data.txt', data)
np.savetxt('data.csv', data, delimiter=',')
np.savetxt('data_formatted.txt', data, fmt='%.2f')
data = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
np.savetxt('data.txt', data)
np.savetxt('data.csv', data, delimiter=',')
np.savetxt('data_formatted.txt', data, fmt='%.2f')
Load from text file
从文本文件加载
loaded = np.loadtxt('data.txt')
loaded_csv = np.loadtxt('data.csv', delimiter=',')
loaded = np.loadtxt('data.txt')
loaded_csv = np.loadtxt('data.csv', delimiter=',')
Skip header rows
跳过表头行
loaded_skip = np.loadtxt('data.txt', skiprows=1)
loaded_skip = np.loadtxt('data.txt', skiprows=1)
Load specific columns
加载指定列
loaded_cols = np.loadtxt('data.csv', delimiter=',', usecols=(0, 2))
undefinedloaded_cols = np.loadtxt('data.csv', delimiter=',', usecols=(0, 2))
undefinedBinary Files (.npy, .npz)
二进制文件(.npy, .npz)
python
import numpy as nppython
import numpy as npSave single array
保存单个数组
arr = np.random.rand(100, 100)
np.save('array.npy', arr)
arr = np.random.rand(100, 100)
np.save('array.npy', arr)
Load single array
加载单个数组
loaded = np.load('array.npy')
loaded = np.load('array.npy')
Save multiple arrays (compressed)
保存多个数组(压缩)
arr1 = np.random.rand(10, 10)
arr2 = np.random.rand(20, 20)
np.savez('arrays.npz', first=arr1, second=arr2)
arr1 = np.random.rand(10, 10)
arr2 = np.random.rand(20, 20)
np.savez('arrays.npz', first=arr1, second=arr2)
Load multiple arrays
加载多个数组
loaded = np.load('arrays.npz')
loaded_arr1 = loaded['first']
loaded_arr2 = loaded['second']
loaded = np.load('arrays.npz')
loaded_arr1 = loaded['first']
loaded_arr2 = loaded['second']
Compressed save
压缩保存
np.savez_compressed('arrays_compressed.npz', arr1=arr1, arr2=arr2)
undefinednp.savez_compressed('arrays_compressed.npz', arr1=arr1, arr2=arr2)
undefinedAdvanced Techniques
高级技巧
Universal Functions (ufuncs)
通用函数(ufuncs)
python
import numpy as nppython
import numpy as npUfuncs operate element-wise
通用函数逐元素操作
arr = np.array([1, 2, 3, 4, 5])
arr = np.array([1, 2, 3, 4, 5])
Built-in ufuncs
内置通用函数
result = np.sqrt(arr)
result = np.exp(arr)
result = np.log(arr)
result = np.sqrt(arr)
result = np.exp(arr)
result = np.log(arr)
Custom ufunc
自定义通用函数
def my_func(x):
return x**2 + 2*x + 1
vectorized = np.vectorize(my_func)
result = vectorized(arr)
def my_func(x):
return x**2 + 2*x + 1
vectorized = np.vectorize(my_func)
result = vectorized(arr)
More efficient: define true ufunc
更高效的方式:定义真正的通用函数
@np.vectorize
def better_func(x):
return x**2 + 2*x + 1
undefined@np.vectorize
def better_func(x):
return x**2 + 2*x + 1
undefinedStructured Arrays
结构化数组
python
import numpy as nppython
import numpy as npDefine dtype
定义数据类型
dt = np.dtype([('name', 'U20'), ('age', 'i4'), ('weight', 'f8')])
dt = np.dtype([('name', 'U20'), ('age', 'i4'), ('weight', 'f8')])
Create structured array
创建结构化数组
data = np.array([
('Alice', 25, 55.5),
('Bob', 30, 70.2),
('Charlie', 35, 82.1)
], dtype=dt)
data = np.array([
('Alice', 25, 55.5),
('Bob', 30, 70.2),
('Charlie', 35, 82.1)
], dtype=dt)
Access by field name
通过字段名访问
names = data['name']
ages = data['age']
names = data['name']
ages = data['age']
Sort by field
按字段排序
sorted_data = np.sort(data, order='age')
print(f"Names: {names}")
print(f"Sorted by age:\n{sorted_data}")
undefinedsorted_data = np.sort(data, order='age')
print(f"姓名: {names}")
print(f"按年龄排序:\n{sorted_data}")
undefinedMemory Layout and Performance
内存布局与性能
python
import numpy as nppython
import numpy as npC-contiguous (row-major, default)
C连续(行优先,默认)
arr_c = np.array([[1, 2, 3], [4, 5, 6]], order='C')
arr_c = np.array([[1, 2, 3], [4, 5, 6]], order='C')
Fortran-contiguous (column-major)
Fortran连续(列优先)
arr_f = np.array([[1, 2, 3], [4, 5, 6]], order='F')
arr_f = np.array([[1, 2, 3], [4, 5, 6]], order='F')
Check memory layout
检查内存布局
print(f"C-contiguous? {arr_c.flags['C_CONTIGUOUS']}")
print(f"F-contiguous? {arr_c.flags['F_CONTIGUOUS']}")
print(f"是否为C连续? {arr_c.flags['C_CONTIGUOUS']}")
print(f"是否为Fortran连续? {arr_c.flags['F_CONTIGUOUS']}")
Make contiguous
转换为连续数组
arr_made_c = np.ascontiguousarray(arr_f)
arr_made_f = np.asfortranarray(arr_c)
arr_made_c = np.ascontiguousarray(arr_f)
arr_made_f = np.asfortranarray(arr_c)
Memory usage
内存使用情况
print(f"Memory (bytes): {arr_c.nbytes}")
print(f"Item size: {arr_c.itemsize}")
undefinedprint(f"内存占用(字节): {arr_c.nbytes}")
print(f"单个元素大小: {arr_c.itemsize}")
undefinedAdvanced Indexing with ix_
使用ix_进行高级索引
python
import numpy as np
arr = np.arange(20).reshape(4, 5)python
import numpy as np
arr = np.arange(20).reshape(4, 5)Select specific rows and columns
选择特定行和列
rows = np.array([0, 2])
cols = np.array([1, 3, 4])
rows = np.array([0, 2])
cols = np.array([1, 3, 4])
ix_ creates open mesh
ix_创建开放网格
result = arr[np.ix_(rows, cols)]
print(result)
result = arr[np.ix_(rows, cols)]
print(result)
[[1, 3, 4],
[[1, 3, 4],
[11, 13, 14]]
[11, 13, 14]]
Equivalent to
等价于
result = arr[[0, 2]][:, [1, 3, 4]]
result = arr[[0, 2]][:, [1, 3, 4]]
undefinedundefinedPractical Workflows
实用工作流
Statistical Analysis
统计分析
python
import numpy as nppython
import numpy as npGenerate sample data
生成样本数据
np.random.seed(42)
data = np.random.normal(loc=100, scale=15, size=1000)
np.random.seed(42)
data = np.random.normal(loc=100, scale=15, size=1000)
Descriptive statistics
描述性统计
mean = np.mean(data)
median = np.median(data)
std = np.std(data)
var = np.var(data)
mean = np.mean(data)
median = np.median(data)
std = np.std(data)
var = np.var(data)
Percentiles
分位数
q25, q50, q75 = np.percentile(data, [25, 50, 75])
q25, q50, q75 = np.percentile(data, [25, 50, 75])
Histogram
直方图
counts, bins = np.histogram(data, bins=20)
counts, bins = np.histogram(data, bins=20)
Correlation coefficient
相关系数
data2 = data + np.random.normal(0, 5, size=1000)
corr = np.corrcoef(data, data2)[0, 1]
print(f"Mean: {mean:.2f}")
print(f"Median: {median:.2f}")
print(f"Std: {std:.2f}")
print(f"IQR: [{q25:.2f}, {q75:.2f}]")
print(f"Correlation: {corr:.3f}")
undefineddata2 = data + np.random.normal(0, 5, size=1000)
corr = np.corrcoef(data, data2)[0, 1]
print(f"均值: {mean:.2f}")
print(f"中位数: {median:.2f}")
print(f"标准差: {std:.2f}")
print(f"四分位距: [{q25:.2f}, {q75:.2f}]")
print(f"相关系数: {corr:.3f}")
undefinedMonte Carlo Simulation
蒙特卡洛模拟
python
import numpy as np
def estimate_pi(n_samples=1000000):
"""Estimate π using Monte Carlo method."""
# Random points in [0, 1] x [0, 1]
x = np.random.rand(n_samples)
y = np.random.rand(n_samples)
# Check if inside quarter circle
inside = (x**2 + y**2) <= 1
# Estimate π
pi_estimate = 4 * np.sum(inside) / n_samples
return pi_estimatepython
import numpy as np
def estimate_pi(n_samples=1000000):
"""使用蒙特卡洛方法估算π值。"""
# 在[0,1]x[0,1]区域生成随机点
x = np.random.rand(n_samples)
y = np.random.rand(n_samples)
# 检查点是否在四分之一圆内
inside = (x**2 + y**2) <= 1
# 估算π值
pi_estimate = 4 * np.sum(inside) / n_samples
return pi_estimateEstimate π
估算π
pi_est = estimate_pi(10000000)
print(f"π estimate: {pi_est:.6f}")
print(f"Error: {abs(pi_est - np.pi):.6f}")
undefinedpi_est = estimate_pi(10000000)
print(f"π的估算值: {pi_est:.6f}")
print(f"误差: {abs(pi_est - np.pi):.6f}")
undefinedPolynomial Fitting
多项式拟合
python
import numpy as nppython
import numpy as npGenerate noisy data
生成带噪声的数据
x = np.linspace(0, 10, 50)
y_true = 2x**2 + 3x + 1
y_noisy = y_true + np.random.normal(0, 10, size=50)
x = np.linspace(0, 10, 50)
y_true = 2x**2 + 3x + 1
y_noisy = y_true + np.random.normal(0, 10, size=50)
Fit polynomial (degree 2)
拟合二次多项式
coeffs = np.polyfit(x, y_noisy, deg=2)
print(f"Coefficients: {coeffs}") # Should be close to [2, 3, 1]
coeffs = np.polyfit(x, y_noisy, deg=2)
print(f"系数: {coeffs}") # 应接近[2, 3, 1]
Predict
预测值
y_pred = np.polyval(coeffs, x)
y_pred = np.polyval(coeffs, x)
Evaluate fit quality
评估拟合质量
residuals = y_noisy - y_pred
rmse = np.sqrt(np.mean(residuals**2))
print(f"RMSE: {rmse:.2f}")
residuals = y_noisy - y_pred
rmse = np.sqrt(np.mean(residuals**2))
print(f"均方根误差: {rmse:.2f}")
Create polynomial object
创建多项式对象
poly = np.poly1d(coeffs)
print(f"Polynomial: {poly}")
undefinedpoly = np.poly1d(coeffs)
print(f"多项式: {poly}")
undefinedImage Processing Basics
基础图像处理
python
import numpy as nppython
import numpy as npCreate synthetic image (grayscale)
创建合成灰度图像
image = np.random.rand(100, 100)
image = np.random.rand(100, 100)
Apply transformations
应用变换
Rotate 90 degrees
旋转90度
rotated = np.rot90(image)
rotated = np.rot90(image)
Flip vertically
垂直翻转
flipped_v = np.flipud(image)
flipped_v = np.flipud(image)
Flip horizontally
水平翻转
flipped_h = np.fliplr(image)
flipped_h = np.fliplr(image)
Transpose
转置
transposed = image.T
transposed = image.T
Normalize to [0, 255]
归一化到[0,255]
normalized = ((image - image.min()) / (image.max() - image.min()) * 255).astype(np.uint8)
print(f"Original shape: {image.shape}")
print(f"Value range: [{image.min():.2f}, {image.max():.2f}]")
undefinednormalized = ((image - image.min()) / (image.max() - image.min()) * 255).astype(np.uint8)
print(f"原图像形状: {image.shape}")
print(f"值范围: [{image.min():.2f}, {image.max():.2f}]")
undefinedDistance Matrices
距离矩阵
python
import numpy as nppython
import numpy as npPoints in 2D
二维点集
points = np.random.rand(100, 2)
points = np.random.rand(100, 2)
Pairwise distances (broadcasting)
成对距离(使用广播)
diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
distances = np.sqrt(np.sum(diff**2, axis=2))
print(f"Distance matrix shape: {distances.shape}")
print(f"Max distance: {distances.max():.4f}")
diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
distances = np.sqrt(np.sum(diff**2, axis=2))
print(f"距离矩阵形状: {distances.shape}")
print(f"最大距离: {distances.max():.4f}")
Find nearest neighbors
查找最近邻
for i in range(5):
# Exclude self (distance = 0)
dists = distances[i].copy()
dists[i] = np.inf
nearest = np.argmin(dists)
print(f"Point {i} nearest to point {nearest}, distance: {distances[i, nearest]:.4f}")
undefinedfor i in range(5):
# 排除自身(距离为0)
dists = distances[i].copy()
dists[i] = np.inf
nearest = np.argmin(dists)
print(f"点{i}的最近邻是点{nearest},距离: {distances[i, nearest]:.4f}")
undefinedSliding Window Operations
滑动窗口操作
python
import numpy as np
def sliding_window_view(arr, window_size):
"""Create sliding window views of array."""
shape = (arr.shape[0] - window_size + 1, window_size)
strides = (arr.strides[0], arr.strides[0])
return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)python
import numpy as np
def sliding_window_view(arr, window_size):
"""创建数组的滑动窗口视图。"""
shape = (arr.shape[0] - window_size + 1, window_size)
strides = (arr.strides[0], arr.strides[0])
return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)Time series data
时间序列数据
data = np.random.rand(100)
data = np.random.rand(100)
Create sliding windows
创建滑动窗口
windows = sliding_window_view(data, window_size=10)
windows = sliding_window_view(data, window_size=10)
Compute statistics for each window
计算每个窗口的统计量
window_means = np.mean(windows, axis=1)
window_stds = np.std(windows, axis=1)
print(f"Number of windows: {len(windows)}")
print(f"First window mean: {window_means[0]:.4f}")
undefinedwindow_means = np.mean(windows, axis=1)
window_stds = np.std(windows, axis=1)
print(f"窗口数量: {len(windows)}")
print(f"第一个窗口均值: {window_means[0]:.4f}")
undefinedPerformance Optimization
性能优化
Vectorization Examples
向量化示例
python
import numpy as np
import timepython
import numpy as np
import timeBad: Python loop
错误:Python循环
def sum_python_loop(arr):
total = 0
for x in arr:
total += x**2
return total
def sum_python_loop(arr):
total = 0
for x in arr:
total += x**2
return total
Good: Vectorized
正确:向量化
def sum_vectorized(arr):
return np.sum(arr**2)
def sum_vectorized(arr):
return np.sum(arr**2)
Benchmark
基准测试
arr = np.random.rand(1000000)
start = time.time()
result1 = sum_python_loop(arr)
time_loop = time.time() - start
start = time.time()
result2 = sum_vectorized(arr)
time_vec = time.time() - start
print(f"Loop time: {time_loop:.4f}s")
print(f"Vectorized time: {time_vec:.4f}s")
print(f"Speedup: {time_loop/time_vec:.1f}x")
undefinedarr = np.random.rand(1000000)
start = time.time()
result1 = sum_python_loop(arr)
time_loop = time.time() - start
start = time.time()
result2 = sum_vectorized(arr)
time_vec = time.time() - start
print(f"循环耗时: {time_loop:.4f}s")
print(f"向量化耗时: {time_vec:.4f}s")
print(f"加速比: {time_loop/time_vec:.1f}x")
undefinedMemory-Efficient Operations
内存高效运算
python
import numpy as nppython
import numpy as npBad: Creates intermediate arrays
错误:创建中间数组
def inefficient(arr):
temp1 = arr * 2
temp2 = temp1 + 5
temp3 = temp2 ** 2
return temp3
def inefficient(arr):
temp1 = arr * 2
temp2 = temp1 + 5
temp3 = temp2 ** 2
return temp3
Good: In-place operations
正确:原地运算
def efficient(arr):
result = arr.copy()
result *= 2
result += 5
result **= 2
return result
def efficient(arr):
result = arr.copy()
result *= 2
result += 5
result **= 2
return result
Even better: Single expression (optimized by NumPy)
更优:单表达式(NumPy会优化)
def most_efficient(arr):
return (arr * 2 + 5) ** 2
undefineddef most_efficient(arr):
return (arr * 2 + 5) ** 2
undefinedUsing numexpr for Complex Expressions
使用numexpr处理复杂表达式
python
import numpy as nppython
import numpy as npFor very large arrays and complex expressions,
对于超大数组和复杂表达式,
numexpr can be faster (requires installation)
numexpr可以更快(需要安装)
Without numexpr
不使用numexpr
a = np.random.rand(10000000)
b = np.random.rand(10000000)
result = 2a + 3b**2 - np.sqrt(a)
a = np.random.rand(10000000)
b = np.random.rand(10000000)
result = 2a + 3b**2 - np.sqrt(a)
With numexpr (if installed)
使用numexpr(如果已安装)
import numexpr as ne
import numexpr as ne
result = ne.evaluate('2a + 3b**2 - sqrt(a)')
result = ne.evaluate('2a + 3b**2 - sqrt(a)')
undefinedundefinedCommon Pitfalls and Solutions
常见陷阱与解决方案
NaN Handling
NaN处理
python
import numpy as np
arr = np.array([1, 2, np.nan, 4, 5, np.nan])python
import numpy as np
arr = np.array([1, 2, np.nan, 4, 5, np.nan])Problem: Regular functions return NaN
问题:常规函数返回NaN
mean = np.mean(arr) # Returns nan
mean = np.mean(arr) # 返回nan
Solution: Use nan-safe functions
解决方案:使用NaN安全函数
mean = np.nanmean(arr) # Returns 3.0
std = np.nanstd(arr)
sum_val = np.nansum(arr)
mean = np.nanmean(arr) # 返回3.0
std = np.nanstd(arr)
sum_val = np.nansum(arr)
Check for NaN
检查是否存在NaN
has_nan = np.isnan(arr).any()
where_nan = np.where(np.isnan(arr))[0]
has_nan = np.isnan(arr).any()
where_nan = np.where(np.isnan(arr))[0]
Remove NaN
移除NaN
arr_clean = arr[~np.isnan(arr)]
print(f"Mean (nan-safe): {mean}")
print(f"NaN positions: {where_nan}")
undefinedarr_clean = arr[~np.isnan(arr)]
print(f"NaN安全均值: {mean}")
print(f"NaN位置: {where_nan}")
undefinedInteger Division Pitfall
整数除法陷阱
python
import numpy as nppython
import numpy as npProblem: Integer division with integers
问题:整数间的除法
a = np.array([1, 2, 3])
b = np.array([2, 2, 2])
result = a / b # With Python 3, this is fine
a = np.array([1, 2, 3])
b = np.array([2, 2, 2])
result = a / b # Python 3中没问题
But be careful with older code or explicit int types
但要注意旧代码或显式整数类型
a_int = np.array([1, 2, 3], dtype=np.int32)
b_int = np.array([2, 2, 2], dtype=np.int32)
a_int = np.array([1, 2, 3], dtype=np.int32)
b_int = np.array([2, 2, 2], dtype=np.int32)
In NumPy, / always gives float result
NumPy中/始终返回浮点数
result_float = a_int / b_int # [0.5, 1, 1.5]
result_float = a_int / b_int # [0.5, 1, 1.5]
Use // for integer division
使用//进行整数除法
result_int = a_int // b_int # [0, 1, 1]
print(f"Float division: {result_float}")
print(f"Integer division: {result_int}")
undefinedresult_int = a_int // b_int # [0, 1, 1]
print(f"浮点除法结果: {result_float}")
print(f"整数除法结果: {result_int}")
undefinedArray Equality
数组相等性比较
python
import numpy as np
a = np.array([1.0, 2.0, 3.0])
b = np.array([1.0, 2.0, 3.0])python
import numpy as np
a = np.array([1.0, 2.0, 3.0])
b = np.array([1.0, 2.0, 3.0])Problem: Can't use == directly for array comparison
问题:不能直接用==比较数组
if a == b: # ValueError!
if a == b: # 会抛出ValueError!
Solution 1: Element-wise comparison
解决方案1:逐元素比较
equal_elements = a == b # Boolean array
equal_elements = a == b # 布尔数组
Solution 2: Check if all elements equal
解决方案2:检查所有元素是否相等
all_equal = np.all(a == b)
all_equal = np.all(a == b)
Solution 3: array_equal
解决方案3:使用array_equal
array_equal = np.array_equal(a, b)
array_equal = np.array_equal(a, b)
Solution 4: For floating point, use allclose
解决方案4:浮点数使用allclose
c = a + 1e-10
close_enough = np.allclose(a, c, rtol=1e-5, atol=1e-8)
print(f"All equal: {all_equal}")
print(f"Arrays equal: {array_equal}")
print(f"Close enough: {close_enough}")
undefinedc = a + 1e-10
close_enough = np.allclose(a, c, rtol=1e-5, atol=1e-8)
print(f"所有元素相等: {all_equal}")
print(f"数组相等: {array_equal}")
print(f"足够接近: {close_enough}")
undefinedMemory Leaks with Views
视图导致的内存泄漏
python
import numpy as nppython
import numpy as npProblem: Large array kept in memory
问题:大数组被视图引用而无法释放
large_array = np.random.rand(1000000, 100)
small_view = large_array[0:10] # Just 10 rows
large_array = np.random.rand(1000000, 100)
small_view = large_array[0:10] # 仅10行
large_array is kept in memory because small_view references it!
large_array会被保留在内存中,因为small_view引用了它!
del large_array # Doesn't free memory!
del large_array # 不会释放内存!
Solution: Make a copy
解决方案:创建副本
large_array = np.random.rand(1000000, 100)
small_copy = large_array[0:10].copy()
del large_array # Now memory is freed
large_array = np.random.rand(1000000, 100)
small_copy = large_array[0:10].copy()
del large_array # 现在内存会被释放
Check if it's a view
检查是否为视图
print(f"Is view? {small_view.base is not None}")
print(f"Is copy? {small_copy.base is None}")
This comprehensive NumPy guide covers 50+ examples across all major array operations and numerical computing workflows!print(f"是否为视图? {small_view.base is not None}")
print(f"是否为副本? {small_copy.base is None}")
这份全面的NumPy指南涵盖了50多个示例,覆盖了所有主要的数组操作与数值计算工作流!