numba
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNumba - High-Performance Python with JIT
Numba - 借助JIT实现Python高性能计算
Numba makes Python code go fast. It works by decorating your functions with decorators that tell Numba to compile them. It is particularly effective for code that involves heavy numerical loops and NumPy array manipulations.
Numba能够显著提升Python代码的运行速度。它通过为函数添加装饰器的方式,告知Numba对该函数进行编译。对于包含大量数值循环和NumPy数组操作的代码,Numba的优化效果尤为明显。
When to Use
适用场景
- When NumPy's built-in vectorization isn't enough for your specific algorithm.
- You have complex nested loops that are slow in standard Python.
- You need to write custom "ufuncs" (universal functions) that operate element-wise on arrays.
- High-performance physical simulations (Monte Carlo, N-body, Grid-based solvers).
- Accelerating code for execution on NVIDIA GPUs (CUDA).
- Creating parallelized code that utilizes all CPU cores without the overhead of multiprocessing.
- 当NumPy内置的向量化操作无法满足你的特定算法需求时。
- 你拥有在标准Python中运行缓慢的复杂嵌套循环。
- 你需要编写可对数组进行逐元素操作的自定义"ufuncs"(通用函数)。
- 高性能物理模拟(蒙特卡洛、N体问题、基于网格的求解器)。
- 为NVIDIA GPU(CUDA)加速代码执行。
- 创建可利用所有CPU核心的并行化代码,且无需多进程的额外开销。
Reference Documentation
参考文档
Official docs: https://numba.pydata.org/numba-doc/latest/index.html
User Guide: https://numba.pydata.org/numba-doc/latest/user/index.html
Search patterns:, , , ,
User Guide: https://numba.pydata.org/numba-doc/latest/user/index.html
Search patterns:
@njit@vectorizeprangecuda.jitnumba.typed官方文档:https://numba.pydata.org/numba-doc/latest/index.html
用户指南:https://numba.pydata.org/numba-doc/latest/user/index.html
搜索关键词:, , , ,
用户指南:https://numba.pydata.org/numba-doc/latest/user/index.html
搜索关键词:
@njit@vectorizeprangecuda.jitnumba.typedCore Principles
核心原则
nopython Mode (@njit)
nopython模式(@njit)
This is the "gold standard" for Numba. In this mode, Numba compiles the code without using the Python C-API, resulting in maximum speed. If it can't compile (e.g., because of unsupported Python objects), it throws an error.
这是Numba的"黄金标准"模式。在该模式下,Numba不使用Python C-API进行编译,可实现极致的运行速度。如果代码无法被编译(例如使用了不支持的Python对象),Numba会抛出错误。
Just-In-Time (JIT) Compilation
即时(JIT)编译
Compilation happens the first time you call the function. The machine code is then cached for subsequent calls.
编译过程在函数首次被调用时发生,生成的机器码会被缓存,供后续调用使用。
Array-Oriented
面向数组
Numba is designed to work with NumPy arrays. It understands their memory layout and can generate highly optimized loops over them.
Numba专为NumPy数组设计,它理解数组的内存布局,并能生成针对数组循环的高度优化代码。
Quick Reference
快速参考
Installation
安装
bash
pip install numbabash
pip install numbaStandard Imports
标准导入
python
import numpy as np
from numba import njit, prange, vectorize, guvectorize, cudapython
import numpy as np
from numba import njit, prange, vectorize, guvectorize, cudaBasic Pattern - Accelerating a Loop
基础模式 - 加速循环
python
import numpy as np
from numba import njitpython
import numpy as np
from numba import njit1. Apply the @njit decorator (alias for @jit(nopython=True))
1. 应用@njit装饰器(@jit(nopython=True)的别名)
@njit
def sum_array(arr):
res = 0.0
# Standard Python loop that would be slow is now fast as C
for i in range(arr.shape[0]):
res += arr[i]
return res
@njit
def sum_array(arr):
res = 0.0
# 原本运行缓慢的标准Python循环现在速度可媲美C语言
for i in range(arr.shape[0]):
res += arr[i]
return res
2. Execute
2. 执行
data = np.random.random(1_000_000)
result = sum_array(data) # First call compiles, then runs
undefineddata = np.random.random(1_000_000)
result = sum_array(data) # 首次调用会进行编译,后续调用直接运行
undefinedCritical Rules
关键规则
✅ DO
✅ 建议做法
- Prefer @njit - Always use nopython=True (or its alias @njit). It ensures your code is actually running at machine speed.
- Use NumPy Arrays - Numba is optimized for NumPy. Avoid standard Python lists inside jitted functions.
- Enable Parallelism - Use and
@njit(parallel=True)instead ofprangefor automatic multi-threading.range - Cache Compiled Code - Use to avoid recompilation every time you restart your script.
@njit(cache=True) - Warm up - Remember that the first call is slow due to compilation. In timing benchmarks, always run the function once before measuring.
- Type Specifying (Optional) - You can provide signatures (e.g., ) to speed up the very first call, but Numba usually infers them well.
(float64[:],)
- 优先使用@njit - 始终启用nopython=True(或其别名@njit),确保代码以机器速度运行。
- 使用NumPy数组 - Numba针对NumPy进行了优化,在JIT函数内避免使用标准Python列表。
- 启用并行化 - 使用和
@njit(parallel=True)替代prange,实现自动多线程。range - 缓存编译代码 - 使用避免每次重启脚本时重新编译。
@njit(cache=True) - 预热函数 - 注意首次调用因编译会较慢,在性能基准测试中,应先运行一次函数再进行计时。
- 指定类型(可选) - 你可以提供类型签名(例如)来加快首次调用的速度,但Numba通常能很好地自动推断类型。
(float64[:],)
❌ DON'T
❌ 避免做法
- Don't use Python Objects - Strings, dictionaries, and custom classes are slow or unsupported in nopython mode. Use for specialized containers if needed.
numba.typed - Don't JIT small functions - The overhead of calling a jitted function from Python can outweigh the gains for trivial operations.
- Don't use unsupported libraries - You cannot use pandas, matplotlib, or requests inside an function.
@njit - Don't modify global state - Jitted functions should be "pure" as much as possible for stability.
- 不要使用Python对象 - 字符串、字典和自定义类在nopython模式下运行缓慢或不受支持。若需要容器,可使用提供的专用容器。
numba.typed - 不要对小型函数进行JIT编译 - 对于简单操作,从Python调用JIT函数的开销可能超过性能提升。
- 不要使用不支持的库 - 无法在函数内使用pandas、matplotlib或requests等库。
@njit - 不要修改全局状态 - JIT函数应尽可能保持"纯函数",以保证稳定性。
Anti-Patterns (NEVER)
反模式(绝对避免)
python
from numba import njit
import pandas as pdpython
from numba import njit
import pandas as pd❌ BAD: Using Pandas inside @njit (Unsupported)
❌ 错误:在@njit内使用Pandas(不支持)
@njit
def bad_func(df):
return df['col'].sum() # Will raise a LoweringError
@njit
def bad_func(df):
return df['col'].sum() # 会抛出LoweringError
✅ GOOD: Pass NumPy arrays instead
✅ 正确做法:传入NumPy数组
@njit
def good_func(arr):
return arr.sum()
@njit
def good_func(arr):
return arr.sum()
❌ BAD: Using @jit without nopython=True
❌ 错误:使用不带nopython=True的@jit
from numba import jit
@jit
def slow_func(x): # This might fall back to "Object Mode" (slow)
return x + 1
from numba import jit
@jit
def slow_func(x): # 可能回退到"对象模式"(运行缓慢)
return x + 1
✅ GOOD: Always ensure nopython mode
✅ 正确做法:始终确保启用nopython模式
@njit
def fast_func(x):
return x + 1
@njit
def fast_func(x):
return x + 1
❌ BAD: Manual loops in Python to call a JIT function
❌ 错误:在Python中手动循环调用JIT函数
for i in range(1000):
for i in range(1000):
process_element(arr[i]) # Calling JIT overhead 1000 times
process_element(arr[i]) # 1000次调用JIT函数的开销很大
✅ GOOD: Move the loop INSIDE the @njit function
✅ 正确做法:将循环移至@njit函数内部
@njit
def process_all(arr):
for i in range(arr.shape[0]):
process_element(arr[i])
undefined@njit
def process_all(arr):
for i in range(arr.shape[0]):
process_element(arr[i])
undefinedParallelism and Vectorization
并行化与向量化
Automatic Multi-threading
自动多线程
python
from numba import njit, prange
@njit(parallel=True)
def parallel_sum(A):
# Use prange for the loop that should be parallelized
s = 0
for i in prange(A.shape[0]):
s += A[i]
return spython
from numba import njit, prange
@njit(parallel=True)
def parallel_sum(A):
# 对需要并行化的循环使用prange
s = 0
for i in prange(A.shape[0]):
s += A[i]
return sCreating Fast ufuncs (@vectorize)
创建高速ufuncs(@vectorize)
python
from numba import vectorizepython
from numba import vectorizeThis creates a NumPy ufunc that supports broadcasting
创建支持广播的NumPy ufunc
@vectorize(['float64(float64, float64)'], target='parallel')
def fast_add(x, y):
return x + y
@vectorize(['float64(float64, float64)'], target='parallel')
def fast_add(x, y):
return x + y
Now you can use it on massive arrays
现在可在大规模数组上使用该函数
res = fast_add(arr1, arr2)
undefinedres = fast_add(arr1, arr2)
undefinedWorking with Structs and Types
结构体与类型处理
numba.typed for Non-Array Data
使用numba.typed处理非数组数据
python
from numba.typed import List, Dict
from numba import njit
@njit
def use_typed_list():
l = List()
l.append(1.0)
return lpython
from numba.typed import List, Dict
from numba import njit
@njit
def use_typed_list():
l = List()
l.append(1.0)
return lGPU Acceleration (numba.cuda)
GPU加速(numba.cuda)
Writing CUDA Kernels
编写CUDA核函数
python
from numba import cuda
@cuda.jit
def my_kernel(io_array):
# Calculate thread indices
pos = cuda.grid(1)
if pos < io_array.size:
io_array[pos] *= 2python
from numba import cuda
@cuda.jit
def my_kernel(io_array):
# 计算线程索引
pos = cuda.grid(1)
if pos < io_array.size:
io_array[pos] *= 2Usage
使用示例
data = np.ones(256)
threadsperblock = 32
blockspergrid = (data.size + (threadsperblock - 1)) // threadsperblock
my_kernelblockspergrid, threadsperblock
undefineddata = np.ones(256)
threadsperblock = 32
blockspergrid = (data.size + (threadsperblock - 1)) // threadsperblock
my_kernelblockspergrid, threadsperblock
undefinedPractical Workflows
实用工作流
1. Fast Monte Carlo Simulation
1. 快速蒙特卡洛模拟
python
import random
@njit(parallel=True)
def monte_carlo_pi(nsamples):
acc = 0
for i in prange(nsamples):
x = random.random()
y = random.random()
if (x**2 + y**2) < 1.0:
acc += 1
return 4.0 * acc / nsamplespython
import random
@njit(parallel=True)
def monte_carlo_pi(nsamples):
acc = 0
for i in prange(nsamples):
x = random.random()
y = random.random()
if (x**2 + y**2) < 1.0:
acc += 1
return 4.0 * acc / nsamples2. Custom Image Filter (Stencil)
2. 自定义图像滤波器(模板)
python
from numba import njit
@njit
def apply_threshold(image, threshold):
M, N = image.shape
result = np.zeros_like(image)
for i in range(M):
for j in range(N):
if image[i, j] > threshold:
result[i, j] = 255
return resultpython
from numba import njit
@njit
def apply_threshold(image, threshold):
M, N = image.shape
result = np.zeros_like(image)
for i in range(M):
for j in range(N):
if image[i, j] > threshold:
result[i, j] = 255
return result3. Solving a Physics Grid (Laplace Equation)
3. 求解物理网格问题(拉普拉斯方程)
python
@njit
def solve_laplace(u, niters):
M, N = u.shape
for n in range(niters):
for i in range(1, M-1):
for j in range(1, N-1):
u[i, j] = 0.25 * (u[i+1, j] + u[i-1, j] + u[i, j+1] + u[i, j-1])
return upython
@njit
def solve_laplace(u, niters):
M, N = u.shape
for n in range(niters):
for i in range(1, M-1):
for j in range(1, N-1):
u[i, j] = 0.25 * (u[i+1, j] + u[i-1, j] + u[i, j+1] + u[i, j-1])
return uPerformance Optimization
性能优化
The inspect_types() method
inspect_types()方法
Use this to see if Numba had to fall back to expensive Python objects or if it managed to optimize everything to native types.
python
fast_func.inspect_types() # Prints color-coded annotated code使用该方法查看Numba是否不得不回退到开销较大的Python对象,或者是否成功将所有内容优化为原生类型。
python
fast_func.inspect_types() # 打印带颜色标注的注解代码Avoid Array Allocation in Loops
避免在循环内分配数组
Pre-allocate arrays outside the function or pass them as arguments to avoid memory management overhead.
@njitpython
undefined在函数外部预分配数组,或将数组作为参数传入,避免内存管理开销。
@njitpython
undefined✅ GOOD:
✅ 正确做法:
@njit
def compute_into(out_arr, in_arr):
for i in range(in_arr.shape[0]):
out_arr[i] = in_arr[i] * 2
undefined@njit
def compute_into(out_arr, in_arr):
for i in range(in_arr.shape[0]):
out_arr[i] = in_arr[i] * 2
undefinedCommon Pitfalls and Solutions
常见陷阱与解决方案
The "Global Variable" problem
"全局变量"问题
Numba captures the value of global variables at the time of compilation.
python
undefinedNumba在编译时会捕获全局变量的值。
python
undefined❌ Problem: Changing a global variable won't affect the jitted function
❌ 问题:修改全局变量不会影响JIT函数
K = 10
@njit
def f(x): return x + K
K = 20
f(1) # Result is still 11!
K = 10
@njit
def f(x): return x + K
K = 20
f(1) # 结果仍然是11!
✅ Solution: Pass constants as arguments
✅ 解决方案:将常量作为参数传入
undefinedundefinedObject Mode Fallback
对象模式回退
If Numba says "Object mode is enabled", your code will be slow.
python
undefined如果Numba提示"Object mode is enabled",你的代码运行速度会很慢。
python
undefined✅ Solution: Force nopython mode
✅ 解决方案:强制启用nopython模式
@njit # If this throws error, fix the code instead of removing @njit
undefined@njit # 如果抛出错误,修复代码而非移除@njit
undefinedRandom Seed in Parallel
并行环境中的随机种子
Using in requires care to ensure independent streams for each thread. Standard or inside Numba are thread-safe and handle seeding per-thread automatically.
np.randomparallel=Truerandom.random()np.random.random()在模式下使用需要注意确保每个线程使用独立的随机流。Numba内部的标准或是线程安全的,会自动为每个线程处理种子。
parallel=Truenp.randomrandom.random()np.random.random()Best Practices
最佳实践
- Always use @njit - Never use without
@jitnopython=True - Pre-allocate arrays - Avoid creating arrays inside hot loops
- Use prange for parallelism - Enable automatic multi-threading with and
parallel=Trueprange - Cache compiled functions - Use to avoid recompilation
cache=True - Warm up functions - Call jitted functions once before benchmarking
- Pass NumPy arrays - Convert Python lists to NumPy arrays before calling jitted functions
- Avoid Python objects - Use and
numba.typed.Listif you need containersnumba.typed.Dict - Check compilation mode - Use to verify nopython mode
inspect_types() - Handle first-call overhead - Remember the first call compiles the function
- Use appropriate signatures - Optional but can speed up first compilation
Numba is the bridge that allows Python to compete with C++ and Fortran in the high-performance computing arena. It removes the "Python tax" from your loops, enabling rapid prototyping without sacrificing execution speed.
- 始终使用@njit - 不要使用不带的
nopython=True@jit - 预分配数组 - 避免在热循环内创建数组
- 使用prange实现并行化 - 启用并使用
parallel=True实现自动多线程prange - 缓存编译后的函数 - 使用避免重复编译
cache=True - 预热函数 - 在基准测试前先调用一次JIT函数
- 传入NumPy数组 - 在调用JIT函数前将Python列表转换为NumPy数组
- 避免使用Python对象 - 若需要容器,使用和
numba.typed.Listnumba.typed.Dict - 检查编译模式 - 使用验证是否处于nopython模式
inspect_types() - 处理首次调用开销 - 记住首次调用会进行函数编译
- 使用合适的类型签名 - 可选操作,但可加快首次编译速度
Numba是让Python在高性能计算领域与C++和Fortran竞争的桥梁。它消除了Python循环的"性能税",让你无需牺牲执行速度即可快速完成原型开发。