matlab-performance-optimizer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMATLAB Performance Optimizer
MATLAB 性能优化器
This skill provides comprehensive guidelines for optimizing MATLAB code performance. Apply vectorization techniques, memory optimization strategies, and profiling tools to make code faster and more efficient.
本技能提供优化MATLAB代码性能的全面指南。应用向量化技术、内存优化策略和性能分析工具,让代码更快、更高效。
When to Use This Skill
何时使用本技能
- Optimizing slow or inefficient MATLAB code
- Converting loops to vectorized operations
- Reducing memory usage
- Improving algorithm performance
- When user mentions: slow, performance, optimize, speed up, efficient, memory
- Profiling code to find bottlenecks
- Parallelizing computations
- 优化运行缓慢或低效的MATLAB代码
- 将循环转换为向量化操作
- 减少内存占用
- 提升算法性能
- 当用户提及:缓慢、性能、优化、加速、高效、内存
- 分析代码以找到性能瓶颈
- 并行化计算
Core Optimization Principles
核心优化原则
1. Vectorization (Most Important)
1. 向量化(最重要)
Replace loops with vectorized operations whenever possible.
SLOW - Using loops:
matlab
% Slow approach
n = 1000000;
result = zeros(n, 1);
for i = 1:n
result(i) = sin(i) * cos(i);
endFAST - Vectorized:
matlab
% Fast approach
n = 1000000;
i = (1:n).';
result = sin(i) .* cos(i);尽可能用向量化操作替代循环。
缓慢 - 使用循环:
matlab
% Slow approach
n = 1000000;
result = zeros(n, 1);
for i = 1:n
result(i) = sin(i) * cos(i);
end快速 - 向量化:
matlab
% Fast approach
n = 1000000;
i = (1:n).';
result = sin(i) .* cos(i);2. Preallocate Arrays
2. 预分配数组
Always preallocate arrays before loops.
SLOW - Growing arrays:
matlab
% Very slow - array grows each iteration
result = [];
for i = 1:10000
result(end+1) = i^2;
endFAST - Preallocated:
matlab
% Fast - preallocated array
n = 10000;
result = zeros(n, 1);
for i = 1:n
result(i) = i^2;
end循环前务必预分配数组。
缓慢 - 动态扩容数组:
matlab
% Very slow - array grows each iteration
result = [];
for i = 1:10000
result(end+1) = i^2;
end快速 - 预分配数组:
matlab
% Fast - preallocated array
n = 10000;
result = zeros(n, 1);
for i = 1:n
result(i) = i^2;
end3. Use Built-in Functions
3. 使用内置函数
MATLAB built-in functions are highly optimized.
SLOW - Manual implementation:
matlab
% Slow
sum_val = 0;
for i = 1:length(x)
sum_val = sum_val + x(i);
endFAST - Built-in function:
matlab
% Fast
sum_val = sum(x);MATLAB内置函数经过高度优化。
缓慢 - 手动实现:
matlab
% Slow
sum_val = 0;
for i = 1:length(x)
sum_val = sum_val + x(i);
end快速 - 内置函数:
matlab
% Fast
sum_val = sum(x);Vectorization Techniques
向量化技术
Element-wise Operations
逐元素操作
Use , , for element-wise operations:
.*./.^matlab
% Instead of this:
for i = 1:length(x)
y(i) = x(i)^2 + 2*x(i) + 1;
end
% Do this:
y = x.^2 + 2*x + 1;使用 、、 进行逐元素操作:
.*./.^matlab
% Instead of this:
for i = 1:length(x)
y(i) = x(i)^2 + 2*x(i) + 1;
end
% Do this:
y = x.^2 + 2*x + 1;Logical Indexing
逻辑索引
Replace conditional loops with logical indexing:
matlab
% Instead of this:
count = 0;
for i = 1:length(data)
if data(i) > threshold
count = count + 1;
filtered(count) = data(i);
end
end
filtered = filtered(1:count);
% Do this:
filtered = data(data > threshold);用逻辑索引替代条件循环:
matlab
% Instead of this:
count = 0;
for i = 1:length(data)
if data(i) > threshold
count = count + 1;
filtered(count) = data(i);
end
end
filtered = filtered(1:count);
% Do this:
filtered = data(data > threshold);Matrix Operations
矩阵运算
Use matrix multiplication instead of nested loops:
matlab
% Instead of this:
C = zeros(size(A, 1), size(B, 2));
for i = 1:size(A, 1)
for j = 1:size(B, 2)
for k = 1:size(A, 2)
C(i,j) = C(i,j) + A(i,k) * B(k,j);
end
end
end
% Do this:
C = A * B;用矩阵乘法替代嵌套循环:
matlab
% Instead of this:
C = zeros(size(A, 1), size(B, 2));
for i = 1:size(A, 1)
for j = 1:size(B, 2)
for k = 1:size(A, 2)
C(i,j) = C(i,j) + A(i,k) * B(k,j);
end
end
end
% Do this:
C = A * B;Cumulative Operations
累积运算
Use , , , :
cumsumcumprodcummaxcumminmatlab
% Instead of this:
running_sum = zeros(size(data));
running_sum(1) = data(1);
for i = 2:length(data)
running_sum(i) = running_sum(i-1) + data(i);
end
% Do this:
running_sum = cumsum(data);使用 、、、:
cumsumcumprodcummaxcumminmatlab
% Instead of this:
running_sum = zeros(size(data));
running_sum(1) = data(1);
for i = 2:length(data)
running_sum(i) = running_sum(i-1) + data(i);
end
% Do this:
running_sum = cumsum(data);Memory Optimization
内存优化
Use Appropriate Data Types
使用合适的数据类型
matlab
% Instead of default double (8 bytes)
data = rand(1000, 1000); % 8 MB
% Use single precision when appropriate (4 bytes)
data = single(rand(1000, 1000)); % 4 MB
% Use integers when applicable
indices = uint32(1:1000000); % 4 MB instead of 8 MBmatlab
% Instead of default double (8 bytes)
data = rand(1000, 1000); % 8 MB
% Use single precision when appropriate (4 bytes)
data = single(rand(1000, 1000)); % 4 MB
% Use integers when applicable
indices = uint32(1:1000000); % 4 MB instead of 8 MBSparse Matrices
稀疏矩阵
For matrices with mostly zeros:
matlab
% Dense matrix (wastes memory)
A = zeros(10000, 10000);
A(1:100, 1:100) = rand(100); % 800 MB
% Sparse matrix (efficient)
A = sparse(10000, 10000);
A(1:100, 1:100) = rand(100); % Only stores non-zeros针对大部分元素为0的矩阵:
matlab
% Dense matrix (wastes memory)
A = zeros(10000, 10000);
A(1:100, 1:100) = rand(100); % 800 MB
% Sparse matrix (efficient)
A = sparse(10000, 10000);
A(1:100, 1:100) = rand(100); % Only stores non-zerosClear Unused Variables
清理未使用的变量
matlab
% Process large data
largeData = loadData();
processedData = processData(largeData);
% Clear when no longer needed
clear largeData;
% Continue with processed data
results = analyze(processedData);matlab
% Process large data
largeData = loadData();
processedData = processData(largeData);
% Clear when no longer needed
clear largeData;
% Continue with processed data
results = analyze(processedData);In-Place Operations
原地操作
matlab
% Instead of creating copies
A = A + 5; % In-place when possible
% Avoid unnecessary copies
B = A; % Creates copy if A is modified later
B = A + 0; % Forces copymatlab
% Instead of creating copies
A = A + 5; % In-place when possible
% Avoid unnecessary copies
B = A; % Creates copy if A is modified later
B = A + 0; % Forces copyProfiling and Benchmarking
性能分析与基准测试
Using the Profiler
使用性能分析器
matlab
% Profile code execution
profile on
myFunction(inputs);
profile viewer
profile offThe profiler shows:
- Time spent in each function
- Number of calls to each function
- Lines that take the most time
matlab
% Profile code execution
profile on
myFunction(inputs);
profile viewer
profile off性能分析器会显示:
- 每个函数的耗时
- 每个函数的调用次数
- 耗时最长的代码行
Timing Comparisons
计时对比
matlab
% Time single execution
tic;
result = myFunction(data);
elapsedTime = toc;
% Benchmark with timeit (more accurate)
timeit(@() myFunction(data))
% Compare multiple approaches
time1 = timeit(@() approach1(data));
time2 = timeit(@() approach2(data));
fprintf('Approach 1: %.6f s\nApproach 2: %.6f s\n', time1, time2);matlab
% Time single execution
tic;
result = myFunction(data);
elapsedTime = toc;
% Benchmark with timeit (more accurate)
timeit(@() myFunction(data))
% Compare multiple approaches
time1 = timeit(@() approach1(data));
time2 = timeit(@() approach2(data));
fprintf('Approach 1: %.6f s\nApproach 2: %.6f s\n', time1, time2);Common Optimization Patterns
常见优化模式
Pattern 1: Replace find with Logical Indexing
模式1:用逻辑索引替代find
matlab
% SLOW
indices = find(x > 5);
y = x(indices);
% FAST
y = x(x > 5);matlab
% SLOW
indices = find(x > 5);
y = x(indices);
% FAST
y = x(x > 5);Pattern 2: Use Implicit Expansion Instead of repmat
模式2:用隐式扩展替代repmat
matlab
% SLOW - repmat to match dimensions
A = rand(1000, 5);
B = rand(1, 5);
C = A - repmat(B, size(A, 1), 1);
% FAST - implicit expansion (R2016b+)
C = A - B;matlab
% SLOW - repmat to match dimensions
A = rand(1000, 5);
B = rand(1, 5);
C = A - repmat(B, size(A, 1), 1);
% FAST - implicit expansion (R2016b+)
C = A - B;Pattern 3: Avoid Repeated Calculations
模式3:避免重复计算
matlab
% SLOW - recalculates each iteration
for i = 1:n
result(i) = data(i) / sqrt(sum(data.^2));
end
% FAST - calculate once
norm_factor = sqrt(sum(data.^2));
for i = 1:n
result(i) = data(i) / norm_factor;
end
% EVEN FASTER - vectorize
result = data / sqrt(sum(data.^2));matlab
% SLOW - recalculates each iteration
for i = 1:n
result(i) = data(i) / sqrt(sum(data.^2));
end
% FAST - calculate once
norm_factor = sqrt(sum(data.^2));
for i = 1:n
result(i) = data(i) / norm_factor;
end
% EVEN FASTER - vectorize
result = data / sqrt(sum(data.^2));Pattern 4: Efficient String Operations
模式4:高效字符串操作
matlab
% SLOW - concatenating in loop
str = '';
for i = 1:1000
str = [str, sprintf('Line %d\n', i)];
end
% FAST - cell array + join
lines = cell(1000, 1);
for i = 1:1000
lines{i} = sprintf('Line %d', i);
end
str = strjoin(lines, '\n');
% FASTEST - vectorized sprintf
str = sprintf('Line %d\n', 1:1000);matlab
% SLOW - concatenating in loop
str = '';
for i = 1:1000
str = [str, sprintf('Line %d\n', i)];
end
% FAST - cell array + join
lines = cell(1000, 1);
for i = 1:1000
lines{i} = sprintf('Line %d', i);
end
str = strjoin(lines, '\n');
% FASTEST - vectorized sprintf
str = sprintf('Line %d\n', 1:1000);Pattern 5: Use Table for Mixed Data Types
模式5:用Table存储混合数据类型
matlab
% Instead of separate arrays
names = cell(1000, 1);
ages = zeros(1000, 1);
scores = zeros(1000, 1);
% Use table
data = table(names, ages, scores);
% Faster access and better organizationmatlab
% Instead of separate arrays
names = cell(1000, 1);
ages = zeros(1000, 1);
scores = zeros(1000, 1);
% Use table
data = table(names, ages, scores);
% Faster access and better organizationAlgorithm-Specific Optimizations
算法特定优化
Convolution and Filtering
卷积与滤波
matlab
% Use built-in functions
filtered = conv(signal, kernel, 'same');
filtered = filter(b, a, signal);
% For 2D
filtered = conv2(image, kernel, 'same');
filtered = imfilter(image, kernel);
% FFT-based for large kernels (zero-pad for linear convolution)
nfft = length(signal) + length(kernel) - 1;
filtered = ifft(fft(signal, nfft) .* fft(kernel, nfft));matlab
% Use built-in functions
filtered = conv(signal, kernel, 'same');
filtered = filter(b, a, signal);
% For 2D
filtered = conv2(image, kernel, 'same');
filtered = imfilter(image, kernel);
% FFT-based for large kernels (zero-pad for linear convolution)
nfft = length(signal) + length(kernel) - 1;
filtered = ifft(fft(signal, nfft) .* fft(kernel, nfft));Distance Calculations
距离计算
matlab
% Instead of nested loops for pairwise distances
% SLOW
n = size(points, 1);
distances = zeros(n, n);
for i = 1:n
for j = 1:n
distances(i,j) = norm(points(i,:) - points(j,:));
end
end
% FAST - vectorized
distances = pdist2(points, points);matlab
% Instead of nested loops for pairwise distances
% SLOW
n = size(points, 1);
distances = zeros(n, n);
for i = 1:n
for j = 1:n
distances(i,j) = norm(points(i,:) - points(j,:));
end
end
% FAST - vectorized
distances = pdist2(points, points);Sorting and Searching
排序与搜索
matlab
% Presort for multiple searches
sortedData = sort(data);
% Binary search on sorted data
idx = find(sortedData >= value, 1, 'first');
% Use ismember for set operations
[isPresent, locations] = ismember(searchValues, data);
% Use unique for removing duplicates
uniqueData = unique(data);matlab
% Presort for multiple searches
sortedData = sort(data);
% Binary search on sorted data
idx = find(sortedData >= value, 1, 'first');
% Use ismember for set operations
[isPresent, locations] = ismember(searchValues, data);
% Use unique for removing duplicates
uniqueData = unique(data);Parallel Computing
并行计算
Simple Parallel Loops (parfor)
简单并行循环(parfor)
matlab
% Convert for to parfor for independent iterations
parfor i = 1:n
results(i) = expensiveFunction(data(i));
endRequirements for parfor:
- Iterations must be independent
- Loop variable must be consecutive integers
- Variables must be classified as loop, sliced, broadcast, or reduction
matlab
% Convert for to parfor for independent iterations
parfor i = 1:n
results(i) = expensiveFunction(data(i));
endparfor要求:
- 迭代必须相互独立
- 循环变量必须是连续整数
- 变量必须归类为循环变量、切片变量、广播变量或归约变量
Parallel Array Operations
并行数组操作
matlab
% Create parallel pool
parpool('local', 4); % 4 workers
% Use parfeval for asynchronous parallel execution
futures = parfeval(@expensiveFunction, 1, data);
result = fetchOutputs(futures);
% GPU arrays for massive parallelization
gpuData = gpuArray(data);
result = arrayfun(@myFunction, gpuData);
result = gather(result); % Bring back to CPUmatlab
% Create parallel pool
parpool('local', 4); % 4 workers
% Use parfeval for asynchronous parallel execution
futures = parfeval(@expensiveFunction, 1, data);
result = fetchOutputs(futures);
% GPU arrays for massive parallelization
gpuData = gpuArray(data);
result = arrayfun(@myFunction, gpuData);
result = gather(result); % Bring back to CPUAdvanced Optimizations
高级优化
MEX Functions for Critical Sections
关键代码段使用MEX函数
Convert performance-critical code to C/C++:
matlab
% Create MEX file for bottleneck function
% Write myFunction.c, then compile:
% mex myFunction.c
% Call like regular MATLAB function
result = myFunction(inputs);将性能关键代码转换为C/C++:
matlab
% Create MEX file for bottleneck function
% Write myFunction.c, then compile:
% mex myFunction.c
% Call like regular MATLAB function
result = myFunction(inputs);Persistent Variables for Cached Results
用持久化变量缓存结果
matlab
function result = expensiveComputation(input)
persistent cachedData cachedInput
if isequal(input, cachedInput)
% Return cached result
result = cachedData;
return;
end
% Compute and cache
result = computeExpensiveOperation(input);
cachedData = result;
cachedInput = input;
endmatlab
function result = expensiveComputation(input)
persistent cachedData cachedInput
if isequal(input, cachedInput)
% Return cached result
result = cachedData;
return;
end
% Compute and cache
result = computeExpensiveOperation(input);
cachedData = result;
cachedInput = input;
endJIT Acceleration Best Practices
JIT加速最佳实践
MATLAB's JIT (Just-In-Time) compiler optimizes:
- Simple for-loops with scalar operations
- Functions without dynamic features
JIT-friendly code:
matlab
function result = jitFriendly(n)
result = 0;
for i = 1:n
result = result + i;
end
endJIT-unfriendly code (avoid):
matlab
function result = jitUnfriendly(n)
result = 0;
for i = 1:n
eval(['x' num2str(i) ' = i;']); % Dynamic code
end
endMATLAB的JIT(即时)编译器会优化:
- 包含标量操作的简单循环
- 无动态特性的函数
JIT友好代码:
matlab
function result = jitFriendly(n)
result = 0;
for i = 1:n
result = result + i;
end
endJIT不友好代码(避免):
matlab
function result = jitUnfriendly(n)
result = 0;
for i = 1:n
eval(['x' num2str(i) ' = i;']); % Dynamic code
end
endPerformance Checklist
性能检查清单
Before finalizing optimized code, verify:
- Loops are vectorized where possible
- Arrays are preallocated before loops
- Built-in functions used instead of manual implementations
- Logical indexing used instead of find + indexing
- Appropriate data types used (single vs double, integers)
- Sparse matrices used for sparse data
- Repeated calculations moved outside loops
- String concatenation uses efficient methods
- Code profiled to identify actual bottlenecks
- Matrix operations used instead of element-wise loops
- Parallel computing considered for independent operations
- Memory-intensive operations optimized
- Caching implemented for repeated expensive calls
在完成代码优化前,验证以下内容:
- 尽可能对循环进行向量化
- 循环前预分配数组
- 使用内置函数替代手动实现
- 使用逻辑索引替代find+索引
- 使用合适的数据类型(单精度vs双精度、整数)
- 对稀疏数据使用稀疏矩阵
- 将重复计算移到循环外
- 使用高效方法进行字符串拼接
- 分析代码以识别实际性能瓶颈
- 使用矩阵运算替代逐元素循环
- 考虑对独立操作使用并行计算
- 优化内存密集型操作
- 为重复的昂贵调用实现缓存
Profiling Workflow
性能分析工作流
-
Measure First: Profile before optimizingmatlab
profile on myScript; profile viewer -
Identify Bottlenecks: Focus on functions taking most time
-
Optimize: Apply appropriate techniques
-
Measure Again: Verify improvementmatlab
% Before time_before = timeit(@() myFunction(data)); % After optimization time_after = timeit(@() myFunctionOptimized(data)); fprintf('Speedup: %.2fx\n', time_before/time_after); -
Iterate: Repeat for remaining bottlenecks
-
先测量:优化前先进行性能分析matlab
profile on myScript; profile viewer -
识别瓶颈:聚焦耗时最长的函数
-
优化:应用合适的技术
-
再次测量:验证优化效果matlab
% Before time_before = timeit(@() myFunction(data)); % After optimization time_after = timeit(@() myFunctionOptimized(data)); fprintf('Speedup: %.2fx\n', time_before/time_after); -
迭代:对剩余瓶颈重复上述步骤
Common Performance Pitfalls
常见性能陷阱
Pitfall 1: Premature Optimization
陷阱1:过早优化
- Profile first, optimize second
- Focus on actual bottlenecks, not assumptions
- 先分析,再优化
- 聚焦实际瓶颈,而非主观假设
Pitfall 2: Over-vectorization
陷阱2:过度向量化
- Sometimes loops are clearer and fast enough
- Balance readability with performance
- 有时循环更清晰且速度足够
- 在可读性与性能间取得平衡
Pitfall 3: Ignoring Memory Access Patterns
陷阱3:忽略内存访问模式
matlab
% SLOW - inner loop over columns (row-major traversal in column-major MATLAB)
for i = 1:rows
for j = 1:cols
A(i,j) = process(i, j);
end
end
% FAST - inner loop over rows (column-major traversal, contiguous memory)
for j = 1:cols
for i = 1:rows
A(i,j) = process(i, j);
end
end
% FASTEST - vectorized
[I, J] = ndgrid(1:rows, 1:cols);
A = process(I, J);matlab
% SLOW - inner loop over columns (row-major traversal in column-major MATLAB)
for i = 1:rows
for j = 1:cols
A(i,j) = process(i, j);
end
end
% FAST - inner loop over rows (column-major traversal, contiguous memory)
for j = 1:cols
for i = 1:rows
A(i,j) = process(i, j);
end
end
% FASTEST - vectorized
[I, J] = ndgrid(1:rows, 1:cols);
A = process(I, J);Pitfall 4: Unnecessary Data Type Conversions
陷阱4:不必要的数据类型转换
matlab
% SLOW - repeated conversions
for i = 1:n
x = double(data(i));
result(i) = sin(x);
end
% FAST - convert once
x = double(data);
result = sin(x);matlab
% SLOW - repeated conversions
for i = 1:n
x = double(data(i));
result(i) = sin(x);
end
% FAST - convert once
x = double(data);
result = sin(x);Optimization Examples
优化示例
Example 1: Image Processing
示例1:图像处理
matlab
% SLOW
[rows, cols] = size(image);
output = zeros(rows, cols);
for i = 2:rows-1
for j = 2:cols-1
output(i,j) = mean(image(i-1:i+1, j-1:j+1), 'all');
end
end
% FAST
kernel = ones(3,3) / 9;
output = conv2(image, kernel, 'same');matlab
% SLOW
[rows, cols] = size(image);
output = zeros(rows, cols);
for i = 2:rows-1
for j = 2:cols-1
output(i,j) = mean(image(i-1:i+1, j-1:j+1), 'all');
end
end
% FAST
kernel = ones(3,3) / 9;
output = conv2(image, kernel, 'same');Example 2: Statistical Analysis
示例2:统计分析
matlab
% SLOW
n = size(data, 1);
means = zeros(n, 1);
for i = 1:n
means(i) = mean(data(i, :));
end
% FAST
means = mean(data, 2);matlab
% SLOW
n = size(data, 1);
means = zeros(n, 1);
for i = 1:n
means(i) = mean(data(i, :));
end
% FAST
means = mean(data, 2);Example 3: Time Series Processing
示例3:时间序列处理
matlab
% SLOW
n = length(signal);
movingAvg = zeros(size(signal));
window = 10;
for i = window:n
movingAvg(i) = mean(signal(i-window+1:i));
end
% FAST - trailing window: [window-1 past samples, 0 future samples]
movingAvg = movmean(signal, [window-1 0]);matlab
% SLOW
n = length(signal);
movingAvg = zeros(size(signal));
window = 10;
for i = window:n
movingAvg(i) = mean(signal(i-window+1:i));
end
% FAST - trailing window: [window-1 past samples, 0 future samples]
movingAvg = movmean(signal, [window-1 0]);Troubleshooting Performance
性能问题排查
Issue: Code still slow after vectorization
- Solution: Profile to find new bottlenecks; consider algorithm complexity
Issue: Out of memory errors
- Solution: Use smaller data types, process in chunks, use sparse matrices
Issue: parfor slower than for loop
- Solution: Check if overhead outweighs benefits; ensure iterations are expensive enough
Issue: GPU computation slower than CPU
- Solution: Data transfer overhead may exceed computation time; use for large arrays
问题:向量化后代码仍然缓慢
- 解决方案:分析以找到新瓶颈;考虑算法复杂度
问题:内存不足错误
- 解决方案:使用更小的数据类型、分块处理、使用稀疏矩阵
问题:parfor比普通for循环更慢
- 解决方案:检查开销是否超过收益;确保迭代足够耗时
问题:GPU计算比CPU慢
- 解决方案:数据传输开销可能超过计算时间;针对大型数组使用GPU
Additional Resources
额外资源
- Use to analyze performance
profile viewer - Use to check memory usage
memory - Use with:
doc,timeit,tic/toc,parfor,gpuArraysparse - Check MATLAB Performance and Memory documentation
- 使用 分析性能
profile viewer - 使用 检查内存使用情况
memory - 使用 查询:
doc、timeit、tic/toc、parfor、gpuArraysparse - 查看MATLAB性能与内存相关文档