为什么我的Matlab for循环代码比我的矢量化版本要快 [英] Why is my Matlab for-loop code faster than my vectorized version

查看:137
本文介绍了为什么我的Matlab for循环代码比我的矢量化版本要快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直听说矢量化代码比MATLAB中的循环运行得更快.但是,当我尝试对MATLAB代码进行矢量化处理时,它的运行速度似乎会变慢.

I had always heard that vectorized code runs faster than for loops in MATLAB. However, when I tried vectorizing my MATLAB code it seemed to run slower.

我用tictoc来测量时间.我只更改了程序中单个功能的实现.我的矢量化版本在47.228801秒内运行,我的for循环版本在16.962089秒内运行.

I used tic and toc to measure the times. I changed only the implementation of a single function in my program. My vectorized version ran in 47.228801 seconds and my for-loop version ran in 16.962089 seconds.

在我的主程序中,我对N使用大量数字,而N = 1000000并且数据集的大小为1 301,并且对于具有相同大小和N的不同数据集,我对每个版本运行了几次.

Also in my main program I used a large number for N, N = 1000000and DataSet's size is 1 301, and I ran each version several times for different data sets with the same size and N.

为什么矢量化的速度这么慢?如何进一步提高速度?

Why is the vectorized so much slower and how can I improve the speed further?

矢量化"版本

function [RNGSet] = RNGAnal(N,DataSet)
%Creates a random number generated set of numbers to check accuracy overall
%   This function will produce random numbers and normalize a new Data set
%   that is derived from an old data set by multiply random numbers and
%   then dividing by N/2
randData = randint(N,length(DataSet));
tempData = repmat(DataSet,N,1);
RNGSet = randData .* tempData;
RNGSet = sum(RNGSet,1) / (N/2); % sum and normalize by the N
end

"for-loop"版本

The "for-loop" version

function [RNGData] = RNGAnsys(N,Data)
%RNGAnsys This function produces statistical RNG data using a for loop
%   This function will produce RNGData that will be used to plot on another
%   plot that possesses the actual data
multData = zeros(N,length(Data));
for i = 1:length(Data)
    photAbs = randint(N,1); % Create N number of random 0's or 1's
    multData(:,i) = Data(i) * photAbs; % multiply each element in the molar data by the random numbers
end

sumData = sum(multData,1); % sum each individual energy level's data point
RNGData = (sumData/(N/2))'; % divide by n, but account for 0.5 average by n/2
end

推荐答案

矢量化

乍一看,for循环代码告诉我们,由于photAbs是二进制数组,其每一列均根据Data的每个元素进行缩放,因此该二进制功能可用于矢量化.这在此处的代码中被滥用-

Vectorization

First glance at the for-loop code tells us that since photAbs is a binary array each column of which is scaled according to each element of Data, this binary feature could be used for vectorization. This is abused in the code here -

function RNGData = RNGAnsys_vect1(N,Data)

%// Get the 2D Matrix of random ones and zeros
photAbsAll = randint(N,numel(Data));

%// Take care of multData internally by summing along the columns of the
%// binary 2D matrix and then multiply each element of it with each scalar 
%// taken from Data by performing elementwise multiplication
sumData = Data.*sum(photAbsAll,1);

%// Divide by n, but account for 0.5 average by n/2
RNGData = (sumData./(N/2))'; %//'

return;

分析后,瓶颈似乎是随机二进制数组创建的一部分.因此,按照此智能解决方案中的建议,使用更快的随机二进制数组创建器,可以进一步优化上述功能,如下所示:

After profiling, it appears that the bottleneck is the random binary array creating part. So, using a faster random binary array creator as suggested in this smart solution, the above function could be further optimized like so -

function RNGData = RNGAnsys_vect2(N,Data)

%// Create a random binary array and sum along the columns on the fly to
%// save on any variable space that would be required otherwise. 
%// Also perform the elementwise multiplication as discussed before.
sumData = Data.*sum(rand(N,numel(Data))<0.5,1);

%// Divide by n, but account for 0.5 average by n/2
RNGData = (sumData./(N/2))'; %//'

return;

使用智能二进制随机数组创建器,原始代码也可以进行优化,稍后将在优化的for循环代码和矢量化代码之间进行公平的基准测试.此处列出了优化的for循环代码-

Using the smart binary random array creator, the original code could be optimized as well, that will be used for a fair benchmarking between optimized for-loop and vectorized codes later on. The optimized for-loop code is listed here -

function RNGData = RNGAnsys_opt1(N,Data)

multData = zeros(N,numel(Data));
for i = 1:numel(Data)

    %// Create N number of random 0's or 1's using a smart approach
    %// Then, multiply each element in the molar data by the random numbers
    multData(:,i) = Data(i) * rand(N,1)<.5; 
end

sumData = sum(multData,1); % sum each individual energy level's data point
RNGData = (sumData/(N/2))'; % divide by n, but account for 0.5 average by n/2
return;

基准化

基准代码

N = 15000; %// Kept at this value as it going out of memory with higher N's.
           %// Size of dataset is more important anyway as that decides how
           %// well is vectorized code against a for-loop code

DS_arr = [50 100 200 500 800 1500 5000]; %// Dataset sizes
timeall = zeros(2,numel(DS_arr));

for k1 = 1:numel(DS_arr)
    DS = DS_arr(k1);
    Data = rand(1,DS);

    f = @() RNGAnsys_opt1(N,Data);%// Optimized for-loop code
    timeall(1,k1) = timeit(f);
    clear f

    f = @() RNGAnsys_vect2(N,Data);%// Vectorized Code
    timeall(2,k1) = timeit(f);
    clear f
end

%// Display benchmark results
figure,hold on, grid on
plot(DS_arr,timeall(1,:),'-ro')
plot(DS_arr,timeall(2,:),'-kx')
legend('Optimized for-loop code','Vectorized code')
xlabel('Dataset size ->'),ylabel('Time(sec) ->')
avg_speedup = mean(timeall(1,:)./timeall(2,:))
title(['Average Speedup with vectorized code = ' num2str(avg_speedup) 'x'])

结果

结束语

根据我到目前为止对MATLAB的经验,循环和矢量化技术都不适合所有情况,但是所有情况都是特定于情况的.

Based on the experience I had so far with MATLAB, neither for loops nor vectorized techniques are fit for all situations, but everything is situation-specific.

这篇关于为什么我的Matlab for循环代码比我的矢量化版本要快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆