有关重组循环以提高代码效率的问题 [英] Questions regarding reorganizing loops for code's efficiency

查看:60
本文介绍了有关重组循环以提高代码效率的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题出在这里

请考虑以下功能:

function A = plodding(N,d)
for ii = 1:N
   jj = 1;
   A(ii,jj) = randn;
   while abs(A(ii,jj)) < d
      jj = jj + 1;
      A(ii,jj) = randn;
   end
end

重写此功能可以消除使速度变慢的分配问题.调用新函数,巡航.在具有7.8 GB可用内存的Dell Latitude E6410上,消除分配问题可产生7的加速因子.

Rewrite this function to eliminate the allocation problem that is slowing it down. Call the new function, cruising. On a Dell Latitude E6410, with 7.8 gigabytes of usable memory, eliminating the allocation problem produces a speed-up factor of 7.

这是我的工作:

带有rng(0)的原始版本

The original version with rng(0)

function A = plodding(N,d)
rng(0); % To compare between the original and the modified version
for ii = 1:N
   jj = 1;
   A(ii,jj) = randn;
   while abs(A(ii,jj)) < d
      jj = jj + 1;
      A(ii,jj) = randn;
   end
end
end

修改后的版本

function A = cruising(N,d)
rng(0);
for jj = 1:N % Reorganize, so elems are added column-wise
   ii = 1;
   A(ii,jj) = randn;
   while abs(A(ii,jj)) < d
      ii = ii + 1;
      A(ii,jj) = randn;
   end
end
A = A'; % To get the matrix desired
end

但是当我测试时:

tic; A = plodding(5,4.5); toc
Elapsed time is 0.176091 seconds.

tic; A1 = cruising(5,4.5); toc;
Elapsed time is 39.285447 seconds.

B = A - A1; sum(B(:))
ans = 0

当然可以,A = A1

根据我从本课中学到的知识,我的逻辑应该是正确的,因为MATLAB按列存储元素.有人可以帮我吗?

Based on what I learned from the lesson, my logic should be right, because MATLAB stores elements column-wise. Could someone please help me????

推荐答案

分配

Matlab必须在何时何地重新分配动态增长的矩阵,才更有可能产生差异. 考虑以下输出:

Allocation

The differences are more than likely arising from when and how often Matlab has to reallocate the dynamically growing matrix. Consider the following output:

>> clear();n = 1E5;A = rand(n,1);tic;for k = 2:500; A(:,k) = rand(n,1);end;toc;whos();
Elapsed time is 1.294744 seconds.
  Name           Size                 Bytes  Class     Attributes

  A         100000x500            400000000  double              
  k              1x1                      8  double              
  n              1x1                      8  double              

>> clear();n = 1E5;A = rand(1,n);tic;for k = 2:500; A(k,:) = rand(1,n);end;toc;whos();
Elapsed time is 31.106700 seconds.
  Name        Size                    Bytes  Class     Attributes

  A         500x100000            400000000  double              
  k           1x1                         8  double              
  n           1x1                         8  double

尽管将数组A增长到相同的大小,但按列增长的速度却快了近25倍. 由于Matlab是一列主要语言,因此其数组的列在内存中始终是连续的. 因此,在第二个循环中,Matlab需要在每次迭代开始时将每个扩展列重新分配到内存的连续部分,这可能会产生巨大的开销. 而在第一个循环中,如果存在空间,Matlab可以在上一个列之后立即寻址新列,或者如果Matlab支持非连续数组(我不支持),则可以添加一个指针将前一列的末尾连接到新列对Matlab内部知识一无所知,但我倾向于这种方法.

Despite growing the array A to the same size, growing it column-wise is nearly 25 times faster. Since Matlab is a column-major language, the columns of its arrays are always contiguous in memory. So in the second loop, Matlab needs to reallocate every expanded column to a contiguous section of memory at the beginning of every iteration, which can have a huge overhead. Whereas, in the first loop, Matlab can address the new column right after the previous one, if space exists, or add a pointer connecting the end of the previous column to the new, if Matlab supports non-contiguous arrays (I don't know anything about Matlab internals, but I'd lean toward this method).

函数plodding按列(内部)增长,然后按行(外部)增长,而cruising函数按列(内部)增长,然后按列增长. 运行时间差异取决于这两种分配方法,还取决于终止分配所使用的条件(即阈值d).

The function plodding grows column-wise (inner) then row-wise (outer) while cruising grows row-wise (inner) then grows column-wise. The running time differences depend on these two allocation methodologies but also on the condition used to terminate the allocation (i.e., the threshold value d).


累积分布中计算得出接收小于或等于d的数字的概率正态分布的功能. 可以使用统计工具箱的 normcdf 进行计算,或者如果不这样做的话有工具箱,以下匿名功能:

The probability of receiving a number less than or equal to d is calculated from the cumulative distribution function of the normal distribution. This can be calculated using the Statistics Toolbox's normcdf or, if you do not have the toolbox, the following anonymous function:

normcdf = @(x,mu,sigma) 0.5*(1 + erf((x-mu)./(sigma*sqrt(2))));

接收2或更少的概率为normcdf(2,0,1),即97.7%,接收到大于2的值的概率为1减去概率:1 - normcdf(2,0,1)或2.28%. 概率可用于估计获得大于d的数字所需的数组中元素的数量. 例如,由于有2.28%的机会获得大于2的值,因此平均每两个2s产生100个元素的可能性就更大.

The probability of receiving a 2 or less is normcdf(2,0,1) which is 97.7%, and the probability of receiving a value higher than 2 is one minus the probability: 1 - normcdf(2,0,1) or 2.28%. The probabilities can be used to estimate the number of elements in an array needed to get a number greater than d. For example, since there is a 2.28% chance of getting a value greater than 2, an array of 100 elements will more than likely produce at about two 2s on average.

再举一个例子,获得大于5的数字的概率为0.000029%,这意味着需要大约100万个元素的数组才能平均获得5s. 请注意,我用来估算数组大小的公式是

For another example, the probability of getting a number greater than 5 is 0.000029%, and that means that an array of around 1 million elements is needed to get some 5s on average. Note that the formula I used to estimate the array size is

elemApprox = 10^(abs(log10(1 - normcdf(5,0,1))))

,然后将其除以概率的有效位数(在这种情况下为2.665),以粗略估算出大约为5的元素数量. 像这样进行近似估计并没有什么害处,因为我们正在处理随机数. 为了更加确定,您始终可以将猜测乘以2或3或更多.

and then I divide it by the significant digits of the probability (2.8665 in this case) to get a rough estimate for the approximate number of elements to get one 5. It doesn't hurt to approximate like this since we are dealing with random numbers. For more certainty, you can always multiply the guess by 2 or 3 or more.


现在让我们讨论一下您得到的结果.从评论中,您介绍了

Now let's discuss the results you got. From the comments, you presented

>> tic;A = plodding(10000,2);toc
Elapsed time is 9.289355 seconds.
>> tic;A1 = cruising(10000,2);toc;
Elapsed time is 0.078783 seconds.

如上所述,该阵列的尺寸为10,000 x 100(由于2的概率). 然后,调用plodding(10000,2)将为其第一次迭代生成大约100列,然后将数组逐行增长10,000次迭代. 如上所述,这在Matlab中是一个非常缓慢的过程,如果可能的话,最好避免这样做. 然后,调用cruising(10000,2)将为其第一次迭代生成大约100行,然后逐列增加数组以进行10,000次迭代. 如上所述,对于恒定行长,此过程比逐行增长快得多,但在某些情况下可能非常慢.

As discussed above, the array will have dimensions of 10,000 by about 100 (due to 2's probability). The call plodding(10000,2) will then generate about 100 columns for its first iteration, and then grow the array row-wise for 10,000 iterations. As discussed above, that is a very slow process in Matlab and best avoided if possible. The call cruising(10000,2) will then generate about 100 rows for its first iteration, and then grow the array column-wise for 10,000 iterations. As discussed above, this is a far faster process than row-wise growth for constant row length but can be extremely slow in certain cases.

您还发布了结果:

>> tic;A = plodding(5,5);toc
Elapsed time is 1.168961 seconds.
>> tic;A1 = cruising(5,5);toc;
% When I posted this thread, it's already more than 10 mins and MATLAB's still "busy"!

调用plodding(5,5)将为其第一次迭代生成大约1,000,000列,然后逐行增长数组以进行5次迭代. 因此,在每个成功的内部循环之后,Matlab必须将大约一百万个左右的5个64位列重新分配到连续的内存插槽中. 尽管不一定很快,但对于具有现代内存大小的世界来说,找到一百万个40字节的单独位置可能并不是最糟糕的事情. 调用cruising(5,5)将为其第一次迭代生成大约1,000,000行,然后按列增长数组以进行5次迭代. 这看起来比plodding调用更好,因为首先分配了行,然后添加了列;但是,如果后续迭代所需的行数大于当前行数,则必须在每次内部迭代之后重新分配数组的列,直到满足中断条件为止.

The call plodding(5,5) will generate about 1,000,000 columns for its first iteration, and then grow the array row-wise for 5 iterations. So after every successful inner-loop, Matlab must reallocate 1 million or-so 5 64-bit columns into contiguous memory slots. While not necessarily fast, finding a million separate locations of 40 bytes probably isn't the worst thing in the world with modern memory sizes. The call cruising(5,5) will generate about 1,000,000 rows for its first iteration, and then grow the array column-wise for 5 iterations. This appears better than the plodding call since the rows are being allocated first and then columns added; however, if the number of required rows for subsequent iterations is greater than the current row count, the array's columns must be reallocated after every inner-iteration until the break condition is met.

再深入一点,cruising(5,5)的第一次迭代(在我的计算机上)需要大约2.8秒来运行,并分配756,670行. 第二次迭代需要大约2.6秒来迭代分配的行,并且在等待60秒之后,第二次迭代仅前进了20,366行. 我会得出结论,重复添加行并使Matlab每列重新分配大约6MB的连续内存块,会使该功能非常慢.

Diving a little deeper, the first iteration (on my computer) of cruising(5,5) takes about 2.8 seconds to run and allocates 756,670 rows. The second iteration takes about 2.6 seconds to iterate over the allocated rows, and after waiting for 60 seconds past that point, the second iteration has only advanced 20,366 rows. I would conclude that the repeated addition of rows and making Matlab re-allocate about 6MB contiguous blocks of memory per-column makes the function extremely slow.


我对性能改善的建议是在进入循环之前分配A的行数. 对于plodding,这意味着在循环之前添加A(N,elemApprox) = 0;,对于cruising,这意味着在循环之前添加A(elemApprox,N) = 0;. elemApprox是在不增加数组的情况下一致地达到条件所需的大约行数/列数.

My recommendation for performance improvement would be to allocate the number of rows of A before entering the loop. For plodding, this means adding A(N,elemApprox) = 0; before the loop, and for cruising, this means adding A(elemApprox,N) = 0; before the loop. elemApprox is the approximate number of rows/columns needed to consistently achieve the condition without growing the array.

即使我将d = 5elemApprox估计早于1,000,000,事实证明,指定rng(0)的最大行数为3,151,807. 因此,为了使elemApprox的上限一致,我将上面的近似公式修改为

Even though I approximated elemApprox to be 1,000,000 earlier for d = 5, it turns out that maximum number of rows with rng(0) specified is 3,151,807. Therefore, for a consistent upper bound on elemApprox, I'd amend the above approximation formula to be

elemApprox = 10^(ceil(abs(log10(1 - normcdf(5,0,1)))));

,即10,000,000.如果内存有限,那有点陡峭,但是它将大大增加运行时间. 使用分配公式,大大改善了cruising的性能:

which is 10,000,000. That's a little steep if memory is limited, but it will greatly increase the running time. Using the allocation formula, the performance of cruising is greatly improved:

>> tic;plodding(5,5);toc;
Elapsed time is 0.576764 seconds.
>> tic;cruising(5,5);toc;
Elapsed time is 0.906496 seconds.

我不确定为什么plodding的效果会更好,但是至少cruising现在可以在合理的时间内完成.

I'm not sure why plodding is performing better, but at least cruising now completes in a reasonable amount of time.

我还要指出的是,仅将A(N,1) = 0添加到plodding并让Matlab知道会有多少行,这对于较小的d会带来较大的性能提升:

I'd also like to point out that adding just A(N,1) = 0 to plodding and letting Matlab know how many rows there will be results in a large performance gain for small d:

>> tic;plodding(1E4,2);toc; %   without A(N,1) = 0;
Elapsed time is 18.479698 seconds.
>> tic;plodding(1E4,2);toc; %   WITH A(N,1) = 0;
Elapsed time is 0.052307 seconds.

这篇关于有关重组循环以提高代码效率的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆