优化值N分裂阵列为矢量化数组所以它运行最快 [英] Optimizing the value N to split arrays up for vectorizing an array so it runs the quickest
问题描述
我试图优化值的 N 分裂阵列为矢量化数组所以它运行最快的在不同的机器。我有以下
I'm trying to optimizing the value N to split arrays up for vectorizing an array so it runs the quickest on different machines. I have some test code below
#example use random values
clear all,
t=rand(1,556790);
inner_freq=rand(8193,6);
N=100; # use N chunks
nn = int32(linspace(1, length(t)+1, N+1))
aa_sig_combined=zeros(size(t));
total_time_so_far=0;
for ii=1:N
tic;
ind = nn(ii):nn(ii+1)-1;
aa_sig_combined(ind) = sum(diag(inner_freq(1:end-1,2)) * cos(2 .* pi .* inner_freq(1:end-1,1) * t(ind)) .+ repmat(inner_freq(1:end-1,3),[1 length(ind)]));
toc
total_time_so_far=total_time_so_far+sum(toc)
end
fprintf('- Complete test in %4.4fsec or %4.4fmins\n',total_time_so_far,total_time_so_far/60);
这需要162.7963sec或2.7133mins完成时16gig i7处理器的机器上N = 100运行Ubuntu
有没有办法,找出哪些值的 N 应得到这个在不同的机器上运行速度最快的?
Is there a way to find out what value N should be to get this to run the fastest on different machines?
PS:我的16gig i7处理器的Ubuntu 14.04运行倍频3.8.1,但它也将运行甚至1G的树莓派2。
PS: I'm running Octave 3.8.1 on 16gig i7 ubuntu 14.04 but it will also be running on even a 1 gig raspberry pi 2.
推荐答案
这是Matlab的测试脚本,我用的时间每个参数。返回用于第一迭代后,将它作为它看起来像迭代的其余部分是相似的。
This is the Matlab test script that I used to time each parameter. The return is used to break it after the first iteration as it looks like the rest of the iterations are similar.
%example use random values
clear all;
t=rand(1,556790);
inner_freq=rand(8193,6);
N=100; % use N chunks
nn = int32( linspace(1, length(t)+1, N+1) );
aa_sig_combined=zeros(size(t));
D = diag(inner_freq(1:end-1,2));
for ii=1:N
ind = nn(ii):nn(ii+1)-1;
tic;
cosPara = 2 * pi * A * t(ind);
toc;
cosResult = cos( cosPara );
sumParaA = D * cosResult;
toc;
sumParaB = repmat(inner_freq(1:end-1,3),[1 length(ind)]);
toc;
aa_sig_combined(ind) = sum( sumParaA + sumParaB );
toc;
return;
end
的输出被表示为如下。请注意,我有一个缓慢的电脑。
The output is indicated as follows. Note that I have a slow computer.
Elapsed time is 0.156621 seconds.
Elapsed time is 17.384735 seconds.
Elapsed time is 17.922553 seconds.
Elapsed time is 18.452994 seconds.
正如你所看到的,COS操作什么的这么长时间。你在一个8192x5568矩阵(45613056元),这是有道理的,它需要这么长时间运行的 COS
。
如果你想提高性能,使用 PARFOR
,因为它似乎每个迭代是独立的。假设你有100个内核,运行100次迭代,你的脚本将在完成的 17 秒+ PARFOR
开销。
If you wish to improve performance, use parfor
as it appears each iteration is independent. Assuming you had 100 cores to run your 100 iterations, your script would be done in 17 seconds + parfor
overhead.
在 COS
计算,你可能想看看,如果其他方法存在计算值的COS比股票的方法更快,更平行的。
Within the cos
calculation, you might want to look into if another method exists to calculate cos of a value faster and more parallel than the stock method.
另一个小优化是这一行。它确保了诊断
函数不是内环路称为对角矩阵是恒定的。你不想每次都生成一个8192x8192的对角矩阵!我只是把它存储在循环外,它给出了一个有点性能提升的同时。
Another minor optimization is this line. It ensures that the diag
function isn't called within the loop as the diagonal matrix is constant. You don't want a 8192x8192 diagonal matrix to be generated every time! I just stored it outside the loop and it gives a bit of a performance boost as well.
D = diag(inner_freq(1:end-1,2));
请注意,我没有使用Matlab的轮廓,因为它没有为我工作,但你应该使用在未来多种官能code。
Note that I didn't use the Matlab profile as it didn't work for me, but you should use that in the future for more functionalized code.
这篇关于优化值N分裂阵列为矢量化数组所以它运行最快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!