分段多项式求值的并行化 [英] Parallelization of Piecewise Polynomial Evaluation

查看:75
本文介绍了分段多项式求值的并行化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试评估大型分段多项式中的点,该多项式是从三次样条曲线获得的.这需要很长时间才能完成,我想加快速度.

I am trying to evaluate points in a large piecewise polynomial, which is obtained from a cubic-spline. This takes a long time to do and I would like to speed it up.

这样,我想用并行处理而不是顺序地计算分段多项式上的点.

As such, I would like to evaluate a points on a piecewise polynomial with parallel processes, rather than sequentially.

代码:

z = zeros(1e6, 1) ;    % preallocate some memory for speed
Y = rand(11220,161) ;  %some data, rand for generating a working example
X = 0 : 0.0125 : 2 ;   % vector of data sites
pp = spline(X, Y) ;    % get the piecewise polynomial form of the cubic spline. 

结果结构很大.

for t = 1 : 1e6  % big number
    hcurrent = ppval(pp,t); %evaluate the piecewise polynomial at t
    z(t) = sum(x(t:t+M-1).*hcurrent,1) ; % do some operation of the interpolated value. Most likely not relevant to this question.
end

不幸的是,使用矩阵形式并使用:

Unfortunately, with matrix form and using:

hcurrent = flipud(ppval(pp, 1: 1e6 ))

需要太多的内存来处理,因此无法完成.有没有一种方法可以批量处理此代码以加快速度?

requires too much memory to process, so cannot be done. Is there a way that I can batch process this code to speed it up?

推荐答案

对于标量第二个参数,如您的示例所示,您正在处理两个问题.首先,有大量的函数调用开销和冗余计算(例如,每次循环迭代都调用unmkpp(pp)).其次, ppval 被写成是通用的,因此并不完全向量化,并且可以执行很多您不需要的事情.

For scalar second arguments, as in your example, you're dealing with two issues. First, there's a good amount of function call overhead and redundant computation (e.g., unmkpp(pp) is called every loop iteration). Second, ppval is written to be general so it's not fully vectorized and does a lot of things that aren't necessary in your case.

下面是矢量化的代码,可以利用问题的某些结构(例如,t是大于0的整数),避免了函数调用的开销,将一些计算移出主for主循环(会花费一些额外的内存),并摆脱ppval内的for循环:

Below is vectorized code code that take advantage of some of the structure of your problem (e.g., t is an integer greater than 0), avoids function call overhead, move some calculations outside of your main for loop (at the cost of a bit of extra memory), and gets rid of a for loop inside of ppval:

n = 1e6;
z = zeros(n,1);
X = 0:0.0125:2;
Y = rand(11220,numel(X));
pp = spline(X,Y);

[b,c,l,k,dd] = unmkpp(pp);
T = 1:n;
idx = discretize(T,[-Inf b(2:l) Inf]); % Or: [~,idx] = histc(T,[-Inf b(2:l) Inf]);
x = bsxfun(@power,T-b(idx),(k-1:-1:0).').';

idx = dd*idx;
d = 1-dd:0;
for t = T
    hcurrent = sum(bsxfun(@times,c(idx(t)+d,:),x(t,:)),2);
    z(t) = ...;
end

对于n=1e6,结果代码花费了示例时间的〜34%.注意,由于矢量化,计算以不同的顺序执行.由于浮点数学运算的性质,这将导致ppval的输出与我的优化版本之间的细微差异.任何差异都应在eps(hcurrent)的数量级上.您仍然可以尝试使用parfor进一步加快计算速度(在四个已经运行的工作程序中,我的系统仅花费了代码原始时间的12%).

The resultant code takes ~34% of the time of your example for n=1e6. Note that because of the vectorization, calculations are performed in a different order. This will result in slight differences between outputs from ppval and my optimized version due to the nature of floating point math. Any differences should be on the order of a few times eps(hcurrent). You can still try using parfor to further speed up the calculation (with four already running workers, my system took just 12% of your code's original time).

我认为以上是概念证明.如果您的示例与您的实际代码和数据不太吻合,则我可能过度优化了上面的代码.在这种情况下,建议您创建自己的优化版本.您可以通过在命令窗口中键入edit ppval来查看ppval的代码开始.您可以通过查看问题的结构以及z向量中最终想要的内容来实现进一步的优化.

I consider the above a proof of concept. I may have over-optmized the code above if your example doesn't correspond well to your actual code and data. In that case, I suggest creating your own optimized version. You can start by looking at the code for ppval by typing edit ppval in your Command Window. You may be able to implement further optimizations by looking at the structure of your problem and what you ultimately want in your z vector.

在内部,ppval仍使用 histc ,已不推荐使用.我上面的代码使用 discretize 执行相同的任务,根据文档的建议.

这篇关于分段多项式求值的并行化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆