在MATLAB中优化/向量化Mahalanobis距离计算 [英] Optimize/ Vectorize Mahalanobis distance calculations in MATLAB

查看:72
本文介绍了在MATLAB中优化/向量化Mahalanobis距离计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下一段Matlab代码,该代码计算向量之间的马哈拉诺比斯距离以及具有多次迭代的矩阵.我正在尝试找到一种更快的方法来通过矢量化来完成此任务,但没有成功.

S.data=0+(20-0).*rand(15000,3);
S.a=0+(20-0).*rand(2500,3);

S.resultat=ones(length(S.data),length(S.a))*nan;
S.b=ones(length(S.a),3,length(S.a))*nan;

for i=1:length(S.data)
    for j=1:length(S.a)
         S.a2=S.a;
         S.a2(j,:)=S.data(i,:);
         S.b(:,:,j)=S.a2;
           if j==length(S.a)
              for k=1:length(S.a);
                   S.resultat(i,k)=mahal(S.a(k,:),S.b(:,:,k));
              end
           end    
    end   
end

我现在已经修改了代码,避免出现循环之一.但是它仍然很长.如果有人有一个主意,我将非常感激!

S.data=0+(20-0).*rand(15000,3);
S.a=0+(20-0).*rand(2500,3);

S.resultat=ones(length(S.data),length(S.a))*nan;
   for i=1:length(S.data)
       for j=1:length(S.a)
       S.a2=S.a;
       S.a2(j,:)=S.data(i,:);
       S.resultat(i,j)=mahal(S.a(j,:),S.a2);    
       end   
   end

解决方案

简介和解决方案代码

您可以替换使用 mahal 的最里面的循环 bit 矢量化的,因为它在mahal的循环缩短和被黑的版本中使用了一些预先计算的值(借助于bsxfun). /p>

基本上,您有一个2D数组,为了方便参考,我们将其称为A,而将其称为B则是一个3D数组.让输出存储在变量out中.因此,可以根据假定的变量名称提取最里面的代码片段.

原始循环代码

for k=1:size(A,1)
    out(k)=mahal(A(k,:),B(:,:,k));
end

所以,我要做的是侵入mahal.m并查找当输入为2D3D时可以向量化的部分.现在,mahal在其内部使用了qr,无法对其进行矢量化.因此,我们最终得到了 hacked 代码.

密码被破解

%// Pre-calculate certain values that could be avoided than using into loop
meanB = mean(B,1); %// mean of B along dim-1
B_meanB = bsxfun(@minus,B,meanB); %// B minus mean values of B
A_B_meanB = A' - reshape(meanB,size(B,2),[]); %//'# A minus B_meanB 

%// QR calculations in a for-loop starts until the output is obtained
for k = 1:size(A,1)
    [~,R] = qr(B_meanB(:,:,k),0);
    out2(k) = sum((R'\A_B_meanB(:,k)).^2)*(size(A,1)-1);
end

现在,要将这种hack解决方案扩展到问题代码,可以引入更多的调整来预先计算使用这些嵌套循环的更多值.

最终解决方案代码

A = S.a; %// Get data from S
[rx,cx] = size(A); %// Get size parameters
Atr = A'; %//'# Pre-calculate transpose of A

%// Pre-calculate replicated B and the indices to be modified at each iteration
B_rep = repmat(S.a,1,1,rx);
B_idx = bsxfun(@plus,[(0:cx-1)*rx + 1]',[0:rx-1]*(rx*cx+1)); %//'

out = zeros(size(S.data,1),rx); %// initialize output array
for i=1:length(S.data)

    B = B_rep;
    B(B_idx) = repmat(S.data(i,:)',1,rx); %//'
    meanB = mean(B,1); %// mean of B along dim-1

    B_meanB = bsxfun(@minus,B,meanB); %// B minus mean values of B
    A_B_meanB = Atr - reshape(meanB,3,[]); %// A minus B_meanB
    for jj = 1:rx
        [~,R] = qr(B_meanB(:,:,jj),0);
        out(i,jj) = sum((R'\A_B_meanB(:,jj)).^2)*(rx-1); %//'
    end

end
S.resultat = out;


基准化

这里是基准代码,用于将建议的解决方案与问题中列出的代码进行比较-

%// Random inputs
S.data=0+(20-0).*rand(1500,3); %(size 10x reduced for a quicker runtime test)
S.a=0+(20-0).*rand(250,3);

S.resultat=ones(length(S.data),length(S.a))*nan;
disp('----------------------------- With original code')
tic

S.b=ones(length(S.a),3,length(S.a))*nan;
for i=1:length(S.data)
    for j=1:length(S.a)
        S.a2=S.a;
        S.a2(j,:)=S.data(i,:);
        S.b(:,:,j)=S.a2;
        if j==length(S.a)
            for k=1:length(S.a);
                S.resultat(i,k)=mahal(S.a(k,:),S.b(:,:,k));
            end
        end
    end
end

toc, clear i j S.a2 k S.resultat

S.resultat=ones(length(S.data),length(S.a))*nan;
disp('----------------------------- With proposed solution code')
tic

[ ... Proposed solution code ...]

toc

运行时-

----------------------------- With original code
Elapsed time is 17.734394 seconds.
----------------------------- With proposed solution code
Elapsed time is 6.602860 seconds.

因此,通过提议的方法和一些调整,我们可能会加快 2.7x 的速度!

I have the following piece of Matlab code, which calculates Mahalanobis distances between a vector and a matrix with several iterations. I am trying to find a faster method to do this by vectorization but without success.

S.data=0+(20-0).*rand(15000,3);
S.a=0+(20-0).*rand(2500,3);

S.resultat=ones(length(S.data),length(S.a))*nan;
S.b=ones(length(S.a),3,length(S.a))*nan;

for i=1:length(S.data)
    for j=1:length(S.a)
         S.a2=S.a;
         S.a2(j,:)=S.data(i,:);
         S.b(:,:,j)=S.a2;
           if j==length(S.a)
              for k=1:length(S.a);
                   S.resultat(i,k)=mahal(S.a(k,:),S.b(:,:,k));
              end
           end    
    end   
end

I have now modified the code and avoid one of the loop. But it is still very long. If someone have an idea, I will be very greatful!

S.data=0+(20-0).*rand(15000,3);
S.a=0+(20-0).*rand(2500,3);

S.resultat=ones(length(S.data),length(S.a))*nan;
   for i=1:length(S.data)
       for j=1:length(S.a)
       S.a2=S.a;
       S.a2(j,:)=S.data(i,:);
       S.resultat(i,j)=mahal(S.a(j,:),S.a2);    
       end   
   end

解决方案

Introduction and solution code

You can replace the innermost loop that uses mahal with something that is a bit vectorized, as it uses some pre-calculated values (with the help of bsxfun) inside a loop-shortened and hacked version of mahal.

Basically you have a 2D array, let's call it A for easy reference and a 3D array, let's call it B. Let the output be stored be into a variable out. So, the innermost code snippet could be extracted and based on the assumed variable names.

Original loopy code

for k=1:size(A,1)
    out(k)=mahal(A(k,:),B(:,:,k));
end

So, what I did was to hack into mahal.m and look for portions that could be vectorized when the inputs are 2D and 3D. Now, mahal uses qr inside it, which could not be vectorized. Thus, we end up with a hacked code.

Hacked code

%// Pre-calculate certain values that could be avoided than using into loop
meanB = mean(B,1); %// mean of B along dim-1
B_meanB = bsxfun(@minus,B,meanB); %// B minus mean values of B
A_B_meanB = A' - reshape(meanB,size(B,2),[]); %//'# A minus B_meanB 

%// QR calculations in a for-loop starts until the output is obtained
for k = 1:size(A,1)
    [~,R] = qr(B_meanB(:,:,k),0);
    out2(k) = sum((R'\A_B_meanB(:,k)).^2)*(size(A,1)-1);
end

Now, to extend this hack solution to the problem code, one can introduce few more tweaks to pre-calculate more values being used those nested loops.

Final solution code

A = S.a; %// Get data from S
[rx,cx] = size(A); %// Get size parameters
Atr = A'; %//'# Pre-calculate transpose of A

%// Pre-calculate replicated B and the indices to be modified at each iteration
B_rep = repmat(S.a,1,1,rx);
B_idx = bsxfun(@plus,[(0:cx-1)*rx + 1]',[0:rx-1]*(rx*cx+1)); %//'

out = zeros(size(S.data,1),rx); %// initialize output array
for i=1:length(S.data)

    B = B_rep;
    B(B_idx) = repmat(S.data(i,:)',1,rx); %//'
    meanB = mean(B,1); %// mean of B along dim-1

    B_meanB = bsxfun(@minus,B,meanB); %// B minus mean values of B
    A_B_meanB = Atr - reshape(meanB,3,[]); %// A minus B_meanB
    for jj = 1:rx
        [~,R] = qr(B_meanB(:,:,jj),0);
        out(i,jj) = sum((R'\A_B_meanB(:,jj)).^2)*(rx-1); %//'
    end

end
S.resultat = out;


Benchmarking

Here's the benchmarking code to compare the proposed solution against the code listed in the problem -

%// Random inputs
S.data=0+(20-0).*rand(1500,3); %(size 10x reduced for a quicker runtime test)
S.a=0+(20-0).*rand(250,3);

S.resultat=ones(length(S.data),length(S.a))*nan;
disp('----------------------------- With original code')
tic

S.b=ones(length(S.a),3,length(S.a))*nan;
for i=1:length(S.data)
    for j=1:length(S.a)
        S.a2=S.a;
        S.a2(j,:)=S.data(i,:);
        S.b(:,:,j)=S.a2;
        if j==length(S.a)
            for k=1:length(S.a);
                S.resultat(i,k)=mahal(S.a(k,:),S.b(:,:,k));
            end
        end
    end
end

toc, clear i j S.a2 k S.resultat

S.resultat=ones(length(S.data),length(S.a))*nan;
disp('----------------------------- With proposed solution code')
tic

[ ... Proposed solution code ...]

toc

Runtimes -

----------------------------- With original code
Elapsed time is 17.734394 seconds.
----------------------------- With proposed solution code
Elapsed time is 6.602860 seconds.

Thus, we might get around 2.7x speedup with the proposed approach and some tweaks!

这篇关于在MATLAB中优化/向量化Mahalanobis距离计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆