有效地计算MATLAB加权距离 [英] Efficiently calculating weighted distance in MATLAB

查看:2292
本文介绍了有效地计算MATLAB加权距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<一个href=\"http://stackoverflow.com/questions/23911670/efficiently-compute-pairwise-squared-euclidean-distance-in-matlab\">Several <一href=\"http://stackoverflow.com/questions/25780633/calculating-euclidean-distance-of-pairs-of-3d-points-in-matlab\">posts 存在约高效计算成对距离在MATLAB。这些职位往往关注快速计算大量的点之间的欧氏距离。

Several posts exist about efficiently calculating pairwise distances in MATLAB. These posts tend to concern quickly calculating euclidean distance between large numbers of points.

我需要创建并迅速计算出点数量较少(一般少于1000对)之间的两两不同的功能。在节目我写的宏大计划,该功能将被执行数千次,所以即使在效率小的收益是很重要的。该功能需要灵活有两种方式:

I need to create a function which quickly calculates the pairwise differences between smaller numbers of points (typically less than 1000 pairs). Within the grander scheme of the program i am writing, this function will be executed many thousands of times, so even small gains in efficiency are important. The function needs to be flexible in two ways:


  1. 在任何给定的调用,距离度量可以欧几里德或城市街区。

  2. 的数据的尺寸被加权。

据我所知,没有办法解决这个特殊问题已经公布。该statstics工具箱提供 pdist 并的 pdist2 ,该接受多种不同距离的功能,但不加权。我已经看到了这些功能,允许进行加权扩展,但这些扩展不允许用户选择不同的距离函数。

As far as i can tell, no solution to this particular problem has been posted. The statstics toolbox offers pdist and pdist2, which accept many different distance functions, but not weighting. I have seen extensions of these functions that allow for weighting, but these extensions do not allow users to select different distance functions.

在理想情况下,我想避免使用从统计工具箱功能(我不能肯定该功能的用户将有机会获得这些工具箱)。

Ideally, i would like to avoid using functions from the statistics toolbox (i am not certain the user of the function will have access to those toolboxes).

我已经写了两个函数来完成此任务。第一种使用棘手调用repmat和置换,而第二只需使用for循环

I have written two functions to accomplish this task. The first uses tricky calls to repmat and permute, and the second simply uses for-loops.

function [D] = pairdist1(A, B, wts, distancemetric)

% get some information about the data
    numA = size(A,1);
    numB = size(B,1);

    if strcmp(distancemetric,'cityblock')
        r=1;
    elseif strcmp(distancemetric,'euclidean')
        r=2;
    else error('Function only accepts "cityblock" and "euclidean" distance')
    end

%   format weights for multiplication
    wts = repmat(wts,[numA,1,numB]);

%   get featural differences between A and B pairs
    A = repmat(A,[1 1 numB]);
    B = repmat(permute(B,[3,2,1]),[numA,1,1]);
    differences = abs(A-B).^r;

%   weigh difference values before combining them
    differences = differences.*wts;
    differences = differences.^(1/r);

%   combine features to get distance
    D = permute(sum(differences,2),[1,3,2]);
end

function [D] = pairdist2(A, B, wts, distancemetric)

% get some information about the data
    numA = size(A,1);
    numB = size(B,1);

    if strcmp(distancemetric,'cityblock')
        r=1;
    elseif strcmp(distancemetric,'euclidean')
        r=2;
    else error('Function only accepts "cityblock" and "euclidean" distance')
    end

%   use for-loops to generate differences
    D = zeros(numA,numB);
    for i=1:numA
        for j=1:numB
            differences = abs(A(i,:) - B(j,:)).^(1/r);
            differences = differences.*wts;
            differences = differences.^(1/r);    
            D(i,j) = sum(differences,2);
        end
    end
end

下面是性能测试:

A = rand(10,3);
B = rand(80,3);
wts = [0.1 0.5 0.4];
distancemetric = 'cityblock';


tic
D1 = pairdist1(A,B,wts,distancemetric);
toc

tic
D2 = pairdist2(A,B,wts,distancemetric);
toc

Elapsed time is 0.000238 seconds.
Elapsed time is 0.005350 seconds.

及其清楚,repmat和 - 置换版本更快地工作比双for循环版本,至少对于小数据集。但我也知道,调用repmat往往慢下来,但是。所以我想知道是否有人在SO社区有什么建议提供给任何改善的功能效率!

Its clear that the repmat-and-permute version works much more quickly than the double-for-loop version, at least for smaller datasets. But i also know that calls to repmat often slow things down, however. So I am wondering if anyone in the SO community has any advice to offer to improve the efficiency of either function!

@Luis Mendo使用所提供的repmat-和置换功能的一个很好的清理 bsxfun

@Luis Mendo offered a nice cleanup of the repmat-and-permute function using bsxfun. I compared his function with my original on datasets of varying size:

比较

随着数据变大,bsxfun版本将成为明显的赢家!

As the data become larger, the bsxfun version becomes the clear winner!

我已经写完的功能,它可以在github [链接。我最终找到一个pretty好矢量方法计算欧几里得距离[的链接],所以我使用该方法在欧几里得的情况下,我就拿@ Divakar的建议对于城市街区。它仍然是不一样快pdist2,但它必须比任何我在这个岗位奠定了早期的办法更快,容易接受权重。

I have finished writing the function and it is available on github [link]. I ended up finding a pretty good vectorized method for computing euclidean distance [link], so i use that method in the euclidean case, and i took @Divakar's advice for city-block. It is still not as fast as pdist2, but its must faster than either of the approaches i laid out earlier in this post, and easily accepts weightings.

推荐答案

您可以替换 repmat 通过的 bsxfun 。这样做可以避免重复明确,因此它更内存效率,并可能更快:

You can replace repmat by bsxfun. Doing so avoids explicit repetition, therefore it's more memory-efficient, and probably faster:

function D = pairdist1(A, B, wts, distancemetric)

    if strcmp(distancemetric,'cityblock')
        r=1;
    elseif strcmp(distancemetric,'euclidean')
        r=2;
    else
        error('Function only accepts "cityblock" and "euclidean" distance')
    end

    differences  = abs(bsxfun(@minus, A, permute(B, [3 2 1]))).^r;
    differences = bsxfun(@times, differences, wts).^(1/r);
    D = permute(sum(differences,2),[1,3,2]);

end

这篇关于有效地计算MATLAB加权距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆