如何向量化找到向量中最接近的点 [英] How to vectorize finding the closest point out of a vector

查看:86
本文介绍了如何向量化找到向量中最接近的点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

BigList = rand(20, 3)
LittleList = rand(5, 3)

我想为大列表中的每一行找到小列表中的最接近"行,这是由欧几里得范数(即k = 3维中相应值之间的平方距离之和)定义的.

I'd like to find for each row in the big list the 'closest' row in the little list, as defined by the euclidean norm (i.e. sum of squared distances between the corresponding values in the k=3 dimension).

我可以看到如何使用两个循环来执行此操作,但是似乎应该有一种使用内置矩阵操作来执行此操作的更好方法.

I can see how to do this using two loops, but it seems like there ought to be a better way to do this using built in matrix operations.

推荐答案

方法1

有一个内置的MATLAB函数 pdist2 可以找到"Pairwise distance between two sets of observations".有了它,您可以计算欧几里德距离矩阵,然后沿着距离矩阵中的相应维度找到最小值的索引,该最小值代表littleList中每行bigList的最接近".

Approach #1

There is a built in MATLAB function pdist2 which finds "Pairwise distance between two sets of observations". With it, you can calculate the euclidean distance matrix and then find indices of minimum values along the appropriate dimension in the distance matrix that would represent the "closest" for each row of bigList in littleList.

这是单线-

[~,minIdx] = min(pdist2(bigList,littleList),[],2); %// minIdx is what you are after

方法2

如果您关心性能,这是一种利用 的方法> fast matrix multiplication in MATLAB 并且此处提供的大多数代码均来自此智能解决方案.

Approach #2

If you care about performance, here's a method that leverages fast matrix multiplication in MATLAB and most of the code presented here is taken from this smart solution.

dim = 3;
numA = size(bigList,1);
numB = size(littleList,1);

helpA = zeros(numA,3*dim);
helpB = zeros(numB,3*dim);
for idx = 1:dim
    helpA(:,3*idx-2:3*idx) = [ones(numA,1), -2*bigList(:,idx), bigList(:,idx).^2 ];
    helpB(:,3*idx-2:3*idx) = [littleList(:,idx).^2 ,    littleList(:,idx), ones(numB,1)];
end
[~,minIdx] = min(helpA * helpB',[],2); %//'# minIdx is what you are after

基准化

基准代码-

Benchmarking

Benchmarking Code -

N1 = 1750; N2 = 4*N1; %/ datasize
littleList = rand(N1, 3);
bigList = rand(N2, 3);

for k = 1:50000
    tic(); elapsed = toc(); %// Warm up tic/toc
end

disp('------------- With squeeze + bsxfun + permute based approach [LuisMendo]')
tic
d = squeeze(sum((bsxfun(@minus, bigList, permute(littleList, [3 2 1]))).^2, 2));
[~, ind] = min(d,[],2);
toc,  clear d ind

disp('------------- With double permutes + bsxfun based approach [Shai]')
tic
d = bsxfun( @minus, permute( bigList, [1 3 2] ), permute( littleList, [3 1 2] ) ); %//diff in third dimension
d = sum( d.^2, 3 ); %// sq euclidean distance
[~,minIdx] = min( d, [], 2 );
toc
clear d minIdx

disp('------------- With bsxfun + matrix-multiplication based approach [Shai]')
tic
nb = sum( bigList.^2, 2 ); %// norm of bigList's items
nl = sum( littleList.^2, 2 ); %// norm of littleList's items
d = bsxfun(@plus, nb, nl.' ) - 2 * bigList * littleList'; %// all the distances
[~,minIdx] = min(d,[],2);
toc, clear nb nl d minIdx

disp('------------- With matrix multiplication based approach  [Divakar]')
tic
dim = 3;
numA = size(bigList,1);
numB = size(littleList,1);

helpA = zeros(numA,3*dim);
helpB = zeros(numB,3*dim);
for idx = 1:dim
    helpA(:,3*idx-2:3*idx) = [ones(numA,1), -2*bigList(:,idx), bigList(:,idx).^2 ];
    helpB(:,3*idx-2:3*idx) = [littleList(:,idx).^2 ,    littleList(:,idx), ones(numB,1)];
end
[~,minIdx] = min(helpA * helpB',[],2);
toc, clear dim numA numB helpA helpB idx minIdx

disp('------------- With pdist2 based approach [Divakar]')
tic
[~,minIdx] = min(pdist2(bigList,littleList),[],2);
toc, clear minIdx

基准测试结果-

------------- With squeeze + bsxfun + permute based approach [LuisMendo]
Elapsed time is 0.718529 seconds.
------------- With double permutes + bsxfun based approach [Shai]
Elapsed time is 0.971690 seconds.
------------- With bsxfun + matrix-multiplication based approach [Shai]
Elapsed time is 0.328442 seconds.
------------- With matrix multiplication based approach  [Divakar]
Elapsed time is 0.159092 seconds.
------------- With pdist2 based approach [Divakar]
Elapsed time is 0.310850 seconds.

快速结论:Shai的第二种方法(bsxfun和矩阵乘法的组合)的运行时与基于pdist2的运行时非常接近,在这两者之间无法确定明确的获胜者两个.

Quick conclusions: The runtimes with Shai's second approach that was a combination of bsxfun and matrix multiplication were very close with the one based on pdist2 and no clear winner could be decided between those two.

这篇关于如何向量化找到向量中最接近的点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆