两套高维点:找到最近的邻居,另一组 [英] Two sets of high dimensional points: Find the nearest neighbour in the other set
问题描述
我有2套:A和B两套含高维穴数相同。如何找到最近的邻居集A在B组的每一个点?
I have 2 sets: A and B. Both sets contain the same number of high dimensional points. How do I find the nearest neighbour in Set A for every point in Set B?
我想过使用Voronoi图,但它似乎(根据维基百科),它是不适合的尺寸比2更高
I thought about using a Voronoi diagram but it seems (according to wikipedia) that it is not suitable for dimensions higher than 2.
有人建议可以给我一个方法吗?
Can someone suggest to me a method, please?
推荐答案
FLANN
如果你的数据不确实处在一个高维空间,那么你可以使用 FLANN 一>
If your data do really lie in a high dimensional space, then you could use FLANN.
这实际上是建立了一些旋转的KD树和查询(位)的每一棵树木,保持找到了最好的结果。它还旋转的数据集,以避免讨厌的案件。
It actually builds a number of rotated kd-trees and queries (a bit) every single tree, keeping the best results found. It also rotates the data-set to avoid nasty cases.
在刊物部分,你可以阅读更多关于它是如何工作的。
In the publications section you can read more on how it works.
在获得FLANN部分中,您还可以阅读使用手册。
In Getting FLANN section you can also read the manual.
不过,由于要执行最近邻的高维空间搜索(NNS),你需要接受时间和精确度(更多的时间来更精确地)之间的权衡。这就是为什么FLANN进行近似NNS(检查<一href="http://stackoverflow.com/questions/26237575/comparison-of-the-runtime-of-nearest-neighbor-queries-on-different-data-structur/26266337#26266337">this答案更多)。
However, since you wish to perform Nearest Neighbour Searching(NNS) in a high dimensional space, you need to accept the trade-off between time and accuracy (more time comes with more accuracy). That's why FLANN performs approximate NNS (check this answer for more).
LSH
作为替代方案,我建议LSH算法。这里是E²LSH,这实际上实现了LSH算法。该手册可以发现这里。
As an alternative, I would suggest the LSH algorithm. Here is E²LSH, which actually implements LSH algorithm. The manual can be found here.
该算法背后的想法是,我们希望摆邻近彼此点放置(具有高可能性)在同一桶中。然而,LSH致力于解决第r近邻问题。
The idea behind the algorithm is that we want the points that lie near to each other to be placed (with a high probability) in the same bucket. However, LSH is devoted in solving the R nearest neighbour problem.
由R-近邻的数据结构,笔者可能意味着给定一个查询点q,我们就可以回答这个问题:哪一个数据集的谎言点半径R内从Q。
By R-near neighbour data structure, the author probably means that given a query point q, we can answer this question: "Which points of the dataset lie inside radius R from q?".
不过,手册介绍如何LSH用于执行NN搜索。
However, the manual explains how can LSH be used to perform NN searching.
请注意,这种类型的问题是不是这个网站。我回答你,因为你是一个新的用户。接下来的时间确保你不要忘记这一点。 :)
Note that this type of questions are not for this site. I answered to you because you are a new user. Next time make sure you don't forget that. :)
这篇关于两套高维点:找到最近的邻居,另一组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!