如何查找在preFIX树NN-搜索最相似的位向量? [英] How to lookup the most similar bit vector in an prefix tree for NN-search?

查看:273
本文介绍了如何查找在preFIX树NN-搜索最相似的位向量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解决的问题,在这个问题的解释: <一href="http://stackoverflow.com/questions/17282026/finding-the-single-nearest-neighbor-using-a-$p$pfix-tree-in-o1">Finding用O型一preFIX树单近邻(1)?

The problem I'm trying to solve is explained in this question: Finding the single nearest neighbor using a Prefix tree in O(1)?

我的问题是关于在这个问题页建议的解决方案部分。在这一部分它提到,我们找到了最近的邻居,从每个preFIX树遍历从节点开始的树。那么查找是否存在密钥在preFIX树很简单,但得到的最相似的一个我不明白,在所有。如何做到这一点?

My question is regarding the Proposed solution section in that question page. In that section it's mentioned that we find the nearest neighbor from each prefix tree by traversing the tree starting from the node. Well looking up if a key exists in a prefix tree is straightforward but getting the most similar one I don't understand at all. How to accomplish this?

我想,如果有人可以解释这对我来说,如果不是以图形方式(这是preferred),那么至少有一些细节。

I wish if someone can explain this to me, if not graphically (which is preferred), then at least with some details.

编辑:

下面是本文的code。它是用Python编写的,不幸的是我从来没有与Python工作过。如果有人熟悉Python和可查找的code,看看他们是如何找到使用preFIX树最近的邻居。的https://github.com/kykamath/streaming_lsh/blob/master/streaming_lsh/nearest_neighbor_lsh.py

here's the code of the paper. It's written in python and unfortunately I never worked with python before. If anybody is familiar with python and can lookup the code to see how they find the nearest neighbors using prefix trees. https://github.com/kykamath/streaming_lsh/blob/master/streaming_lsh/nearest_neighbor_lsh.py

https://github.com/kykamath/streaming_lsh/blob/master/streaming_lsh/classes.py

推荐答案

首先知道他们遍历整个树。通过遍历整个树,他们肯定可以找到最相似的邻居。

First know they iterate over the entire tree. By iterating over the entire tree they are guaranteed to find the most similar neighbor.

要在一般情况下,使用DFS图的遍历的树更有效。请注意,因为它是树,就没有必要为访问了节点的着色方案。

To be more efficient in the average case use a DFS graph traversal for the tree. Notice that since it is a tree, there will be no need for a coloring scheme for visited nodes.

开始与最接近的对象为空,并在树的根。

Start with the closest object as null and at the root of the tree.

对于每个节点应遍历孩子的多少,他们添加到编辑距离的顺序和如果添加的编辑距离不会超过到最近物体的距离。例如,对于汉明距离,首先遍历子,将添加的O的整体距离,然后再把遍历子,将加1的总距离。但是,不要穿过进入1孩子,如果这将使编辑距离比目前的最近距离更大。

For each node you should traverse the children in the order of how much they add to the edit distance and only if the added edit distance would not be greater than the distance to the closest object. For example, with hamming distance, first traverse the child that would add "O" to the overall distance, then afterwards traverse the child that would add "1" to the overall distance. But do not traverse into the "1" child if that would make the edit distance greater than the current closest distance.

当你遇到了preFIX树中的字检查,看它是否有查询对象小于最接近的对象的距离,并把它分配给最接近的。

When you encounter a "word" within the prefix tree check to see if it has a distance to the query object that is less than the closest object and assign it to closest.

这篇关于如何查找在preFIX树NN-搜索最相似的位向量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆