LSH中的非空桶 [英] Non-empty buckets in LSH

查看：147 发布时间：2020/7/24 7:39:14 locality-sensitive-hash

本文介绍了LSH中的非空桶的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在阅读有关 LSH 的调查，尤其是引用2.2.1部分的最后一段:

I'm reading this survey about LSH, in particular citing the last paragraph of section 2.2.1:

为了提高召回率，构建了L个哈希表，并且将各项位于L(L'，L'< L)哈希桶h_1(q)，··，h_L(q) 被作为q的附近项进行检索，以进行随机R-近邻搜索 (或随机c-近似R附近的邻居搜索). 为了保证精度，L个哈希码y_i中的每一个都需要是一个长代码，这意味着存储桶总数为太大而无法直接建立索引. 因此，只有非空桶是通过使用哈希码h_1的对流哈希来保留 (x).

To improve the recall, L hash tables are constructed, and the items lying in the L (L ′ , L ′ < L) hash buckets h_1 (q), · · · , h_L (q) are retrieved as near items of q for randomized R-near neighbor search (or randomized c- approximate R-near neighbor search). To guarantee the precision, each of the L hash codes, y_i , needs to be a long code, which means that the total number of the buckets is too large to index directly. Thus, only the nonempty buckets are retained by resorting to convectional hashing of the hash codes h_l (x).

我有3个问题:

粗体的句子对我来说还不清楚:重新排序为哈希码h_l (x)的常规哈希"是什么意思?
关于粗体语句，我不确定是否遇到了问题:我完全理解h_l(x)可能是一个很长的代码，因此可能的存储桶数量可能很大.例如，如果h_l(x)是二进制代码，而length是h_l(x)的长度，那么我们总共有L*2^length个可能的存储桶(因为我们使用了L哈希表)...对吗?
最后一个问题:一旦找到查询向量q属于哪个存储桶，为了找到最近的邻居，我们必须使用原始向量q和原始距离度量?例如，假设原始向量q在128维q=[1,0,12,...,14.3]^T中，并且在我们的应用程序中使用了欧几里德距离.现在假设在LSH中使用的哈希函数(为简单起见，假设L = 1)将此向量映射到20维y=[0100...11]^T的二进制空间，以便确定将哪个存储桶分配q.因此y具有与存储区B相同的索引，并且已经包含100个向量.现在，为了找到最近的邻居，我们必须使用欧氏距离将q与所有其他100128维向量进行比较.这是正确的吗?

The bold sentence is not clear to me: what does it mean by "resorting to convenctional hashing of the hash codes h_l (x)"?
Always about the bold sentence, I'm not sure that I got the problem: I totally understand that h_l(x) can be a long code and so the number of possible buckets can be huge. For example, if h_l(x) is a binary code and length is h_l(x)'s length, then we have in total L*2^length possible buckets (since we use L hash tables)...is that correct?
Last question: once we find which bucket the query vector q belongs to, in order to find the nearest neighbor we have to use the original vector q and the original distance metric? For example, let suppose that the original vector q is in 128 dimensions q=[1,0,12,...,14.3]^T and it uses the euclidean distance in our application. Now suppose that our hashing function (supposing that L=1 for simplicity) used in LSH maps this vector to a binary space in 20 dimensions y=[0100...11]^T in order to decide which bucket assign q to. So y has the same index of the bucket B, and which already contains 100 vectors. Now, in order to find the nearest neighbor, we have to compare q with all the others 100 128-dimensions vectors using euclidean distance. Is this correct?

LSH中的非空桶 [英] Non-empty buckets in LSH

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

LSH中的非空桶 [英] Non-empty buckets in LSH

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭