Minhash实施如何寻找哈希函数的排列 [英] Minhash implementation how to find hash functions for permutations

查看:466
本文介绍了Minhash实施如何寻找哈希函数的排列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,实施minhashing。在纸面上,并从阅读我理解这个概念,但我的问题是置换绝招。相反置换套的矩阵和值建议的实施是:挑K(例如100)独立散列函数,然后算法说:

I have a problem implementing minhashing. On paper and from reading I understand the concept, but my problem is the permutation "trick". Instead of permuting the matrix of sets and values the suggestion for implementation is: "pick k (e.g. 100) independent hash functions" and then the algorithm says:

for each row r 
    for each column c 
        if c has 1 in row r 
           for each hash function h_i  do
            if h_i(r) is a smaller value than M (i, c) then
            M(i, c) := h_i(r)

在不同的小例子,教​​学他们只用两三个散列函数的(H = A * X + B模p)的形式。那是很容易找到,但如何做到在实践中,我怎么能找到这样的独立功能的100。

In different small examples and teaching book they only use two or three hash functions in the form of (h = a*x + b mod p). Thats easy to find, but how to do in practice, how can I find 100 of such independent functions.

在一个Java的例子这里有生成的哈希值只能从一个哈希函数,而不是多散列函数,独立于行索引。其中的区别? 我的问题是现在如何找到这些独立的散列函数,或者如果存在与只有一个散列函数算法如何在治疗这些值?一种方法

In a Java example here there are generated hash values only from one hash function instead of multi hash functions, independent of the row index. Where is the difference ? My question is now how to find these independent hash functions or if there is an approach with only one hash function how to treat these values in the algorithm ?

推荐答案

使用参数化哈希家庭如制表哈希函数(一种简单的方法的 http://en.wikipedia.org/wiki/Tabulation_hashing

One simple way is using a parametric hash family such as Tabulation hashing functions(http://en.wikipedia.org/wiki/Tabulation_hashing)

在这本书的例子(A * X + B模p)通过选择不同的组(A,B,P),你可以有不同的哈希函数。有独立的哈希函数的一种方法是选择(A,B,P)素/互质数和preferly大量

In the book's example (a*x+b mod p) by choosing different sets of (a, b, p) you can have different hash function. One way to have independent hash functions is to choose (a, b, p) prime/co-prime and preferly large numbers

这篇关于Minhash实施如何寻找哈希函数的排列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆