良好的排列哈希函数? [英] Good hash function for permutations?

查看:141
本文介绍了良好的排列哈希函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有特定范围的数字(通常从0到大约1000)。算法从这个范围中选择一些数字(大约3到10个数字)。这种选择经常进行,我需要检查所选数字的排列是否已被选中。例如,一步选择 [

1,10,3,18] 和另一个 [10,18,3,1] ,那么第二个选择可以被丢弃,因为它是一个排列组合。



我需要做这个检查非常快。现在我把所有的数组放在哈希映射中,并使用自定义的哈希函数:总结所有元素,所以1 + 10 + 3 + 18 = 32,还有10 + 18 + 3 + 1 = 32。对于等于我使用一个bitset来快速检查元素是否在两个集合(我不需要排序时,使用bitset,但它只适用于数字的范围已知,而不是太大)。

这可以正常工作,但会产生大量冲突,所以equals()方法经常被调用。我想知道是否有更快的方法来检查排列组合?



是否有用于排列的好散列函数?



我已经做了一点基准测试:生成范围为0到6的所有数字组合,数组长度为1到9.有3003个可能的排列,并且应该在接近这个许多不同的散列(我使用32位散列的数字)时产生一个好的散列:


  • 用于添加41个不同的哈希值(所以有很多的碰撞)

  • 8个不同的哈希值用于异或值的匹配

  • 286个不同的哈希值用于乘以(R + 2e)的3003个不同的哈希值并以abc为单位进行乘法运算(对R使用1779033703)


    所以abc的散列值可以非常快地计算出来,比其他所有元素都好很多。谢谢!



    PS:当我不必排序时,我不想排序这些值,因为这会变得太慢。

    解决方案

    一个潜在的候选人可能是这样的。
    修正一个奇数整数R.
    对于每个要散列的元素e计算因子(R + 2 * e)。
    然后计算所有这些因素的乘积。
    最后将乘积除以2得到散列。

    (R + 2e)中的因子2保证所有因子都是奇数,因此避免了
    表示产品将会变为0.除此之外的2分之一是因为
    的产品总是会是奇数的,因此该分割只是消除了一个常数位。

    例如我选择R = 1779033703.这是一个任意的选择,做一些实验应该显示给定的R是好还是坏。假设你的值是[1,10,3,18]。
    产品(使用32位整数计算)为

    $ p $ (R + 2)*(R + 20) *(R + 6)*(R + 36)= 3376724311

    p>


    3376724311/2 = 1688362155。



    I have got numbers in a specific range (usually from 0 to about 1000). An algorithm selects some numbers from this range (about 3 to 10 numbers). This selection is done quite often, and I need to check if a permutation of the chosen numbers has already been selected.

    e.g one step selects [1, 10, 3, 18] and another one [10, 18, 3, 1] then the second selection can be discarded because it is a permutation.

    I need to do this check very fast. Right now I put all arrays in a hashmap, and use a custom hash function: just sums up all the elements, so 1+10+3+18=32, and also 10+18+3+1=32. For equals I use a bitset to quickly check if elements are in both sets (I do not need sorting when using the bitset, but it only works when the range of numbers is known and not too big).

    This works ok, but can generate lots of collisions, so the equals() method is called quite often. I was wondering if there is a faster way to check for permutations?

    Are there any good hash functions for permutations?

    UPDATE

    I have done a little benchmark: generate all combinations of numbers in the range 0 to 6, and array length 1 to 9. There are 3003 possible permutations, and a good hash should generated close to this many different hashes (I use 32 bit numbers for the hash):

    • 41 different hashes for just adding (so there are lots of collisions)
    • 8 different hashes for XOR'ing values together
    • 286 different hashes for multiplying
    • 3003 different hashes for (R + 2e) and multiplying as abc has suggested (using 1779033703 for R)

    So abc's hash can be calculated very fast and is a lot better than all the rest. Thanks!

    PS: I do not want to sort the values when I do not have to, because this would get too slow.

    解决方案

    One potential candidate might be this. Fix a odd integer R. For each element e you want to hash compute the factor (R + 2*e). Then compute the product of all these factors. Finally divide the product by 2 to get the hash.

    The factor 2 in (R + 2e) guarantees that all factors are odd, hence avoiding that the product will ever become 0. The division by 2 at the end is because the product will always be odd, hence the division just removes a constant bit.

    E.g. I choose R = 1779033703. This is an arbitrary choice, doing some experiments should show if a given R is good or bad. Assume your values are [1, 10, 3, 18]. The product (computed using 32-bit ints) is

    (R + 2) * (R + 20) * (R + 6) * (R + 36) = 3376724311
    

    Hence the hash would be

    3376724311/2 = 1688362155.

    这篇关于良好的排列哈希函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆