哪些是在一个Int32或UInt32的散列位好方法呢? [英] What are good methods for hashing bits in an Int32 or UInt32?
问题描述
我有一个伪随机数发生器的实施方案,具体的乔治马尔萨利亚的异或移RNG。我的实现是在这里:
I have an implementation of a pseudo random number generator, specifically of George Marsaglia's XOR-Shift RNG. My implementation is here:
<一个href=\"http://sharpneat.svn.sourceforge.net/svnroot/sharpneat/trunk/SharpNeatV2/src/SharpNeatLib/Utility/FastRandom.cs\"相对=nofollow> FastRandom.cs
原来,第一随机样本是非常密切的种子,如果你看一看的重新初始化(INT种子)方法,该方法是相当明显的相关性。这不好。我提出的解决方案是混合了种子位如下:
It turns out that the first random sample is very closely correlated with the seed, which is fairly obvious if you take a look at the Reinitialise(int seed) method. This is bad. My proposed solution is to mix up the bits of the seed as follows:
_x = (uint)( (seed * 2147483647)
^ ((seed << 16 | seed >> 48) * 28111)
^ ((seed << 32 | seed >> 32) * 69001)
^ ((seed << 48 | seed >> 16) * 45083));
所以我显著由种子的位有四个质数相乘和异或回形成_x减弱任何相关性。我乘法之前也旋转种子的位,以确保幅度不一位混合起来在整个范围值的32位值。
So I have significantly weakened any correlation by multiplying the seed's bits with four primes and XORing back to form _x. I also rotate the seed's bits before multiplication to ensure that bits of varying magnitudes get mixed up across the full range of values for a 32 bit value.
四向旋转只是似乎很喜欢无所事事和每一个可能的转角(32)之间的一个很好的平衡。该素数是手指在空中 - 足够的幅度和位结构,在整个32位混杂起来的点点的$ P $垫'他们不管出发种子
The four-way rotation just seemed liked a nice balance between doing nothing and every possible rotation (32). The primes are 'finger in the air' - enough magnitude and bit structure to jumble up the bits and 'spread' them over the full 32 bits regardless of the starting seed.
我应该用更大的质数?有没有一个标准的办法解决这个问题,也许更正式的依据?我想以最小的CPU开销做到这一点。
Should I use bigger primes? Is there a standard approach to this problem, perhaps with a more formal basis? I am trying to do this with minimal CPU overhead.
感谢
===更新===
我决定使用一些素数,世界各地的32位分布较好的位。其结果是,我可以省略转变为乘法达到同样的效果(散列在整个范围内的32位位),所以我然后只添加四个产品,以得到最终的种子......
I decided to use some primes with set bits better distributed across all 32 bits. The result is that I can omit the shifts as the multiplications achieve the same effect (hashing bits across the full range of 32 bits), so I then just add the four products to give the final seed...
_x = (uint)( (seed * 1431655781)
+ (seed * 1183186591)
+ (seed * 622729787)
+ (seed * 338294347));
我可能逃脱较少的素数/ multiplciations。两个似乎太少了(我仍然可以看到的第一个样本模式),三看着好了,安全余量我做到了四人。
I could possibly get away with fewer primes/multiplciations. Two seemed too few (I could still see patterns in the first samples), three looked OK, so for a safety margin I made it four.
===更新2 ===
=== UPDATE 2 ===
供参考以上减少到功能上等同的:
FYI the above reduces to the functionally equivalent:
_x = seed * 3575866506U;
我起初并没有发现这一点,当我做我在想,如果在计算中不同阶段的溢出会导致不同的结果。我认为答案是否定的 - 两种计算总是给相同的答案
I didn't spot this initially and when I did I was wondering if overflowing at different stages in the calculation would cause a different result. I believe the answer is no - the two calculations always give the same answer.
推荐答案
据一些研究人员, CrapWow 一> Crap8 并的 Murmur3 是是最好的非加密散列算法可今天既快速,简便,统计学不错。
According to some researchers, CrapWow, Crap8 and Murmur3 are the best non-cryptographic hash algorithms available today that are both fast, simple and statistically good.
更多信息,请访问非加密散列函数动物园。
这篇关于哪些是在一个Int32或UInt32的散列位好方法呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!