为什么对于散列表,127(prime)大于128? [英] Why is the size 127 (prime) better than 128 for a hash-table?

查看:140
本文介绍了为什么对于散列表,127(prime)大于128?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设简单的统一散列,也就是说,任何给定的值同样可以散列到散列的任何一个槽中。为什么使用大小为127而不是128的表格更好?我真的不明白2号码的威力有什么问题。或者实际上它有什么区别。


当使用除法时,
通常会避免某些值$ b $ m(桌子大小)的b。例如,m
不应该是2的幂,因为如果m
= 2 ^ p,那么h(k)就是k的p个最低阶位。


让我们假设可能的元素只在1到10000之间,我选择了128的表格大小。127怎么会更好?
所以128是2 ^ 6(1000000),而127是0111111.这有什么不同?所有数字(散列时)仍然是k的127个最低位。我有错吗?



我正在寻找一些例子,因为我真的不明白为什么这么糟糕。感谢提前了很多!



PS:我知道:
哈希表:为什么大小应该是素数? 解析方案

blockquote>

所有数字(散列时)仍然是k的p最低位,也是127。

这是错误的(或者我误解了..)。 k%127 取决于k的所有位。 k%128 仅取决于7个最低位。




编辑:

1和10,000。 10,000%127 10,000%128 两者都会将这个变成一个非常小的分布。如果你的分布在1到10,000之间,这是有偏差的,因为{x,2x,3x ,..}更频繁地发生。然后,一个素数大小会给出一个更好的分配,如答案。 (除非x恰好是素数大小。)

因此,切断高位(使用128的大小)不会造成任何问题如果低位的分配足够好。但是,使用真实数据和真正设计得非常糟糕的散列函数,您将需要这些高位。


Supposing simple uniform hashing, that being, any given value is equally like to hash into any of the slots of the hash. Why is it better to use a table of size 127 and not 128? I really don't understand what's the problem with the power of 2 numbers. Or how it actually makes any difference at all.

When using the division method, we usually avoid certain values of m (table size). For example, m should not be a power of 2, since if m = 2^p , then h(k) is just the p lowest-order bits of k.

Let's suppose the possible elements are only between 1 and 10000 and I picked the table size as 128. How can 127 be better? So 128 is 2^6 (1000000) and 127 is 0111111. What difference does this make? All numbers (when hashed) are still going to be the p lowest-order bits of k for 127 too. Did I get something wrong?

I'm looking for some examples as I really can't understand why is this bad. Thanks a lot in advance!

PS: I am aware of: Hash table: why size should be prime?

解决方案

All numbers (when hashed) are still going to be the p lowest-order bits of k for 127 too.

That is wrong (or I misunderstood..). k % 127 depends on all bits of k. k % 128 only depends on the 7 lowest bits.


EDIT:

If you have a perfect distribution between 1 and 10,000. 10,000 % 127 and 10,000 % 128 both will turn this in a excellent smaller distribution. All buckets will contain 10,000 /128 = 78 (or 79) items.

If you have a distribution between 1 and 10,000 that is biased, because {x, 2x, 3x, ..} occur more often. Then a prime size will give a much, much better distribution as explained in this answer. (Unless x is exactly that prime size.)

Thus, cutting off the high bits (using a size of 128) is no problem whatsoever if the distribution in the lower bits is good enough. But, with real data and real badly designed hash functions, you will need those high bits.

这篇关于为什么对于散列表,127(prime)大于128?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆