为什么只使用素数来进行散列函数分割方法 [英] Why only use primes for Hash function division method

查看:209
本文介绍了为什么只使用素数来进行散列函数分割方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用划分方法的散列意味着h(k)= k mod m。我读过这样的话:


m不应该是2的幂。这是因为如果m = 2 ^ p,h变成
只是k的p个最低位。通常情况下,我们选择m为不超过2的幂的主要
数。


有人可以用小例子的最低位部分?我认为所有的(mod m)都是围绕范围m包裹结果。不知何故,如果m是2的幂,就不会看到问题。 解决方案

计算机中的所有数据都以二进制数据的形式存储。一个二进制数字以base-2写成。



如果您散列数据,您希望创建一个易于比较的指纹。如果我们有类似的数据与原始数据不完全相同,它不应该创建相同的指纹(散列)。



猜猜如果您使用m其中 m = 2 ^ p(p是int> = 0)。因为2 ^ 7是2 ^ 4的倍数,所以从2 ^ 4剩下的所有位将减少为0.您切断了部分数据。这意味着如果数据在二进制数的最左边位中不同,它们将创建相同的散列。



示例:

  k:1111111111010101 
m:0000000001000000(2 ^ 6)
k(m):0000000000010101

$ b

 

k:0000000000010101
m:0000000001000000(2 ^ 6)
k(m):0000000000010101



<嘿,那是一样的哈希!这正是选择远离2 ^ p的数字的原因。通过这种方式,最左边的位在计算散列时很重要,并且两个相似的数据创建相同的散列的可能性要小得多。


Hashing using division method means h(k) = k mod m . I read that

m should not be power of 2. This is because if m = 2^p, h becomes just the p lowest-order bits of k. Usually we choose m to be a prime number not too close to a power of 2.

Could someone explain with a small example the lowest order bits part? I thought all (mod m) does is that it wraps the result around a range m. Somehow cant see the issue if m was power of 2.

解决方案

All data in the computer is stored as binary data. A binary number is written in base-2.

If you hash data, you want to create a fingerprint that is easy comparable. If we have similar data that is not exactly the same as the original data, it shouldn't create the same fingerprint (hash).

Guess what happens if you use an m where m = 2^p (p is int >= 0). Because 2^7 is a multiple of 2^4 for example, all bits left from 2^4 will be reduced to 0. You cut off part of the data. This means that if the data is different in the left-most bits of the binary number, they will create the same hash.

Example:

k:    1111111111010101
m:    0000000001000000 (2^6)
k(m): 0000000000010101

Now do the same for this:

k:    0000000000010101
m:    0000000001000000 (2^6)
k(m): 0000000000010101

Hey, that is the same hash! This is exactly the reason why a number far from 2^p is chosen. This way the left-most bits do matter in calculating the hash, and it is far less likely that two similar pieces of data create identical hashes.

这篇关于为什么只使用素数来进行散列函数分割方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆