Python中的常规哈希函数 [英] General Hash Functions In Python

查看:90
本文介绍了Python中的常规哈希函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,


如果有兴趣的话,我已经将各种哈希函数移植到python:


def RSHash(key) :

a = 378551

b = 63689

hash = 0

for i in range(len(key)) :

hash = hash * a + ord(key [i])

a = a * b

return(hash& 0x7FFFFFFF)

def JSHash(关键):

hash = 1315423911

for i in range(len(key)):

hash ^ =((hash<< 5)+ ord(key [i])+(hash> 2))

return(hash& 0x7FFFFFFF)

def PJWHash(关键):

BitsInUnsignedInt = 4 * 8

ThreeQuarters = long((BitsInUnsignedInt * 3)/ 4)

OneEighth = long(BitsInUnsignedInt / 8)

HighBits =(0xFFFFFFFF)<< (BitsInUnsignedInt - OneEighth)

hash = 0

test = 0

for i in range(len(key)):

hash =(hash<< OneEighth)+ ord(key [i])

test = hash& HighBits

如果测试!= 0:

hash =((hash ^(test> ThreeQuarters))&(~HighBits));

return(hash& 0x7FFFFFFF)

def ELFHash(key):

hash = 0

x = 0
$ b范围内的$ b(len(key)):

hash =(hash<< 4)+ ord(key [i])

x = hash& ; 0xF0000000

如果x!= 0:

hash ^ =(x> 24)

hash& = ~x

return(hash& 0x7FFFFFFF)

def BKDRHash(关键):

seed = 131#31 131 1313 13131 131313等..

hash = 0

for i in range(len(key)):

hash =(hash * seed)+ ord(key [i])

return(hash& 0x7FFFFFFF)

def SDBMHash(key):

hash = 0

for i in range (len(key)):

hash = ord(key [i])+(hash<< 6)+(hash<< 16) - hash;

return(hash& 0x7FFFFFFF)

def DJBHash(key):

hash = 5381

for i in range(len( key)):

hash =((hash<< 5)+ hash)+ ord(key [i])

return(hash& 0x7FFFFFFF)

def DEKHash(键):

hash = len(key);

for i in range(len(key)):

hash =((hash<< 5)^(hash> 27))^ ord(key [i])

return(hash& 0x7FFFFFFF)

def APHash(关键):

hash = 0

for i in range(len(key)):

if((i& 1)== 0):

hash ^ =((hash<< 7)^ ord(key [i])^(hash> 3))

else:

hash ^ =(〜((hash<< 11)^ ord(key [i])^(hash> 5)))

return(hash& 0x7FFFFFFF)

print RSHash(''abcdefghijklmnopqrstuvwxyz1234567890'')

print JSHash(''abcdefghijklmnopqrstuvwxyz1234567890'')

打印PJWHash(''abcdefghijklmnopqrstuvwxyz1234567890'')

print ELFHash(''abcdefghijklmnopqrstuvwxyz1234567890'')

print BKDRHash(''abcdefghijklmnopqrstuvwxyz1234567890'')

打印SDBMHash(''abcdefghijklmnopqrstuvwxyz1234567890'')

打印DJBHash(''abcdefghijklmnopqrstuvwxyz1234567890'')

打印DEKHash(''abcdefghijklmnopqrstuvwxyz1234567890'')

打印APHash(''abcdefghijklmnopqrstuvwxyz1234567890'')


Arash Partow

__________________________________________________ ______

成为一个知道他们不知道的人,

而不是一个不知道他们不知道的人,

认为他们知道关于所有事情的一切。
http://www.partow.net

解决方案

" Arash Partow" < pa **** @ gmail.comwrites:


如果有兴趣的话,我已经将各种哈希函数移植到python:


这些对于任何特定的互操作性目的是否有用?

否则我会说只有一个这样的哈希是充足的,或者甚至可能不是

麻烦,因为Python的内置dicts足以满足大多数要求散列的
应用程序,并且加密哈希值可用于你的b $ b非常关心哈希的统一性

输出。


for i in range(len(key)):

hash = hash * a + ord(key [i])

a = a * b



作为一个风格问题,我'我写这个:


for key in:

hash = hash * a + ord(c)

a * = b


和其他类似的循环类似。


Hi Pau l,


对于不同的数据类型,不同的哈希函数可以工作

更好/更差,更少或更多的冲突。


我相信人们有更多的选择,而且人们看到特定的事情可以做得更多,然后

他们就越容易出现用他们自己的

特定的有效解决方案。


这就是说,我相信至少有一个(很可能更多)

上面组中的哈希函数最常用的工作比标准默认哈希更好(比冲突更少)

可用于任何随机字符串集的内置字典。


请随意证明我错了:)


Arash Partow

__________________________________________________ ______

成为一个知道他们不知道的人,

而不是一个不知道他们不知道的人,

认为他们知道每一个人关于所有事情。
http://www.partow.net

Paul Rubin写道:


" Arash Partow" < pa **** @ gmail.comwrites:


如果有兴趣的话,我已经将各种哈希函数移植到python:


这些对于任何特定的互操作性目的是否有用?

否则我会说只有一个这样的哈希是充足的,或者甚至可能不是

麻烦,因为Python的内置dicts足以满足大多数要求散列的
应用程序,并且加密哈希值可用于你的b $ b非常关心哈希的统一性

输出。


for i in range(len(key)):

hash = hash * a + ord(key [i])

a = a * b



作为一个风格问题,我'我写这个:


for key in:

hash = hash * a + ord(c)

a * = b


和其他类似的循环类似。


" Arash Partow" < pa **** @ gmail.comwrites:


对于不同的数据类型,不同的哈希函数工作

更好/更差,更少或者更多的碰撞。



但是你没有说明哪些哈希最适合

什么样的数据。用户如何确定选择哪一个?b


Hi all,

I''ve ported various hash functions to python if anyone is interested:

def RSHash(key):
a = 378551
b = 63689
hash = 0
for i in range(len(key)):
hash = hash * a + ord(key[i])
a = a * b
return (hash & 0x7FFFFFFF)
def JSHash(key):
hash = 1315423911
for i in range(len(key)):
hash ^= ((hash << 5) + ord(key[i]) + (hash >2))
return (hash & 0x7FFFFFFF)
def PJWHash(key):
BitsInUnsignedInt = 4 * 8
ThreeQuarters = long((BitsInUnsignedInt * 3) / 4)
OneEighth = long(BitsInUnsignedInt / 8)
HighBits = (0xFFFFFFFF) << (BitsInUnsignedInt - OneEighth)
hash = 0
test = 0

for i in range(len(key)):
hash = (hash << OneEighth) + ord(key[i])
test = hash & HighBits
if test != 0:
hash = (( hash ^ (test >ThreeQuarters)) & (~HighBits));
return (hash & 0x7FFFFFFF)
def ELFHash(key):
hash = 0
x = 0
for i in range(len(key)):
hash = (hash << 4) + ord(key[i])
x = hash & 0xF0000000
if x != 0:
hash ^= (x >24)
hash &= ~x
return (hash & 0x7FFFFFFF)
def BKDRHash(key):
seed = 131 # 31 131 1313 13131 131313 etc..
hash = 0
for i in range(len(key)):
hash = (hash * seed) + ord(key[i])
return (hash & 0x7FFFFFFF)
def SDBMHash(key):
hash = 0
for i in range(len(key)):
hash = ord(key[i]) + (hash << 6) + (hash << 16) - hash;
return (hash & 0x7FFFFFFF)
def DJBHash(key):
hash = 5381
for i in range(len(key)):
hash = ((hash << 5) + hash) + ord(key[i])
return (hash & 0x7FFFFFFF)
def DEKHash(key):
hash = len(key);
for i in range(len(key)):
hash = ((hash << 5) ^ (hash >27)) ^ ord(key[i])
return (hash & 0x7FFFFFFF)
def APHash(key):
hash = 0
for i in range(len(key)):
if ((i & 1) == 0):
hash ^= ((hash << 7) ^ ord(key[i]) ^ (hash >3))
else:
hash ^= (~((hash << 11) ^ ord(key[i]) ^ (hash >5)))
return (hash & 0x7FFFFFFF)
print RSHash (''abcdefghijklmnopqrstuvwxyz1234567890'')
print JSHash (''abcdefghijklmnopqrstuvwxyz1234567890'')
print PJWHash (''abcdefghijklmnopqrstuvwxyz1234567890'')
print ELFHash (''abcdefghijklmnopqrstuvwxyz1234567890'')
print BKDRHash(''abcdefghijklmnopqrstuvwxyz1234567890'')
print SDBMHash(''abcdefghijklmnopqrstuvwxyz1234567890'')
print DJBHash (''abcdefghijklmnopqrstuvwxyz1234567890'')
print DEKHash (''abcdefghijklmnopqrstuvwxyz1234567890'')
print APHash (''abcdefghijklmnopqrstuvwxyz1234567890'')

Arash Partow
__________________________________________________ ______
Be one who knows what they don''t know,
Instead of being one who knows not what they don''t know,
Thinking they know everything about all things.
http://www.partow.net

解决方案

"Arash Partow" <pa****@gmail.comwrites:

I''ve ported various hash functions to python if anyone is interested:

Are these useful for any particular interoperability purposes?
Otherwise I''d say just one such hash is plenty, or maybe don''t even
bother, since Python''s built-in dicts are sufficient for most
applications that call for hashing, and the cryptographic hashes are
available for when you really care about uniformity of the hash
output.

for i in range(len(key)):
hash = hash * a + ord(key[i])
a = a * b

As a stylistic issue, I''d write this:

for c in key:
hash = hash * a + ord(c)
a *= b

and similarly for the other similar loops.


Hi Paul,

For different data types different hash functions work
better/worse aka fewer or more collisions.

I believe the more choice people have and also the more
ways people see how a particular thing can be done, then
the easier it will be for them to come up with their own
specific efficient solutions.

That said, I believe at least one (most likely more) of
the hash functions in the group above will most always work
better (ala less collisions) than the standard default hash
available in the built-in dict for any random set of strings.

Please feel free to prove me wrong :)


Arash Partow
__________________________________________________ ______
Be one who knows what they don''t know,
Instead of being one who knows not what they don''t know,
Thinking they know everything about all things.
http://www.partow.net
Paul Rubin wrote:

"Arash Partow" <pa****@gmail.comwrites:

I''ve ported various hash functions to python if anyone is interested:


Are these useful for any particular interoperability purposes?
Otherwise I''d say just one such hash is plenty, or maybe don''t even
bother, since Python''s built-in dicts are sufficient for most
applications that call for hashing, and the cryptographic hashes are
available for when you really care about uniformity of the hash
output.

for i in range(len(key)):
hash = hash * a + ord(key[i])
a = a * b


As a stylistic issue, I''d write this:

for c in key:
hash = hash * a + ord(c)
a *= b

and similarly for the other similar loops.


"Arash Partow" <pa****@gmail.comwrites:

For different data types different hash functions work
better/worse aka fewer or more collisions.

But you give no indication of which of those hashes works best for
what kind of data. How is the user supposed to figure out which one
to choose?


这篇关于Python中的常规哈希函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆