哈希在python中有什么作用? [英] What does hash do in python?

查看:23
本文介绍了哈希在python中有什么作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到了一个代码示例,其中 hash 函数应用于元组.结果它返回一个负整数.我想知道这个功能有什么作用?谷歌没有帮助.我找到了一个页面,解释了哈希是如何计算的,但没有解释为什么我们需要这个函数.

解决方案

哈希是一个固定大小的整数标识特定值.每个值都需要有自己的哈希值,因此对于相同的值,即使不是同一个对象,您也会获得相同的哈希值.

<预><代码>>>>hash("看着我!")4343814758193556824>>>f = "看着我!">>>哈希(f)4343814758193556824

散列值需要以这样一种方式创建,即结果值均匀分布,以减少您获得的散列冲突的数量.哈希冲突是指两个不同的值具有相同的哈希值.因此,相对较小的更改通常会导致非常不同的哈希值.

<预><代码>>>>hash("看着我!!")6941904779894686356

这些数字非常有用,因为它们可以在大量值中快速查找值.它们使用的两个示例是 Python 的 setdict.在一个list中,如果要检查一个值是否在列表中,使用if x in values:,Python需要遍历整个列表并比较x 列表中的每个值 values.对于很长的list,这可能需要很长时间.在 set 中,Python 会跟踪每个哈希值,当您键入 if x in values: 时,Python 将获取 x 的哈希值>,在内部结构中查找,然后只将 x 与与 x 具有相同哈希值的值进行比较.

相同的方法用于字典查找.这使得在 setdict 中查找非常快,而在 list 中查找很慢.这也意味着您可以在 list 中拥有不可散列的对象,但不能在 set 中或作为 dict 中的键.不可散列对象的典型示例是任何可变对象,这意味着您可以更改其值.如果您有一个可变对象,它不应该是可散列的,因为它的散列会在其生命周期内发生变化,这会引起很多混乱,因为对象可能会在字典中使用错误的散列值.

请注意,对于一次 Python 运行,值的哈希值只需相同.在 Python 3.3 中,它们实际上会随着 Python 的每次新运行而改变:

$/opt/python33/bin/python3Python 3.3.2(默认,2013 年 6 月 17 日,17:49:21)[GCC 4.6.3] 在 Linux 上输入帮助"、版权"、信用"或许可证"以获取更多信息.>>>哈希(富")1849024199686380661>>>$/opt/python33/bin/python3Python 3.3.2(默认,2013 年 6 月 17 日,17:49:21)[GCC 4.6.3] 在 Linux 上输入帮助"、版权"、信用"或许可证"以获取更多信息.>>>哈希(富")-7416743951976404299

这是为了让猜测某个字符串将具有什么哈希值变得更加困难,这对于 Web 应用程序等来说是一个重要的安全功能.

因此不应永久存储哈希值.如果您需要以永久方式使用哈希值,您可以查看更严重"的哈希类型,加密哈希函数,可用于对文件等进行可验证的校验和

I saw an example of code that where hash function is applied to a tuple. As a result it returns a negative integer. I wonder what does this function do? Google does not help. I found a page that explains how hash is calculated but it does not explain why we need this function.

解决方案

A hash is an fixed sized integer that identifies a particular value. Each value needs to have its own hash, so for the same value you will get the same hash even if it's not the same object.

>>> hash("Look at me!")
4343814758193556824
>>> f = "Look at me!"
>>> hash(f)
4343814758193556824

Hash values need to be created in such a way that the resulting values are evenly distributed to reduce the number of hash collisions you get. Hash collisions are when two different values have the same hash. Therefore, relatively small changes often result in very different hashes.

>>> hash("Look at me!!")
6941904779894686356

These numbers are very useful, as they enable quick look-up of values in a large collection of values. Two examples of their use are Python's set and dict. In a list, if you want to check if a value is in the list, with if x in values:, Python needs to go through the whole list and compare x with each value in the list values. This can take a long time for a long list. In a set, Python keeps track of each hash, and when you type if x in values:, Python will get the hash-value for x, look that up in an internal structure and then only compare x with the values that have the same hash as x.

The same methodology is used for dictionary lookup. This makes lookup in set and dict very fast, while lookup in list is slow. It also means you can have non-hashable objects in a list, but not in a set or as keys in a dict. The typical example of non-hashable objects is any object that is mutable, meaning that you can change its value. If you have a mutable object it should not be hashable, as its hash then will change over its life-time, which would cause a lot of confusion, as an object could end up under the wrong hash value in a dictionary.

Note that the hash of a value only needs to be the same for one run of Python. In Python 3.3 they will in fact change for every new run of Python:

$ /opt/python33/bin/python3
Python 3.3.2 (default, Jun 17 2013, 17:49:21) 
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> hash("foo")
1849024199686380661
>>> 
$ /opt/python33/bin/python3
Python 3.3.2 (default, Jun 17 2013, 17:49:21) 
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> hash("foo")
-7416743951976404299

This is to make is harder to guess what hash value a certain string will have, which is an important security feature for web applications etc.

Hash values should therefore not be stored permanently. If you need to use hash values in a permanent way you can take a look at the more "serious" types of hashes, cryptographic hash functions, that can be used for making verifiable checksums of files etc.

这篇关于哈希在python中有什么作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆