哈希在Python中做什么? [英] What does hash do in python?

查看:88
本文介绍了哈希在Python中做什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到一个代码示例,其中 hash 函数应用于元组。结果它返回一个负整数。我想知道这个函数做了什么。谷歌没有帮助。我找到了一个解释哈希如何计算的页面,但它不能解释为什么我们需要这个函数。

解决方案

哈希是一个固定大小的整数,用于标识特定值。每个值都需要自己的散列值,因此对于相同的值,即使它不是相同的对象,也会得到相同的散列值。

 >>> hash(看看我!)
4343814758193556824
>>> f =看看我!
>>>哈希(f)
4343814758193556824

需要以这样的方式创建哈希值:得到的值均匀分布以减少您得到的散列冲突的数量。散列冲突是两个不同的值具有相同的散列值。因此,相对较小的变化通常会导致非常不同的哈希。

 >>> hash(Look at me !!)
6941904779894686356

这些数字非常有用,因为它们可以快速查找大量值中的值。它们使用的例子是Python的 set dict 。在 list 中,如果你想检查一个值是否在列表中,如果x在值中:, Python需要遍历整个列表,并将 x 与列表 values 中的每个值进行比较。对于长 list ,这可能需要很长时间。在集合中,Python会跟踪每个散列,并且如果键入 if x in values:,Python将得到 x 的哈希值,在内部结构中查看,然后只比较 x 和与 x



相同的散列。字典查找使用相同的方法。这使得在 set dict 中的查找非常快,而在 list 很慢。这也意味着你可以在列表中有非可哈希对象,但不能在 set 中或作为字典。非可哈希对象的典型示例是任何可变的对象,即可以对其进行更改。如果你有一个可变对象,它不应该是可散列的,因为它是散列,那么它会改变它的生命周期,这会造成很多混乱,因为一个对象最终会在字典中出现错误的散列值。 >

请注意,对于一次Python运行,值的散列值只需要相同。在Python 3.3中,它们实际上会随着Python的每次新运行而改变:

  $ / opt / python33 / bin / python3 
Python 3.3.2(默认,2013年6月17日,17:49:21)
[GCC 4.6.3]在linux
上输入help,copyright,credits或license 了解更多信息。
>>> hash(foo)
1849024199686380661
>>>
$ / opt / python33 / bin / python3
Python 3.3.2(默认,2013年6月17日,17:49:21)
[GCC 4.6.3]在linux
输入帮助,版权,信用或许可证以获取更多信息。
>>> hash(foo)
-7416743951976404299

这样做很难猜出散列值是一个特定的字符串,这是Web应用程序的一个重要的安全功能。



因此散列值不应永久存储。如果您需要永久使用哈希值,您可以查看更严重的哈希类型,加密散列函数,可用于验证文件的校验和等。


I saw an example of code that where hash function is applied to tuple. As a result it returns a negative integer. I wonder what does this function does. Google does not help. I found a page that explains how hash is calculated but it does not explain why we need this function.

解决方案

A hash is an fixed sized integer that identifies a particular value. Each value need to have it's own hash, so for the same value you will get the same hash even if it's not the same object.

>>> hash("Look at me!")
4343814758193556824
>>> f = "Look at me!"
>>> hash(f)
4343814758193556824

Hash values need to be created in such a way that the resulting values are evenly distributed to reduce the number of hash collisions you get. Hash collisions are when two different values have the same hash. Therefore, relatively small changes often result in very different hashes.

>>> hash("Look at me!!")
6941904779894686356

These numbers are very useful, as they enable quick look-up of values in a large collection of values. Examples of their use is in Python's set and dict. In a list, if you want to check if a value is in the list, with if x in values:, Python needs to go through the whole list and compare x with each value in the list values. This can take a long time for a long list. In a set, Python keeps track of each hash, and when you type if x in values:, Python will get the hash-value for x, look that up in an internal structure and then only compare x with the values that have the same hash as x.

The same methodology is used for dictionary lookup. This makes lookup in set and dict very fast, while lookup in list is slow. It also means you can have non-hashable objects in a list, but not in a set or as keys in a dict. The typical example of non-hashable objects is any object that is mutable, ie, you can change it. If you have a mutable object it should not be hashable, as it's hash then will change over it's life-time, which would cause a lot of confusion, as an object could end up under the wrong hash value in a dictionary.

Note that the hash of a value only needs to be the same for one run of Python. In Python 3.3 they will in fact change for every new run of Python:

$ /opt/python33/bin/python3
Python 3.3.2 (default, Jun 17 2013, 17:49:21) 
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> hash("foo")
1849024199686380661
>>> 
$ /opt/python33/bin/python3
Python 3.3.2 (default, Jun 17 2013, 17:49:21) 
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> hash("foo")
-7416743951976404299

This is to make is harder to guess what hash value a certain string will have, which is an important security feature for web applications etc.

Hash values should therefore not be stored permanently. If you need to use hash values in a permanent way you can take a look at the more "serious" types of hashes, cryptographic hash functions, that can be used for making verifiable checksums of files etc.

这篇关于哈希在Python中做什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆