为什么Python字典可以有多个具有相同哈希值的键? [英] Why can a Python dict have multiple keys with the same hash?

查看:434
本文介绍了为什么Python字典可以有多个具有相同哈希值的键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解Python的hash功能.我创建了一个自定义类,其中所有实例都返回相同的哈希值.

I am trying to understand the Python hash function under the hood. I created a custom class where all instances return the same hash value.

class C:
    def __hash__(self):
        return 42

我只是假设任何时候以上类的一个实例都可以位于dict中,但实际上dict可以有多个具有相同散列的元素.

I just assumed that only one instance of the above class can be in a dict at any time, but in fact a dict can have multiple elements with the same hash.

c, d = C(), C()
x = {c: 'c', d: 'd'}
print(x)
# {<__main__.C object at 0x7f0824087b80>: 'c', <__main__.C object at 0x7f0823ae2d60>: 'd'}
# note that the dict has 2 elements

我做了一些实验,发现如果我重写__eq__方法,以使该类的所有实例都相等,那么dict仅允许一个实例.

I experimented a little more and found that if I override the __eq__ method such that all the instances of the class compare equal, then the dict only allows one instance.

class D:
    def __hash__(self):
        return 42
    def __eq__(self, other):
        return True

p, q = D(), D()
y = {p: 'p', q: 'q'}
print(y)
# {<__main__.D object at 0x7f0823a9af40>: 'q'}
# note that the dict only has 1 element

因此,我很想知道dict如何具有相同散列的多个元素.

So I am curious to know how a dict can have multiple elements with the same hash.

推荐答案

有关Python哈希的工作原理的详细说明,请参见我对的回答.提前返回比其他慢吗?

For a detailed description of how Python's hashing works see my answer to Why is early return slower than else?

基本上,它使用哈希在表中选择一个插槽.如果广告位中有一个值且哈希值匹配,它将比较项目以查看它们是否相等.

Basically it uses the hash to pick a slot in the table. If there is a value in the slot and the hash matches, it compares the items to see if they are equal.

如果哈希不匹配或项目不相等,则尝试另一个槽.有一个公式可以选择(我在引用的答案中对此进行了描述),它会逐渐提取哈希值的未使用部分;但是一旦将其全部用尽,它最终将在哈希表的所有插槽中工作.这样可以保证最终我们找到匹配的商品或空的广告位.当搜索发现一个空插槽时,它会插入值或放弃(取决于我们要添加还是获取值).

If the hash doesn't match or the items aren't equal, then it tries another slot. There's a formula to pick this (which I describe in the referenced answer), and it gradually pulls in unused parts of the hash value; but once it has used them all up, it will eventually work its way through all slots in the hash table. That guarantees eventually we either find a matching item or an empty slot. When the search finds an empty slot, it inserts the value or gives up (depending whether we are adding or getting a value).

要注意的重要一点是,没有列表或存储桶:只有一个具有特定数量的插槽的哈希表,每个哈希用于生成一系列候选插槽.

The important thing to note is that there are no lists or buckets: there is just a hash table with a particular number of slots, and each hash is used to generate a sequence of candidate slots.

这篇关于为什么Python字典可以有多个具有相同哈希值的键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆