为什么 tuple(set([1, “a", “b", “c", “z", “f"])) == tuple(set([“a",";b"、“c"、“z"、“f"、1])) 85% 的时间启用哈希随机化? [英] Why does tuple(set([1,"a","b","c","z","f"])) == tuple(set(["a","b","c","z","f",1])) 85% of the time with hash randomization enabled?

查看:18
本文介绍了为什么 tuple(set([1, “a", “b", “c", “z", “f"])) == tuple(set([“a",";b"、“c"、“z"、“f"、1])) 85% 的时间启用哈希随机化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于零比雷埃夫斯对另一个问题的回答,我们知道了

x = tuple(set([1, "a", "b", "c", "z", "f"]))
y = tuple(set(["a", "b", "c", "z", "f", 1]))
print(x == y)

使用 哈希随机化已启用.为什么是 85%?

Prints True about 85% of the time with hash randomization enabled. Why 85%?

推荐答案

我假设这个问题的任何读者都读过:

I'm going to assume any readers of this question to have read both:

我对 CPython 字典的解释.

首先要注意的是哈希随机化是在解释器启动时决定的.

两个集合中每个字母的哈希值都是相同的,所以唯一重要的是是否存在冲突(会影响顺序).

The hash of each letter will be the same for both sets, so the only thing that can matter is if there is a collision (where order will be affected).

通过第二个链接的推导,我们知道这些集合的后备数组从长度 8 开始:

By the deductions of that second link we know the backing array for these sets starts at length 8:

_ _ _ _ _ _ _ _

在第一种情况下,我们插入 1:

In the first case, we insert 1:

_ 1 _ _ _ _ _ _

然后插入其余部分:

α 1 ? ? ? ? ? ?

然后将其重新散列到大小为 32:

Then it is rehashed to size 32:

    1 can't collide with α as α is an even hash
  ↓ so 1 is inserted at slot 1 first
? 1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

在第二种情况下,我们插入其余部分:

In the second case, we insert the rest:

? β ? ? ? ? ? ?

然后尝试插入1:

    Try to insert 1 here, but will
  ↓ be rehashed if β exists
? β ? ? ? ? ? ?

然后会重新散列:

    Try to insert 1 here, but will
    be rehashed if β exists and has
  ↓ not rehashed somewhere else
? β ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

所以迭代顺序是否不同,完全取决于β是否存在.

So whether the iteration orders are different depends solely on whether β exists.

β 的概率是 5 个字母中的任何一个将散列到 1 模 8 散列到 1 模 32 的机会.

The chance of a β is the chance that any of the 5 letters will hash to 1 modulo 8 and hash to 1 modulo 32.

因为任何散列到 1 模 32 也散列到 1 模 8,我们想要找到 32 个插槽中,五个插槽之一位于插槽 1 的机会:

Since anything that hashes to 1 modulo 32 also hashes to 1 modulo 8, we want to find the chance that of the 32 slots, one of the five is in slot 1:

5 (number of letters) / 32 (number of slots)

5/32 是 0.15625,因此两个集合结构之间的顺序有 15.625% 的可能性¹不同.

5/32 is 0.15625, so there is a 15.625% chance¹ of the orders being different between the two set constructions.

一点也不奇怪,这正是零比雷埃夫斯所测量的.

Not very strangely at all, this is exactly what Zero Piraeus measured.

¹从技术上讲,即使这并不明显.由于重新散列,我们可以假装 5 个散列中的每一个都是唯一的,但是由于线性探测,实际上更有可能发生捆绑"结构……但是因为我们只查看单个插槽是否被占用,这并没有实际上并没有影响到我们.

这篇关于为什么 tuple(set([1, “a", “b", “c", “z", “f"])) == tuple(set([“a",";b"、“c"、“z"、“f"、1])) 85% 的时间启用哈希随机化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆