为什么“键入d.keys()"在O(n)时间中完成而“键入d"在O(1)中完成? [英] Why does 'key in d.keys()' finish in O(n) time while 'key in d' finishes in O(1)?

查看:172
本文介绍了为什么“键入d.keys()"在O(n)时间中完成而“键入d"在O(1)中完成?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对这一个有后续问题.关于原始问题的评论者之一提到,他过去曾错误地使用如下语法来查看别人:

key in d.keys()

完成时间为O(n),而不是

key in d

在O(1)时间结束,没有意识到差异.直到今天(当我试图理解为什么我的代码运行如此缓慢之后,我偶然发现了最初的问题)时,我还是其中的一员.我试图使用Python 2.7.5验证注释的准确性,并且可以肯定的是,这是timeit的结果:

$ python -m timeit -s 'd=dict.fromkeys(range(100))' '1000 in d.keys()'
100000 loops, best of 3: 2.18 usec per loop
$ python -m timeit -s 'd=dict.fromkeys(range(100))' '1000 in d'
10000000 loops, best of 3: 0.0449 usec per loop
$ python -m timeit -s 'd=dict.fromkeys(range(1000))' '10000 in d.keys()'
100000 loops, best of 3: 17.9 usec per loop
$ python -m timeit -s 'd=dict.fromkeys(range(1000))' '10000 in d'
10000000 loops, best of 3: 0.0453 usec per loop    

对于具有100个键的字典,速度差异大约为50倍(2.19微秒/0.0449微秒),对于具有1000个键的字典,对于显式搜索,速度差异约为400倍(17.9微秒/0.0453微秒).构造为使搜索关键字太大而无法在字典中找到.换句话说,就像评论者所说的,O(n)vs. O(1).

我的问题:为什么有什么不同?这两种语法实际上几乎是相同的!显然,在这两种情况下,Python必须做的非常不同,但是内部到底发生了什么,实际上引起了区分?

解决方案

d.keys()返回一个列表,该列表是dict键的副本,而不是视图.构造该列表需要O(n),查找也需要O(n),查找使用list.__contains__即迭代键.

另一方面,key in d本质上是调用

d.__contains__(key)

使用dict键上的O(1)哈希查找有效地实现了方法dict.__contains__.这恰好是dict数据结构的 raison d'être,这与使用d[key]访问字典时获得快速O(1)查找的原因相同.

总而言之,key in d.keys()在python 2中永远不合适.

注意:在python3中已更改,其中使用key in d.keys()或多或少等同于python 2的key in d.viewkeys().

I have a follow-up question to this one. One of the commenters on the original question mentions that he has seen people in the past mistakenly using syntax like:

key in d.keys()

which finishes in O(n) time, as opposed to

key in d

which finishes in O(1) time, not realizing the difference. Until today (when I stumbled upon the original question after trying to understand why my code was running so slowly), I was one of those people. I attempted to verify the accuracy of the comment using Python 2.7.5, and sure enough, here are the results of timeit:

$ python -m timeit -s 'd=dict.fromkeys(range(100))' '1000 in d.keys()'
100000 loops, best of 3: 2.18 usec per loop
$ python -m timeit -s 'd=dict.fromkeys(range(100))' '1000 in d'
10000000 loops, best of 3: 0.0449 usec per loop
$ python -m timeit -s 'd=dict.fromkeys(range(1000))' '10000 in d.keys()'
100000 loops, best of 3: 17.9 usec per loop
$ python -m timeit -s 'd=dict.fromkeys(range(1000))' '10000 in d'
10000000 loops, best of 3: 0.0453 usec per loop    

There is a factor of roughly 50X difference in speed (2.19 usec / 0.0449 usec), for a dictionary with 100 keys, and 400X difference (17.9 usec / 0.0453 usec) for a dictionary with 1000 keys, for searches which are explicitly constructed so that the search key will be too large to be found in the dictionary. So in other words, O(n) vs. O(1), just as the commenter said.

My question: why is it different? The two syntaxes look practically identical! Clearly Python must be doing something very different under the hood between these two cases, but what exactly is going on internally that actually gives rise to the distinction?

解决方案

d.keys() returns a list which is a copy of the dict keys, not a view. Constructing that list takes O(n), as does the lookup, which uses list.__contains__ i.e. iterating the keys.

On the other hand, key in d essentially calls

d.__contains__(key)

The method dict.__contains__ is implemented efficiently with an O(1) hash lookup on the dict keys. This is precisely the raison d'être for the dict data structure, and it is the same reason you get a fast O(1) lookup when you access a dict with d[key].

In summary, key in d.keys() is never appropriate in python 2.

Note: in python3 this has changed, where using key in d.keys() would be more or less equivalent to key in d.viewkeys() for python 2.

这篇关于为什么“键入d.keys()"在O(n)时间中完成而“键入d"在O(1)中完成?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆