Python的"in"或"not in"运算符的效率如何? [英] How efficient is Python's 'in' or 'not in' operators?
问题描述
我有一个超过100000个值的列表,并且正在遍历这些值,并检查每个值是否包含在另一个(相同大小的)随机值列表中.
I have a list of over 100000 values and I am iterating over these values and checking if each one is contained in another list of random values (of the same size).
我正在通过使用if item[x] in randomList
来做到这一点.
这有多有效? python是对每个容器进行某种哈希处理还是在内部对另一个容器进行直接搜索以找到我要查找的元素?
I am doing this by using if item[x] in randomList
.
How efficient is this? Does python do some sort of hashing for each container or is it internally doing a straight up search of the other container to find the element I am looking for?
如果它是线性搜索的,那么它会创建randomList的字典并以此进行查找吗?
Also, if it does this search linearly, then does it create a dictionary of the randomList and do the lookup with that?
推荐答案
in
由其适用对象的__contains__
魔术方法实现,因此效率取决于此.例如,set
,dict
和frozenset
将是基于哈希的查找,而list
将需要线性搜索.但是,xrange
(或Python 3.x中的range
)具有一个__contains__
方法,该方法不需要线性搜索,而是可以使用开始/停止/步骤信息来确定真实值. (例如:7 in xrange(4, 1000000)
不是线性完成的.)
in
is implemented by the __contains__
magic method of the object it applies to, so the efficiency is dependent upon that. For instance, set
, dict
and frozenset
will be hash based lookups, while list
will require a linear search. However, xrange
(or range
in Python 3.x) has a __contains__
method that doesn't require a linear search, but instead can use the start/stop/step information to determine a truthy value. (eg: 7 in xrange(4, 1000000)
isn't done linearly).
自定义类可以自由实现__contains__
,但是他们认为合适,但是理想情况下应该在文档中提供一些有关如何实现的信息,如果不太明显".
Custom classes are free to implement __contains__
however they see fit but ideally should provide some information about how it does so in documentation if "not obvious".
这篇关于Python的"in"或"not in"运算符的效率如何?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!