Python-“输入"语句搜索对象列表慢 [英] Python - "in" statement search slow for list of objects
问题描述
我希望有人能解释为什么搜索对象引用列表比搜索普通列表要慢得多.这是使用python中的"in"关键字来搜索的,我认为它以"C编译器"的速度运行.我认为列表只是对象引用(指针)的数组,因此搜索应该非常快.这两个列表的内存大小均为412236字节.
I'm hoping someone can explain why searching a list of object references is so much slower than searching a normal list. This is using the python "in" keyword to search which I thought runs at "C compiler" speed. I thought a list is just an array of object references (pointers) so the search should be extremely fast. Both lists are exactly 412236 bytes in memory.
常规列表(搜索需要0.000秒):
Normal list (takes 0.000 seconds to search):
alist = ['a' for x in range(100000)]
if 'b' in alist:
print("Found")
对象引用列表(搜索需要0.469 !!秒):
List of object references (takes 0.469 !! seconds to search):
class Spam:
pass
spamlist = [Spam() for x in range(100000)]
if Spam() in spamlist:
print("Found")
因此,显然这与具有比新样式类更多开销的旧样式类有关.现在,只需让我所有的类都继承自对象"类,我的脚本就只能容纳400个对象,现在可以轻松地处理多达10000个对象.就在我以为我知道Python的时候!.
我以前读过关于新式与旧式的文章,但从未有人提到过,旧式类的速度可能比新式类慢100倍.在对象实例列表中搜索特定实例的最佳方法是什么?
1.继续使用"in"语句,但要确保所有类都是新样式.
2.使用"is"语句执行其他某种类型的搜索,例如:
So apparently this has something to do with old-style classes having way more overhead than new style classes. My script that was bogging down with only 400 objects can now easily handle up to 10000 objects simply by making all my classes inherit from the "object" class. Just when I thought I knew Python!.
I've read about new-style vs old-style before but it was never mentioned that old-style classes can be up to 100x slower than new style ones. What is the best way to search a list of object instances for a particular instance?
1. Keep using the "in" statement but make sure all classes are new style.
2. Perform some other type of search using the "is" statement like:
[obj for obj in spamlist if obj is target]
3.还有其他更Python化的方式吗?
3. Some other more Pythonic way?
推荐答案
这主要是由于老式类的特殊方法查找机制不同.
This is mostly due to the different special method lookup mechanics of old-style classes.
>>> timeit.timeit("Spam() in l", """
... # Old-style
... class Spam: pass
... l = [Spam() for i in xrange(100000)]""", number=10)
3.0454677856675403
>>> timeit.timeit("Spam() in l", """
... # New-style
... class Spam(object): pass
... l = [Spam() for i in xrange(100000)]""", number=10)
0.05137817007346257
>>> timeit.timeit("'a' in l", 'l = ["b" for i in xrange(100000)]', number=10)
0.03013876870841159
如您所见,Spam
继承自object
的版本运行速度要快得多,几乎与使用字符串的情况一样快.
As you can see, the version where Spam
inherits from object
runs much faster, almost as fast as the case with strings.
列表的in
运算符使用==
比较项目是否相等.定义==
以便按该顺序尝试对象的__eq__
方法,其__cmp__
方法和指针比较.
The in
operator for lists uses ==
to compare items for equality. ==
is defined to try the objects' __eq__
methods, their __cmp__
methods, and pointer comparison, in that order.
对于老式类,这是通过直接但缓慢的方式实现的. Python必须实际在每个实例的字典以及每个实例的类和超类的字典中查找__eq__
和__cmp__
方法.作为三路比较过程的一部分,__coerce__
也将被查找.当这些方法都不存在时,就好像要进行指针比较一样进行12次dict查找.除了dict查询外,还有很多其他开销,而且我实际上不确定该过程的哪些方面最耗时,但是可以说该过程比可能要花的钱还多.
For old-style classes, this is implemented in a straightforward but slow manner. Python has to actually look for the __eq__
and __cmp__
methods in each instance's dict and the dicts of each instance's class and superclasses. __coerce__
gets looked up too, as part of the 3-way compare process. When none of these methods actually exist, that's something like 12 dict lookups just to get to the pointer comparison. There's a bunch of other overhead besides the dict lookups, and I'm not actually sure which aspects of the process are the most time-consuming, but suffice it to say that the procedure is more expensive than it could be.
对于内置类型和新型类,情况会更好.首先,Python不会在实例的字典上寻找特殊的方法.这样可以节省一些dict查找并启用下一部分.其次,类型对象具有对应于Python级特殊方法的C级函数指针.当特殊方法在C中实现或不存在时,相应的函数指针允许Python完全跳过方法查找过程.这意味着在新型情况下,Python可以快速检测到它应该直接跳过指针比较.
For built-in types and new-style classes, things are better. First, Python doesn't look for special methods on the instance's dict. This saves some dict lookups and enables the next part. Second, type objects have C-level function pointers corresponding to the Python-level special methods. When a special method is implemented in C or doesn't exist, the corresponding function pointer allows Python to skip the method lookup procedure entirely. This means that in the new-style case, Python can quickly detect that it should skip straight to the pointer comparison.
关于应该做什么,我建议使用in
和新样式的类.如果发现此操作正在成为瓶颈,但是您需要旧式类以实现向后兼容性,则any(x is y for y in l)
的运行速度比x in l
快20倍:
As for what you should do, I'd recommend using in
and new-style classes. If you find that this operation is becoming a bottleneck, but you need old-style classes for backward compatibility, any(x is y for y in l)
runs about 20 times faster than x in l
:
>>> timeit.timeit('x in l', '''
... class Foo: pass
... x = Foo(); l = [Foo()] * 100000''', number=10)
2.8618816054721936
>>> timeit.timeit('any(x is y for y in l)', '''
... class Foo: pass
... x = Foo(); l = [Foo()] * 100000''', number=10)
0.12331640524583776
这篇关于Python-“输入"语句搜索对象列表慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!