Python-“输入"语句搜索对象列表慢 [英] Python - "in" statement search slow for list of objects

查看:62
本文介绍了Python-“输入"语句搜索对象列表慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望有人能解释为什么搜索对象引用列表比搜索普通列表要慢得多.这是使用python中的"in"关键字来搜索的,我认为它以"C编译器"的速度运行.我认为列表只是对象引用(指针)的数组,因此搜索应该非常快.这两个列表的内存大小均为412236字节.

I'm hoping someone can explain why searching a list of object references is so much slower than searching a normal list. This is using the python "in" keyword to search which I thought runs at "C compiler" speed. I thought a list is just an array of object references (pointers) so the search should be extremely fast. Both lists are exactly 412236 bytes in memory.

常规列表(搜索需要0.000秒):

Normal list (takes 0.000 seconds to search):

alist = ['a' for x in range(100000)]
if 'b' in alist:
    print("Found")

对象引用列表(搜索需要0.469 !!秒):

List of object references (takes 0.469 !! seconds to search):

class Spam:
    pass
spamlist = [Spam() for x in range(100000)]
if Spam() in spamlist:
    print("Found")


因此,显然这与具有比新样式类更多开销的旧样式类有关.现在,只需让我所有的类都继承自对象"类,我的脚本就只能容纳400个对象,现在可以轻松地处理多达10000个对象.就在我以为我知道Python的时候!.

我以前读过关于新式与旧式的文章,但从未有人提到过,旧式类的速度可能比新式类慢100倍.在对象实例列表中搜索特定实例的最佳方法是什么?
1.继续使用"in"语句,但要确保所有类都是新样式.
2.使用"is"语句执行其他某种类型的搜索,例如:


So apparently this has something to do with old-style classes having way more overhead than new style classes. My script that was bogging down with only 400 objects can now easily handle up to 10000 objects simply by making all my classes inherit from the "object" class. Just when I thought I knew Python!.

I've read about new-style vs old-style before but it was never mentioned that old-style classes can be up to 100x slower than new style ones. What is the best way to search a list of object instances for a particular instance?
1. Keep using the "in" statement but make sure all classes are new style.
2. Perform some other type of search using the "is" statement like:

[obj for obj in spamlist if obj is target]

3.还有其他更Python化的方式吗?

3. Some other more Pythonic way?

推荐答案

这主要是由于老式类的特殊方法查找机制不同.

This is mostly due to the different special method lookup mechanics of old-style classes.

>>> timeit.timeit("Spam() in l", """
... # Old-style
... class Spam: pass
... l = [Spam() for i in xrange(100000)]""", number=10)
3.0454677856675403
>>> timeit.timeit("Spam() in l", """
... # New-style
... class Spam(object): pass
... l = [Spam() for i in xrange(100000)]""", number=10)
0.05137817007346257
>>> timeit.timeit("'a' in l", 'l = ["b" for i in xrange(100000)]', number=10)
0.03013876870841159

如您所见,Spam继承自object的版本运行速度要快得多,几乎与使用字符串的情况一样快.

As you can see, the version where Spam inherits from object runs much faster, almost as fast as the case with strings.

列表的in运算符使用==比较项目是否相等.定义==以便按该顺序尝试对象的__eq__方法,其__cmp__方法和指针比较.

The in operator for lists uses == to compare items for equality. == is defined to try the objects' __eq__ methods, their __cmp__ methods, and pointer comparison, in that order.

对于老式类,这是通过直接但缓慢的方式实现的. Python必须实际在每个实例的字典以及每个实例的类和超类的字典中查找__eq____cmp__方法.作为三路比较过程的一部分,__coerce__也将被查找.当这些方法都不存在时,就好像要进行指针比较一样进行12次dict查找.除了dict查询外,还有很多其他开销,而且我实际上不确定该过程的哪些方面最耗时,但是可以说该过程比可能要花的钱还多.

For old-style classes, this is implemented in a straightforward but slow manner. Python has to actually look for the __eq__ and __cmp__ methods in each instance's dict and the dicts of each instance's class and superclasses. __coerce__ gets looked up too, as part of the 3-way compare process. When none of these methods actually exist, that's something like 12 dict lookups just to get to the pointer comparison. There's a bunch of other overhead besides the dict lookups, and I'm not actually sure which aspects of the process are the most time-consuming, but suffice it to say that the procedure is more expensive than it could be.

对于内置类型和新型类,情况会更好.首先,Python不会在实例的字典上寻找特殊的方法.这样可以节省一些dict查找并启用下一部分.其次,类型对象具有对应于Python级特殊方法的C级函数指针.当特殊方法在C中实现或不存在时,相应的函数指针允许Python完全跳过方法查找过程.这意味着在新型情况下,Python可以快速检测到它应该直接跳过指针比较.

For built-in types and new-style classes, things are better. First, Python doesn't look for special methods on the instance's dict. This saves some dict lookups and enables the next part. Second, type objects have C-level function pointers corresponding to the Python-level special methods. When a special method is implemented in C or doesn't exist, the corresponding function pointer allows Python to skip the method lookup procedure entirely. This means that in the new-style case, Python can quickly detect that it should skip straight to the pointer comparison.

关于应该做什么,我建议使用in和新样式的类.如果发现此操作正在成为瓶颈,但是您需要旧式类以实现向后兼容性,则any(x is y for y in l)的运行速度比x in l快20倍:

As for what you should do, I'd recommend using in and new-style classes. If you find that this operation is becoming a bottleneck, but you need old-style classes for backward compatibility, any(x is y for y in l) runs about 20 times faster than x in l:

>>> timeit.timeit('x in l', '''
... class Foo: pass
... x = Foo(); l = [Foo()] * 100000''', number=10)
2.8618816054721936
>>> timeit.timeit('any(x is y for y in l)', '''
... class Foo: pass
... x = Foo(); l = [Foo()] * 100000''', number=10)
0.12331640524583776

这篇关于Python-“输入"语句搜索对象列表慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆