Cython实现不比纯python快 [英] Cython implementation no faster than pure python

查看:113
本文介绍了Cython实现不比纯python快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为练习,我编写了一个XOR双向链接列表

For an exercise I've written a XOR doubly-linked list

%%cython

from cpython.object cimport PyObject
from cpython.ref cimport Py_XINCREF, Py_XDECREF
from libc.stdint cimport uintptr_t

cdef class Node:
    cdef uintptr_t _prev_xor_next
    cdef object val

    def __init__(self, object val, uintptr_t prev_xor_next=0):
        self._prev_xor_next=prev_xor_next
        self.val=val

    @property
    def prev_xor_next(self):
        return self._prev_xor_next
    @prev_xor_next.setter
    def prev_xor_next(self, uintptr_t p):
        self._prev_xor_next=p

    def __repr__(self):
        return str(self.val)


cdef class CurrentNode(Node):
    cdef uintptr_t _node, _prev_ptr
    def __init__(self, uintptr_t node, uintptr_t prev_ptr=0):
        self._node = node
        self._prev_ptr= prev_ptr

    @property
    def val(self):
        return self.node.val
    @property
    def node(self):
        ret=<PyObject *> self._node
        return <Node> ret
    @property
    def prev_ptr(self):
        return self._prev_ptr

    cdef CurrentNode forward(self):
        if self.node.prev_xor_next!=self._prev_ptr:
            return CurrentNode(self.node.prev_xor_next^self._prev_ptr, self._node)

    cdef CurrentNode backward(self):
        if self._prev_ptr:
            pp=<PyObject*>self._prev_ptr
            return CurrentNode(self._prev_ptr, self._node^(<Node> pp).prev_xor_next)

    def __repr__(self):
        return str(self.node)

cdef class XORList:
    cdef PyObject* first
    cdef PyObject* last
    cdef int length

    def __init__(self):
        self.length=0
    @property
    def head(self):
        return (<Node> self.first)

    @property
    def tail(self):
        return (<Node> self.last)

    cdef append(self, object val):
        self.length+=1
        #empty list
        if not self.first:
            t=Node(val)
            tp=(<PyObject*> t)
            self.first=tp
            Py_XINCREF(tp)
            self.last=tp
            Py_XINCREF(tp)

        #not empty
        else:
            new_node=Node(val, <uintptr_t> self.last)
            new_ptr=<PyObject*> new_node
            cur_last=<Node>self.last
            cur_last.prev_xor_next=cur_last.prev_xor_next^(<uintptr_t> new_ptr)
            Py_XINCREF(new_ptr)
            self.last=new_ptr
            Py_XINCREF(new_ptr)

    cpdef reverse(self):
        temp=self.last
        self.last=self.first
        self.first=temp

    def __repr__(self):
        return str(list(iter_XORList(self)))
    def __len__(self):
        return self.length

def iter_XORList(l):
    head=<PyObject*>l.head
    cur=CurrentNode(<uintptr_t> head)
    while cur:
        yield cur
        cur=cur.forward()

import time

start=time.time()
cdef XORList l=XORList()
for i in range(100000):
    l.append(i)
print('time xor ', time.time()-start)

start=time.time()
l1=[]
for i in range(100000):
    l1.append(i)
print('time regular ', time.time()-start)

使用上面的内置列表,在cython链接列表上,我的性能持续下降约10倍.

using the builtin list above I consistently get ~10x worse performance on the cython linked list.

time xor  0.10768294334411621
time regular  0.010972023010253906

当我分析xorlist的循环时,我得到:

When I profile the loop for the xorlist I get:

         700003 function calls in 1.184 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.184    1.184 <string>:1(<module>)
        1    0.039    0.039    1.184    1.184 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:108(list_check)
   100000    0.025    0.000    0.025    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:11(__init__)
    99999    0.019    0.000    0.019    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:16(__get__)
    99999    0.018    0.000    0.018    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:19(__set__)
        1    0.000    0.000    0.000    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:60(__init__)
   100000    0.937    0.000    0.999    0.000 _cython_magic_14cf45d2116440f3df600718d58e4f95.pyx:70(append)
   100000    0.113    0.000    1.146    0.000 line_profiler.py:111(wrapper)
        1    0.000    0.000    1.184    1.184 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   100000    0.018    0.000    0.018    0.000 {method 'disable_by_count' of '_line_profiler.LineProfiler' objects}
   100000    0.015    0.000    0.015    0.000 {method 'enable_by_count' of '_line_profiler.LineProfiler' objects}

因此,忽略对append的调用,似乎大部分时间都花在了特殊方法上.

So, ignoring the calls to append it seems most of the time is spent in the special methods.

这使我想到了问题:

  1. 我该如何加快速度
  2. 我认为Cython中的扩展类型是通过结构实现的,是什么导致它们的初始化花费了如此长的时间

我还尝试了使用纯python对普通双向链表进行另一种自定义实现,并且它与cython xorlist的计时在我的机器上相差10%以内.

I also tried another custom implementation of an oridnary doubly-linked list in pure python and the timings of it and the cython xorlist are similar within 10% difference on my machine.

推荐答案

配置文件中的三个罪魁祸首是Node的__init__(在这里是不可避免的),而__get____set__prev_xor_next财产.我的观点是,您不希望使用prev_xor_next属性(或者如果您将其设置为只读的话),因为它使应该是在Python中可以访问的Cython内部.

The three culprits from your profiling look to be Node's __init__ (which is unavoidable here), and __get__ and __set__ for the prev_xor_next property. My view is that you don't want the prev_xor_next property (or if you do it should be read-only) since it makes what should be a Cython internal accessible in Python.

无论是否删除该属性,都在这里使用Cython,以便可以直接写入基础C属性_prev_xor_next.您可能需要在append的开头(以及可能在其他功能中)设置cdef Node cur_last,以确保Cython知道cur_last的类型-我认为它应该可以解决,但是如果您得到在运行时,这就是您需要做的.

Whether you delete the property or not, you are working in Cython here so you can write directly to the underlying C attribute _prev_xor_next. You may need to set cdef Node cur_last at the start of append (and maybe in other functions) to ensure that Cython knows the type of cur_last - I think it should be able to work it out but if you get AttributeErrors at runtime then this is what you need to do.

此更改使我的速度提高了30%(即,它的速度仍然比常规列表慢,但这是一个明显的改进).

This change gives me a 30% speed increase (i.e. it's still slower than a regular list, but it's a noticeable improvement).

我将概述一个更剧烈的更改,我可能应该在您关于此问题的第一个问题上提出该更改.这确实是一个模糊的大纲,因此没有为使其工作而作任何努力...

I'll outline a more drastic change that I possibly should have suggested on your first question about this problem. This really is a vague outline so no effort has been made to get it to work...

  • Node完全在您的XORList类内部:不应在Python中使用它,并且XORList中所有Nodes的生存期都直接与该列表绑定.因此,应该在销毁自己的XORList时破坏它们(或者如果列表缩小,等等),因此不必进行引用计数.因此Node应该是C结构而不是Python对象:

  • Node is entirely internal to your XORList class: it should not be used from Python and the lifetime of all the Nodes in an XORList is tied directly to the list. Therefore they should be destructed on the destruction of their owning XORList (or if the list shrinks, etc) and so do not need to be reference counted. Therefore Node should be a C struct rather than a Python object:

cdef struct Node:
    uintptr_t prev_xor_next
    PyObject* val

# with associated constructor- and destructor-like functions:
cdef Node* make_node(object val, uintptr_t prev_xor_next):
    cdef Node* n = <Node*>malloc(sizeof(Node))
    n.val = <PyObject*>val
    Py_XINCREF(n.val)
    n.prev_xor_next = prev_xor_next
    return n

cdef void destroy_node(Node* n):
    Py_XDECREF(n.val)
    free(n)

  • XORList需要一个__dealloc__函数,该函数循环遍历在每个Node上调用destroy_node的列表(在您的版本中也需要一个__dealloc__函数!)

  • XORList needs a __dealloc__ function that loops through the list calling destroy_node on each Node (it needs a __dealloc__ function anyway in your version too!)

    CurrentNode需要保留Cython类,因为这是您的迭代器"界面.它显然不再可以从Node继承.我将其更改为:

    CurrentNode needs to remain a Cython class, since this is your "iterator" interface. It can obviously no longer inherit from Node. I'd change it to:

    cdef class XORListIterator:
        cdef Node* current_node
        cdef XORList our_list
    

    our_list属性的作用是确保XORList至少与CurrentNode一样长-如果最终得到的XORList迭代器不再存在, current_node属性将无效. current_node不是XORListIterator的所有者,因此不需要析构函数.

    the point of the attribute our_list is to ensure that the XORList is kept alive at least as long as the CurrentNode - if you end up with an iterator for an XORList that no longer exists that the current_node attribute will be invalid. current_node is not owned by XORListIterator so no need for a destructor.

    我认为这种方案的危险在于,确保对XORList所做的任何更改都不会完全使任何现有的XORListIterators失效,直至崩溃.我怀疑这也是您当前版本的问题.

    The danger with this scheme I think is making sure that if any changes to the XORList don't completely invalidate any existing XORListIterators to the point where you get crashes. I suspect this would also be an issue with your current version.

    我怀疑内置的list仍然会保持竞争力,因为它是一种编写良好,高效的结构.请记住,list.append通常是一个简单的Py_INCREF,偶尔会进行数组重新分配和复制.您总是需要创建一个新的Python对象(Node)以及一些相关的引用计数.

    I suspect the built-in list will still remain competitive, since it is a well-written, efficient structure. Remember that list.append is usually a simple Py_INCREF, with an occasional array reallocation and copy. Yours always involves creation of a new Python object (the Node) as well as some associated reference counting.

    我的替代方案避免了很多引用计数(在计算时间和您必须考虑它"时间方面),因此我希望它会更紧密.它保留了每个append的内存分配较小的缺点,这对于链表结构是不可避免的.

    My alternative scheme avoids a lot of reference counting (both in terms of computational time and "you having to think about it" time), so I'd expect it to be much closer. It retain the disadvantage of a small memory allocation each append, which is unavoidable for a linked-list structure.

    附录:解决有关"Cython类的便利性"的评论.在我看来,使用Cython类和struct的两个优点是:

    Addendum: to address the comment about "the convenience of a Cython class". In my view the two advantages of using a Cython class vs a struct are:

    1. 您可以获得与结构相当接近的内容,但不必担心C指针,并且引用计数已得到处理.很明显,对于这个问题,您对指针做了奇怪的事情,并且必须显式地处理引用计数,因此我认为这不适用于您.
    2. 您可以从Python使用它-您不仅限于Cython.在这种情况下,我认为这完全是XORList的实现细节,不应向Python用户公开.
    1. You get something fairly close to a struct, but don't have to worry about C pointers and the reference counting is taken care of. It's pretty clear that for this problem you're doing odd things to pointers and having to handle reference counting explicitly, so I don't think this is applies to you.
    2. You can use it from Python - you aren't just restricted to Cython. In this case I think it's entirely an implementation detail of the XORList that shouldn't be exposed to Python users.

    因此,我认为专门使用Cython类的主要原因适用于您的问题. (当然,对于很多代码来说,优点确实适用!)

    Therefore I think the main reasons to use Cython classes specifically don't apply to your problem. (For a lot of code the advantages do apply, of course!)

    也许值得补充的是,构造Cython类可能是其较慢的功能之一-为了支持可能的继承,构造过程是间接的".您已经成功创建了一个基准,该基准几乎是所有构建的基准-我想这是一个稍微偏斜的基准,实际情况可能还不错.

    It's probably also worth adding that constructing Cython classes is probably one of their slower features - to support possible inheritance the construction process is quite "indirect". You've managed to create a benchmark that turns out to be almost all constructing - I'd guess it's a slightly skewed benchmark and the real case might not be that bad.

    这篇关于Cython实现不比纯python快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆