内联另一个cython软件包中cdef类的cdef方法 [英] Inlining a cdef method from a cdef class from another cython package

查看:103
本文介绍了内联另一个cython软件包中cdef类的cdef方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个cython类,如下所示:

I have cython a class which looks like this:

cdef class Cls:

    cdef func1(self):
        pass

如果我在另一个库中使用此类,我会可以内联func1这是一个类方法吗?还是我应该找到解决方法(例如,通过创建将Cls指针作为arg的func?

If I use this class in another library, will I be able to inline func1 which is a class method? Or should I find a way around it (by creating a func that takes a Cls pointer as an arg, for example?

推荐答案

有一个坏消息:另一个模块不可能进行内联,但是您不必支付Python函数调用的全部费用。

There are bad and good news: The inlining isn't possible from the other module, but you don't have to pay the full price of a Python-function-call.

内联什么是由C编译器完成的:当C编译器知道函数的定义时,可以决定对其进行内联,这有两个优点:

What is inlining? It is done by the C-compiler: when the C-compiler knows the definition of a function it can decide to inline it. This has two advantages:


  1. 您不必支付调用函数的开销

  2. 这使得进一步优化成为可能。

例如查看

%%cython -a
ctypedef unsigned long long ull
cdef ull doit(ull a):
    return a

def calc_sum_fun():
    cdef ull res=0
    cdef ull i
    for i in range(1000000000):#10**9
        res+=doit(i)
    return res

>>> %timeit calc_sum_fun()
53.4 ns ± 1.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

如何在53纳秒内进行10 ^ 9加法?因为未完成,所以:C-Compiler内嵌了 cdef doit()并能够在编译期间计算循环的结果。因此,在运行时,程序简单地返回了预先计算的结果。

How was it possible to do 10^9 additions in 53 nanoseconds? Because it was not done: The C-Compiler inlined the cdef doit() and was able to calculate the result of the loop during the compiler time. So during the run time the program simple returns the precomputed result.

从那里很明显,C编译器将无法从另一个模块内联函数,因为该定义在另一个c文件/翻译单元中被隐藏了。例如:

It is pretty obvious from there, that C compiler will not be able to inline a function from another module, because the definition is concealed from it in another c-file/translation-unit. As example see:

#simple.pdx:
ctypedef unsigned long long ull
cdef ull doit(ull a)

#simple.pyx:
cdef ull doit(ull a):
    return a
def doit_slow(a):
    return a

现在从另一个cython模块访问它:

and now accessing it from another cython module:

%%cython
cimport simple
ctypedef unsigned long long ull
def calc_sum_fun():
    cdef ull res=0
    cdef ull i
    for i in range(10000000):#10**7
        res+=doit(i)
    return res

导致以下计时:

>>> %timeit calc_sum_fun()
17.8 ms ± 208 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

因为不可能进行内联,所以该函数确实必须执行循环...但是,它比普通的python-call更快,我们可以通过替换 cdef doit() def doit_slow()

Because the inlining was not possible, the function really has to do the loop... However, it does it faster than a normal python-call, which we can do by replacing cdef doit() through def doit_slow():

%%cython
import simple              #import, not cimport

ctypedef unsigned long long ull
def calc_sum_fun_slow():
    cdef ull res=0
    cdef ull i
    for i in range(10000000):#10**7
        res+=simple.doit_slow(i)      #slow
    return res

Python调用大约慢50倍!

Python-call is about 50 times slower!

>>> %timeit calc_sum_fun_slow()
1.07 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

但是您询问的是类方法而不是全局函数。对于类方法,即使在同一模块中也无法进行内联:

But you asked about class-methods and not global functions. For class-methods the inlining is not possible even in the same module:

%%cython

ctypedef unsigned long long ull

cdef class A:
    cdef ull doit(self, ull a):
        return a

def calc_sum_class():
    cdef ull res=0
    cdef ull i
    cdef A a=A()
    for i in range(10000000):#10**7
        res+=a.doit(i)      
    return res

线索到:

>>> %timeit calc_sum_class()
18.2 ms ± 264 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

与在另一个模块中定义cdef类的情况基本相同。

which is basically the same as in the case, where the cdef class is defined in another module.

此行为的原因是cdef类的构建方式。它与C ++中的虚拟类有很多不同-类定义具有类似于称为 __ pyx_vtab 的虚拟表的东西:

The reason for this behavior is the way a cdef-class is build. It is a lot unlike virtual classes in C++ - the class definition has something similar to a virtual table called __pyx_vtab:

struct __pyx_obj_12simple_class_A {
  PyObject_HEAD
  struct __pyx_vtabstruct_12simple_class_A *__pyx_vtab;
};

其中指向 cdef doit()的指针保存:

struct __pyx_vtabstruct_12simple_class_A {
   __pyx_t_12simple_class_ull (*doit)(struct __pyx_obj_12simple_class_A *, __pyx_t_12simple_class_ull);
};

当我们调用 a.doit()我们不直接调用该函数,而是通过以下指针:

When we call a.doit() we don't call the function directly but via this pointer:

((struct __pyx_vtabstruct_12simple_class_A *)__pyx_v_a->__pyx_vtab)->doit(__pyx_v_a, __pyx_v_i);

解释了为什么C编译器不能内联函数 doit()

which explains why the C-compiler cannot inline the function doit().

这篇关于内联另一个cython软件包中cdef类的cdef方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆