赛顿慢速分裂 [英] Slow division in cython

查看:61
本文介绍了赛顿慢速分裂的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了在cython中快速划分,我可以使用编译器指令

In order to get fast division in cython, I can use the compiler directive

@cython.cdivision(True)

这有效,因为生成的c代码没有零除法检查。但是由于某种原因,它实际上使我的代码变慢了。下面是一个示例:

This works, in that the resulting c code has no zero division checking. However for some reason it is actually making my code slower. Here is an example:

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.cdivision(True)
def example1(double[:] xi, double[:] a, double[:] b, int D):

    cdef int k
    cdef double[:] x = np.zeros(D)

    for k in range(D):
        x[k] = (xi[k] - a[k]) / (b[k] - a[k]) 

    return x

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def example2(double[:] xi, double[:] a, double[:] b, int D):

    cdef int k
    cdef double[:] x = np.zeros(D)

    for k in range(D):
        x[k] = (xi[k] - a[k]) / (b[k] - a[k]) 

    return x

def test_division(self):

    D = 10000
    x = np.random.rand(D)
    a = np.zeros(D)
    b = np.random.rand(D) + 1

    tic = time.time()
    example1(x, a, b, D)
    toc = time.time()

    print 'With c division: ' + str(toc - tic)

    tic = time.time()
    example2(x, a, b, D)
    toc = time.time()

    print 'Without c division: ' + str(toc - tic)

这将导致输出:

With c division: 0.000194787979126
Without c division: 0.000176906585693

关闭零除检查是否有任何理由会使速度减慢(我知道没有零除数) 。

Is there any reason why turning off zero division checking could slow down things (I know there are no zero divisors).

推荐答案

首先,您需要多次(> 1000次)调用这些函数,并平均花费在每个,以准确了解它们之间的差异。每次调用每个函数都不够准确。

Firstly, you need to call the functions many (>1000) times, and take an average of the time spent in each, to get an accurate idea of how different they are. Calling each function once will not be accurate enough.

其次,在该函数上花费的时间将受到其他因素的影响,而不仅是除法循环。调用 def 即这样的Python函数在传递和返回参数时会涉及一些开销。另外,在函数中创建 numpy 数组将花费时间,因此两个函数的循环中的任何差异都不那么明显。

Secondly, the time spent in the function will be affected by other things, not just the loop with divisions. Calling a def i.e. Python function like this involves some overhead in passing and returning the arguments. Also, creating a numpy array in the function will take time, so any differences in the loops in the two functions will be less obvious.

最后,请参见此处( https://github.com/cython/cython/wiki/enhancements-compilerdirectives ),将c-division指令设置为 False 会导致〜35%的速度损失。考虑到其他开销,我认为这不足以显示在您的示例中。我检查了 Cython 输出的 C 代码,并且 example2 的代码明显不同,并且包含一个附加的零除校验,但是当我进行配置时

Finally, see here (https://github.com/cython/cython/wiki/enhancements-compilerdirectives), setting the c-division directive to False has a ~35% speed penalty. I think this is not enough to show up in your example, given the other overheads. I checked the C code output by Cython, and the code for example2 is clearly different and contains an additional zero division check, but when I profile it, the difference in run-time is negligible.

为了说明这一点,我在下面的代码中运行了,这里输入了您的代码,并制作了 def 函数转换为 cdef 函数,即 Cython 函数而不是 Python 函数。这大大减少了传递和返回参数的开销。我还更改了 example1 example2 ,仅计算numpy数组中的值之和,而不是创建并填充新数组。这意味着几乎所有花在每个函数上的时间都在循环中,因此应该更容易看到任何差异。我也多次运行每个函数,并使D变大。

To illustrate this, I ran the code below, where I've taken your code and made the def functions into cdef functions, i.e. Cython functions rather than Python functions. This massively reduces the overhead of passing and returning arguments. I have also changed example1 and example2 to just calculate a sum over the values in the numpy arrays, rather than creating a new array and populating it. This means that almost all the time spent in each function is now in the loop, so it should be easier to see any differences. I have also run each function many times, and made D bigger.

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.cdivision(True) 
@cython.profile(True)
cdef double example1(double[:] xi, double[:] a, double[:] b, int D):

    cdef int k
    cdef double theSum = 0.0

    for k in range(D):
        theSum += (xi[k] - a[k]) / (b[k] - a[k])

    return theSum

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.profile(True)
@cython.cdivision(False)
cdef double example2(double[:] xi, double[:] a, double[:] b, int D):

    cdef int k
    cdef double theSum = 0.0

    for k in range(D):
        theSum += (xi[k] - a[k]) / (b[k] - a[k])

    return theSum


def testExamples():
    D = 100000
    x = np.random.rand(D)
    a = np.zeros(D)
    b = np.random.rand(D) + 1

    for i in xrange(10000):
        example1(x, a, b, D)
        example2(x, a, b,D) 

我通过探查器运行了这段代码( python -m cProfile -s累积),相关输出如下:

I ran this code through the profiler (python -m cProfile -s cumulative), and the relevant output is below:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10000    1.546    0.000    1.546    0.000 test.pyx:26(example2)
10000    0.002    0.000    0.002    0.000 test.pyx:11(example1)

这表明example2慢得多。如果我在example2中启用c-div除法,则example1和example2所花费的时间是相同的,因此这显然会产生重大影响。

which shows that example2 is much slower. If I turn c-division on in example2 then the time spent is identical for example1 and example2, so this clearly has a significant effect.

这篇关于赛顿慢速分裂的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆