在用Cython创建小数组花费的时间量堆积如山 [英] creating small arrays in cython takes a humongous amount of time

查看：165 发布时间：2016/5/31 19:31:14 python arrays performance numpy cython

本文介绍了在用Cython创建小数组花费的时间量堆积如山的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我写了numpy的，根据一个任意分布，当我遇到这确实怪异的行为来产生随机数的新的随机数生成器：

这是test.pyx

  #cython：boundscheck =假
#cython：概括=假
导入numpy的是NP
cimport numpy的是NP
cimport用Cython高清准系统（np.ndarray [双，NDIM = 1]一，np.ndarray [双，NDIM = 1] U，R）：
    返回üDEF UntypedWithLoop（A，U，R）：
    CDEF INT I，J = 0
    因为我在范围内（u.shape [0]）：
        J + = I
    返回ū，J高清BSReplacement（np.ndarray [双，NDIM = 1]一，np.ndarray [双，NDIM = 1] U）：
    CDEF np.ndarray [np.int_t，NDIM = 1] R = np.empty（u.shape [0]，DTYPE = INT）
    CDEF INT I，J = 0
    因为我在范围内（u.shape [0]）：
        J =
    回归ř

setup.py

 从distutils.core进口设置
从Cython.Build进口cythonize
设置（NAME =简单用Cython FUNC，ext_modules = cythonize（'test.pyx'））

分析code

 ＃！的/ usr / bin中/蟒蛇
从__future__进口部进口子
进口timeit导入它们之前#Compile的用Cython模块
subprocess.call（['蟒蛇'，'setup.py'，'build_ext'，'--inplace']）SSTR =
进口测试
进口numpy的
U = numpy.random.random（10）
一个= numpy.random.random（10）
一个= numpy.cumsum（一）
一/ =一个[-1]
R = numpy.empty（10，INT）
打印二进制搜索：创建一个数组[N]，并执行N个二进制搜索来填充它：\\ n，timeit.timeit（'numpy.searchsorted（A，U），SSTR）
打印二进制搜索简单更换：采用相同的ARGS为np.searchsorted同样返回一个新数组，这个执行每个单元只有一个微不足道的操作：\\ n，timeit.timeit（'test.BSReplacement（A，U） ，SSTR）打印功能准系统无为，timeit.timeit（'test.BareBones（A，U，R）'，SSTR）
打印非类型化的投入，做n次迭代：timeit.timeit（'test.UntypedWithLoop（A，U，R）'，SSTR）
打印时间刚刚np.empty（），timeit.timeit（'numpy.empty（10，INT），SSTR）

二分查找执行发生在 len个的（U）*日志（LEN（A））的时间来执行命令。琐碎用Cython函数接受的顺序LEN（U）运行。两种方法都返回LEN（U）的一维int数组。

然而，即使这没有计算简单的实现比numpy的库全二进制搜索需要更长的时间。（这是写在C：<一href=\"https://github.com/numpy/numpy/blob/202e78d607515e0390cffb1898e11807f117b36a/numpy/core/src/multiarray/item_selection.c\" rel=\"nofollow\">https://github.com/numpy/numpy/blob/202e78d607515e0390cffb1898e11807f117b36a/numpy/core/src/multiarray/item_selection.c见PyArray_SearchSorted）

的结果是：

 二进制搜索：创建一个数组[N]，并执行N个二进制搜索来填补它：
1.15157485008
二进制搜索简单更换：采用相同的ARGS为np.searchsorted同样返回一个新的数组。这个执行每个元素只有一个微不足道的操作：
3.69442796707
准系统功能无所事事：0.87496304512
非类型化的投入，做n次迭代：0.244267940521
时间刚刚np.empty（）1.0983929634

为什么np.empty（）步走这么多的时间？而我能做些什么来得到一个空数组我能回来吗？

C函数做到这一点，并运行了一大堆健全的检查，并使用内循环较长的算法。（我删除了所有逻辑除循环本身来回我的例子）

更新

原来有两个明显的问题：

的np.empty（10）单独调用具有极大的相开销，因为它需要为searchsorted作出新的数组，并在其上执行10二进制搜索需要尽可能多的时间

刚刚宣布缓冲区语法 np.ndarray [...] 也有一个庞大的开销，这比接受无类型变量和迭代50次占用更多的时间。

50次迭代的结果：

 二进制搜索：2.45336699486
简单的更换：3.71126317978
准系统功能无所事事：0.924916028976
非类型化的投入，做n次迭代：0.316384077072
时间刚刚np.empty（）1.04949498177

解决方案

有就是这一点，可能有一些有益的建议在用Cython名单上的讨论：
https://groups.google.com/forum/#!topic/cython -Users / CwtU_jYADgM

在后续调用该方法，虽然一般来说，我尝试分配小数组用Cython之外，通过他们，并重新使用它们。我明白，这并不总是一个选项。

I was writing a new random number generator for numpy that produces random numbers according to an arbitrary distribution when I came across this really weird behavior:

this is test.pyx

#cython: boundscheck=False
#cython: wraparound=False
import numpy as np
cimport numpy as np
cimport cython

def BareBones(np.ndarray[double, ndim=1] a,np.ndarray[double, ndim=1] u,r):
    return u

def UntypedWithLoop(a,u,r):
    cdef int i,j=0
    for i in range(u.shape[0]):
        j+=i
    return u,j

def BSReplacement(np.ndarray[double, ndim=1] a, np.ndarray[double, ndim=1] u):
    cdef np.ndarray[np.int_t, ndim=1] r=np.empty(u.shape[0],dtype=int)
    cdef int i,j=0
    for i in range(u.shape[0]):
        j=i
    return r

setup.py

from distutils.core import setup
from Cython.Build import cythonize
setup(name = "simple cython func",ext_modules = cythonize('test.pyx'),)

profiling code

#!/usr/bin/python
from __future__ import division

import subprocess
import timeit

#Compile the cython modules before importing them
subprocess.call(['python', 'setup.py', 'build_ext', '--inplace'])

sstr="""
import test
import numpy
u=numpy.random.random(10)
a=numpy.random.random(10)
a=numpy.cumsum(a)
a/=a[-1]
r=numpy.empty(10,int)
"""

print "binary search: creates an array[N] and performs N binary searches to fill it:\n",timeit.timeit('numpy.searchsorted(a,u)',sstr)
print "Simple replacement for binary search:takes the same args as np.searchsorted and similarly returns a new array. this performs only one trivial operation per element:\n",timeit.timeit('test.BSReplacement(a,u)',sstr)

print "barebones function doing nothing:",timeit.timeit('test.BareBones(a,u,r)',sstr)
print "Untyped inputs and doing N iterations:",timeit.timeit('test.UntypedWithLoop(a,u,r)',sstr)
print "time for just np.empty()",timeit.timeit('numpy.empty(10,int)',sstr)

The binary search implementation takes in the order of len(u)*Log(len(a)) time to execute. The trivial cython function takes in the order of len(u) to run. Both return a 1D int array of len(u).

however, even this no computation trivial implementation takes longer than the full binary search in the numpy library. (it was written in C: https://github.com/numpy/numpy/blob/202e78d607515e0390cffb1898e11807f117b36a/numpy/core/src/multiarray/item_selection.c see PyArray_SearchSorted)

The results are:

binary search: creates an array[N] and performs N binary searches to fill it:
1.15157485008
Simple replacement for binary search:takes the same args as np.searchsorted and similarly returns a new array. this performs only one trivial operation per element:
3.69442796707
barebones function doing nothing: 0.87496304512
Untyped inputs and doing N iterations: 0.244267940521
time for just np.empty() 1.0983929634

Why is the np.empty() step taking so much time? and what can I do to get an empty array that I can return ?

The C function does this AND runs a whole bunch of sanity checks AND uses a longer algorithm in the inner loop. (i removed all the logic except the loop itself fro my example)

Update

It turns out there are two distinct problems:

The np.empty(10) call alone has a ginormous overhead and takes as much time as it takes for searchsorted to make a new array AND perform 10 binary searches on it
Just declaring the buffer syntax np.ndarray[...] also has a massive overhead that takes up MORE time than receiving the untyped variables AND iterating 50 times.

results for 50 iterations:

binary search: 2.45336699486
Simple replacement:3.71126317978
barebones function doing nothing: 0.924916028976
Untyped inputs and doing N iterations: 0.316384077072
time for just np.empty() 1.04949498177

解决方案

There is a discussion of this on the Cython list that might have some useful suggestions: https://groups.google.com/forum/#!topic/cython-users/CwtU_jYADgM

Generally though I try to allocate small arrays outside of Cython, pass them in and re-use them in subsequent calls to the method. I understand that this is not always an option.

这篇关于在用Cython创建小数组花费的时间量堆积如山的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在用Cython创建小数组花费的时间量堆积如山 [英] creating small arrays in cython takes a humongous amount of time

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在用Cython创建小数组花费的时间量堆积如山 [英] creating small arrays in cython takes a humongous amount of time

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭