如何直接从Cython调用numpy/scipy C函数,而没有Python调用开销? [英] How to call numpy/scipy C functions from Cython directly, without Python call overhead?

查看:302
本文介绍了如何直接从Cython调用numpy/scipy C函数,而没有Python调用开销?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Cython中进行计算,而这些计算在很大程度上依赖于诸如numpy.log之类的一些numpy/scipy数学函数.我注意到,如果我在Cython的循环中重复调用numpy/scipy函数,则会产生巨大的开销,例如:

I am trying to make calculations in Cython that rely heavily on some numpy/scipy mathematical functions like numpy.log. I noticed that if I call numpy/scipy functions repeatedly in a loop in Cython, there are huge overhead costs, e.g.:

import numpy as np
cimport numpy as np
np.import_array()
cimport cython

def myloop(int num_elts):
   cdef double value = 0
   for n in xrange(num_elts):
     # call numpy function
     value = np.log(2)

这非常昂贵,大概是因为np.log是通过Python而不是直接调用numpy C函数.如果我将该行替换为:

This is very expensive, presumably because np.log goes through Python rather than calling the numpy C function directly. If I replace that line with:

from libc.math cimport log
...
# calling libc function 'log'
value = log(2)

然后它要快得多.但是,当我尝试将numpy数组传递给libc.math.log时:

then it's much faster. However, when I try to pass a numpy array to libc.math.log:

cdef np.ndarray[long, ndim=1] foo = np.array([1, 2, 3])
log(foo)

出现此错误:

TypeError: only length-1 arrays can be converted to Python scalars

我的问题是:

  1. 是否可以调用C函数并将其传递给numpy数组?还是只能将其用于标量值,这将需要我编写一个循环(例如,如果我想将其应用于上面的foo数组.)
  2. 有没有类似的方法可以直接从C调用scipy函数,而又不增加Python开销?我该如何导入scipy的C函数库?
  1. Is it possible to call the C function and pass it a numpy array? Or can it only be used on scalar values, which would require me to write a loop (eg if I wanted to apply it to the foo array above.)
  2. Is there an analogous way to call scipy functions from C directly without a Python overhead? Which how can I import scipy's C function library?

具体示例:假设您想在Cython的for循环内的标量值上调用scipy或numpy的许多有用的统计函数(例如scipy.stats.*)?在Cython中重新实现所有这些功能非常疯狂,因此必须调用其C版本.例如,与pdf/cdf相关的所有功能以及从各种统计分布中采样的所有功能(例如,请参见 http://www.johndcook.com/distributions_scipy.html )如果您在循环,速度会非常慢.

Concrete example: say you want to call many of scipy's or numpy's useful statistics functions (e.g. scipy.stats.*) on scalar values inside a for loop in Cython? It's crazy to reimplement all those functions in Cython, so their C versions have to be called. For example, all the functions related to pdf/cdf and sampling from various statistical distributions (e.g. see http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.pdf.html#scipy.stats.rv_continuous.pdf and http://www.johndcook.com/distributions_scipy.html) If you call these functions with Python overhead in a loop, it'll be prohibitively slow.

谢谢.

推荐答案

您无法应用C函数,例如登录numpy数组,并且numpy没有可从cython调用的C函数库.

You cannot apply C functions such as log on numpy arrays, and numpy does not have a C function library that you can call from cython.

Numpy函数已经过优化,可以在numpy数组上调用.除非您有一个非常独特的用例,否则将numpy函数重新实现为C函数不会有太大的好处. (numpy中的某些功能可能无法很好地实现,在某些情况下,请考虑将您的导入内容作为补丁提交.)但是,您确实提出了一个要点.

Numpy functions are already optimized to be called on numpy arrays. Unless you have a very unique use case, you're not going to see much benefit from reimplementing numpy functions as C functions. (It's possible that some functions in numpy are not implemented well, in chich case consider submitting your importations as patches.) However you do bring up a good point.

# A
from libc.math cimport log
for i in range(N):
    r[i] = log(foo[i])

# B
r = np.log(foo)

# C
for i in range(n):
    r[i] = np.log(foo[i])

通常,A和B的运行时间应该相似,但应避免使用C,并且运行速度会慢得多.

In general, A and B should have similar run times, but C should be avoided and will be much slower.

更新

这是scipy.stats.norm.pdf的代码,您可以看到它是用numpy和scipy调用以python编写的.该代码没有C版本,您必须将其称为通过python".如果这是使您退缩的原因,则需要将其重新植入C/Cython中,但是首先,我会花一些时间非常仔细地分析代码,以查看是否有其他需要解决的低挂水果.

Here is the code for scipy.stats.norm.pdf, as you can see it's written in python with numpy and scipy calls. There is no C version of this code, you have to call it "through python". If this is what is holding you back, you'll need to re-implant it in C/Cython, but first I would spend some time very carefully profiling the code to see if there are any lower hanging fruit to go after first.

def pdf(self,x,*args,**kwds):
    loc,scale=map(kwds.get,['loc','scale'])
    args, loc, scale = self._fix_loc_scale(args, loc, scale)
    x,loc,scale = map(asarray,(x,loc,scale))
    args = tuple(map(asarray,args))
    x = asarray((x-loc)*1.0/scale)
    cond0 = self._argcheck(*args) & (scale > 0)
    cond1 = (scale > 0) & (x >= self.a) & (x <= self.b)
    cond = cond0 & cond1
    output = zeros(shape(cond),'d')
    putmask(output,(1-cond0)+np.isnan(x),self.badvalue)
    if any(cond):
        goodargs = argsreduce(cond, *((x,)+args+(scale,)))
        scale, goodargs = goodargs[-1], goodargs[:-1]
        place(output,cond,self._pdf(*goodargs) / scale)
    if output.ndim == 0:
        return output[()]
    return output

这篇关于如何直接从Cython调用numpy/scipy C函数,而没有Python调用开销?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆