将 NumPy 数组作为参数传递给 numba.cfunc [英] Passing NumPy arrays as arguments to numba.cfunc

查看:47
本文介绍了将 NumPy 数组作为参数传递给 numba.cfunc的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在与一个问题作斗争,我无法解决问题,因此不太知道如何开始解决它.我在 C 编程方面的经验非常有限,这也是我无法取得进步的原因.

I have been fighting with an issue that I am having trouble wrapping my head around, and therefore don't quite know how to start solving it. My experience in programming C is very limited and that is, I think, the reason for which I cannot make progress.

我有一些函数使用 numpy.interpscipy.integrate.quad 来执行某个积分.由于我使用 quad 进行集成,并且根据其文档:

I have some function that uses numpy.interp and scipy.integrate.quad to carry out a certain integral. Since I use quad for the integration, and according to its documentation:

要集成的 Python 函数或方法.如果 func 需要很多参数,它沿对应于第一个轴的轴积分论证.

A Python function or method to integrate. If func takes many arguments, it is integrated along the axis corresponding to the first argument.

如果用户希望提高集成性能,那么 f 可能是一个scipy.LowLevelCallable 带有签名之一:

If the user desires improved integration performance, then f may be a scipy.LowLevelCallable with one of the signatures:

double func(double x)
double func(double x, void *user_data) 
double func(int n, double *xx) 
double func(int n, double *xx, void *user_data) 

user_data 是包含在 scipy.LowLevelCallable 中的数据.在里面用xx调用表格,n是长度包含 xx[0] == xxx 数组和其余的项目是quad 的 args 参数中包含的数字.

The user_data is the data contained in the scipy.LowLevelCallable. In the call forms with xx, n is the length of the xx array which contains xx[0] == x and the rest of the items are numbers contained in the args argument of quad.

此外,某些 ctypes 调用签名支持向后兼容性,但不应在新代码中使用.

In addition, certain ctypes call signatures are supported for backward compatibility, but those should not be used in new code.

我需要使用 scipy.LowLevelCallable 对象来加速我的代码,并且我需要将我的函数设计坚持上述签名之一.此外,由于我不想用 C 库和编译器使整个事情复杂化,我想使用 numba 提供的工具即时"解决这个问题,特别是 numba.cfunc,它允许我绕过 Python C API.

I need to use the scipy.LowLevelCallable objects for speeding up my code, and I need to stick my function design to one of the above signatures. Moreover, since I do not want to be complicating the whole thing with C libraries and compilers, I want to solve this "on the fly" with the tools available from numba, in particular numba.cfunc, which allows me to by-pass the Python C API.

对于一个以积分变量和任意数量的标量参数作为输入的被积函数,我已经能够解决这个问题:

I have been able to solve this for an integrand that takes as an input the integration variable and an arbitrary number of scalar parameters:

    from scipy import integrate, LowLevelCallable
    from numba import njit, cfunc
    from numba.types import intc, float64, CPointer


    def jit_integrand_function(integrand_function):
        jitted_function = njit(integrand_function)

        @cfunc(float64(intc, CPointer(float64)))
        def wrapped(n, xx):
            return jitted_function(xx[0], xx[1], xx[2], xx[3])
        return LowLevelCallable(wrapped.ctypes)

    @jit_integrand_function
    def regular_function(x1, x2, x3, x4):
        return x1 + x2 + x3 + x4

    def do_integrate_wo_arrays(a, b, c, lolim=0, hilim=1):
        return integrate.quad(regular_function, lolim, hilim, (a, b, c))

    >>> print(do_integrate_wo_arrays(1,2,3,lolim=2, hilim=10))
    (96.0, 1.0658141036401503e-12)

此代码工作正常.我能够对被积函数进行 jit 并将 jitted 函数作为 LowLevelCallable 对象返回.但是,我实际上需要将两个 numpy.array 传递给我的被积函数,并且上面的构造中断了:

This code works just fine. I am able to jit the integrand function and return the jitted function as a LowLevelCallable object. However, I actually need to pass to my integrand two numpy.arrays, and the above construction breaks:

    from scipy import integrate, LowLevelCallable
    from numba import njit, cfunc
    from numba.types import intc, float64, CPointer


    def jit_integrand_function(integrand_function):
        jitted_function = njit(integrand_function)

        @cfunc(float64(intc, CPointer(float64)))
        def wrapped(n, xx):
            return jitted_function(xx[0], xx[1], xx[2], xx[3])
        return LowLevelCallable(wrapped.ctypes)

    @jit_integrand_function
    def function_using_arrays(x1, x2, array1, array2):
        res1 = np.interp(x1, array1[0], array1[1])
        res2 = np.interp(x2, array2[0], array2[1])

        return res1 + res2

    def do_integrate_w_arrays(a, lolim=0, hilim=1):
        foo = np.arange(20, dtype=np.float).reshape(2, -1)
        bar = np.arange(60, dtype=np.float).reshape(2, -1)

        return integrate.quad(function_using_arrays, lolim, hilim, (a, foo, bar))


    >>> print(do_integrate_w_arrays(3, lolim=2, hilim=10))
    Traceback (most recent call last):
      File "C:\ProgramData\Miniconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3267, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-63-69c0074d4936>", line 1, in <module>
        runfile('C:/Users/mosegui/Desktop/fos4x_pkg_develop/python-packages/fos4x_tec/fos4x_tec/test_scipy_numba.py', wdir='C:/Users/mosegui/Desktop/fos4x_pkg_develop/python-packages/fos4x_tec/fos4x_tec')
      File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.4\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
        pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
      File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
        exec(compile(contents+"\n", file, 'exec'), glob, loc)
      File "C:/Users/mosegui/Desktop/fos4x_pkg_develop/python-packages/fos4x_tec/fos4x_tec/test_scipy_numba.py", line 29, in <module>
        @jit_integrand_function
      File "C:/Users/mosegui/Desktop/fos4x_pkg_develop/python-packages/fos4x_tec/fos4x_tec/test_scipy_numba.py", line 13, in jit_integrand_function
        @cfunc(float64(intc, CPointer(float64)))
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\decorators.py", line 260, in wrapper
        res.compile()
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\compiler_lock.py", line 32, in _acquire_compile_lock
        return func(*args, **kwargs)
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\ccallback.py", line 69, in compile
        cres = self._compile_uncached()
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\ccallback.py", line 82, in _compile_uncached
        cres = self._compiler.compile(sig.args, sig.return_type)
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\dispatcher.py", line 81, in compile
        raise retval
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\dispatcher.py", line 91, in _compile_cached
        retval = self._compile_core(args, return_type)
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\dispatcher.py", line 109, in _compile_core
        pipeline_class=self.pipeline_class)
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\compiler.py", line 528, in compile_extra
        return pipeline.compile_extra(func)
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\compiler.py", line 326, in compile_extra
        return self._compile_bytecode()
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\compiler.py", line 385, in _compile_bytecode
        return self._compile_core()
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\compiler.py", line 365, in _compile_core
        raise e
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\compiler.py", line 356, in _compile_core
        pm.run(self.state)
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\compiler_machinery.py", line 328, in run
        raise patched_exception
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\compiler_machinery.py", line 319, in run
        self._runPass(idx, pass_inst, state)
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\compiler_lock.py", line 32, in _acquire_compile_lock
        return func(*args, **kwargs)
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\compiler_machinery.py", line 281, in _runPass
        mutated |= check(pss.run_pass, internal_state)
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\compiler_machinery.py", line 268, in check
        mangled = func(compiler_state)
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\typed_passes.py", line 94, in run_pass
        state.locals)
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\typed_passes.py", line 66, in type_inference_stage
        infer.propagate()
      File "C:\Users\mosegui\AppData\Roaming\Python\Python36\site-packages\numba\typeinfer.py", line 951, in propagate
        raise errors[0]
    numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
    Failed in nopython mode pipeline (step: nopython frontend)
    Invalid use of Function(<built-in function getitem>) with argument(s) of type(s): (float64, Literal[int](0))
     * parameterized
    In definition 0:
        All templates rejected with literals.
    In definition 1:
        All templates rejected without literals.
    In definition 2:
        All templates rejected with literals.
    In definition 3:
        All templates rejected without literals.
    In definition 4:
        All templates rejected with literals.
    In definition 5:
        All templates rejected without literals.
    In definition 6:
        All templates rejected with literals.
    In definition 7:
        All templates rejected without literals.
    In definition 8:
        All templates rejected with literals.
    In definition 9:
        All templates rejected without literals.
    In definition 10:
        All templates rejected with literals.
    In definition 11:
        All templates rejected without literals.
    In definition 12:
        All templates rejected with literals.
    In definition 13:
        All templates rejected without literals.
    This error is usually caused by passing an argument of a type that is unsupported by the named function.
    [1] During: typing of intrinsic-call at C:/Users/mosegui/Desktop/fos4x_pkg_develop/python-packages/fos4x_tec/fos4x_tec/test_scipy_numba.py (32)
    [2] During: typing of static-get-item at C:/Users/mosegui/Desktop/fos4x_pkg_develop/python-packages/fos4x_tec/fos4x_tec/test_scipy_numba.py (32)
    File "test_scipy_numba.py", line 32:
    def diff_moment_edge(radius, alpha, chord_df, aerodyn_df):
        <source elided>
        # # calculate blade twist for radius
        # sensor_twist = np.arctan((2 * rated_wind_speed) / (3 * rated_rotor_speed * (sensor_radius / 30.0) * radius)) * (180.0 / np.pi)
        ^
    [1] During: resolving callee type: type(CPUDispatcher(<function function_using_arrays at 0x0000020C811827B8>))
    [2] During: typing of call at C:/Users/mosegui/Desktop/fos4x_pkg_develop/python-packages/fos4x_tec/fos4x_tec/test_scipy_numba.py (15)
    [3] During: resolving callee type: type(CPUDispatcher(<function function_using_arrays at 0x0000020C811827B8>))
    [4] During: typing of call at C:/Users/mosegui/Desktop/fos4x_pkg_develop/python-packages/fos4x_tec/fos4x_tec/test_scipy_numba.py (15)
    File "test_scipy_numba.py", line 15:
    def jit_integrand_function(integrand_function):
        <source elided>
        jitted_function = njit(integrand_function)
     ^

现在,显然这不起作用,因为在这个例子中我没有修改装饰器的设计.但这正是我问题的核心:我不完全理解这种情况,因此不知道如何修改 cfunc 参数以将数组作为参数传递并仍然遵守 scipy.integrate.quad 签名要求.在介绍 CPointersnumba 文档中,有一个如何将数组传递给 numba.cfunc 的示例:

Well now, obviously this doesn't work, because in this example I have not modified the design of the decorator. But that is exactly the core of my question: I do not fully understand this situation and therefore don't know how to modify the cfunc arguments for passing an array as parameter and still complying with the scipy.integrate.quad signature requirements. In the numba documentation that introduces CPointers there is an example of how to pass an array to a numba.cfunc:

C 或 C++ 使用的本机平台 ABI 没有 a 的概念Numpy 中的形状数组.一种常见的解决方案是传递原始数据指针和一个或多个大小参数(取决于维度).Numba 必须提供一种方法来重建数组视图回调中的这个数据.

Native platform ABIs as used by C or C++ don’t have the notion of a shaped array as in Numpy. One common solution is to pass a raw data pointer and one or several size arguments (depending on dimensionality). Numba must provide a way to rebuild an array view of this data inside the callback.

    from numba import cfunc, carray
    from numba.types import float64, CPointer, void, intp

    # A callback with the C signature `void(double *, double *, size_t)`

    @cfunc(void(CPointer(float64), CPointer(float64), intp))
    def invert(in_ptr, out_ptr, n):
        in_ = carray(in_ptr, (n,))
        out = carray(out_ptr, (n,))
        for i in range(n):
            out[i] = 1 / in_[i] ```

我以某种方式理解 CPointer 用于在 C 中构建数组,就像在我的装饰器示例的签名中一样 CPointer(float64) 收集作为参数并将它们放入数组中.但是,我仍然无法将它放在一起,看看如何使用它来传递数组,而不是从我传递的 float 参数集合中创建一个数组.

I somehow understand that the CPointer is used for building the array in C, as in the signature of my decorator example CPointer(float64) gathers all the floats passed as arguments and puts them into an array. However, I am still not able to put it together and see how I can use it for passing an array, not for making an array out of the collection of float arguments I pass.

@max9111 的回答是有效的,因为它能够传递给 scipy.integrate.quad 一个改进的 LowLevelCallable计算的时间效率.这是非常有价值的,因为现在在 C 中如何管理内存地址的工作变得更加清晰. 即使本机 C 中不存在结构化数组的概念,我也可以在 Python 中创建一个结构化数组,其中包含 C 将要存储的数据在连续的内存区域中并通过唯一的内存地址访问它/指向它.结构化数组提供的映射允许识别该内存区域的不同组件.

The answer by @max9111 worked, in the sense that was able to pass to the scipy.integrate.quad a LowLevelCallable that improved the time efficiency of the calculation. This is very valuable, as now it is much clearer how management of memory addresses work in C. Even though the concept of structured array does not exist in native C, I can create a structured array in Python with data that C is going to store in a contiguous memory region and access it/point to it through a unique memory address. The mapping provided by the structured array allows for identifying the different components of that memory region.

尽管 @max9111 的解决方案有效并解决了我最初发布的问题,但从 Python 的角度来看,这种方法引入了一定的开销,在某些条件下,可能比时间更耗时现在通过 LowLevelCallable 调用 scipy.integrate.quad 集成函数获得.

Even though the solution by @max9111 works and solves the question I posted originally, from the Python perspective this approach introduces a certain overhead that, under certain conditions, might be more time consuming than the time gained by now calling the scipy.integrate.quad integration function via a LowLevelCallable.

在我的真实案例中,我将积分用作二维优化问题的一个步骤.优化的每一步都需要积分两次,积分需要九个标量参数和两个数组.只要我还不能通过 LowLevelCallable 解决积分问题,我唯一能做的就是加快代码的速度,就是简单地 njit 被积函数.即使集成仍在通过 Python API 触发,这也能正常工作.

In my real case I am using the integration as a step of a two-dimensional optimization problem. Each step of the optimization requires to integrate twice, and the integral requires nine scalar parameters ad two arrays. As long as I have not been able to solve the integration via a LowLevelCallable, the only thing I could do to speed up the code was to simply njit the integrand function. And that worked decently even though the integration was still being fired via the Python API.

就我而言,实施 @max9111 的解决方案大大提高了集成时间的效率(从每步约 0.0009 秒到约 0.0005 ).然而,创建结构化数组、C 解包数据、将其传递给 jitted 被积函数并返回 LowLevelCallable 的步骤平均每次迭代增加了 0.3 秒,从而使我的情况变得更糟.

In my case, implementing @max9111's solution has greatly improved the efficiency in the integration time (from about 0.0009s per step to about 0.0005 ). Nevertheless, the step of creating the structured array, C-unpacking the data, pass it to the jitted integrand and return a LowLevelCallable has added an extra 0.3s on average per iteration, thereby worsening my situation.

这里有一些玩具代码,用于展示 LowLevelCallable 方法如何在迭代过程中变得更糟:

Here there is some toy code for showing how the LowLevelCallable approach becomes the worse option the more one goes for an iterative process:

    import ctypes
    import timeit

    from tqdm import tqdm
    import numpy as np
    from scipy import integrate, LowLevelCallable
    import numba as nb
    from numba import types
    import matplotlib.pyplot as plt


    ##################################################
    # creating some sample data and parameters
    a = 3
    foo = np.arange(200, dtype=np.float64).reshape(2, -1)
    bar = np.arange(600, dtype=np.float64).reshape(2, -1)

    lim1 = 0
    lim2 = 1

    @nb.njit
    def function_using_arrays(x1, x2, array1, array2):
        res1 = np.interp(x1, array1[0], array1[1])
        res2 = np.interp(x2, array2[0], array2[1])

        return res1 + res2


    ##################################################
    # JIT INTEGRAND

    def do_integrate_w_arrays_jit(a, array1, array2, lolim=0, hilim=1):
        return integrate.quad(function_using_arrays, lolim, hilim, (a, array1, array2))

    def process_jit_integrand():
        do_integrate_w_arrays_jit(a, foo, bar, lolim=lim1, hilim=lim2)


    ##################################################
    # LOWLEV CALLABLE

    def create_jit_integrand_function(integrand_function,args,args_dtype):
        @nb.cfunc(types.float64(types.float64,types.CPointer(args_dtype)))
        def wrapped(x1,user_data_p):
            #Array of structs
            user_data = nb.carray(user_data_p, 1)

            #Extract the data
            x2=user_data[0].a
            array1=user_data[0].foo
            array2=user_data[0].bar

            return integrand_function(x1, x2, array1, array2)
        return wrapped


    def do_integrate_w_arrays_lowlev(func,args,lolim=0, hilim=1):
        integrand_func = LowLevelCallable(func.ctypes,user_data=args.ctypes.data_as(ctypes.c_void_p))
        return integrate.quad(integrand_func, lolim, hilim)


    def process_lowlev_callable():
        args_dtype = types.Record.make_c_struct([
            ('a', types.float64),
            ('foo', types.NestedArray(dtype=types.float64, shape=foo.shape)),
            ('bar', types.NestedArray(dtype=types.float64, shape=bar.shape)),])

        args=np.array((a, foo, bar), dtype=args_dtype)

        func = create_jit_integrand_function(function_using_arrays,args,args_dtype)

        do_integrate_w_arrays_lowlev(func, args, lolim=0, hilim=1)


    ##################################################

    repetitions = range(100)

    jit_integrand_delays = [timeit.timeit(stmt=process_jit_integrand, 
    number=repetition) for repetition in tqdm(repetitions)]
    lowlev_callable_delays = [timeit.timeit(stmt=process_lowlev_callable, 
    number=repetition) for repetition in tqdm(repetitions)]

    fig, ax = plt.subplots()
    ax.plot(repetitions, jit_integrand_delays, label="jit_integrand")
    ax.plot(repetitions, lowlev_callable_delays, label="lowlev_callable")
    ax.set_xlabel('number of repetitions')
    ax.set_ylabel('calculation time (s)')
    ax.set_title("Comparison calculation time")
    plt.tight_layout()
    plt.legend()
    plt.savefig(f'calculation_time_comparison_{repetitions[-1]}_reps.png')

这里比较了两个选项(仅抖动被积函数与 @max9111 的解决方案).在 @max9111 的解决方案的修改版本中,我永久地 jitted 被积函数 (function_using_arrays) 并从 create_jit_integrand_function 中删除了该步骤,其中将开销"时间减少了 20%.此外,也是为了速度,我将 jit_with_dummy_data 函数进行了抑制,并将其功能包含在 process_lowlev_callable 的主体中,主要是为了避免不必要的函数调用.在下面找出两个解的计算时间,最多 100 个循环:

Here the two options (only jitting the integrand vs. @max9111's solution) are compared. In a modified version of @max9111's solution, I have permanently jitted the integrand function (function_using_arrays) and removed that step from create_jit_integrand_function, which reduces the "overhead" time by a neat 20%. Moreover, also for the sake of speed, I have suppressed the jit_with_dummy_data function and included its functionality in the body of process_lowlev_callable, basically for avoiding an unnecessary function call. Find in the following the calculation time for both solutions for a series of up to 100 cycles:

如您所见,如果您处于迭代过程中,则在每次计算中节省的时间 (30+ %!!) 并不能抵消您为构建所需的几个额外函数而带来的开销LowLevelCallable(也可以迭代调用并在 Python C API 上运行的函数).

As you can see, if you are in an iterative process, the time saved in each single calculation (30+ %!!) does not pay off the overhead carried by the couple extra functions that you need to implement for building the LowLevelCallable (functions that are as well called iteratively and run over the Python C API).

底线:这个解决方案非常适合在一个非常重的积分中减少计算时间,但是在迭代过程中求解平均积分时,只是抖动被积函数似乎更好,因为 LowlevelCallable 所需的额外函数,需要与集成本身一样频繁地调用,这会造成损失.

Bottom line: this solution is very good for reducing calculation time in one single very heavy integral, but just jitting the integrand seems to be better off when solving average integrals within an iterative process, since the extra functions required by the LowlevelCallable, that need to be called as much often as the integration itself, take their toll.

无论如何,非常感谢.尽管这个解决方案对我不起作用,但我学到了宝贵的东西,我认为我的问题已经解决了.

Anyway, thank you very much. Even though this solution will not work for me, I learned valuable things and I consider my question solved.

编辑 2:

我误解了 @max9111 的部分解决方案以及函数 create_jit_integrand_function 所扮演的角色,并且我错误地编译了 LowLevelCallable我优化的每一步(我不需要这样做,因为即使传递给积分的参数和数组在每次迭代中都会改变,它们的形状,因此 C 结构的形状保持不变).

I misunderstood parts of @max9111's solution and the role played by the function create_jit_integrand_function, and I was wrongly compiling the LowLevelCallable in each step of my optimization (which I do not need to do beacuse, even though the parameters and arrays passed to the integral change each iteration, their shape, and therefore that of the C struct remains constant).

上述编辑中代码的重构版本有意义:

The refactored version of the code from the above EDIT that makes sense:


    import ctypes
    import timeit

    from tqdm import tqdm
    import numpy as np
    from scipy import integrate, LowLevelCallable
    import numba as nb
    from numba import types
    import matplotlib.pyplot as plt


    ##################################################
    # creating some sample data and parameters
    a = 3
    foo = np.arange(200, dtype=np.float64).reshape(2, -1)
    bar = np.arange(600, dtype=np.float64).reshape(2, -1)

    lim1 = 0
    lim2 = 1


    def function_using_arrays(x1, x2, array1, array2):
        res1 = np.interp(x1, array1[0], array1[1])
        res2 = np.interp(x2, array2[0], array2[1])

        return res1 + res2


    ##################################################
    # JIT INTEGRAND

    def do_integrate_w_arrays_jit(a, array1, array2, lolim=0, hilim=1):
        return integrate.quad(nb.njit(function_using_arrays), lolim, hilim, (a, array1, array2))

    def process_jit_integrand():
        do_integrate_w_arrays_jit(a, foo, bar, lolim=lim1, hilim=lim2)


    ##################################################
    # LOWLEV CALLABLE

    def create_jit_integrand_function(integrand_function, args_dtype):
        jitted_function = nb.njit(integrand_function)

        @nb.cfunc(types.float64(types.float64,types.CPointer(args_dtype)))
        def wrapped(x1,user_data_p):
            #Array of structs
            user_data = nb.carray(user_data_p, 1)

            #Extract the data
            x2=user_data[0].a
            array1=user_data[0].foo
            array2=user_data[0].bar

            return jitted_function(x1, x2, array1, array2)
        return wrapped


    def do_integrate_w_arrays_lowlev(func,args,lolim=0, hilim=1):
        integrand_func=LowLevelCallable(func.ctypes,user_data=args.ctypes.data_as(ctypes.c_void_p))
        return integrate.quad(integrand_func, lolim, hilim)


    def process_lowlev_callable():
        do_integrate_w_arrays_lowlev(func, np.array((a, foo, bar), dtype=args_dtype), lolim=0, hilim=1)


    ##################################################

    repetitions = range(100)

    jit_integrand_delays = [timeit.timeit(stmt=process_jit_integrand, number=repetition) for repetition in tqdm(repetitions)]


    args_dtype = types.Record.make_c_struct([
            ('a', types.float64),
            ('foo', types.NestedArray(dtype=types.float64, shape=foo.shape)),
            ('bar', types.NestedArray(dtype=types.float64, shape=bar.shape)),])
    func = create_jit_integrand_function(function_using_arrays, args_dtype)


    lowlev_callable_delays = [timeit.timeit(stmt=process_lowlev_callable, number=repetition) for repetition in tqdm(repetitions)]

    fig, ax = plt.subplots()
    ax.plot(repetitions, jit_integrand_delays, label="jit_integrand")
    ax.plot(repetitions, lowlev_callable_delays, label="lowlev_callable")
    ax.set_xlabel('number of repetitions')
    ax.set_ylabel('calculation time (s)')
    ax.set_title("Comparison calculation time")
    plt.tight_layout()
    plt.legend()
    plt.savefig(f'calculation_time_comparison_{repetitions[-1]}_reps.png')

在这种配置中,LowLevelCallable 的构建(确实花费了一些时间)只需要执行一次,并且整个过程要快几个数量级:

In this configuration, the building of the LowLevelCallable (which indeed costs a bit of time) only has to be carried out once, and the overall process is orders of magnitude faster:

以及 lowlev_callable 的特写:

推荐答案

可以使用 user_data Input 来传递数组

据我了解 scipy.integrate.quad 使用 scipy.LowLevelCallable,但您可以传递任意的 user_data.

You can use the user_data Input to pass arrays

As I understood the documentation of scipy.integrate.quad it isn't possible to pass arrays with the args parameter when using a scipy.LowLevelCallable, but you can pass abitrary user_data.

在下面的例子中,我使用了这个签名.

In the following example I used this signature.

double func(double x, void *user_data)

double func(double x, void *user_data)

无需重新编译即可编辑任意形状的数组

使用这个 answer 还可以为任意数组形状编译一次函数(只有维数是固定).

Using this answer it is also possible to compile the function once for arbitrary array shapes (only the number of dimensions is fixed).

import numpy as np
import numba as nb
from numba import types
from scipy import integrate, LowLevelCallable
import ctypes

#Void Pointer from Int64
@nb.extending.intrinsic
def address_as_void_pointer(typingctx, src):
    """ returns a void pointer from a given memory address """
    from numba import types, cgutils
    sig = types.voidptr(src)

    def codegen(cgctx, builder, sig, args):
        return builder.inttoptr(args[0], cgutils.voidptr_t)
    return sig, codegen

def create_jit_integrand_function(integrand_function,args_dtype):
    jitted_function = nb.njit(integrand_function)

    #double func(double x, void *user_data)
    @nb.cfunc(types.float64(types.float64,types.CPointer(args_dtype)))
    def wrapped(x1,user_data_p):
        #Array of structs
        user_data = nb.carray(user_data_p, 1)

        #Extract the data
        x2=user_data[0].a
        array1=nb.carray(address_as_void_pointer(user_data[0].foo_p),(user_data[0].foo_s1,user_data[0].foo_s2),dtype=np.float64)
        array2=nb.carray(address_as_void_pointer(user_data[0].bar_p),(user_data[0].bar_s1,user_data[0].bar_s2),dtype=np.float64)

        return jitted_function(x1, x2, array1, array2)
    return wrapped

def function_using_arrays(x1, x2, array1, array2):
    res1 = np.interp(x1, array1[0], array1[1])
    res2 = np.interp(x2, array2[0], array2[1])

    return res1 + res2

def do_integrate_w_arrays(func,args,lolim=0, hilim=1):
    integrand_func=LowLevelCallable(func.ctypes,user_data=args.ctypes.data_as(ctypes.c_void_p))
    return integrate.quad(integrand_func, lolim, hilim)

#Define the datatype of the struct array
#Pointers are not allowed, therefore we use int64
args_dtype = types.Record.make_c_struct([
    ('a', types.float64),
    ('foo_p',  types.int64),
    ('foo_s1', types.int64),
    ('foo_s2', types.int64),
    ('bar_p',  types.int64),
    ('bar_s1', types.int64),
    ('bar_s2', types.int64),])

#creating some sample data
#The arrays must be c-contigous
#To ensure that you can use np.ascontiguousarray

a=3
foo = np.ascontiguousarray(np.arange(200, dtype=np.float64).reshape(2, -1))
bar = np.ascontiguousarray(np.arange(600, dtype=np.float64).reshape(2, -1))


args=np.array((a,foo.ctypes.data,foo.shape[0],foo.shape[1],
                 bar.ctypes.data,bar.shape[0],bar.shape[1]),dtype=args_dtype)

#compile the integration function (array-shapes are fixed)
#There is only a structured array like args allowed
func=create_jit_integrand_function(function_using_arrays,args_dtype)


print(do_integrate_w_arrays(func,args, lolim=0, hilim=1))

旧版本

当我传递一个结构化数组 如果数组形状或数据类型发生变化,则需要重新编译.这不是 API 本身的限制.必须有一种更简单的方法来做到这一点(也许使用元组?)

As I am passing a Structured array a recompilation is needed if the array shapes or datatypes changes. This isn't a limitation of the API itself. There must a way how to do this in an easier way (Maybe using Tuples?)

实施

import numpy as np
import numba as nb
from numba import types
from scipy import integrate, LowLevelCallable
import ctypes

def create_jit_integrand_function(integrand_function,args,args_dtype):
    jitted_function = nb.njit(integrand_function)

    @nb.cfunc(types.float64(types.float64,types.CPointer(args_dtype)))
    def wrapped(x1,user_data_p):
        #Array of structs
        user_data = nb.carray(user_data_p, 1)

        #Extract the data
        x2=user_data[0].a
        array1=user_data[0].foo
        array2=user_data[0].bar

        return jitted_function(x1, x2, array1, array2)
    return wrapped

def function_using_arrays(x1, x2, array1, array2):
    res1 = np.interp(x1, array1[0], array1[1])
    res2 = np.interp(x2, array2[0], array2[1])

    return res1 + res2

def jit_with_dummy_data(args,args_dtype):
    func=create_jit_integrand_function(function_using_arrays,args,args_dtype)
    return func

def do_integrate_w_arrays(func,args,lolim=0, hilim=1):
    integrand_func=LowLevelCallable(func.ctypes,user_data=args.ctypes.data_as(ctypes.c_void_p))
    return integrate.quad(integrand_func, lolim, hilim)

使用实现

#creating some sample data
a=3
foo = np.arange(200, dtype=np.float64).reshape(2, -1)
bar = np.arange(600, dtype=np.float64).reshape(2, -1)

args_dtype = types.Record.make_c_struct([
    ('a', types.float64),
    ('foo', types.NestedArray(dtype=types.float64, shape=foo.shape)),
    ('bar', types.NestedArray(dtype=types.float64, shape=bar.shape)),])

args=np.array((a,foo,bar),dtype=args_dtype)

#compile the integration function (array-shapes are fixed)
#There is only a structured array like args allowed
func=jit_with_dummy_data(args,args_dtype)


print(do_integrate_w_arrays(func,args, lolim=0, hilim=1))

这篇关于将 NumPy 数组作为参数传递给 numba.cfunc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆