使用cython将numpy数组列表传递给C [英] Passing list of numpy arrays to C using cython

查看:385
本文介绍了使用cython将numpy数组列表传递给C的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要通过模板传递给C函数的3D numpy数组列表list_of_arrays

I have a list list_of_arrays of 3D numpy arrays that I want to pass to a C function with the template

int my_func_c(double **data, int **shape, int n_arrays)

如此

data[i]  : pointer to the numpy array values in list_of_arrays[i]
shape[i] : pointer to the shape of the array in list_of_arrays[i] e.g. [2,3,4]

如何使用cython接口函数调用my_func_c?

How can I call my_func_c using a cython interface function?

我的第一个想法是执行以下操作(有效),但我觉得有一种更好的方法是仅使用numpy数组而不进行malloc和释放.

My first idea was to do something like below (which works) but I feel there is a better way just using numpy arrays without mallocing and freeing.

# my_func_c.pyx

import numpy as np
cimport numpy as np
cimport cython
from libc.stdlib cimport malloc, free

cdef extern from "my_func.c":
    double my_func_c(double **data, int **shape, int n_arrays)

def my_func(list list_of_arrays):
    cdef int n_arrays  = len(list_of_arrays)
    cdef double **data = <double **> malloc(n_arrays*sizeof(double *))
    cdef int **shape   = <int **> malloc(n_arrays*sizeof(int *))
    cdef double x;

    cdef np.ndarray[double, ndim=3, mode="c"] temp

    for i in range(n_arrays):
        temp = list_of_arrays[i]
        data[i]  = &temp[0,0,0]
        shape[i] = <int *> malloc(3*sizeof(int))
        for j in range(3):
            shape[i][j] = list_of_arrays[i].shape[j]

    x = my_func_c(data, shape, n_arrays)

    # Free memory
    for i in range(n_arrays):
        free(shape[i])
    free(data)
    free(shape)

    return x

N.B.

要查看工作示例,我们可以使用一个非常简单的函数来计算列表中所有数组的乘积.

To see a working example we can use a very simple function calculating the product of all the arrays in our list.

# my_func.c

double my_func_c(double **data, int **shape, int n_arrays) {
    int array_idx, i0, i1, i2;

    double prod = 1.0;

    // Loop over all arrays
    for (array_idx=0; array_idx<n_arrays; array_idx++) {
        for (i0=0; i0<shape[array_idx][0]; i0++) {
            for (i1=0; i1<shape[array_idx][1]; i1++) {
                for (i2=0; i2<shape[array_idx][2]; i2++) {
                    prod = prod*data[array_idx][i0*shape[array_idx][1]*shape[array_idx][2] + i1*shape[array_idx][2] + i2];
                }
            }
        }
    }

    return prod;
}

创建setup.py文件,

# setup.py

from distutils.core import setup
from Cython.Build import cythonize
import numpy as np

setup(
    name='my_func',
    ext_modules = cythonize("my_func_c.pyx"),
    include_dirs=[np.get_include()]
    )

编译

python3 setup.py build_ext --inplace

最后我们可以运行一个简单的测试

Finally we can run a simple test

# test.py

import numpy as np
from my_func_c import my_func

a = [1+np.random.rand(3,1,2), 1+np.random.rand(4,5,2), 1+np.random.rand(1,2,3)]

print('Numpy product: {}'.format(np.prod([i.prod() for i in a])))
print('my_func product: {}'.format(my_func(a)))

使用

python3 test.py

推荐答案

一种替代方法是让numpy为您管理内存.您可以通过使用np.uintp的numpy数组来实现此目的,该数组是一个无符号int,其大小与任何指针相同.

One alternative would be to let numpy manage your memory for you. You can do this by using numpy arrays of np.uintp which is an unsigned int with the same size as any pointer.

不幸的是,这确实需要进行一些类型转换(在指针大小的int"和指针之间),这是隐藏逻辑错误的好方法,因此我对此不是100%满意.

Unfortunately, this does require some type-casting (between "pointer sized int" and pointers) which is a good way of hiding logic errors, so I'm not 100% happy with it.

def my_func(list list_of_arrays):
    cdef int n_arrays  = len(list_of_arrays)
    cdef np.uintp_t[::1] data = np.array((n_arrays,),dtype=np.uintp)
    cdef np.uintp_t[::1] shape = np.array((n_arrays,),dtype=np.uintp)
    cdef double x;

    cdef np.ndarray[double, ndim=3, mode="c"] temp

    for i in range(n_arrays):
        temp = list_of_arrays[i]
        data[i]  = <np.uintp_t>&temp[0,0,0]
        shape[i] = <np.uintp_t>&(temp.shape[0])

    x = my_func_c(<double**>(&data[0]), <np.intp_t**>&shape[0], n_arrays)

(我应该指出,我只是确认它可以编译,并且没有对其进行进一步的测试,但是基本思想应该可以)

(I should point out that I've only confirmed that it compiles and not tested it further, but the basic idea should be OK)

您完成此操作的方式可能是非常明智的方式.对您应该使用的原始代码稍作简化

The way you've done it is probably a pretty sensible way. One slight simplification to your original code that should work

shape[i] = <np.uintp_t>&(temp.shape[0])

而不是malloc并复制.我还建议将free放在finally块中,以确保它们可以运行.

instead of malloc and copy. I'd also recommend putting the frees in a finally block to ensure they get run.

@ead有助于指出

@ead has helpfully pointed out that the numpy shape is stored as as np.intp_t - i.e. an signed integer big enough to fit a pointer in, which is mostly 64bit - while int is usually 32 bit. Therefore, to pass the shape without copying you'd need to change your C api. Casting help makes that mistake harder to spot ("a good way of hiding logic errors")

这篇关于使用cython将numpy数组列表传递给C的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆