在Cython中访问NumPy记录数组列 [英] Accessing NumPy record array columns in Cython

查看:215
本文介绍了在Cython中访问NumPy记录数组列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一位经验比较丰富的Python程序员,但是很长一段时间都没有编写任何C语言,并且正在尝试了解Cython.我正在尝试编写一个Cython函数,该函数将在NumPy Recarray的列上运行.

I'm a relatively experienced Python programmer, but haven't written any C in a very long time and am attempting to understand Cython. I'm trying to write a Cython function that will operate on a column of a NumPy recarray.

我到目前为止的代码如下.

The code I have so far is below.

recarray_func.pyx:

recarray_func.pyx:

import numpy as np
cimport numpy as np

cdef packed struct rec_cell0:
  np.float32_t f0
  np.int64_t i0, i1, i2

def sum(np.ndarray[rec_cell0, ndim=1] recarray):
    cdef Py_ssize_t i
    cdef rec_cell0 *cell
    cdef np.float32_t running_sum = 0

    for i in range(recarray.shape[0]):
        cell = &recarray[i]
        running_sum += cell.f0
    return running_sum

在解释器提示下:

array = np.recarray((100, ), names=['f0', 'i0', 'i1', 'i2'],
                             formats=['f4', 'i8', 'i8', 'i8'])
recarray_func.sum(array)

这只是将recarray的f0列求和.它可以编译并运行而不会出现问题.

This simply sums the f0 column of the recarray. It compiles and runs without a problem.

我的问题是,我将如何修改它以便它可以在任何列上运行?在上面的示例中,求和的列经过硬编码并通过点表示法进行访问.是否可以更改函数,以便将要求和的列作为参数传递?

My question is, how would I modify this so that it can operate on any column? In the example above, the column to sum is hard coded and accessed through dot notation. Is it possible to change the function so the column to sum is passed in as a parameter?

推荐答案

我相信使用Cython的内存视图.这些方面的东西应该可以工作(代码未经测试):

I believe this should be possible using Cython's memoryviews. Something along these lines should work (code not tested):

import numpy as np
cimport numpy as np

cdef packed struct rec_cell0:
  np.float32_t f0
  np.int64_t i0, i1, i2

def sum(rec_cell0[:] recview):
    cdef Py_ssize_t i
    cdef np.float32_t running_sum = 0

    for i in range(recview.shape[0]):
        running_sum += recview[i].f0
    return running_sum

通过确保传递给Cython的记录阵列是连续的,可以提高速度.在python(调用)端,可以使用np.require,而函数签名应更改为rec_cell0[::1] recview,以表明可以假定该数组是连续的.与往常一样,一旦对代码进行了测试,请关闭boundscheckwraparoundnonecheck 编译器伪指令可能会进一步提高速度.

Speed can probably be increased by ensuring that the record array you pass to Cython is contiguous. On the python (calling) side, you can use np.require, while the function signature should change to rec_cell0[::1] recview to indicate that the array can be assumed to be contiguous. And as always, once the code has been tested, turning off the boundscheck, wraparound and nonecheck compiler directives in Cython will likely further improve speed.

这篇关于在Cython中访问NumPy记录数组列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆