传递/返回Cython Memoryviews与NumPy Arrays [英] Passing/Returning Cython Memoryviews vs NumPy Arrays

查看:460
本文介绍了传递/返回Cython Memoryviews与NumPy Arrays的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写Python代码来加速二进制图像中标记对象的区域属性功能。以下代码将根据对象的索引计算二进制图像中标记对象的边界像素数。 main()函数将遍历二进制图像mask中的所有标记对象,并计算每个对象的边框像素数。

I am writing Python code to accelerate a region properties function for labeled objects in a binary image. The following code will calculate the number of border pixels of a labeled object in a binary image given the indices of the object. The main() function will cycle through all labeled objects in a binary image 'mask' and calculate the number of border pixels for each one.

我想知道在这个Cython代码中传递或返回变量的最佳方法是什么。变量可以是NumPy数组,也可以是类型化的Memoryviews。我已经弄乱了以不同格式传递/返回变量,但无法推断出最佳/最有效的方式。我是Cython的新手,所以Memoryviews对我来说仍然相当抽象,两种方法之间是否存在差异仍然是个谜。我正在使用的图像包含100,000多个标记对象,因此这些操作需要相当高效。

I am wondering what the best way is to pass or return my variables in this Cython code. The variables are either in NumPy arrays or typed Memoryviews. I've messed around with passing/returning the variables in the different formats, but cannot deduce what the best/most efficient way is. I am new to Cython so Memoryviews are still fairly abstract to me and whether there is a different between the two methods remains a mystery. The images I am working with contain 100,000+ labeled objects so operations such as these need to be fairly efficient.

总结:

何时/应该将我的变量作为类型化的Memoryviews而不是NumPy数组传递/返回非常重复的计算?有没有最好的方式或者它们完全一样?

When/should I pass/return my variables as typed Memoryviews rather than NumPy arrays for very repetitive computations? Is there a way that is best or are they exactly the same?

%%cython --annotate

import numpy as np
import cython
cimport numpy as np

DTYPE = np.intp
ctypedef np.intp_t DTYPE_t

@cython.boundscheck(False)
@cython.wraparound(False)
def erode(DTYPE_t [:,:] img):

    # Image dimensions
    cdef int height, width, local_min
    height = img.shape[0]
    width = img.shape[1]

    # Padded Array
    padded_np = np.zeros((height+2, width+2), dtype = DTYPE)
    cdef DTYPE_t[:,:] padded = padded_np
    padded[1:height+1,1:width+1] = img

    # Eroded image
    eroded_np = np.zeros((height,width),dtype=DTYPE)
    cdef DTYPE_t[:,:] eroded = eroded_np

    cdef DTYPE_t i,j
    for i in range(height):
        for j in range(width):
            local_min = min(padded[i+1,j+1], padded[i,j+1], padded[i+1,j],padded[i+1,j+2],padded[i+2,j+1])
            eroded[i,j] = local_min
    return eroded_np


@cython.boundscheck(False)
@cython.wraparound(False)
def border_image(slice_np):

    # Memoryview of slice_np
    cdef DTYPE_t [:,:] slice = slice_np

    # Image dimensions
    cdef Py_ssize_t ymax, xmax, y, x
    ymax = slice.shape[0]
    xmax = slice.shape[1]

    # Erode image
    eroded_image_np = erode(slice_np)
    cdef DTYPE_t[:,:] eroded_image = eroded_image_np

    # Border image
    border_image_np = np.zeros((ymax,xmax),dtype=DTYPE)
    cdef DTYPE_t[:,:] border_image = border_image_np
    for y in range(ymax):
        for x in range(xmax):
            border_image[y,x] = slice[y,x]-eroded_image[y,x]
    return border_image_np.sum()


@cython.boundscheck(False)
@cython.wraparound(False)
def main(DTYPE_t[:,:] mask, int numobjects, Py_ssize_t[:,:] indices):

    # Memoryview of boundary pixels
    boundary_pixels_np = np.zeros(numobjects,dtype=DTYPE)
    cdef DTYPE_t[:] boundary_pixels = boundary_pixels_np

    # Loop through each object
    cdef Py_ssize_t y_from, y_to, x_from, x_to, i
    cdef DTYPE_t[:,:] slice
    for i in range(numobjects):
        y_from = indices[i,0]
        y_to = indices[i,1]
        x_from = indices[i,2]
        x_to = indices[i,3]
        slice = mask[y_from:y_to, x_from:x_to]
        boundary_pixels[i] = border_image(slice)

    return boundary_pixels_np


推荐答案

Memoryview是Cython的最新成员,与原始的 np.ndarray 语法相比,它是一种改进。出于这个原因,他们稍微偏爱。虽然你使用它通常没有太大的区别。以下是一些注意事项:

Memoryviews are a more recent addition to Cython, designed to be an improvement compared to the original np.ndarray syntax. For this reason they're slightly preferred. It usually doesn't make too much difference which you use though. Here are a few notes:

对于速度而言,非常很少差异 - 我的经验是,作为函数参数的记忆视图稍微慢一点,但几乎不值得担心。

For speed it makes very little difference - my experience is that memoryviews as function parameters are marginally slower, but it's hardly worth worrying about.

Memoryviews旨在用于任何具有Python缓冲区接口的类型(例如标准库 array 模块)。键入 np.ndarray 仅适用于numpy数组。原则上,memorviews可以支持偶数更广泛的内存布局,可以更容易地与C代码连接(实际上我从未真正看到它有用)。

Memoryviews are designed to work with any type that has Python's buffer interface (for example the standard library array module). Typing as np.ndarray only works with numpy arrays. In principle memorviews can support an even wider range of memory layouts which can make interfacing with C code easier (in practice I've never actually seen this be useful).

当从Cython返回数组到Python代码时,用户可能会对numpy数组比使用memoryview更开心。如果你正在使用memoryviews,你可以这样做:

When returning an array from Cython to code Python the user will probably be happier with a numpy array than with a memoryview. If you're working with memoryviews you can do either:

return np.asarray(mview)
return mview.base



易于编译



如果你'使用 np.ndarray 你必须在你的< c> np.get_include()中设置包含目录code> setup.py 文件。您不必使用memoryviews执行此操作,这通常意味着您可以跳过 setup.py 并使用 cythonize 命令行命令或 pyximport 用于更简单的项目。

Ease of compiling

If you're using np.ndarray you have to get the set the include directory with np.get_include() in your setup.py file. You don't have to do this with memoryviews, which often means you can skip setup.py and just use the cythonize command line command or pyximport for simpler projects.

与numpy数组(如果你想使用它)相比,这是memoryviews的优势。它不需要全局解释器锁来获取内存视图的片段,但它适用于numpy数组。这意味着以下代码大纲可以与内存视图并行工作:

This is the big advantage of memoryviews compared to numpy arrays (if you want to use it). It does not require the global interpreter lock to take slices of a memoryview but it does for a numpy array. This means that the following code outline can work in parallel with a memoryview:

cdef void somefunc(double[:] x) nogil:
     # implementation goes here

cdef double[:,:] 2d_array = np.array(...)
for i in prange(2d_array.shape[0]):
    somefunc(2d_array[i,:])

如果您不使用Cython的并行功能不适用。

If you aren't using Cython's parallel functionality this doesn't apply.

您可以将memoryviews用作 cdef 类的属性,但不能使用 np.ndarray s。您当然可以(当然)使用numpy数组作为无类型 object 属性。

You can use memoryviews as attributes of cdef classes but not np.ndarrays. You can (of course) use numpy arrays as untyped object attributes instead.

这篇关于传递/返回Cython Memoryviews与NumPy Arrays的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆