在Cython中分配中间多维数组而无需获取GIL [英] Allocate intermediate multidimensional arrays in Cython without acquiring the GIL

查看:156
本文介绍了在Cython中分配中间多维数组而无需获取GIL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Cython并行化一项昂贵的操作,该操作涉及生成中间多维数组.

I'm trying to use Cython to parallelize an expensive operation which involves generating intermediate multidimensional arrays.

以下非常简化的代码说明了我正在尝试做的事情:

The following very simplified code illustrates the sort of thing I'm trying to do:

import numpy as np
cimport cython
cimport numpy as np
from cython.parallel cimport prange
from libc.stdlib cimport malloc, free


@cython.boundscheck(False)
@cython.wraparound(False)
def embarrasingly_parallel_example(char[:, :] A):

    cdef unsigned int m = A.shape[0]
    cdef unsigned int n = A.shape[1]
    cdef np.ndarray[np.float64_t, ndim = 2] out = np.empty((m, m), np.float64)
    cdef unsigned int ii, jj
    cdef double[:, :] tmp

    for ii in prange(m, nogil=True):
        for jj in range(m):

            # allocate a temporary array to hold the result of
            # expensive_function_1
            tmp_carray = <double * > malloc((n ** 2) * sizeof(double))

            # a 2D typed memoryview onto tmp_carray
            tmp = <double[:n, :n] > tmp_carray

            # shove the intermediate result in tmp
            expensive_function_1(A[ii, :], A[jj, :], tmp)

            # get the final (scalar) output for this ii, jj
            out[ii, jj] = expensive_function_2(tmp)

            # free the intermediate array
            free(tmp_carray)

    return out


# some silly examples - the actual operation I'm performing is a lot more
# involved
# ------------------------------------------------------------------------
@cython.boundscheck(False)
@cython.wraparound(False)
cdef void expensive_function_1(char[:] x, char[:] y, double[:, :] tmp):

    cdef unsigned int m = tmp.shape[0]
    cdef unsigned int n = x.shape[0]
    cdef unsigned int ii, jj

    for ii in range(m):
        for jj in range(m):
            tmp[ii, jj] = 0
            for kk in range(n):
                tmp[ii, jj] += (x[kk] + y[kk]) * (ii - jj)


@cython.boundscheck(False)
@cython.wraparound(False)
cdef double expensive_function_2(double[:, :] tmp):

    cdef unsigned int m = tmp.shape[0]
    cdef unsigned int ii, jj
    cdef double result = 0

    for ii in range(m):
        for jj in range(m):
            result += tmp[ii, jj]

    return result

似乎无法编译的原因至少有两个:

There seems to be at least two reasons why this fails to compile:

  1. 基于cython -a的输出,在此处创建类型化的内存视图:

  1. Based on the output of cython -a, the creation of the typed memory view here:

cdef double[:, :] tmp = <double[:n, :n] > tmp_carray

似乎涉及Python API调用,因此我无法释放GIL以允许外部循环并行运行.

seems to involve Python API calls, and I therefore can't release the GIL to allow the outer loop to run in parallel.

我的印象是,键入的内存视图不是Python对象,因此子进程应该能够在不先获取GIL的情况下创建它们.是这样吗?

I was under the impression that typed memory views were not Python objects, and therefore a child process ought to be able to create them without first acquiring the GIL. Is this the case?

2.即使我将prange(m, nogil=True)替换为普通的range(m),Cython仍然似乎不喜欢在内部循环中出现cdef:

2. Even if I replace prange(m, nogil=True) with a normal range(m), Cython still doesn't seem to like the presence of a cdef within the inner loop:

    Error compiling Cython file:
    ------------------------------------------------------------
    ...
                # allocate a temporary array to hold the result of
                # expensive_function_1
                tmp_carray = <double*> malloc((n ** 2) * sizeof(double))

                # a 2D typed memoryview onto tmp_carray
                cdef double[:, :] tmp = <double[:n, :n]> tmp_carray
                    ^
    ------------------------------------------------------------

    parallel_allocate.pyx:26:17: cdef statement not allowed here

更新

事实证明,第二个问题可以通过移动轻松解决

Update

It turns out that the second problem was easily solved by moving

 cdef double[:, :] tmp

for循环之外,只是分配

 tmp = <double[:n, :n] > tmp_carray

在循环内.不过,我仍然不完全理解为什么这样做是必要的.

within the loop. I still don't fully understand why this is necessary, though.

现在,如果我尝试使用prange,则会遇到以下编译错误:

Now if I try to use prange I hit the following compilation error:

Error compiling Cython file:
------------------------------------------------------------
...
            # allocate a temporary array to hold the result of
            # expensive_function_1
            tmp_carray = <double*> malloc((n ** 2) * sizeof(double))

            # a 2D typed memoryview onto tmp_carray
            tmp = <double[:n, :n]> tmp_carray
               ^
------------------------------------------------------------

parallel_allocate.pyx:28:16: Memoryview slices can only be shared in parallel sections

推荐答案

免责声明:这里的所有东西都应撒一粒盐.我想知道的更多.您当然应该在 Cython用户.他们总是友好而快速地回答.

Disclaimer: Everything here is to be taken with a grain of salt. I'm more guessing that knowing. You should certainly ask the question on Cython-User. They are always friendly and fast to answer.

我同意Cython的文档不是很清楚:

I agree that Cython's documentation is not very clear:

[...]内存视图通常不需要GIL:

[...] memoryviews often do not need the GIL:

cpdef int sum3d(int [:,:,:] arr)nogil: ...

cpdef int sum3d(int[:, :, :] arr) nogil: ...

尤其是,您不需要GIL进行memoryview索引,切片或转置.内存视图需要GIL作为复制方法(C和Fortran连续副本),或者当dtype为object并且读取或写入object元素时.

In particular, you do not need the GIL for memoryview indexing, slicing or transposing. Memoryviews require the GIL for the copy methods (C and Fortran contiguous copies), or when the dtype is object and an object element is read or written.

我认为这意味着传递内存视图参数或将其用于切片或转置不需要Python GIL.但是,创建内存视图或复制需要GIL.

I think this means that passing a memory view parameter, or using it for slicing or transposing doesn't need Python GIL. However, creating a memoryview or copying one needs the GIL.

支持此功能的另一个参数是Cython函数可以向Python返回一个内存视图.

Another argument supporting this is that is is possible for a Cython function to return to Python a memory view.

from cython.view cimport array as cvarray
import numpy as np

def bla():
    narr = np.arange(27, dtype=np.dtype("i")).reshape((3, 3, 3))
    cdef int [:, :, :] narr_view = narr
    return narr_view

赠予:

>>> import hello
>>> hello.bla()
<MemoryView of 'ndarray' at 0x1b03380>

这意味着memoryview是在Python的GC管理的内存中分配的,因此需要创建GIL.因此,您无法在nogil部分中创建内存视图

which means that the memoryview is allocated in Python's GC managed memory and thus needs the GIL to be created. So you can't cant create a memoryview in a nogil section

现在有关错误消息的内容

Now for what concerns the error message

Memoryview切片只能在并行部分中共享

Memoryview slices can only be shared in parallel sections

我认为您应该将其读为您不能拥有线程专用的memoryview切片.它必须是线程共享的memoryview切片".

I think you should read it as "You can't have a thread private memoryview slices. It must be a thread shared memoryview slices".

这篇关于在Cython中分配中间多维数组而无需获取GIL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆