cython openmp single,障碍 [英] cython openmp single, barrier
问题描述
我正在尝试在cython中使用openmp。我需要在cython中做两件事:
I'm trying to use openmp in cython. I need to do two things in cython:
i)在我的 #pragma omp single {}
范围内使用cython代码。
i) use the #pragma omp single{}
scope in my cython code.
ii)使用 #pragma omp barrier {}
有人知道如何在cython中执行此操作吗?
Does anyone know how to do this in cython?
这里有更多详细信息。我有一个nogil cdef函数 my_fun()
,我在omp for循环中调用它:
Here are more details. I have a nogil cdef-function my_fun()
which I call in an omp for-loop:
from cython.parallel cimport prange
cimport openmp
cdef int i
with nogil:
for i in prange(10,schedule='static', num_threads=10):
my_func(i)
在 my_func
内部,我需要放置一个屏障,以等待所有线程赶上,然后仅在其中一个线程中并在获得gil的情况下执行耗时的操作,并且然后释放屏障,以便所有线程同时恢复。
Inside my_func
I need to place a barrier to wait for all threads to catch up, then execute a time-consuming operation only in one of the threads and with the gil acquired, and then release the barrier so all threads resume simultaneously.
cdef int my_func(...) nogil:
...
# put a barrier until all threads catch up, e.g. #pragma omp barrier
with gil:
# execute time consuming operation in one thread only, e.g. pragma omp single{}
# remove barrier after the above single thread has finished and continue the operation over all threads in parallel, e.g. #pragma omp barrier
...
推荐答案
Cython对openmp有一些支持,但是如果广泛使用openmp-pragmas,可能更容易用C语言编写代码并用Cython包装结果代码。
Cython has some support for openmp, but it is probably easier to code in C and to wrap resulting code with Cython if openmp-pragmas are used extensively.
作为替代,您可以使用逐字C代码和具有定义的技巧将某些功能带给Cython,但在define中使用编译指示直截了当( _Pragma
是 C99解决方案,MSVC使用 __ pragma
),其中有一些示例可以用作Linux / gcc的概念证明:
As alternative, you could use verbatim-C-code and tricks with defines to bring some of the functionality to Cython, but using of pragmas in defines isn't straight forward (_Pragma
is a C99-solution, MSVC doing its own thing as always with __pragma
), there are some examples as proof of concept for Linux/gcc:
cdef extern from *:
"""
#define START_OMP_PARALLEL_PRAGMA() _Pragma("omp parallel") {
#define END_OMP_PRAGMA() }
#define START_OMP_SINGLE_PRAGMA() _Pragma("omp single") {
#define START_OMP_CRITICAL_PRAGMA() _Pragma("omp critical") {
"""
void START_OMP_PARALLEL_PRAGMA() nogil
void END_OMP_PRAGMA() nogil
void START_OMP_SINGLE_PRAGMA() nogil
void START_OMP_CRITICAL_PRAGMA() nogil
我们让Cython相信, START_OMP_PARALLEL_PRAGMA()
和Co.是nogil函数,因此将它们放入C代码,因此它们
we make Cython believe, that START_OMP_PARALLEL_PRAGMA()
and Co. are nogil-function, so it put them into C-code and thus they get pick up by the preprocessor.
我们必须使用语法
#pragma omp single{
//do_something
}
而不是
#pragma omp single
do_something
由于Cython生成C代码的方式。
because of the way Cython generates C-code.
用法可能如下所示(我在这里避免了cython.parallel.parallel 的,因为它对这个简单的示例来说太神奇了):
The usage could look as follows (I'm avoiding here from cython.parallel.parallel
as it does too much magic for this simple example):
%%cython -c=-fopenmp --link-args=-fopenmp
cdef extern from *:# as listed above
...
def test_omp():
cdef int a=0
cdef int b=0
with nogil:
START_OMP_PARALLEL_PRAGMA()
START_OMP_SINGLE_PRAGMA()
a+=1
END_OMP_PRAGMA()
START_OMP_CRITICAL_PRAGMA()
b+=1
END_OMP_PRAGMA() # CRITICAL
END_OMP_PRAGMA() # PARALLEL
print(a,b)
调用 test_omp
在我的机器上打印2个线程,结果为 1 2,这与预期的一样(一个人可以使用 openmp.omp_set_num_threads更改线程数( 10)
)。
Calling test_omp
prints "1 2" on my machine with 2 threads, as expected (one could change the number of threads using openmp.omp_set_num_threads(10)
).
但是,以上操作仍然很脆弱-Cython进行的某些错误检查可能会导致代码无效(Cython使用goto进行控制,因此无法跳转超出openmp块)。在您的示例中会发生这样的事情:
However, the above is still very brittle - some error checking by Cython can lead to invalid code (Cython uses goto for control flow and it is not possible to jump out of openmp-block). Something like this happens in your example:
cimport numpy as np
import numpy as np
def test_omp2():
cdef np.int_t[:] a=np.zeros(1,dtype=int)
START_OMP_SINGLE_PRAGMA()
a[0]+=1
END_OMP_PRAGMA()
print(a)
由于边界检查,Cython将产生:
Because of bounding checking, Cython will produce:
START_OMP_SINGLE_PRAGMA();
...
//check bounds:
if (unlikely(__pyx_t_6 != -1)) {
__Pyx_RaiseBufferIndexError(__pyx_t_6);
__PYX_ERR(0, 30, __pyx_L1_error) // HERE WE GO A GOTO!
}
...
END_OMP_PRAGMA();
在这种特殊情况下,将boundcheck设置为false,即
In this special case setting boundcheck to false, i.e.
cimport cython
@cython.boundscheck(False)
def test_omp2():
...
可以解决上述示例的问题,但可能无法解决。
would solve the issue for the above example, but probably not in general.
再次:在C中使用openmp(并用Cython包装功能)是一种更愉快的体验。
Once again: using openmp in C (and wrapping the functionality with Cython) is a more enjoyable experience.
作为补充说明:Python线程(由GIL控制)和openmp线程是不同的,彼此之间一无所知。上面的示例也可以在不释放GIL的情况下正常工作(编译和运行)-openmp线程并不关心GIL,但是由于没有涉及Python对象,所以不会出错。因此,我在包装的函数中添加了 nogil
,因此它也可以用于nogil块中。
As a side note: Python-threads (the ones governed by GIL) and openmp-threads are different and know nothing about eachother. The above example would also work (compile and run) correctly without releasing the GIL - openmp-threads do not care about GIL, but as there are no Python-objects involved nothing can go wrong. Thus I have added nogil
to the wrapped "functions", so it can also be used in nogil blocks.
但是,当代码变得更加复杂时,它变得不那么明显了,就不会访问不同Python线程之间共享的变量(所有这些都在上面,因为这些访问可能发生在生成的C代码中,并且从Cython-代码),在使用openmp时不发布gil可能更明智。
However, when code gets more complicated it becomes less obvious, that the variables shared between different Python-threads aren't accessed (all above because those accesses could happen in the generated C-code and this doesn't become clear from the Cython-code), it might be wiser not to release gil, while using openmp.
这篇关于cython openmp single,障碍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!