cython openmp single,障碍 [英] cython openmp single, barrier

查看:265
本文介绍了cython openmp single,障碍的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在cython中使用openmp。我需要在cython中做两件事:

I'm trying to use openmp in cython. I need to do two things in cython:

i)在我的 #pragma omp single {} 范围内使用cython代码。

i) use the #pragma omp single{} scope in my cython code.

ii)使用 #pragma omp barrier {}

有人知道如何在cython中执行此操作吗?

Does anyone know how to do this in cython?

这里有更多详细信息。我有一个nogil cdef函数 my_fun(),我在omp for循环中调用它:

Here are more details. I have a nogil cdef-function my_fun() which I call in an omp for-loop:

from cython.parallel cimport prange
cimport openmp

cdef int i

with nogil:
    for i in prange(10,schedule='static', num_threads=10):
        my_func(i)

my_func 内部,我需要放置一个屏障,以等待所有线程赶上,然后仅在其中一个线程中并在获得gil的情况下执行耗时的操作,并且然后释放屏障,以便所有线程同时恢复。

Inside my_func I need to place a barrier to wait for all threads to catch up, then execute a time-consuming operation only in one of the threads and with the gil acquired, and then release the barrier so all threads resume simultaneously.

cdef int my_func(...) nogil:

    ...

    # put a barrier until all threads catch up, e.g. #pragma omp barrier

    with gil:
        # execute time consuming operation in one thread only, e.g. pragma omp single{}

    # remove barrier after the above single thread has finished and continue the operation over all threads in parallel, e.g. #pragma omp barrier

    ...



推荐答案

Cython对openmp有一些支持,但是如果广泛使用openmp-pragmas,可能更容易用C语言编写代码并用Cython包装结果代码。

Cython has some support for openmp, but it is probably easier to code in C and to wrap resulting code with Cython if openmp-pragmas are used extensively.

作为替代,您可以使用逐字C代码和具有定义的技巧将某些功能带给Cython,但在define中使用编译指示直截了当( _Pragma C99解决方案,MSVC使用 __ pragma ),其中有一些示例可以用作Linux / gcc的概念证明:

As alternative, you could use verbatim-C-code and tricks with defines to bring some of the functionality to Cython, but using of pragmas in defines isn't straight forward (_Pragma is a C99-solution, MSVC doing its own thing as always with __pragma), there are some examples as proof of concept for Linux/gcc:

cdef extern from *:
    """
    #define START_OMP_PARALLEL_PRAGMA() _Pragma("omp parallel") {
    #define END_OMP_PRAGMA() }
    #define START_OMP_SINGLE_PRAGMA() _Pragma("omp single") {
    #define START_OMP_CRITICAL_PRAGMA() _Pragma("omp critical") {   
    """
    void START_OMP_PARALLEL_PRAGMA() nogil
    void END_OMP_PRAGMA() nogil
    void START_OMP_SINGLE_PRAGMA() nogil
    void START_OMP_CRITICAL_PRAGMA() nogil

我们让Cython相信, START_OMP_PARALLEL_PRAGMA()和Co.是nogil函数,因此将它们放入C代码,因此它们

we make Cython believe, that START_OMP_PARALLEL_PRAGMA() and Co. are nogil-function, so it put them into C-code and thus they get pick up by the preprocessor.

我们必须使用语法

#pragma omp single{
   //do_something
}

而不是

#pragma omp single
do_something

由于Cython生成C代码的方式。

because of the way Cython generates C-code.

用法可能如下所示(我在这里避免了cython.parallel.parallel 的,因为它对这个简单的示例来说太神奇了):

The usage could look as follows (I'm avoiding here from cython.parallel.parallel as it does too much magic for this simple example):

%%cython -c=-fopenmp --link-args=-fopenmp
cdef extern from *:# as listed above
    ...

def test_omp():
    cdef int a=0
    cdef int b=0  
    with nogil:
        START_OMP_PARALLEL_PRAGMA()
        START_OMP_SINGLE_PRAGMA()
        a+=1
        END_OMP_PRAGMA()
        START_OMP_CRITICAL_PRAGMA()
        b+=1
        END_OMP_PRAGMA() # CRITICAL
        END_OMP_PRAGMA() # PARALLEL
    print(a,b)

调用 test_omp 在我的机器上打印2个线程,结果为 1 2,这与预期的一样(一个人可以使用 openmp.omp_set_num_threads更改线程数( 10) )。

Calling test_omp prints "1 2" on my machine with 2 threads, as expected (one could change the number of threads using openmp.omp_set_num_threads(10)).

但是,以上操作仍然很脆弱-Cython进行的某些错误检查可能会导致代码无效(Cython使用goto进行控制,因此无法跳转超出openmp块)。在您的示例中会发生这样的事情:

However, the above is still very brittle - some error checking by Cython can lead to invalid code (Cython uses goto for control flow and it is not possible to jump out of openmp-block). Something like this happens in your example:

cimport numpy as np
import numpy as np
def test_omp2():
    cdef np.int_t[:] a=np.zeros(1,dtype=int)

    START_OMP_SINGLE_PRAGMA()
    a[0]+=1
    END_OMP_PRAGMA()

    print(a)

由于边界检查,Cython将产生:

Because of bounding checking, Cython will produce:

START_OMP_SINGLE_PRAGMA();
...
//check bounds:
if (unlikely(__pyx_t_6 != -1)) {
    __Pyx_RaiseBufferIndexError(__pyx_t_6);
    __PYX_ERR(0, 30, __pyx_L1_error)  // HERE WE GO A GOTO!
}
...
END_OMP_PRAGMA();

在这种特殊情况下,将boundcheck设置为false,即

In this special case setting boundcheck to false, i.e.

cimport cython
@cython.boundscheck(False) 
def test_omp2():
   ...

可以解决上述示例的问题,但可能无法解决。

would solve the issue for the above example, but probably not in general.

再次:在C中使用openmp(并用Cython包装功能)是一种更愉快的体验。

Once again: using openmp in C (and wrapping the functionality with Cython) is a more enjoyable experience.

作为补充说明:Python线程(由GIL控制)和openmp线程是不同的,彼此之间一无所知。上面的示例也可以在不释放GIL的情况下正常工作(编译和运行)-openmp线程并不关心GIL,但是由于没有涉及Python对象,所以不会出错。因此,我在包装的函数中添加了 nogil ,因此它也可以用于nogil块中。

As a side note: Python-threads (the ones governed by GIL) and openmp-threads are different and know nothing about eachother. The above example would also work (compile and run) correctly without releasing the GIL - openmp-threads do not care about GIL, but as there are no Python-objects involved nothing can go wrong. Thus I have added nogil to the wrapped "functions", so it can also be used in nogil blocks.

但是,当代码变得更加复杂时,它变得不那么明显了,就不会访问不同Python线程之间共享的变量(所有这些都在上面,因为这些访问可能发生在生成的C代码中,并且从Cython-代码),在使用openmp时不发布gil可能更明智。

However, when code gets more complicated it becomes less obvious, that the variables shared between different Python-threads aren't accessed (all above because those accesses could happen in the generated C-code and this doesn't become clear from the Cython-code), it might be wiser not to release gil, while using openmp.

这篇关于cython openmp single,障碍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆