从openmp并行区域调用多线程MKL [英] Calling multithreaded MKL in from openmp parallel region

查看:343
本文介绍了从openmp并行区域调用多线程MKL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下结构的代码

I have a code with following structure

#pragma omp parallel
{
    #omp for nowait
    {
        // first for loop
    }

    #omp for nowait 
    {
        // first for loop
    }

    #pragma barrier 

    <-- #pragma omp single/critical/atomic --> not sure 
    dgemm_(....)

    #pragma omp for
    {
        // yet another for loop  
    }

}

对于dgemm_,我链接了多线程mkl.我希望mkl使用所有可用的8个线程.最好的方法是什么?

For dgemm_, I link with multithreaded mkl. I want mkl to use all available 8 threads. What is the best way to do so?

推荐答案

这是嵌套并行性的一种情况.它由MKL支持,但是仅当您的可执行文件是使用Intel C/C ++编译器构建的时,它才有效.限制的原因是MKL使用Intel的OpenMP运行时,并且不同的OMP运行时不能很好地发挥作用.

This is a case of nested parallelism. It is supported by MKL, but it only works if your executable is built using the Intel C/C++ compiler. The reason for that restriction is that MKL uses Intel's OpenMP runtime and that different OMP runtimes do not play well with each other.

一旦解决,您应该通过将OMP_NESTED设置为TRUE来启用嵌套并行性,并通过将MKL_DYNAMIC设置为FALSE来禁用MKL对嵌套并行性的检测.如果共享要使用dgemm_处理的数据,则必须从single构造内调用后者.如果每个线程处理自己的私有数据,则您不需要任何同步结构,但是使用多线程MKL也不会给您带来任何好处.因此,我认为您的情况是前者.

Once that is sorted out, you should enable nested parallelism by setting OMP_NESTED to TRUE and disable MKL's detection of nested parallelism by setting MKL_DYNAMIC to FALSE. If the data to be processes with dgemm_ is shared, then you have to invoke the latter from within a single construct. If each thread processes its own private data, then you don't need any synchronisation constructs, but using multithreaded MKL won't give you any benefit too. Therefore I would assume that your case is the former.

总结一下:

#pragma omp single
dgemm_(...);

并运行:

$ MKL_DYNAMIC=FALSE MKL_NUM_THREADS=8 OMP_NUM_THREADS=8 OMP_NESTED=TRUE ./exe

您还可以通过适当的调用来设置参数:

You could also set the parameters with the appropriate calls:

mkl_set_dynamic(0);
mkl_set_num_threads(8);
omp_set_nested(1);

#pragma omp parallel num_threads(8) ...
{
   ...
}

尽管我更愿意使用环境变量.

though I would prefer to use environment variables instead.

这篇关于从openmp并行区域调用多线程MKL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆