OpenMP 中可重用的私有动态分配数组 [英] Reusable private dynamically allocated arrays in OpenMP

查看:41
本文介绍了OpenMP 中可重用的私有动态分配数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 OpenMP 和 MPI 在 c 中并行化一些矩阵运算.一些对矩阵进行运算的函数是用 Fortran 编写的.Fortran 函数需要传入一个缓冲区数组,该数组仅在函数内部使用.目前我正在每个并行部分分配缓冲区,类似于下面的代码.

I am using OpenMP and MPI to parallelize some matrix operations in c. Some of the functions operating on the matrix are written in Fortran. The Fortran functions require a buffer array to be passed in which is only used internally in the function. Currently I am allocating buffers in each parallel section similar to the code below.

int i = 0;
int n = 1024; // Actually this is read from command line
double **a = createNbyNMat(n);
#pragma omp parallel
{
    double *buf;
    buf = malloc(sizeof(double)*n);
#pragma omp for
    for (i=0; i < n; i++)
    {
        fortranFunc1_(a[i], &n, buf);
    }
    free(z);
}

// Serial code and moving data around in the matrix a using MPI

#pragma omp parallel
{
    double *buf;
    buf = malloc(sizeof(double)*n);
#pragma omp for
    for (i=0; i < n; i++)
    {
        fortranFunc2_(a[i], &n, buf);
    }
    free(z);
}

// and repeat a few more times.

我知道使用类似于下面代码的方法可以避免重新分配缓冲区,但我很好奇是否有更简单的方法或 OpenMP 中的一些内置功能来处理这个问题.无论我们正在编译的系统上是否存在 OpenMP,如果能够在没有大量编译器指令的情况下编译代码,那就太好了.

I know reallocating the buffers can be avoided using a method similar to the code below, but I was curious if there is an easier way or some built in functionality in OpenMP for handling this. It would be nice to be able to compile the code without a lot of compiler directives whether or not OpenMP is present on the system we are compiling for.

double **buf;
buf = malloc(sizeof(double*) * num_openmp_threads);
int i = 0;
for (i = 0; i < num_openmp_threads; ++i)
{
    buf[i] = malloc(sizeof(double) * n);
}

// skip ahead

#pragma omp for
for (i=0; i < n; i++)
{
    fortranFunc1_(a[i], &n, buf[current_thread_num]);
}

推荐答案

可以使用线程私有变量来实现.那些在后续的parallel 区域中持续存在:

It is possible to do it using thread-private variables. Those persist across subsequent parallel regions:

void func(...)
{
   static double *buf;
   #pragma omp threadprivate(buf)

   #pragma omp parallel num_threads(nth)
   {
       buf = malloc(n * sizeof(double));
       ...
   }

   #pragma omp parallel num_threads(nth)
   {
       // Access buf here - it is still allocated
   }

   #pragma omp parallel num_threads(nth)
   {
       // Free the memory in the last parallel region
       free(buf);
   }
}

这里有几个关键点需要注意.首先,分配 buf 的线程数应该与释放它的线程数相匹配.此外,如果它们之间存在并行区域并且它们与较大的团队一起执行,则 buf 可能不会分配到所有区域中.因此,建议禁用 OpenMP 的动态团队规模功能或简单地使用 num_threads 子句(如上所示)来固定每个并行区域的线程数.

There are several key points to notice here. First, the number of threads that allocate buf should match the number of threads that deallocate it. Also, if there are parallel regions in between and they execute with larger teams, buf might not be allocated in all of them. Therefore it is advisable to either disable the dynamic team size feature of OpenMP or to simply use the num_threads clause as shown above to fix the number of threads for each parallel region.

第二,局部变量只有在它们是静态的情况下才能成为线程私有的.因此,该方法不适用于递归函数.

Second, local variables can be made thread-private only if they are static. Therefore, this method is not suitable for use in recursive functions.

即使禁用了 OpenMP 支持,代码也应按预期编译和工作.

The code should compile and work as expected even if OpenMP support is disabled.

这篇关于OpenMP 中可重用的私有动态分配数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆