OpenMP的慢多个线程，想不通 [英] openmp slower more than one threads, can't figure out

查看：167 发布时间：2016/8/24 13:09:37 c performance openmp

本文介绍了OpenMP的慢多个线程，想不通的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我得到了我的下面code使用OpenMP运行速度较慢的问题：

I got a problem that my following code runs slower with openmp:

chunk = nx/nthreads;
int i, j;
for(int t = 0; t < n; t++){
     #pragma omp parallel for default(shared) private(i, j) schedule(static,chunk) 
     for(i = 1; i < nx/2+1; i++){
        for(j = 1; j < nx-1; j++){
            T_c[i][j] =0.25*(T_p[i-1][j] +T_p[i+1][j]+T_p[i][j-1]+T_p[i][j+1]);
            T_c[nx-i+1][j] = T_c[i][j];
        }
    }
    copyT(T_p, T_c, nx);
}
print2file(T_c, nx, file);

问题是，当我运行多个线程，计算时间会更长。

The problem is when I run more than one threads, the computational time will be much longer.

推荐答案

首先，你的并行区域重新启动的外循环的每个迭代，从而增加了大的开销。

First, your parallel region is restarted on each iteration of the outer loop, thus adding a huge overhead.

二，线程的一半将只是坐在那里什么都不做，因为你的块大小为两倍大，因为它应该是 - 它是 NX /确定nthreads 而数并行循环的迭代是 NX / 2 ，因此有（NX / 2）/（NX /确定nthreads）=确定nthreads / 2 总块。再说你已经尝试实现的是复制时间表的行为（静态）。

Second, half of the threads would be just sitting there doing nothing since your chunk size is twice as bigger as it should be - it is nx/nthreads while the number of iterations of the parallel loop is nx/2, hence there are (nx/2)/(nx/nthreads) = nthreads/2 chunks in total. Besides what you have tried to achieve is to replicate the behaviour of schedule(static).

#pragma omp parallel
for (int t = 0; t < n; t++) {
   #pragma omp for schedule(static) 
   for (int i = 1; i < nx/2+1; i++) {
      for (int j = 1; j < nx-1; j++) {
         T_c[i][j] = 0.25*(T_p[i-1][j]+T_p[i+1][j]+T_p[i][j-1]+T_p[i][j+1]);
         T_c[nx-i-1][j] = T_c[i][j];
      }
   }
   #pragma omp single
   copyT(T_p, T_c, nx);
}
print2file(T_c, nx, file);

如果您修改 copyT 也使用并行为，那么单结构应该被删除。你不需要默认（共享），因为这是默认的。你不申报并行循环的循环变量私人 - 即使这个变量来自一个外部范围（因此该地区是隐式共享），OpenMP的自动使它私有的。简单地声明所有循环变量在循环控制，并将其与应用的默认共享规则自动地工作。

If you modify copyT to also use parallel for, then the single construct should be removed. You do not need default(shared) as this is the default. You do not to declare the loop variable of a parallel loop private - even if this variable comes from an outer scope (and hence is implicitly shared in the region), OpenMP automatically makes it private. Simply declare all loop variables in the loop controls and it works automagically with the default sharing rules applied.

二半，有（可能）在你的内循环的错误。第二assingment说法应为：

Second and a half, there is (probably) an error in your inner loop. The second assingment statement should read:

T_c[nx-i-1][j] = T_c[i][j];

（或 T_C [NX-I] [J] 如果你不保持一个光环下侧），否则当 I 等于 1 ，那么你将要访问 T_C [NX] [...] 这是的 T_C 的范围之外。

(or T_c[nx-i][j] if you do not keep a halo on the lower side) otherwise when i equals 1, then you would be accessing T_c[nx][...] which is outside the bounds of T_c.

三，一般提示：而不是复制一个阵列到另一个，使用指针到这些阵列，只是交换两个指针在每次迭代结束

Third, a general hint: instead of copying one array into another, use pointers to those arrays and just swap the two pointers at the end of each iteration.

这篇关于OpenMP的慢多个线程，想不通的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

OpenMP的慢多个线程，想不通 [英] openmp slower more than one threads, can't figure out

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

OpenMP的慢多个线程，想不通 [英] openmp slower more than one threads, can&#39;t figure out

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

OpenMP的慢多个线程，想不通 [英] openmp slower more than one threads, can't figure out

登录关闭