如何在OpenMP中使用孤立的for循环? [英] how to use orphaned for loop in OpenMP?

查看:111
本文介绍了如何在OpenMP中使用孤立的for循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已解决:请参见下面的编辑2

我正在尝试并行化对矩阵做一些运算的算法(为简单起见,我们称其为模糊).完成此操作后,它将在旧矩阵和新矩阵之间找到最大的变化(以每个元素为基础,新旧矩阵之间的绝对差的最大值).如果此最大差异超出某个阈值,则对矩阵运算进行另一次迭代.

I am trying to parallelise an algorithm which does some operation on a matrix (lets call it blurring for simplicity sake). Once this operation has been done, it finds the biggest change between the old and new matrix (max of absolute difference between old and new matrix on a per element basis). If this maximum difference is above some threshold, then do another iteration of the matrix operation.

所以我的主程序具有以下循环:

So my main program has the following loop:

converged = 0;
for( i = 1; i <= iteration_limit; i++ ){
    max_diff = update( &data_grid );

    if( max_diff < tol ) {
        converged = 1;
        break;
    }
}

然后

update( &data_grid )调用模糊算法的实际实现.然后模糊算法在矩阵上进行迭代,正是我要并行化的循环:

update( &data_grid ) then calls the actual implementation of the blurring algorithm. The blurring algorithm then iterates over the matrix, it is this loop that I am trying to parallelise:

for( i = 0; i < width; i++ ) {
    for( j = 0; j <= height; j++ ) {
        g->data[ update ][ i ][ j ] = 
        ONE_QUARTER * ( 
                     g->data[ update ][ i + 1 ][ j     ] +
                     g->data[ update ][ i - 1 ][ j     ] +
                     g->data[ update ][ i     ][ j + 1 ] +
                     g->data[ update ][ i     ][ j - 1 ] +
                     );
        diff = fabs( g->data[ old ][ i ][ j ] - g->data[ update ][ i ][ j ] );
        maxdiff = maxdiff > diff ? maxdiff : diff;
    }
}

我可以在update(&data_grid)内保留一个并行区域,但这意味着在每次迭代中都会创建和销毁线程,这是我想避免的.:

I could just stick a parallel region inside update(&data_grid) but that would mean threads would be created and destroyed on each iteration which I am trying to avoid.:

#pragma omp parallel for private(i, j, diff, maxdg) shared(width, height, update, g, dg, chunksize) default(none) schedule(static, chunksize)

我有2个网格副本,并在每次迭代中通过在01之间切换oldupdate来将新答案写在另一个"中.

I have 2 copies of the grid and write the new answer in the "other one" on every iteration by switching old and update between 0 and 1.

因此,按照乔纳森·杜尔西(Jonathan Dursi)的建议,我为循环创建了一个孤立的omp,但由于某些原因,似乎无法在线程之间找到最大值...

So I've made an orphaned omp for loop as per Jonathan Dursi's suggestion, but for some reason, can't seem to find the maximum value between threads...

这是我的外部"代码:

  converged = 0;

  #pragma omp parallel shared(i, max_iter, g, tol, maxdg, dg) private(converged) default(none)
  {
      for( i = 1; i <= 40; i++ ){

          maxdg = 0;

          dg = grid_update( &g );

          printf("[%d] dg from a single thread: %f\n", omp_get_thread_num(), dg );


  #pragma omp critical
          {
              if (dg > maxdg) maxdg = dg;
          }

  #pragma omp barrier
  #pragma omp flush

          printf("[%d] maxdg: %f\n", omp_get_thread_num(), maxdg);

          if( maxdg < tol ) {
              converged = 1;
              break;
          }
      }
  }

结果:

  [11] dg from a single thread: 0.000000
  [3] dg from a single thread: 0.000000
  [4] dg from a single thread: 0.000000
  [5] dg from a single thread: 0.000000
  [0] dg from a single thread: 0.166667
  [6] dg from a single thread: 0.000000
  [7] dg from a single thread: 0.000000
  [8] dg from a single thread: 0.000000
  [9] dg from a single thread: 0.000000
  [15] dg from a single thread: 0.000000
  [10] dg from a single thread: 0.000000
  [1] dg from a single thread: 0.166667
  [12] dg from a single thread: 0.000000
  [13] dg from a single thread: 0.000000
  [14] dg from a single thread: 0.000000
  [2] maxdg: 0.000000
  [3] maxdg: 0.000000
  [0] maxdg: 0.000000
  [8] maxdg: 0.000000
  [9] maxdg: 0.000000
  [4] maxdg: 0.000000
  [5] maxdg: 0.000000
  [6] maxdg: 0.000000
  [7] maxdg: 0.000000
  [1] maxdg: 0.000000
  [14] maxdg: 0.000000
  [11] maxdg: 0.000000
  [15] maxdg: 0.000000
  [10] maxdg: 0.000000
  [12] maxdg: 0.000000
  [13] maxdg: 0.000000

在私有/共享分类器上犯了一些错误,并且忘记了障碍.这是正确的代码:

EDIT 2: Made some mistakes with the private/shared classifiers and forgot a barrier. This is the correct code:

  #pragma omp parallel shared(max_iter, g, tol, maxdg) private(i, dg, converged) default(none)
  {
      for( i = 1; i <= max_iter; i++ ){

  #pragma omp barrier
          maxdg=0;
  /*#pragma omp flush */

          dg = grid_update( &g );

  #pragma omp critical
          {
              if (dg > maxdg) maxdg = dg;
          }

  #pragma omp barrier
  /*#pragma omp flush*/

          if( maxdg < tol ) {
              converged = 1;
              break;
          }
      }
  }

推荐答案

在for之前在另一个例程中开始并行段没有问题,肯定是从OpenMP 3.0(2008)开始,也许从OpenMP 2.5开始.使用gcc4.4:

There's no problem with having the parallel section start in another routine before the for, certainly since OpenMP 3.0 (2008) and maybe since OpenMP 2.5. With gcc4.4:

outer.c:

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

void update(int n, int iter);

int main(int argc, char **argv) {
    int n=10;

    #pragma omp parallel num_threads(4) default(none) shared(n)
    for (int iter=0; iter<3; iter++)
    {
        #pragma omp single
        printf("---iteration %d---\n", iter);
        update(n, iter);
    }

    return 0;
}

inner.c:

#include <omp.h>
#include <stdio.h>

void update(int n, int iter) {
    int thread = omp_get_thread_num();

    #pragma omp for
    for  (int i=0;i<n;i++) {
        int newthread=omp_get_thread_num();
        printf("%3d: doing loop index %d.\n",newthread,i);
    }
}

建筑物:

$ make
gcc44 -g -fopenmp -std=c99   -c -o inner.o inner.c
gcc44 -g -fopenmp -std=c99   -c -o outer.o outer.c
gcc44 -o main outer.o inner.o -fopenmp -lgomp
$ ./main 
---iteration 0---
  2: doing loop index 6.
  2: doing loop index 7.
  2: doing loop index 8.
  0: doing loop index 0.
  0: doing loop index 1.
  0: doing loop index 2.
  1: doing loop index 3.
  1: doing loop index 4.
  1: doing loop index 5.
  3: doing loop index 9.
---iteration 1---
  0: doing loop index 0.
  0: doing loop index 1.
  0: doing loop index 2.
  1: doing loop index 3.
  1: doing loop index 4.
  1: doing loop index 5.
  3: doing loop index 9.
  2: doing loop index 6.
  2: doing loop index 7.
  2: doing loop index 8.
---iteration 2---
  0: doing loop index 0.
  0: doing loop index 1.
  0: doing loop index 2.
  3: doing loop index 9.
  2: doing loop index 6.
  2: doing loop index 7.
  2: doing loop index 8.
  1: doing loop index 3.
  1: doing loop index 4.
  1: doing loop index 5.

但是,按照@ jdv-Jan de Vaan的说法,如果在最新的OpenMP实现中,如果与并行并行进行更新相比导致显着的性能改进,尤其是在更新足够昂贵的情况下,我会感到非常惊讶.

But as per @jdv-Jan de Vaan, I'd be very surprised if in a up-to-date OpenMP implmentation this led to a significant performance improvement over having the parallel for in update, particularly if update is expensive enough.

顺便说一句,仅在更新时在Gauss-Seidel例程中对i循环进行并行处理存在一些问题;您会看到i步不是独立的,这将导致比赛条件.您将需要执行类似Red-Black或Jacobi迭代的操作...

BTW, there are issues with just putting a parallel for around the i-loop in the Gauss-Seidel routine in update; you can see that the i steps aren't independant, and that will lead to race conditions. You will need to do something like Red-Black or Jacobi iteration instead...

更新:

提供的代码示例用于G-S迭代,而不是Jacobi,但我只是假设这是一个错字.

The code sample provided is for a G-S iteration, not Jacobi, but I'll just assume that's a typo.

如果您的问题实际上是关于reduce而不是孤立的for循环:是的,很遗憾,您不得不在OpenMP中推出自己的最小/最大减少量,但这很简单,您只需使用通常的技巧即可.

If your question is actually about the reduce and not the orphaned for loop: yes, you sadly have to roll your own min/max reductions in OpenMP, but it's pretty straightforward, you just use the usual tricks.

更新2 –是的,locmax需要是私有的,而不是共享的.

Update 2 -- yikes, locmax needs to be private, not shared.

outer.c:

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

int update(int n, int iter);

int main(int argc, char **argv) {
    int n=10;
    int max, locmax;

    max = -999;

    #pragma omp parallel num_threads(4) default(none) shared(n, max) private(locmax)
    for (int iter=0; iter<3; iter++)
    {
        #pragma omp single
            printf("---iteration %d---\n", iter);

        locmax = update(n, iter);

        #pragma omp critical
        {
            if (locmax > max) max=locmax;
        }

        #pragma omp barrier
        #pragma omp flush

        #pragma omp single
            printf("---iteration %d's max value = %d---\n", iter, max);
    }
    return 0;
}

inner.c:

#include <omp.h>
#include <stdio.h>

int update(int n, int iter) {
    int thread = omp_get_thread_num();
    int max = -999;

    #pragma omp for
    for  (int i=0;i<n;i++) {
        printf("%3d: doing loop index %d.\n",thread,i);
        if (i+iter>max) max = i+iter;
    }

    return max;
}

和建筑物:

$ make
gcc44 -g -fopenmp -std=c99   -c -o inner.o inner.c
gcc44 -g -fopenmp -std=c99   -c -o outer.o outer.c
gcc44 -o main outer.o inner.o -fopenmp -lgomp
bash-3.2$ ./main 
---iteration 0---
  0: doing loop index 0.
  0: doing loop index 1.
  0: doing loop index 2.
  2: doing loop index 6.
  2: doing loop index 7.
  2: doing loop index 8.
  1: doing loop index 3.
  1: doing loop index 4.
  1: doing loop index 5.
  3: doing loop index 9.
---iteration 0's max value = 9---
---iteration 1---
  0: doing loop index 0.
  0: doing loop index 1.
  0: doing loop index 2.
  3: doing loop index 9.
  2: doing loop index 6.
  2: doing loop index 7.
  2: doing loop index 8.
  1: doing loop index 3.
  1: doing loop index 4.
  1: doing loop index 5.
---iteration 1's max value = 10---
---iteration 2---
  0: doing loop index 0.
  0: doing loop index 1.
  0: doing loop index 2.
  1: doing loop index 3.
  1: doing loop index 4.
  1: doing loop index 5.
  3: doing loop index 9.
  2: doing loop index 6.
  2: doing loop index 7.
  2: doing loop index 8.
---iteration 2's max value = 11---

这篇关于如何在OpenMP中使用孤立的for循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆