OpenMP / C ++:for循环中的元素数 [英] OpenMP/C++: number of elements in for-loop

查看:120
本文介绍了OpenMP / C ++:for循环中的元素数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在C ++做OpenMP的一些非常简单的测试,我遇到一个问题,可能是愚蠢的,但我不能找出什么问题。在以下MWE中:

  #include< iostream> 
#include< ctime>
#include< vector>
#include< omp.h>

int main()
{

int nthreads = 1,threadid = 0;
clock_t tstart,tend;
const int nx = 10,ny = 10,nz = 10;
int i,j,k;
std :: vector< std :: vector< std :: vector< long long int> > > arr_par;

arr_par.resize(nx);
for(i = 0; i arr_par [i] .resize(ny);
for(j = 0; j arr_par [i] [j] .resize(nz);
}
}

tstart = clock();
#pragma omp parallel default(shared)private(threadid)
{
#ifdef _OPENMP
nthreads = omp_get_num_threads();
threadid = omp_get_thread_num();
#endif
#pragma omp master
std :: cout<OpenMP execution with<< nthreads<<threads<< std :: endl;
#pragma omp end master
#pragma omp barrier
#pragma omp critical
{
std :: cout<<Thread id:< threadid<< std :: endl;
}

#pragma omp for
for(i = 0; i for(j = 0; j for(k = 0; k arr_par [i] [j] [k] = i * j + k;
}
}
}
}
tend = clock();
std :: cout<<Elapsed time:<<(tend - tstart)/ double(CLOCKS_PER_SEC)<s<< std :: endl;

return 0;
}

如果 nx ny nz 等于 10 运行平稳。如果我把这些数字增加到 20 ,我得到一个segfault。无论数量多少,



我编译了该死的东西,它不会依次出现问题,也不会出现 OMP_NUM_THREADS = 1

  g ++ -std = c ++ 0x -fopenmp -gstabs + -O0 test.cpp -o test 



使用GCC 4.6.3。



解决方案

您的循环计数器中有数据竞赛:

  #pragma omp for 
for(i = 0; i for(j = 0; j for(k = 0; k arr_par [i] [j] [k] j + k;
}
}
}

c> j 或 k 给出 private 数据共享类,超过相应的限制,当多个线程尝试立即增加它们,导致对 arr_par 的超出访问。有多个线程增加 j k 的机会随着迭代次数增加。



处理这些情况的最好方法是简单地在循环运算符本身中声明循环变量:

  #pragma omp for 
for(int i = 0; i for(int j = 0; j for(int k = 0; k arr_par [i] [j] [k] = i * j + k;
}
}
}

另一种方法是添加 private(j,k)子句到并行区域的头部:

  #pragma omp parallel default(shared)private(threadid)private(j,k)

在你的情况下,不一定要使 i private,因为并行循环的循环变量被隐式设置为私有。仍然,如果 i 在代码中的其他位置使用,将其设为private可能会阻止其他数据竞争。



此外,不要使用 clock()来测量并行应用程序的时间,因为在大多数Unix操作系统上,它返回所有线程的总CPU时间。请改用 omp_get_wtime()


I am doing some very simple tests with OpenMP in C++ and I encounter a problem that is probably silly, but I can't find out what's wrong. In the following MWE:

#include <iostream>
#include <ctime>
#include <vector>
#include <omp.h>

int main()
{

  int nthreads=1, threadid=0;
  clock_t tstart, tend;
  const int nx=10, ny=10, nz=10;
  int i, j, k;
  std::vector<std::vector<std::vector<long long int> > > arr_par;

  arr_par.resize(nx);
  for (i=0; i<nx; i++) {
    arr_par[i].resize(ny);
    for (j = 0; j<ny; j++) {
      arr_par[i][j].resize(nz);
    }
  }

  tstart = clock();
#pragma omp parallel default(shared) private(threadid)
  {
#ifdef _OPENMP
    nthreads = omp_get_num_threads();
    threadid = omp_get_thread_num();
#endif
#pragma omp master
    std::cout<<"OpenMP execution with "<<nthreads<<" threads"<<std::endl;
#pragma omp end master
#pragma omp barrier
#pragma omp critical
    {
      std::cout<<"Thread id: "<<threadid<<std::endl;
    }

#pragma omp for
    for (i=0; i<nx; i++) {
      for (j=0; j<ny; j++) {
        for (k=0; k<nz; k++) {
          arr_par[i][j][k] = i*j + k;
        }
      }
    }
  }
  tend = clock();
  std::cout<<"Elapsed time: "<<(tend - tstart)/double(CLOCKS_PER_SEC)<<" s"<<std::endl;

  return 0;
}

if nx, ny and nz are equal to 10, the code is running smoothly. If I increase these numbers to 20, I get a segfault. It runs without problem sequentially or with OMP_NUM_THREADS=1, whatever the number of elements.

I compiled the damn thing with

g++ -std=c++0x -fopenmp -gstabs+ -O0 test.cpp -o test

using GCC 4.6.3.

Any thought would be appreciated!

解决方案

You have a data race in your loop counters:

#pragma omp for
for (i=0; i<nx; i++) {
  for (j=0; j<ny; j++) {          // <--- data race
    for (k=0; k<nz; k++) {        // <--- data race
      arr_par[i][j][k] = i*j + k;
    }
  }
}

Since neither j nor k are given the private data-sharing class, their values might exceed the corresponding limits when several threads try to increase them at once, resulting in out-of-bound access to arr_par. The chance to have several threads increase j or k at the same time increases with the number of iterations.

The best way to treat those cases is to simply declare the loop variables inside the loop operator itself:

#pragma omp for
for (int i=0; i<nx; i++) {
  for (int j=0; j<ny; j++) {
    for (int k=0; k<nz; k++) {
      arr_par[i][j][k] = i*j + k;
    }
  }
}

The other way is to add the private(j,k) clause to the head of the parallel region:

#pragma omp parallel default(shared) private(threadid) private(j,k)

It is not strictly necessary to make i private in your case since the loop variable of parallel loops are implicitly made private. Still, if i is used somewhere else in the code, it might make sense to make it private to prevent other data races.

Also, don't use clock() to measure the time for parallel applications since on most Unix OSes it returns the total CPU time for all threads. Use omp_get_wtime() instead.

这篇关于OpenMP / C ++:for循环中的元素数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆