OpenMP / C ++:for循环中的元素数 [英] OpenMP/C++: number of elements in for-loop
问题描述
我在C ++做OpenMP的一些非常简单的测试,我遇到一个问题,可能是愚蠢的,但我不能找出什么问题。在以下MWE中:
#include< iostream>
#include< ctime>
#include< vector>
#include< omp.h>
int main()
{
int nthreads = 1,threadid = 0;
clock_t tstart,tend;
const int nx = 10,ny = 10,nz = 10;
int i,j,k;
std :: vector< std :: vector< std :: vector< long long int> > > arr_par;
arr_par.resize(nx);
for(i = 0; i arr_par [i] .resize(ny);
for(j = 0; j arr_par [i] [j] .resize(nz);
}
}
tstart = clock();
#pragma omp parallel default(shared)private(threadid)
{
#ifdef _OPENMP
nthreads = omp_get_num_threads();
threadid = omp_get_thread_num();
#endif
#pragma omp master
std :: cout<OpenMP execution with<< nthreads<<threads<< std :: endl;
#pragma omp end master
#pragma omp barrier
#pragma omp critical
{
std :: cout<<Thread id:< threadid<< std :: endl;
}
#pragma omp for
for(i = 0; i for(j = 0; j for(k = 0; k arr_par [i] [j] [k] = i * j + k;
}
}
}
}
tend = clock();
std :: cout<<Elapsed time:<<(tend - tstart)/ double(CLOCKS_PER_SEC)<s<< std :: endl;
return 0;
}
如果 nx
ny
和 nz
等于 10
运行平稳。如果我把这些数字增加到 20
,我得到一个segfault。无论数量多少,
我编译了该死的东西,它不会依次出现问题,也不会出现 OMP_NUM_THREADS = 1
g ++ -std = c ++ 0x -fopenmp -gstabs + -O0 test.cpp -o test
使用GCC 4.6.3。
解决方案您的循环计数器中有数据竞赛:
#pragma omp for
for(i = 0; ifor(j = 0; j for(k = 0; k arr_par [i] [j] [k] j + k;
}
}
}
c> j 或
k
给出private
数据共享类,超过相应的限制,当多个线程尝试立即增加它们,导致对arr_par
的超出访问。有多个线程增加j
或k
的机会随着迭代次数增加。
处理这些情况的最好方法是简单地在循环运算符本身中声明循环变量:
#pragma omp for
for(int i = 0; ifor(int j = 0; j for(int k = 0; k arr_par [i] [j] [k] = i * j + k;
}
}
}
另一种方法是添加
private(j,k)
子句到并行区域的头部:#pragma omp parallel default(shared)private(threadid)private(j,k)
在你的情况下,不一定要使
i
private,因为并行循环的循环变量被隐式设置为私有。仍然,如果i
在代码中的其他位置使用,将其设为private可能会阻止其他数据竞争。
此外,不要使用
clock()
来测量并行应用程序的时间,因为在大多数Unix操作系统上,它返回所有线程的总CPU时间。请改用omp_get_wtime()
。I am doing some very simple tests with OpenMP in C++ and I encounter a problem that is probably silly, but I can't find out what's wrong. In the following MWE:
#include <iostream> #include <ctime> #include <vector> #include <omp.h> int main() { int nthreads=1, threadid=0; clock_t tstart, tend; const int nx=10, ny=10, nz=10; int i, j, k; std::vector<std::vector<std::vector<long long int> > > arr_par; arr_par.resize(nx); for (i=0; i<nx; i++) { arr_par[i].resize(ny); for (j = 0; j<ny; j++) { arr_par[i][j].resize(nz); } } tstart = clock(); #pragma omp parallel default(shared) private(threadid) { #ifdef _OPENMP nthreads = omp_get_num_threads(); threadid = omp_get_thread_num(); #endif #pragma omp master std::cout<<"OpenMP execution with "<<nthreads<<" threads"<<std::endl; #pragma omp end master #pragma omp barrier #pragma omp critical { std::cout<<"Thread id: "<<threadid<<std::endl; } #pragma omp for for (i=0; i<nx; i++) { for (j=0; j<ny; j++) { for (k=0; k<nz; k++) { arr_par[i][j][k] = i*j + k; } } } } tend = clock(); std::cout<<"Elapsed time: "<<(tend - tstart)/double(CLOCKS_PER_SEC)<<" s"<<std::endl; return 0; }
if
nx
,ny
andnz
are equal to10
, the code is running smoothly. If I increase these numbers to20
, I get a segfault. It runs without problem sequentially or withOMP_NUM_THREADS=1
, whatever the number of elements.I compiled the damn thing with
g++ -std=c++0x -fopenmp -gstabs+ -O0 test.cpp -o test
using GCC 4.6.3.
Any thought would be appreciated!
解决方案You have a data race in your loop counters:
#pragma omp for for (i=0; i<nx; i++) { for (j=0; j<ny; j++) { // <--- data race for (k=0; k<nz; k++) { // <--- data race arr_par[i][j][k] = i*j + k; } } }
Since neither
j
nork
are given theprivate
data-sharing class, their values might exceed the corresponding limits when several threads try to increase them at once, resulting in out-of-bound access toarr_par
. The chance to have several threads increasej
ork
at the same time increases with the number of iterations.The best way to treat those cases is to simply declare the loop variables inside the loop operator itself:
#pragma omp for for (int i=0; i<nx; i++) { for (int j=0; j<ny; j++) { for (int k=0; k<nz; k++) { arr_par[i][j][k] = i*j + k; } } }
The other way is to add the
private(j,k)
clause to the head of the parallel region:#pragma omp parallel default(shared) private(threadid) private(j,k)
It is not strictly necessary to make
i
private in your case since the loop variable of parallel loops are implicitly made private. Still, ifi
is used somewhere else in the code, it might make sense to make it private to prevent other data races.Also, don't use
clock()
to measure the time for parallel applications since on most Unix OSes it returns the total CPU time for all threads. Useomp_get_wtime()
instead.这篇关于OpenMP / C ++:for循环中的元素数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!