gcc openmp线程重用 [英] gcc openmp thread reuse
问题描述
线程团队
被实例化。这意味着,每当内部循环开始时,线程的创建都会重复。 为了使线程可以重用,使用更大的并行部分(来控制生命周期),并具体控制外部/内部循环的并行性,如下所示:
$ b
也许明显的是,并不是说内部循环的并行化是一个很好的性能指标,但这可能是这个工作的重点,我不会批评这个问题。
$ b $ $ p $ #include< stdio.h>
#include< stdlib.h>
#include<数学.h>
#include< omp.h>
#define numThread 2
int main(int argc,char * argv []){
int ser [29],结束,i,j,a,限制,als;
als =的atoi(argv的[1]);
limit = atoi(argv [2]);
#pragma omp parallel num_threads(numThread)
{
#pragma omp single
for(i = 2; i
ser [a] = 1;
;
int prev = ser [a-1]; ((prev> i)||(a == 1)){
end = sqrt(prev);
int sum = 0; //添加这个
#pragma omp parallel for reduction(+:sum)//添加这个
(j = 2; j <= end; j ++) {
if(prev%j == 0){
sum + = j;
sum + = prev / j;
ser [a] = sum + 1; // added this
}
}
if(ser [als] == i ){
printf(%d,i); (j = 1; j
;
}
printf(\\\
);
}
}
}
}
I am using gcc's implementation of openmp to try to parallelize a program. Basically the assignment is to add omp pragmas to obtain speedup on a program that finds amicable numbers.
The original serial program was given(shown below except for the 3 lines I added with comments at the end). We have to parallize first just the outer loop, then just the inner loop. The outer loop was easy and I get close to ideal speedup for a given number of processors. For the inner loop, I get much worse performance than the original serial program. Basically what I am trying to do is a reduction on the sum variable.
Looking at the cpu usage, I am only using ~30% per core. What could be causing this? Is the program continually making new threads everytime it hits the omp parallel for clause? Is there just so much more overhead in doing a barrier for the reduction? Or could it be memory access issue(eg cache thrashing)? From what I read with most implementations of openmp threads get reused overtime(eg pooled), so I am not so sure the first problem is what is wrong.
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include <omp.h>
#define numThread 2
int main(int argc, char* argv[]) {
int ser[29], end, i, j, a, limit, als;
als = atoi(argv[1]);
limit = atoi(argv[2]);
for (i = 2; i < limit; i++) {
ser[0] = i;
for (a = 1; a <= als; a++) {
ser[a] = 1;
int prev = ser[a-1];
if ((prev > i) || (a == 1)) {
end = sqrt(prev);
int sum = 0;//added this
#pragma omp parallel for reduction(+:sum) num_threads(numThread)//added this
for (j = 2; j <= end; j++) {
if (prev % j == 0) {
sum += j;
sum += prev / j;
}
}
ser[a] = sum + 1;//added this
}
}
if (ser[als] == i) {
printf("%d", i);
for (j = 1; j < als; j++) {
printf(", %d", ser[j]);
}
printf("\n");
}
}
}
OpenMP thread teams
are instantiated on entering the parallel section. This means, indeed, that the thread creation is repeated every time the inner loop is starting.
To enable reuse of threads, use a larger parallel section (to control the lifetime of the team) and specificly control the parallellism for the outer/inner loops, like so:
Execution time for test.exe 1 1000000
has gone down from 43s to 22s using this fix (and the number of threads reflects the numThreads defined value + 1
PS Perhaps stating the obvious, it would not appear that parallelizing the inner loop is a sound performance measure. But that is likely the whole point to this exercise, and I won't critique the question for that.
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include <omp.h>
#define numThread 2
int main(int argc, char* argv[]) {
int ser[29], end, i, j, a, limit, als;
als = atoi(argv[1]);
limit = atoi(argv[2]);
#pragma omp parallel num_threads(numThread)
{
#pragma omp single
for (i = 2; i < limit; i++) {
ser[0] = i;
for (a = 1; a <= als; a++) {
ser[a] = 1;
int prev = ser[a-1];
if ((prev > i) || (a == 1)) {
end = sqrt(prev);
int sum = 0;//added this
#pragma omp parallel for reduction(+:sum) //added this
for (j = 2; j <= end; j++) {
if (prev % j == 0) {
sum += j;
sum += prev / j;
}
}
ser[a] = sum + 1;//added this
}
}
if (ser[als] == i) {
printf("%d", i);
for (j = 1; j < als; j++) {
printf(", %d", ser[j]);
}
printf("\n");
}
}
}
}
这篇关于gcc openmp线程重用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!