omp并行for循环(减少查找最大值)的运行速度比串行代码慢 [英] omp parallel for loop (reduction to find max) ran slower than serial codes

查看:759
本文介绍了omp并行for循环(减少查找最大值)的运行速度比串行代码慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是使用OpenMP的新手. 我认为使用max reduction子句查找数组的max元素并不是一个坏主意,但实际上,并行的for循环比串行的慢得多.

I am new in using OpenMP. I think that use max reduction clause to find the max element of an array is not such a bad idea, but in fact the parallel for loop ran much slower than serial one.

int main() {
double sta, end, elapse_t;
int bsize = 46000;
int q = bsize;
int max_val = 0;
double *buffer;
buffer = (double*)malloc(bsize*sizeof(double));
srand(time(NULL));
for(int i=0;i<q;i++)
    buffer[i] = rand()%10000;

sta = omp_get_wtime();
//int i;
#pragma omp parallel for reduction(max : max_val)
for(int i=0;i<q; i++)
{
    max_val = max_val > buffer[i] ? max_val : buffer[i];
}
end = omp_get_wtime();
printf("parallel maximum time %f\n", end-sta);

sta = omp_get_wtime();
for(int i=0;i<q; i++)
{
    max_val = max_val > buffer[i] ? max_val : buffer[i];
}
end = omp_get_wtime();
printf("serial maximum time   %f\n", end-sta);

free(buffer); 
return 0;}

编译命令

gcc-7 kp_omp.cpp -o kp_omp -fopenmp

执行结果

./kp_omp 
parallel maximum time 0.000505
serial maximum time   0.000266

对于CPU,它是具有8个内核的Intel Core i7-6700.

As for the CPU, it is an Intel Core i7-6700 with 8 cores.

推荐答案

无论何时并行化循环,openMP都需要执行一些操作,例如创建线程.这些操作会导致一些开销,这反过来意味着,对于每个循环,迭代次数最少,在该迭代次数下,不便于并行化.

Whenever you parallelise a loop openMP needs to perform some operations, for example creating the threads. Those operations result in some overhead and this in turns implies that, for each loop, there is a minimum number of iterations under which it is not convenient to parallelise.

如果我执行您的代码,我将获得与您相同的结果:

If I execute your code I obtain the same results you have:

./kp_omp
parallel maximum time 0.000570
serial maximum time   0.000253

但是,如果我在第8行中将bsize修改为

However if I modify bsize in line 8 such that

int bsize = 100000;

我获得

./kp_omp
parallel maximum time 0.000323
serial maximum time   0.000552

因此,并行版本比顺序版本更快.尝试加快代码执行速度时遇到的部分挑战是了解何时便于并行化以及何时并行化的开销会扼杀预期的性能提升.

So the parallelised version got faster than the sequential. Part of the challenges you encounter when you try to speedup the execution of a code is to understand when it is convenient to parallelise and when the overhead of the parallelisation would kill your expected gain in performance.

这篇关于omp并行for循环(减少查找最大值)的运行速度比串行代码慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆