为什么OpenMP版本较慢? [英] Why OpenMP version is slower?

查看:278
本文介绍了为什么OpenMP版本较慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在试验OpenMP。我写了一些代码来检查其性能。在使用Kubuntu 11.04的4核单CPU英特尔CPU上,使用OpenMP编译的以下程序比不使用OpenMP编译的程序慢大约20倍。为什么?

I am experimenting with OpenMP. I wrote some code to check its performance. On a 4-core single Intel CPU with Kubuntu 11.04, the following program compiled with OpenMP is around 20 times slower than the program compiled without OpenMP. Why?

我用g ++编译它-g -O2 -funroll-loops -fomit-frame-pointer -march = native -fopenmp

I compiled it by g++ -g -O2 -funroll-loops -fomit-frame-pointer -march=native -fopenmp

#include <math.h>
#include <iostream>

using namespace std;

int main ()
{
  long double i=0;
  long double k=0.7;

  #pragma omp parallel for reduction(+:i)
  for(int t=1; t<300000000; t++){       
    for(int n=1; n<16; n++){
      i=i+pow(k,n);
    }
  }

  cout << i<<"\t";
  return 0;
}


推荐答案

问题是变量k被认为是一个共享变量,因此它必须在线程之间同步。
避免这种情况的一个可能的解决方案是:

The problem is that the variable k is considered to be a shared variable, so it has to be synced between the threads. A possible solution to avoid this is:

#include <math.h>
#include <iostream>

using namespace std;

int main ()
{
  long double i=0;

#pragma omp parallel for reduction(+:i)
  for(int t=1; t<30000000; t++){       
    long double k=0.7;
    for(int n=1; n<16; n++){
      i=i+pow(k,n);
    }
  }

  cout << i<<"\t";
  return 0;
}

按照以下注释中的Martin Beckett的提示,而不是声明k inside循环,你也可以声明k const和外部循环。

Following the hint of Martin Beckett in the comment below, instead of declaring k inside the loop, you can also declare k const and outside the loop.

否则,ejd是正确的 - 这里的问题似乎并不糟糕的并行化,但是代码并行化时的优化不好。记住,gcc的OpenMP实现是相当年轻,远不是最佳。

Otherwise, ejd is correct - the problem here does not seem bad parallelization, but bad optimization when the code is parallelized. Remember that the OpenMP implementation of gcc is pretty young and far from optimal.

这篇关于为什么OpenMP版本较慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆