OpenMP并行尖峰 [英] OpenMP parallel spiking

查看:105
本文介绍了OpenMP并行尖峰的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Visual Studio 2010中使用OpenMP加快循环。

I'm using OpenMP in Visual Studio 2010 to speed up loops.

我写了一个非常简单的测试,看看使用OpenMP的性能提升。我在空循环中使用omp parallel

I wrote a very simple test to see the performance increase using OpenMP. I use omp parallel on an empty loop

int time_before = clock();

#pragma omp parallel for
for(i = 0; i < 4; i++){

}

int time_after = clock();

std::cout << "time elapsed: " << (time_after - time_before) << " milliseconds" << std::endl;

如果没有omp pragma,它一直需要0毫秒才能完成(如预期的那样)通常也取0。问题是,使用opm pragma它偶尔尖峰,从10到32毫秒。每次我尝试并行OpenMP我得到这些随机尖峰,所以我试过这个非常基本的测试。这些尖峰是OpenMP的固有部分,还是可以避免的?

Without the omp pragma it consistently takes 0 milliseconds to complete (as expected), and with the pragma it usually takes 0 as well. The problem is that with the opm pragma it spikes occasionally, anywhere from 10 to 32 milliseconds. Every time I tried parallel with OpenMP I get these random spikes, so I tried this very basic test. Are the spikes an inherent part of OpenMP, or can they be avoided?

这种并行方式在某些环路上提高了速度,但这些随机峰值太大

The parallel for gives me great speed boosts on some loops, but these random spikes are too big for me to be able to use it.

推荐答案

如果OpenMP parallel spiking,我称之为并行开销在您的循环中,这意味着您可能没有足够的工作负载来并行。并行化只有在你有足够的问题大小时才会产生加速。你已经显示了一个极端的例子:在并行循环中没有工作。

If "OpenMP parallel spiking", which I would call "parallel overhead", is a concern in your loop, this infers you probably don't have enough workload to parallelize. Parallelization yields a speedup only if you have a sufficient problem size. You already showed an extreme example: no work in a parallelized loop. In such case, you will see highly fluctuating time due to parallel overhead.

OpenMP的 omp并行开销包括几个因素:

The parallel overhead in OpenMP's omp parallel for includes several factors:


  • 首先, omp parallel for code> omp parallel omp用于

  • 产生或唤醒线程许多OpenMP实现不会创建/销毁每个 omp parallel

  • 关于 omp for ,开销(a)分派工作负载到工作线程,
    (b)调度(特别是如果使用动态调度)。

  • omp parallel 的结尾,除非指定 nowait

  • First, omp parallel for is the sum of omp parallel and omp for.
  • The overhead of spawning or awakening threads (many OpenMP implementations won't create/destroy every omp parallel.
  • Regarding omp for, overhead of (a) dispatching workloads to worker threads, (b) scheduling (especially, if dynamic scheduling is used).
  • The overhead of implicit barrier at the end of omp parallel unless nowait is specified.

FYI,为了衡量OpenMP的并行开销,以下更有效:

FYI, in order to measure OpenMP's parallel overhead, the following would be more effective:

double measureOverhead(int tripCount) {
  static const size_t TIMES = 10000;
  int sum = 0;

  int startTime = clock();
  for (size_t k = 0; k < TIMES; ++k) {
    for (int i = 0; i < tripCount; ++i) {
      sum += i;
    }
  }
  int elapsedTime = clock() - startTime;

  int startTime2 = clock();
  for (size_t k = 0; k < TIMES; ++k) {
  #pragma omp parallel for private(sum) // We don't care correctness of sum 
                                        // Otherwise, use "reduction(+: sum)"
    for (int i = 0; i < tripCount; ++i) {
      sum += i;
    }
  }
  int elapsedTime2 = clock() - startTime2;

  double parallelOverhead = double(elapsedTime2 - elapsedTime)/double(TIMES);
  return parallelOverhead;
}

尝试运行此类小代码可能需要多次,然后取平均值。另外,在循环中至少放置最小工作负载。在上面的代码中, parallelOverhead 是OpenMP omp parallel for 构造的近似开销。

Try to run such small code may times, then take an average. Also, put at least minimum workload in loops. In the above code, parallelOverhead is an approximated overhead of OpenMP's omp parallel for construct.

这篇关于OpenMP并行尖峰的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆