OpenMP并行尖峰 [英] OpenMP parallel spiking
问题描述
我在Visual Studio 2010中使用OpenMP加快循环。
I'm using OpenMP in Visual Studio 2010 to speed up loops.
我写了一个非常简单的测试,看看使用OpenMP的性能提升。我在空循环中使用omp parallel
I wrote a very simple test to see the performance increase using OpenMP. I use omp parallel on an empty loop
int time_before = clock();
#pragma omp parallel for
for(i = 0; i < 4; i++){
}
int time_after = clock();
std::cout << "time elapsed: " << (time_after - time_before) << " milliseconds" << std::endl;
如果没有omp pragma,它一直需要0毫秒才能完成(如预期的那样)通常也取0。问题是,使用opm pragma它偶尔尖峰,从10到32毫秒。每次我尝试并行OpenMP我得到这些随机尖峰,所以我试过这个非常基本的测试。这些尖峰是OpenMP的固有部分,还是可以避免的?
Without the omp pragma it consistently takes 0 milliseconds to complete (as expected), and with the pragma it usually takes 0 as well. The problem is that with the opm pragma it spikes occasionally, anywhere from 10 to 32 milliseconds. Every time I tried parallel with OpenMP I get these random spikes, so I tried this very basic test. Are the spikes an inherent part of OpenMP, or can they be avoided?
这种并行方式在某些环路上提高了速度,但这些随机峰值太大
The parallel for gives me great speed boosts on some loops, but these random spikes are too big for me to be able to use it.
推荐答案
如果OpenMP parallel spiking,我称之为并行开销在您的循环中,这意味着您可能没有足够的工作负载来并行。并行化只有在你有足够的问题大小时才会产生加速。你已经显示了一个极端的例子:在并行循环中没有工作。
If "OpenMP parallel spiking", which I would call "parallel overhead", is a concern in your loop, this infers you probably don't have enough workload to parallelize. Parallelization yields a speedup only if you have a sufficient problem size. You already showed an extreme example: no work in a parallelized loop. In such case, you will see highly fluctuating time due to parallel overhead.
OpenMP的 omp并行开销
包括几个因素:
The parallel overhead in OpenMP's omp parallel for
includes several factors:
- 首先,
omp parallel for
code> omp parallel 和omp用于
。 - 产生或唤醒线程许多OpenMP实现不会创建/销毁每个
omp parallel
。 - 关于
omp for
,开销(a)分派工作负载到工作线程,
(b)调度(特别是如果使用动态调度)。 -
omp parallel
的结尾,除非指定nowait
。
- First,
omp parallel for
is the sum ofomp parallel
andomp for
. - The overhead of spawning or awakening threads (many OpenMP implementations won't create/destroy every
omp parallel
. - Regarding
omp for
, overhead of (a) dispatching workloads to worker threads, (b) scheduling (especially, if dynamic scheduling is used). - The overhead of implicit barrier at the end of
omp parallel
unlessnowait
is specified.
FYI,为了衡量OpenMP的并行开销,以下更有效:
FYI, in order to measure OpenMP's parallel overhead, the following would be more effective:
double measureOverhead(int tripCount) {
static const size_t TIMES = 10000;
int sum = 0;
int startTime = clock();
for (size_t k = 0; k < TIMES; ++k) {
for (int i = 0; i < tripCount; ++i) {
sum += i;
}
}
int elapsedTime = clock() - startTime;
int startTime2 = clock();
for (size_t k = 0; k < TIMES; ++k) {
#pragma omp parallel for private(sum) // We don't care correctness of sum
// Otherwise, use "reduction(+: sum)"
for (int i = 0; i < tripCount; ++i) {
sum += i;
}
}
int elapsedTime2 = clock() - startTime2;
double parallelOverhead = double(elapsedTime2 - elapsedTime)/double(TIMES);
return parallelOverhead;
}
尝试运行此类小代码可能需要多次,然后取平均值。另外,在循环中至少放置最小工作负载。在上面的代码中, parallelOverhead
是OpenMP omp parallel for
构造的近似开销。
Try to run such small code may times, then take an average. Also, put at least minimum workload in loops. In the above code, parallelOverhead
is an approximated overhead of OpenMP's omp parallel for
construct.
这篇关于OpenMP并行尖峰的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!