OpenMP - 只创建线程一次 [英] OpenMP - create threads only once

查看:200
本文介绍了OpenMP - 只创建线程一次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用OpenMP编写简单的应用程序。不幸的是我有加速的问题。
在这个应用程序中,我有一个while循环。这个循环的主体包括一些顺序执行的指令,一个循环。我使用 #pragma omp parallel for 使这个for循环并行。这个循环没有太多的工作,但是经常被调用。

I try to write simple application using OpenMP. Unfortunately I have problem with speedup. In this application I have one while loop. Body of this loop consists of some instructions which should be done sequentially and one for loop. I use #pragma omp parallel for to make this for loop parallel. This loop doesn't have much work, but is called very often.

我准备两个版本的for循环,并在1,2和4core上运行应用程序。

版本1(for循环中的4次迭代): 22秒,23秒,26秒。

版本2(for循环中的100000次迭代):20秒,10秒,6秒。

I prepare two versions of for loop, and run application on 1, 2 and 4cores.
version 1 (4 iterations in for loop): 22sec, 23sec, 26sec.
version 2 (100000 iterations in for loop): 20sec, 10sec, 6sec.

正如你所看到的,当for循环没有太多工作时,2和4核心的时间比1核心的时间要长。
我想原因是 #pragma omp parallel for 在while循环的每次迭代中创建新线程。所以,我想问你 - 是否有可能创建线程一次(在循环之前),并确保在while循环中的一些工作将按顺序完成?

As you can see, when for loop doesn't have much work, time on 2 and 4 cores is higher than on 1core. I guess the reason is that #pragma omp parallel for creates new threads in each iteration of while loop. So, I would like to ask you - is there any possibility to create threads once (before while loop), and ensure that some job in while loop will be done sequentially?

#include <omp.h>
#include <iostream>
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int main(int argc, char* argv[])
{
    double sum = 0;
    while (true)
    {
        // ...
        // some work which should be done sequentially
        // ...

        #pragma omp parallel for num_threads(atoi(argv[1])) reduction(+:sum)
        for(int j=0; j<4; ++j)  // version 2: for(int j=0; j<100000; ++j)
        {
            double x = pow(j, 3.0);
            x = sqrt(x);
            x = sin(x);
            x = cos(x);
            x = tan(x);
            sum += x;

            double y = pow(j, 3.0);
            y = sqrt(y);
            y = sin(y);
            y = cos(y);
            y = tan(y);
            sum += y;

            double z = pow(j, 3.0);
            z = sqrt(z);
            z = sin(z);
            z = cos(z);
            z = tan(z);
            sum += z;
        }

        if (sum > 100000000)
        {
            break;
        }
    }
    return 0;
}


推荐答案

外部的 while(true)循环并使用单个指令使代码的序列部分执行一个线程。这将删除fork / join模型的开销。此外,OpenMP在真正有用的thight循环与非常少的迭代(像你的版本1)。你基本上测量OpenMP开销,因为循环内的工作真的很快 - 即使100000次迭代与超越函数在当前一代CPU上占用不到秒(在2 GHz和大约100个周期每增加一个FP指令,它会需要〜100 ms)。

You could move the parallel region outside of the while (true) loop and use the single directive to make the serial part of the code to execute in one thread only. This will remove the overhead of the fork/join model. Also OpenMP is not really useful on thight loops with very small number of iterations (like your version 1). You are basically measuring the OpenMP overhead since the work inside the loop is done really fast - even 100000 iterations with transcendental functions take less than second on current generation CPU (at 2 GHz and roughly 100 cycles per FP instruciton other than addition, it'll take ~100 ms).

这就是为什么OpenMP提供了 if(condition)关闭小循环的并行化:

That's why OpenMP provides the if(condition) clause that can be used to selectively turn off the parallelisation for small loops:

#omp parallel for ... if(loopcnt > 10000)
for (i = 0; i < loopcnt; i++)
   ...

也可以使用 schedule(static)用于常规循环(即每次迭代花费大约相同时间进行计算的循环)。

It is also advisable to use schedule(static) for regular loops (that is for loops in which every iteration takes about the same time to compute).

这篇关于OpenMP - 只创建线程一次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆