为什么OpenMP超越线程? [英] Why is OpenMP outperforming threads?

查看:241
本文介绍了为什么OpenMP超越线程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在OpenMP中调用此方法

  #pragma omp parallel for num_threads(totalThreads)
for unsigned i = 0; i {
workOnTheseEdges(startIndex [i],endIndex [i]);
}

在C ++ 11 std :: threads只是pthreads)

  vector< thread>线程; 
for(unsigned i = 0; i {
threads.push_back(thread(workOnTheseEdges,startIndex [i],endIndex [i]))
}
for(auto& thread:threads)
{
thread.join();但是,OpenMP的实现速度是2x的速度 - 更快的速度! - - - - - - - - - - - - - - - - 我希望C ++ 11线程更快,因为他们更低级。注意:上面的代码不仅被调用一次,而且可能在循环中被调用了10,000次,所以也许这与它有关。



编辑:在实践中,我使用OpenMP或C ++ 11版本 - 而不是两者。当我使用OpenMP代码,它需要45秒,当我使用C ++ 11,它需要100秒。

解决方案

请考虑以下代码。 OpenMP版本运行在0秒,而C ++ 11版本运行在50秒。这不是由于函数doNothing,而不是由于矢量在循环内。你可以想象,c ++ 11线程被创建,然后在每次迭代中销毁。另一方面,OpenMP实际上实现了线程池。

  for(int j = 1; j< 100000; + + j)
{
if(algorithmToRun == 1)
{
vector< thread>线程;
for(int i = 0; i <16; i ++)
{
threads.push_back(thread(doNothing));
}
for(auto& thread:threads)thread.join();
}
else if(algorithmToRun == 2)
{
#pragma omp parallel for num_threads(16)
for(unsigned i = 0; i < i ++)
{
doNothing();
}
}
}


I've been calling this in OpenMP

#pragma omp parallel for num_threads(totalThreads)
for(unsigned i=0; i<totalThreads; i++)
{
workOnTheseEdges(startIndex[i], endIndex[i]);
}

And this in C++11 std::threads (I believe those are just pthreads)

vector<thread> threads;
for(unsigned i=0; i<totalThreads; i++)
{
threads.push_back(thread(workOnTheseEdges,startIndex[i], endIndex[i])); 
}
for (auto& thread : threads)
{
 thread.join();
}

But, the OpenMP implementation is 2x the speed--Faster! I would have expected C++11 threads to be faster, as they are more low-level. Note: The code above is being called not just once, but probably 10,000 times in a loop, so maybe that has something to do with it?

Edit: for clarification, in practice, I either use the OpenMP or the C++11 version--not both. When I am using the OpenMP code, it takes 45 seconds and when I am using the the C++11, it takes 100 seconds.

解决方案

Consider the following code. The OpenMP version runs in 0 seconds while the C++11 version runs in 50 seconds. This is not due to the function being doNothing, and it's not due to vector being within the loop. As you can imagine, the c++11 threads are created and then destroyed in each iteration. On the other hand, OpenMP actually implements threadpools. It's not in the standard, but it's in Intel's and AMD's implementations.

for(int j=1; j<100000; ++j)
{
    if(algorithmToRun == 1)
    {
        vector<thread> threads;
        for(int i=0; i<16; i++)
        {
            threads.push_back(thread(doNothing));
        }
        for(auto& thread : threads) thread.join();
    }
    else if(algorithmToRun == 2)
    {
        #pragma omp parallel for num_threads(16)
        for(unsigned i=0; i<16; i++)
        {
            doNothing();
        }
    }
}

这篇关于为什么OpenMP超越线程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆