为什么OpenMP超越线程？ [英] Why is OpenMP outperforming threads?

查看：241 发布时间：2016/10/14 21:25:07 c++ multithreading c++11

本文介绍了为什么OpenMP超越线程？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在OpenMP中调用此方法

  #pragma omp parallel for num_threads（totalThreads）
 for unsigned i = 0; i  {
 workOnTheseEdges（startIndex [i]，endIndex [i]）; 
}

在C ++ 11 std :: threads只是pthreads）

  vector< thread>线程; 
 for（unsigned i = 0; i  {
 threads.push_back（thread（workOnTheseEdges，startIndex [i]，endIndex [i]）） 
} 
 for（auto& thread：threads）
 {
 thread.join（）;但是，OpenMP的实现速度是2x的速度 - 更快的速度！ -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   - 我希望C ++ 11线程更快，因为他们更低级。注意：上面的代码不仅被调用一次，而且可能在循环中被调用了10,000次，所以也许这与它有关。
 
 
 编辑：在实践中，我使用OpenMP或C ++ 11版本 - 而不是两者。当我使用OpenMP代码，它需要45秒，当我使用C ++ 11，它需要100秒。
解决方案
请考虑以下代码。 OpenMP版本运行在0秒，而C ++ 11版本运行在50秒。这不是由于函数doNothing，而不是由于矢量在循环内。你可以想象，c ++ 11线程被创建，然后在每次迭代中销毁。另一方面，OpenMP实际上实现了线程池。 
  for（int j = 1; j< 100000; + + j）
 {
 if（algorithmToRun == 1）
 {
 vector< thread>线程; 
 for（int i = 0; i <16; i ++）
 {
 threads.push_back（thread（doNothing））; 
} 
 for（auto& thread：threads）thread.join（）; 
} 
 else if（algorithmToRun == 2）
 {
 #pragma omp parallel for num_threads（16）
 for（unsigned i = 0; i < i ++）
 {
 doNothing（）; 
} 
} 
} 
  
 
I've been calling this in OpenMP
#pragma omp parallel for num_threads(totalThreads)
for(unsigned i=0; i<totalThreads; i++)
{
workOnTheseEdges(startIndex[i], endIndex[i]);
}
And this in C++11 std::threads (I believe those are just pthreads)
vector<thread> threads;
for(unsigned i=0; i<totalThreads; i++)
{
threads.push_back(thread(workOnTheseEdges,startIndex[i], endIndex[i])); 
}
for (auto& thread : threads)
{
 thread.join();
}
But, the OpenMP implementation is 2x the speed--Faster!  I would have expected C++11 threads to be faster, as they are more low-level.  Note: The code above is being called not just once, but probably 10,000 times in a loop, so maybe that has something to do with it?

Edit: for clarification, in practice, I either use the OpenMP or the C++11 version--not both.  When I am using the OpenMP code, it takes 45 seconds and when I am using the the C++11, it takes 100 seconds.
 解决方案 
Consider the following code.  The OpenMP version runs in 0 seconds while the C++11 version runs in 50 seconds.  This is not due to the function being doNothing, and it's not due to vector being within the loop.  As you can imagine, the c++11 threads are created and then destroyed in each iteration.  On the other hand, OpenMP actually implements threadpools.  It's not in the standard, but it's in Intel's and AMD's implementations.
for(int j=1; j<100000; ++j)
{
    if(algorithmToRun == 1)
    {
        vector<thread> threads;
        for(int i=0; i<16; i++)
        {
            threads.push_back(thread(doNothing));
        }
        for(auto& thread : threads) thread.join();
    }
    else if(algorithmToRun == 2)
    {
        #pragma omp parallel for num_threads(16)
        for(unsigned i=0; i<16; i++)
        {
            doNothing();
        }
    }
}


                        
这篇关于为什么OpenMP超越线程？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

为什么OpenMP超越线程？ [英] Why is OpenMP outperforming threads?

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

为什么OpenMP超越线程？ [英] Why is OpenMP outperforming threads?

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭