为什么OpenMP超越线程? [英] Why is OpenMP outperforming threads?
问题描述
我一直在OpenMP中调用此方法
#pragma omp parallel for num_threads(totalThreads)
for unsigned i = 0; i {
workOnTheseEdges(startIndex [i],endIndex [i]);
}
在C ++ 11 std :: threads只是pthreads)
vector< thread>线程;
for(unsigned i = 0; i {
threads.push_back(thread(workOnTheseEdges,startIndex [i],endIndex [i]))
}
for(auto& thread:threads)
{
thread.join();但是,OpenMP的实现速度是2x的速度 - 更快的速度! - - - - - - - - - - - - - - - - 我希望C ++ 11线程更快,因为他们更低级。注意:上面的代码不仅被调用一次,而且可能在循环中被调用了10,000次,所以也许这与它有关。
编辑:在实践中,我使用OpenMP或C ++ 11版本 - 而不是两者。当我使用OpenMP代码,它需要45秒,当我使用C ++ 11,它需要100秒。
解决方案请考虑以下代码。 OpenMP版本运行在0秒,而C ++ 11版本运行在50秒。这不是由于函数doNothing,而不是由于矢量在循环内。你可以想象,c ++ 11线程被创建,然后在每次迭代中销毁。另一方面,OpenMP实际上实现了线程池。
for(int j = 1; j< 100000; + + j)
{
if(algorithmToRun == 1)
{
vector< thread>线程;
for(int i = 0; i <16; i ++)
{
threads.push_back(thread(doNothing));
}
for(auto& thread:threads)thread.join();
}
else if(algorithmToRun == 2)
{
#pragma omp parallel for num_threads(16)
for(unsigned i = 0; i < i ++)
{
doNothing();
}
}
}
I've been calling this in OpenMP
#pragma omp parallel for num_threads(totalThreads)
for(unsigned i=0; i<totalThreads; i++)
{
workOnTheseEdges(startIndex[i], endIndex[i]);
}
And this in C++11 std::threads (I believe those are just pthreads)
vector<thread> threads;
for(unsigned i=0; i<totalThreads; i++)
{
threads.push_back(thread(workOnTheseEdges,startIndex[i], endIndex[i]));
}
for (auto& thread : threads)
{
thread.join();
}
But, the OpenMP implementation is 2x the speed--Faster! I would have expected C++11 threads to be faster, as they are more low-level. Note: The code above is being called not just once, but probably 10,000 times in a loop, so maybe that has something to do with it?
Edit: for clarification, in practice, I either use the OpenMP or the C++11 version--not both. When I am using the OpenMP code, it takes 45 seconds and when I am using the the C++11, it takes 100 seconds.
解决方案 Consider the following code. The OpenMP version runs in 0 seconds while the C++11 version runs in 50 seconds. This is not due to the function being doNothing, and it's not due to vector being within the loop. As you can imagine, the c++11 threads are created and then destroyed in each iteration. On the other hand, OpenMP actually implements threadpools. It's not in the standard, but it's in Intel's and AMD's implementations.
for(int j=1; j<100000; ++j)
{
if(algorithmToRun == 1)
{
vector<thread> threads;
for(int i=0; i<16; i++)
{
threads.push_back(thread(doNothing));
}
for(auto& thread : threads) thread.join();
}
else if(algorithmToRun == 2)
{
#pragma omp parallel for num_threads(16)
for(unsigned i=0; i<16; i++)
{
doNothing();
}
}
}
这篇关于为什么OpenMP超越线程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!