OpenMP:Visual C ++ 2008和2010之间的巨大性能差异 [英] OpenMP: Huge performance differences between Visual C++ 2008 and 2010

查看:99
本文介绍了OpenMP:Visual C ++ 2008和2010之间的巨大性能差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个照相机获取程序,该程序对获取的图像执行处理,并且我正在使用简单的OpenMP指令进行此处理.因此,基本上,我等待来自相机的图像,然后对其进行处理.

I'm running a camera acquisition program that performs processing on acquired images, and I'm using simple OpenMP directives for this processing. So basically I wait for an image from the camera, and then process it.

当迁移到VC2010时,我看到了非常奇怪的性能:在VC2010下,我的应用占用了近100%的CPU,而在VC2008下仅占用了10%.

When migrating to VC2010, I see very strange performance hog : under VC2010 my app is taking nearly 100% CPU while it is taking only 10% under VC2008.

如果我仅对VC2010和VC2008之间没有区别的处理代码进行基准测试,则使用获取功能时会出现区别.

If I benchmark only the processing code I get no difference between VC2010 and VC2008, the difference occurs when using the acquisition functions.

我已将重现该问题所需的代码简化为一个简单的循环,该循环执行以下操作:

I have reduced the code needed to reproduce the problem to a simple loop that does the following:

  for (int i=0; i<1000; ++i)
  {
    GetImage(buffer);//wait for image
    Copy2Array(buffer, my_array);

    long long sum = 0;//do some simple OpenMP parallel loop
    #pragma omp parallel for reduction(+:sum)
    for (int j=0; j<size; ++j)
      sum += my_array[j];
  }

此循环在2008年占用5%的CPU,在2010年占用70%的CPU.

This loop eats 5% of CPU with 2008, and 70% with 2010.

我已经进行了一些分析,表明在2010年,大部分时间都花在OpenMP的vcomp100.dll!_vcomp::PartialBarrierN::Block

I've done some profiling, that shows that in 2010 most of the time is spent in OpenMP's vcomp100.dll!_vcomp::PartialBarrierN::Block

我还做了一些并发分析:

I have also done some concurrency profiling:

2008年,处理工作分布在3个工作线程上,这些线程非常活跃,因为处理时间不如图像等待时间

In 2008, processing work is distributed over 3 worker threads, that are very lightly active as processing time is much inferior as image waiting time

相同的线程出现在2010年,但是它们全部被PartialBarrierN::Block函数占用.因为我有四个核心,所以他们正在吃掉75%的工作,这大约是我在CPU占用率中看到的.

The same threads appear in 2010, but they are all 100% occupied by the PartialBarrierN::Block function. As I have four cores, they are eating 75% of the work, which is roughly what I see in the CPU occupation.

因此看来OpenMP与Matrox采集库(专有)之间存在冲突.但这是VS2010还是Matrox的错误? 有什么我可以做的吗?对我来说,必须使用VC ++ 2010,所以我不能只坚持使用2008.

So it looks like there is a conflict between OpenMP and the Matrox acquisition library (proprietary). But is it a bug of VS2010 or Matrox? Is there anything I can do? Using VC++2010 is mandatory for me, so I cannot just stick with 2008.

非常感谢

如DeadMG所建议,使用新的并发框架将导致40%的CPU.对它进行性能分析表明,时间花费在处理上,因此没有显示我在使用OpenMP时遇到的错误,但是在我的情况下,性能却比OpenMP差.

Using new concurrency framework, as suggested by DeadMG, leads to 40% CPU. Profiling it shows that time is spent in processing, so it doesn't show the bug I'm seeing with OpenMP, but performance in my case is way poorer than OpenMP.

我已经安装了最新的Intel C ++的评估版.它显示出完全相同的性能问题!!

I have installed an evaluation version of latest Intel C++. It shows exactly the same performance problems!!

我交叉发布到 MSDN论坛

在Windows 7 64位和XP 32位上进行了测试,结果完全相同(在同一机器上)

Tested on Windows 7 64 bits and XP 32 bits, with the exact same results (on the same machinje)

推荐答案

在2010 OpenMP中,每个工作线程在任务完成后执行大约200毫秒的旋转等待.以我的I/O等待和重复的OpenMP任务为例,它正在大量加载CPU.

In 2010 OpenMP, each worker thread does a spin-wait of about 200 ms after task completion. In my case of a I/O wait and repetitive OpenMP task it is massively loading the CPU.

解决方案是改变这种行为;英特尔C ++为此kmp_set_blocktime()提供了扩展例程.但是Visual 2010则没有这种可能性.

The solution is to change this behaviour; Intel C++ has an extension routine for this, kmp_set_blocktime(). However Visual 2010 doesn't have such possibility.

此Autodesk注意,他们讨论了英特尔C ++的问题.该编译器首先介绍了该行为,但允许对其进行更改(请参见上文). Visual 2010切换到了它,但是...没有像Intel这样的解决方法.

In this Autodesk note they talks about the problem for Intel C++. This compiler first introduced the behavior, but allows to change it (see above). Visual 2010 switched to it, but... without the workaround like Intel.

因此,总结起来,切换到Intel C ++并使用kmp_set_blocktime(0)解决了该问题.

So to sum it up, switching to Intel C++ and using kmp_set_blocktime(0) solved it.

感谢 DataLever Corporation 中的John Lilley .microsoft.com/Forums/zh-CN/parallelcppnative/thread/528479c8-fb70-4b05-83ce-7a552fd49895/"rel =" noreferrer>其他MSDN线程

Thanks to John Lilley from DataLever Corporation on the other MSDN thread

问题已提交给 MS Connect ,并获得了"无法解决"反馈.

Issue has been submitted to MS Connect, and received the "won't fix" feedback.

这篇关于OpenMP:Visual C ++ 2008和2010之间的巨大性能差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆