OpenMP:请勿使用超线程核心(半个num_threads()具有超线程) [英] OpenMP: don't use hyperthreading cores (half `num_threads()` w/ hyperthreading)

查看:508
本文介绍了OpenMP:请勿使用超线程核心(半个num_threads()具有超线程)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

In Is OpenMP (parallel for) in g++ 4.7 not very efficient? 2.5x at 5x CPU, I determined that the performance of my programme varies between 11s and 13s (mostly always above 12s, and sometimes as slow as 13.4s) at around 500% CPU when using the default #pragma omp parallel for, and the OpenMP speed up is only 2.5x at 5x CPU w/ g++-4.7 -O3 -fopenmp, on a 4-core 8-thread Xeon.

我尝试使用schedule(static) num_threads(4),并且注意到我的程序始终在大约320%CPU上以11.5s至11.7s(始终低于12s)完成,例如,运行更一致,并且使用了更少的资源(即使运行得最好)比使用超线程的罕见异常值慢了半秒).

I tried using schedule(static) num_threads(4), and noticed that my programme always completes in 11.5s to 11.7s (always below 12s) at about 320% CPU, e.g., runs more consistently, and uses less resources (even if the best run is half a second slower than the rare outlier with hyperthreading).

是否有任何简单的OpenMP方式可以检测超线程,并将num_threads()减少到CPU内核的实际数量?

Is there any simple OpenMP-way to detect hyperthreading, and reduce num_threads() to the actual number of CPU cores?

(有一个类似的问题,

(There is a similar question, Poor performance due to hyper-threading with OpenMP: how to bind threads to cores, but in my testing, I found that a mere reduction from 8 to 4 threads somehow already does that job w/ g++-4.7 on Debian 7 wheezy and Xeon E3-1240v3, so, this very question is merely about reducing num_threads() to the number of cores.)

推荐答案

如果您是在Linux上运行的(也假设使用x86架构),则可以查看/proc/cpuinfo.有两个字段cpu coressiblings.第一个是[实际]内核的数量,第二个是超线程的数量. (例如,在我的系统上,我的四核超线程计算机分别为4和8).

If you were running under Linux [also assuming an x86 arch], you could look at /proc/cpuinfo. There are two fields cpu cores and siblings. The first is number of [real] cores and the latter is the number of hyperthreads. (e.g. on my system they are 4 and 8 respectively for my four core hyperthreaded machine).

因为Linux可以检测到此问题(并且可以从Zulan的注释链接中找到),所以该信息也可以从x86 cpuid指令中获得.

Because Linux can detect this [and from the link in Zulan's comment], the information is also available from the x86 cpuid instruction.

无论哪种方式,都有一个环境变量:OMP_NUM_THREADS,它可能更易于与启动器/包装器脚本结合使用

Either way, there is also an environment variable for this: OMP_NUM_THREADS which may be easier to use in conjunction with a launcher/wrapper script

您可能要考虑的一件事是,除了一定数量的线程之外,您还可以使内存总线饱和,并且不增加线程[或内核]会提高性能,并且实际上可能会降低性能.

One thing you may wish to consider is that beyond a certain number of threads, you can saturate the memory bus, and no increase in threads [or cores] will improve performance, and, may in fact, reduce performance.

从这个问题开始:用CAS原子地增加两个整数有一个链接到CppCon 2015的视频对话,分为两个部分: https://www.youtube.com /watch?v = lVBvHbJsg5Y https://www.youtube.com/watch? v = 1obZeHnAwz4

From this question: Atomically increment two integers with CAS there is a link to a video talk from CppCon 2015 that is in two parts: https://www.youtube.com/watch?v=lVBvHbJsg5Y and https://www.youtube.com/watch?v=1obZeHnAwz4

它们每个大约需要1.5个小时,但是,IMO值得这么做.

They're about 1.5 hours each, but, IMO, well worth it.

在演讲中,演讲者[谁做了很多多线程/多核优化]说,根据他的经验,内存总线/系统在大约四个线程后趋于饱和.

In the talk, the speaker [who has done a lot of multithread/multicore optimization] says, that from his experience, the memory bus/system tends to get saturated after about four threads.

这篇关于OpenMP:请勿使用超线程核心(半个num_threads()具有超线程)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆