OpenMP:请勿使用超线程核心(半个num_threads()具有超线程) [英] OpenMP: don't use hyperthreading cores (half `num_threads()` w/ hyperthreading)
问题描述
In Is OpenMP (parallel for) in g++ 4.7 not very efficient? 2.5x at 5x CPU, I determined that the performance of my programme varies between 11s and 13s (mostly always above 12s, and sometimes as slow as 13.4s) at around 500% CPU when using the default #pragma omp parallel for
, and the OpenMP speed up is only 2.5x at 5x CPU w/ g++-4.7 -O3 -fopenmp
, on a 4-core 8-thread Xeon.
我尝试使用schedule(static) num_threads(4)
,并且注意到我的程序始终在大约320%CPU上以11.5s至11.7s(始终低于12s)完成,例如,运行更一致,并且使用了更少的资源(即使运行得最好)比使用超线程的罕见异常值慢了半秒).
I tried using schedule(static) num_threads(4)
, and noticed that my programme always completes in 11.5s to 11.7s (always below 12s) at about 320% CPU, e.g., runs more consistently, and uses less resources (even if the best run is half a second slower than the rare outlier with hyperthreading).
是否有任何简单的OpenMP方式可以检测超线程,并将num_threads()
减少到CPU内核的实际数量?
Is there any simple OpenMP-way to detect hyperthreading, and reduce num_threads()
to the actual number of CPU cores?
(There is a similar question, Poor performance due to hyper-threading with OpenMP: how to bind threads to cores, but in my testing, I found that a mere reduction from 8 to 4 threads somehow already does that job w/ g++-4.7 on Debian 7 wheezy and Xeon E3-1240v3, so, this very question is merely about reducing num_threads()
to the number of cores.)
推荐答案
如果您是在Linux上运行的(也假设使用x86架构),则可以查看/proc/cpuinfo
.有两个字段cpu cores
和siblings
.第一个是[实际]内核的数量,第二个是超线程的数量. (例如,在我的系统上,我的四核超线程计算机分别为4和8).
If you were running under Linux [also assuming an x86 arch], you could look at /proc/cpuinfo
. There are two fields cpu cores
and siblings
. The first is number of [real] cores and the latter is the number of hyperthreads. (e.g. on my system they are 4 and 8 respectively for my four core hyperthreaded machine).
因为Linux可以检测到此问题(并且可以从Zulan的注释链接中找到),所以该信息也可以从x86 cpuid
指令中获得.
Because Linux can detect this [and from the link in Zulan's comment], the information is also available from the x86 cpuid
instruction.
无论哪种方式,都有一个环境变量:OMP_NUM_THREADS
,它可能更易于与启动器/包装器脚本结合使用
Either way, there is also an environment variable for this: OMP_NUM_THREADS
which may be easier to use in conjunction with a launcher/wrapper script
您可能要考虑的一件事是,除了一定数量的线程之外,您还可以使内存总线饱和,并且不增加线程[或内核]会提高性能,并且实际上可能会降低性能.
One thing you may wish to consider is that beyond a certain number of threads, you can saturate the memory bus, and no increase in threads [or cores] will improve performance, and, may in fact, reduce performance.
从这个问题开始:用CAS原子地增加两个整数有一个链接到CppCon 2015的视频对话,分为两个部分: https://www.youtube.com /watch?v = lVBvHbJsg5Y 和 https://www.youtube.com/watch? v = 1obZeHnAwz4
From this question: Atomically increment two integers with CAS there is a link to a video talk from CppCon 2015 that is in two parts: https://www.youtube.com/watch?v=lVBvHbJsg5Y and https://www.youtube.com/watch?v=1obZeHnAwz4
它们每个大约需要1.5个小时,但是,IMO值得这么做.
They're about 1.5 hours each, but, IMO, well worth it.
在演讲中,演讲者[谁做了很多多线程/多核优化]说,根据他的经验,内存总线/系统在大约四个线程后趋于饱和.
In the talk, the speaker [who has done a lot of multithread/multicore optimization] says, that from his experience, the memory bus/system tends to get saturated after about four threads.
这篇关于OpenMP:请勿使用超线程核心(半个num_threads()具有超线程)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!