OpenMp并未利用所有CPU(双插槽,Windows和Microsoft Visual Studio) [英] OpenMp doesn't utilize all CPUs(dual socket, windows and Microsoft visual studio)

查看:172
本文介绍了OpenMp并未利用所有CPU(双插槽,Windows和Microsoft Visual Studio)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个双插槽系统,每个CPU具有22个实际内核或每个CPU 44个超线程.我可以使用openMP来完全利用第一个CPU(22核/44超级),但不能利用它来利用第二个CPU.

I have a dual socket system with 22 real cores per CPU or 44 hyperthreads per CPU. I can get openMP to completely utilize the first CPU(22 cores/44 hyper) but I cannot get it to utilize the second CPU.

我正在使用CPUID HWMonitor检查我的核心使用情况.在所有内核上,第二个CPU始终为0%或接近0%.

I am using CPUID HWMonitor to check my core usage. The second CPU is always at or near 0 % on all cores.

使用:

int nProcessors = omp_get_max_threads();

让我nProcessors = 44,但我认为它只使用1个CPU的44个超线程,而不是44个真实内核(应该是88个超线程)

gets me nProcessors = 44, but I think it's just using the 44 hyperthreads of 1 CPU instead of 44 real cores(should be 88 hyperthreads)

环顾四周后,我不确定如何利用其他CPU.

After looking around a lot, I'm not sure how to utilize the other CPU.

我的CPU运行正常,因为我可以运行利用所有这些程序的其他并行处理程序.

My CPU is running fine as I can run other parallel processing programs that utilize all of them.

我正在以64位编译此文件,但我认为这并不重要.另外,我正在使用Visual Studio 2017专业版15.2.打开MP 2.0(仅一个vs支持).在带有2个Intel Xeon E5-2699v4 @ 2.2Ghz处理器的Windows 10 Pro(64位)上运行.

I'm compiling this in 64 bit but I don't think that matters. Also, I'm using Visual studio 2017 Professional version 15.2. Open MP 2.0(only one vs supports). Running on a windows 10 Pro, 64 bit, with 2 Intel Xeon E5-2699v4 @ 2.2Ghz processors.

推荐答案

因此,感谢@AlexG为我提供了一些见解,回答了我自己的问题.请参阅问题的评论部分.

So answering my own question with thanks to @AlexG for providing some insight. Please see comments section of question.

这是Microsoft Visual Studio和Windows的问题.

This is a Microsoft Visual Studio and Windows problem.

首先阅读 Windows处理器组.

基本上,如果您具有64个以下的逻辑核心,这将不是问题.但是,一旦超过该限制,您现在将为每个套接字(或Windows选择的其他组织)有两个进程组.就我而言,每个进程组都有44个超线程,并代表一个物理CPU插槽,而我恰好有两个进程组.默认情况下,每个进程(程序)都只能访问一个进程组,因此我最初只能在一个内核上利用44个线程.但是,如果您手动创建线程并使用SetThreadGroupAffinity将线程的处理器组设置为与程序最初分配的组不同的线程,则您的程序现在将成为多处理器组.这似乎是启用多处理器的一种绕行方式,但是可以,这是这样做的方法.一旦开始设置每个线程的单个进程组,对GetProcessGroupAffinity的调用将显示组数大于1.

Basically, if you have under 64 logical cores, this would not be a problem. Once you get past that, however, you will now have two process groups for each socket(or other organization Windows so chooses). In my case, each process group had 44 hyperthreads and represented one physical CPU socket and I had exactly two process groups. Every process(program) by default, is only given access to one process group, hence I initially could only utilize 44 threads on one core. However, if you manually create threads and use SetThreadGroupAffinity to set the thread's processor group to one that is different from your program's initially assigned group, then your program now becomes a multi processor group. This seems like a round-about way to enable multi-processors but yes this is how to do it. A call to GetProcessGroupAffinity will show that the number of groups becomes greater than 1 once you start setting each thread's individual process group.

我能够像这样创建一个打开的MP块,并进行分配过程组:

I was able to create an open MP block like so, and go through and assign process groups:

...

#pragma omp parallel num_threads( 88 )
{
    HANDLE thread = GetCurrentThread();

    if (omp_get_thread_num() > 32)
    {
        // Reserved has to be zero'd out after each use if reusing structure...
        GroupAffinity1.Reserved[0] = 0;
        GroupAffinity1.Reserved[1] = 0;
        GroupAffinity1.Reserved[2] = 0;
        GroupAffinity1.Group = 0;
        GroupAffinity1.Mask = 1 << (omp_get_thread_num()%32);
        if (SetThreadGroupAffinity(thread, &GroupAffinity1, &previousAffinity))
        {
            sprintf(buf, "Thread set to group 0: %d\n", omp_get_thread_num());
            OutputDebugString(buf);
        }
    }
    else
    {
        // Reserved has to be zero'd out after each use if reusing structure...
        GroupAffinity2.Reserved[0] = 0;
        GroupAffinity2.Reserved[1] = 0;
        GroupAffinity2.Reserved[2] = 0;
        GroupAffinity2.Group = 1;
        GroupAffinity2.Mask = 1 << (omp_get_thread_num() % 32);
        if (SetThreadGroupAffinity(thread, &GroupAffinity2, &previousAffinity))
        {
            sprintf(buf, "Thread set to group 1: %d\n", omp_get_thread_num());
            OutputDebugString(buf);
        }
    }
}

因此,使用上面的代码,我能够强制运行64个线程,每个套接字每个都运行32个线程.现在,即使我尝试将omp_set_num_threads强制设置为88,也无法超过64个线程.其原因似乎与Visual Studio的OpenMP实现有关,该实现不允许超过64个OpenMP线程.这是更多信息的链接

So with the above code, I was able to force 64 threads to run, 32 threads each per socket. Now I couldn't get over 64 threads even though I tried forcing omp_set_num_threads to 88. The reason seems to be linked to Visual Studio's implementation of OpenMP not allowing more than 64 OpenMP threads. Here's a link on that for more information

感谢大家帮助收集了更多有助于最终答案的花絮!

Thanks all for helping glean some more tidbits that helped in the eventual answer!

这篇关于OpenMp并未利用所有CPU(双插槽,Windows和Microsoft Visual Studio)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆