如果更改 2 个部分中的线程数,OpenMP 线程的 SPID 是否应该更改? [英] Should SPIDs of OpenMP thread change if altering the number of threads in 2 sections?

查看:59
本文介绍了如果更改 2 个部分中的线程数,OpenMP 线程的 SPID 是否应该更改?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 2 个 OpenMP 并行区域(我在 Linux 下的 gcc 下使用 C++)具有不同数量的线程 - 假设一个有 4 个,另一个有 8 个.然后,如果我运行 ps -T $(pidof name_of_process),4 个 SPID 始终相同,但每次调用剩余 4 个更改.示例输出:

I have 2 OpenMP parallel regions (I am using C++ under gcc under Linux) with different numbers of threads - let's say 4 in one and 8 in the other. Then, if I run ps -T $(pidof name_of_process), 4 SPID are the same all the time, but remaining 4 change for every invocation. A sample output:

The first output

  PID  SPID TTY      STAT   TIME COMMAND
 7578  7578 pts/1    Rl+    1:18 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578  7579 pts/1    Rl+    0:57 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578  7580 pts/1    Rl+    0:57 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578  7581 pts/1    Rl+    0:57 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578 19381 pts/1    Rl+    0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578 19382 pts/1    Rl+    0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578 19383 pts/1    Rl+    0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578 19384 pts/1    Rl+    0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000

The second output

  PID  SPID TTY      STAT   TIME COMMAND
 7578  7578 pts/1    Rl+    1:23 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578  7579 pts/1    Rl+    1:01 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578  7580 pts/1    Rl+    1:01 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578  7581 pts/1    Rl+    1:01 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578 22314 pts/1    Rl+    0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578 22315 pts/1    Rl+    0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578 22316 pts/1    Rl+    0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
 7578 22317 pts/1    Sl+    0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000

这是否意味着OpenMP在进入8线程部分时不断创建新的4线程并随后销毁(或进入4线程部分时)?我会这么认为,但是很多地方,例如 here 建议,线程应该坚持并等待轮到他们.我不会担心 OpenMP 的内部工作,但是 我有一个问题 内存神秘地泄漏,我开始怀疑某些线程资源没有释放(或者可能内存变得越来越碎片化?).

Does it mean that OpenMP is constantly creating new 4 threads when entering 8-threaded section and destroying afterwards (or when entering 4-threaded section)? I would assume so, but many places, such as here suggest, that the threads should persist and wait for their turn. I wouldn't be bothered about the internal workings of OpenMP, but I have a problem where memory mysteriously leaks and I am starting to suspect that some thread resources are not released (or maybe the memory becomes increasingly fragmented?).

那么这是一种正确的行为吗?我正在使用 gcc,gcc --version: gcc-8 (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0.

So is it a correct behavior? I am using gcc, gcc --version: gcc-8 (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0.

此外,如果是这种情况,是否可以强制 OpenMP 不不断销毁并生成新线程而不使 2 个部分使用相同数量的线程?

Moreover, if it is the case, is it possible to force OpenMP not to constantly destroy and spawn new threads without making the 2 sections use the same number of threads?

推荐答案

它可能是新线程是的.这完全取决于平台和 OpenMP 实现.此外,这是 OpenMP 规范未指定的,因此是合规行为.然而,GCC 运行时 (GOMP) 和 Intel/Clang one (IOMP) 在实践中倾向于尽可能多地重用线程.在我的机器(有 6 个内核)上,我无法使用 GCC-10.2 的 GOMP 和 Clang-11.0 的 IOMP 重现您的问题.此外,以下程序显示了相同的线程 ID,这可能意味着它们被重用:

It is probably new threads yes. This is totally dependent of the platform and the OpenMP implementation. Moreover, this is unspecified by the OpenMP specification and so a compliant behavior. However, the GCC runtime (GOMP) and Intel/Clang one (IOMP) tend to reuse the threads as much as possible in practice. On my machine (with 6 cores), I am not able to reproduce your issue with both GOMP with GCC-10.2 and IOMP with Clang-11.0. Moreover, the following program show the same thread IDs which likely means they are reused:

#include <cstdio>
#include <unistd.h>
#include <sys/types.h>

int main() {
    #pragma omp parallel num_threads(4)
    printf("%d\n", gettid());

    printf("----------\n");

    #pragma omp parallel num_threads(8)
    printf("%d\n", gettid());

    printf("----------\n");

    #pragma omp parallel num_threads(4)
    printf("%d\n", gettid());

/*
    // Update n°1
    printf("----------\n");

    #pragma omp parallel num_threads(8)
    printf("%d\n", gettid());
*/
}

你应该检查这个程序的结果.如果您无法在这个简单示例中重现程序的行为,则意味着问题特定于您的应用程序的行为.这可能表明您使用了多个相互冲突的 OpenMP 运行时.要检查这个假设,请设置环境变量 OMP_DISPLAY_ENV=TRUE 并查看结果.当您使用嵌套区域时,这种行为也经常出现.

You should check the result of this program. If you cannot reproduce the behavior of your program on this simple example, it means the problem is specific to the behavior of your application. It could be the sign that you use multiple conflicting OpenMP runtimes. To check this hypothesis, please set the environment variable OMP_DISPLAY_ENV=TRUE and look at the result. This behavior also often appear when you use nested regions.

更新 n°1:另一部分有 8 个线程,GCC-10.2 上的 GOMP 会创建新的不需要的线程,而 Clang-11.0 上的 IOMP 不会创建额外的线程.这可能是一个错误(或者 GOMP 的一个非常令人惊讶的行为).

UPDATE n°1: With another section of 8 threads, GOMP on GCC-10.2 create new unneeded threads while IOMP on Clang-11.0 does not create additional threads. This might be a bug (or a very surprising behavior of GOMP).

更新 n°2:虽然运行时的行为是由实现定义的,但您可以使用环境变量 OMP_DYNAMIC 向运行时提供一些提示.OpenMP 规范声明如下:

UPDATE n°2: While the behavior of a runtime is implementation defined, you can give some hints to the runtime using the environment variable OMP_DYNAMIC. Here is what the OpenMP specification states:

OMP_DYNAMIC 环境变量通过设置dyn-varparallel 区域的线程数的动态调整代码> ICV.此环境变量的值必须是以下之一:true |.如果环境变量设置为 true,OpenMP 实现可能会调整用于执行 parallel 区域的线程数以优化系统资源的使用.如果环境变量设置为false,则禁用动态调整线程数.如果 OMP_DYNAMIC 的值既不是 true 也不是 false,则程序的行为是实现定义的.

The OMP_DYNAMIC environment variable controls dynamic adjustment of the number of threads to use for executing parallel regions by setting the initial value of the dyn-var ICV. The value of this environment variable must be one of the following: true | false. If the environment variable is set to true, the OpenMP implementation may adjust the number of threads to use for executing parallel regions in order to optimize the use of system resources. If the environment variable is set to false, the dynamic adjustment of the number of threads is disabled. The behavior of the program is implementation defined if the value of OMP_DYNAMIC is neither true nor false.

然而,使用 OMP_DYNAMIC=TRUE 并不能解决 GOMP/GCC 上的问题.此外,在 GOMP/GCC 和 IOMP/Clang 上,它都将创建的线程数量限制为可用硬件线程的数量(至少在我的机器上).

However, using OMP_DYNAMIC=TRUE does not fix the problem on GOMP/GCC. Moreover, on both GOMP/GCC and IOMP/Clang, it limits the number of created threads to the number of available hardware threads (at least on my machine).

请记住,观察到的 OpenMP 运行时的行为符合规范,您的程序不应假设没有创建新线程(尽管您可能希望为了性能而调整行为).

Keep in mind that the observed behavior of the OpenMP runtimes are compliant with the specification and your program should not assume no new threads are created (although you may want to tune the behavior for sake of performance).

这篇关于如果更改 2 个部分中的线程数,OpenMP 线程的 SPID 是否应该更改?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆