使用mpirun/mpiexec运行时检测到不使用MPI [英] Detecting not using MPI when running with mpirun/mpiexec

查看:579
本文介绍了使用mpirun/mpiexec运行时检测到不使用MPI的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序(在C ++ 11中),可以选择使用MPI并行运行该程序.该项目使用CMake进行配置,如果无法找到MPI,则CMake会自动禁用MPI,并显示有关该消息的警告消息.

I am writing a program (in C++11) that can optionally be run in parallel using MPI. The project uses CMake for its configuration, and CMake automatically disables MPI if it cannot be found and displays a warning message about it.

但是,我担心一个完全合理的用例,即用户在HPC群集上配置和编译程序,忘记加载MPI模块,并且没有注意到警告.然后,该用户可能会尝试运行该程序,请注意未找到mpirun,包括MPI模块,但忘记了重新编译.如果用户随后使用mpirun运行该程序,则可以运行,但是由于在编译时禁用了MPI,因此该程序将运行许多次而没有任何并行化.为了防止用户认为程序正在并行运行,在这种情况下,我想使程序显示一条错误消息.

However, I am worrying about a perfectly plausible use case whereby a user configures and compiles the program on an HPC cluster, forgets to load the MPI module, and does not notice the warning. That same user might then try to run the program, notice that mpirun is not found, include the MPI module, but forget to recompile. If the user then runs the program with mpirun, this will work, but the program will just run a number of times without any parallelization, as MPI was disabled at compile time. To prevent the user from thinking the program is running in parallel, I would like to make the program display an error message in this case.

我的问题是:如何在不使用MPI库函数的情况下检测到我的程序正在并行运行(因为MPI在编译时被禁用)? mpirun只是多次启动该程序,但据我所知,并没有告诉它所启动的进程是否并行运行.

My questions is: how can I detect that my program is being run in parallel without using MPI library functions (as MPI was disabled at compile time)? mpirun just launches the program a number of times, but does not tell the processes it launches about them being run in parallel, as far as I know.

我考虑过让程序编写一些测试文件,然后检查该文件是否已经存在,但是除了由于并发问题而可能难以完成这一事实外,无法保证mpirun甚至可以在共享文件系统的节点上启动各种进程.

I thought about letting the program write some test file, and then check if that file already exists, but apart from the fact that this might be tricky to do due to concurrency problems, there is no guarantee that mpirun will even launch the various processes on nodes that share a file system.

我还考虑过使用系统变量在两个进程之间进行通信,但是据我所知,没有做到这一点的系统独立方法(同样,这可能会导致并发问题,因为没有办法进行协调)各个进程之间的系统调用).

I also considered using a system variable to communicate between the two processes, but as far as I know, there is no system independent way of doing this (and again, this might cause concurrency issues, as there is no way to coordinate system calls between the various processes).

因此,目前我的想法已用尽,我非常感谢任何可能帮助我实现这一目标的建议.首选解决方案应独立于操作系统,尽管仅UNIX解决方案已经很有帮助.

So at the moment, I have run out of ideas, and I would very much appreciate any suggestions that might help me achieve this. Preferred solutions should by operating system independent, although a UNIX-only solution would already be of great help.

推荐答案

基本上,您想要检测是否在非MPI代码路径中被mpirun等运行.有一个非常相似的问题:我的程序如何检测是否已经通过mpirun启动已经存在一个不可移植的问题解决方案.

Basically, you want to run a a detection of whether you are being run by mpirun etc. in your non-MPI code-path. There is a very similar question: How can my program detect, whether it was launch via mpirun that already presents one non-portable solution.

检查由mpirun设置的环境变量.参见例如: http://www.open-mpi.org/faq /?category =运行#mpi-environmental-variables

Check for environment variables that are set by mpirun. See e.g.: http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables

作为另一种选择,您可以获取父进程的进程ID ,然后它是进程名称,并将其与已知的MPI启动器二进制文件列表进行比较,例如orted,slurmstepd,hydra?? 1 .不幸的是,关于此的所有内容还是不可移植的.

As another option, you could get the process id of the parent process and it's process name and compare it with a list of known MPI launcher binaries such as orted,slurmstepd,hydra??1. Everything about that is unfortunately again non-portable.

由于MPI标准没有明确定义启动本身,因此无法找到检测它的标准方法.

Since launching itself is not clearly defined by the MPI standard, there cannot be a standard way to detect it.

1:仅出于我的记忆,请不要从字面上看列表.

从用户体验的角度来看,我认为总是向用户清楚地显示程序的运行方式,例如:

From a user experience point of view, I would argue that always showing a clear message how the program is being run, such as:

连续运行FancySimulator.如果您将其视为mpirun的一部分,请使用FANCYSIM_MPI = True重建FancySimuilator.

Running FancySimulator serially. If you see this as part of mpirun, rebuild FancySimuilator with FANCYSIM_MPI=True.

与120个MPI进程并行运行FancySimulator.

Running FancySimulator in parallel with 120 MPI processes.

将解决"问题.希望用户收到120条乱码的邮件.

would "solve" the problem. A user getting 120 garbled messages will hopefully notice.

这篇关于使用mpirun/mpiexec运行时检测到不使用MPI的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆