在MPI应用程序中处理信号/正常退出 [英] Handling Signals in an MPI Application / Gracefully exit

查看:133
本文介绍了在MPI应用程序中处理信号/正常退出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在MPI应用程序中安全地处理信号(例如SIGUSR1,它应告知应用程序其运行时间已到期,并应在接下来的10分钟内终止.) 我有几个限制条件:

How can signals be handled safley in and MPI application (for example SIGUSR1 which should tell the application that its runtime has expired and should terminate in the next 10 min.) I have several constraints:

  • 在退出应用程序之前先完成所有并行/串行IO!
  • 在所有其他情况下,应用程序均可正常退出

如何安全地做到这一点,在尝试退出时没有死锁,并正确地使当前上下文跳回到main()并调用MPI_FINALIZE()? 退出时流程一定要达成一致(我认为在多线程应用程序中是相同的),但是如何有效地完成而又无需太多沟通?有没有人知道一些正确的方法来做到这一点?

How can this be achieved safely, no deadlocks while trying to exit, and properly leaving the current context jumping back to main() and calling MPI_FINALIZE() ? Somehow the processes have to aggree on exiting (I think this is the same in multithreaded applicaitons) but how is that done efficiently without having to communicate to much? Is anybody aware of some standart way of doing this properly?

下面是一些可能会行或可能行不通的想法:

Below are some thought which might or might not work:

想法1:
可以说,对于每个过程,我们将信号捕获到信号处理程序中,然后将其推入未处理的信号堆栈"(USS)中,然后简单地从信号处理程序例程中返回.然后,我们在应用程序中会有某些终结点,尤其是在IO操作之前和之后,这些终结点随后会处理USS中的所有信号. 例如,如果USS中有SIGUSR1,则每个进程将在终止点退出.

Idea 1:
Lets say for each process we catch the signal in a signal handler and push it on a "unhandled signals stack" (USS) and we simply return from the signal handler routine . We then have certain termination points in our application especially before and after IO operations which then handle all signals in USS. If there is a SIGUSR1 in USS for example, each process would then exit at a termination point.

  • 这个想法的问题是仍然存在死锁,进程1只是捕获终止点之前的信号,而进程2已经通过了这一点,现在开始并行IO.进程1将退出,这将导致进程2陷入僵局(等待IO退出的进程1)...

想法2:
只有主进程0在信号处理程序中捕获信号,然后发送广播消息:所有进程退出!"在应用程序中的特定位置.所有进程都接收广播和引发,并在main中捕获到异常,并调用MPI_FINALIZE.

Idea 2:
Only the master process 0 catches the signal in the signal handler and then sends a broadcast message : "all process exit!" at a specific point in the application. All processes receive the broadcast and throw and exception which is catched in main and MPI_FINALIZE is called.

  • 这种方式可以安全地退出,但要付出一定的代价,即必须不断接收广播消息以查看是否应该退出

非常感谢!

推荐答案

通常,在MPI应用程序中使用信号并不安全.一些实现可能支持它,而其他一些则可能不支持.

Using signals in your MPI application in general is not safe. Some implementations may support it and others may not.

例如,在MPICH中,进程管理器使用SIGUSR1来内部通知异常故障.

For instance, in MPICH, SIGUSR1 is used by the process manager for internal notification of abnormal failures.

http://lists.mpich.org/pipermail/discuss/2014-October/003242.html

另一个上的打开MPI将SIGUSR1SIGUSR2mpiexec转发到其他进程.

Open MPI on the other had will forward SIGUSR1 and SIGUSR2 from mpiexec to the other processes.

http://www.open- mpi.org/doc/v1.6/man1/mpirun.1.php#sect14

其他实现将有所不同.因此,在您走这条路之前,请确保所使用的实现可以处理它.

Other implementations will differ. So before you go too far down this route, make sure that the implementation you're using can deal with it.

这篇关于在MPI应用程序中处理信号/正常退出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆