用于mpirun的自定义中断处理程序 [英] a custom interrupt handler for mpirun

查看:527
本文介绍了用于mpirun的自定义中断处理程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

显然,mpirun使用SIGINT处理程序,将SIGINT信号转发"到它产生的每个进程.

Apparently, mpirun uses a SIGINT handler which "forwards" the SIGINT signal to each of the processes it spawned.

这意味着您可以为启用了mpi的代码编写中断处理程序,执行mpirun -np 3 my-mpi-enabled-executable,然后将为三个进程中的每个进程引发SIGINT.此后不久,mpirun退出.当您有一个小的自定义处理程序,该处理程序仅打印一条错误消息然后退出时,此方法可以很好地工作. 但是,当您的自定义中断处理程序执行不平凡的工作时(例如进行认真的计算或保留数据),该处理程序将无法运行完毕.我认为这是因为mpirun决定退出的太早了.

This means you can write an interrupt handler for your mpi-enabled code, execute mpirun -np 3 my-mpi-enabled-executable and then SIGINT will be raised for each of the three processes. Shortly after that, mpirun exits. This works fine when you have a small custom handler which only prints an error message and then exits. However, when your custom interrupt handler is doing a non-trivial job (e.g. doing serious computations or persisting data), the handler does not run to completion. I'm assuming this is because mpirun decided to exit too soon.

这是执行my-mpi-enabled-executable后按下ctrl-c时的标准错误(即导致SIGINT).这是理想的预期行为:

Here's the stderr upon pressing ctrl-c (i.e. causing SIGINT) after executing my-mpi-enabled-executable. This is the desirable expected behavior:

interrupted by signal 2.
running viterbi... done.
persisting parameters... done.
the master process will now exit.

这是执行mpirun -np 1 my-mpi-enabled-executable后按ctrl-c时的标准错误.这是有问题的行为:

Here's the stderr upon pressing ctrl-c after executing mpirun -np 1 my-mpi-enabled-executable. This is the problematic behavior:

interrupted by signal 2.
running viterbi... mpirun: killing job...

--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 8970 on node pharaoh exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished

回答以下任何一个问题都可以解决我的问题:

Answering any of the following questions will solve my problem:

  • 如何重写mpirun SIGINT处理程序(如果可能的话)?
  • 如何避免mpirun终止后立即产生的mpirun进程终止?
  • 在mpirun终止之前,还有其他信号可能会发送给mpirun到子进程吗?
  • 是否有办法捕获"所谓的信号0(未知信号0)"(请参见上面的第二个stderr)?

我正在linux上运行openmpi-1.6.3.

I'm running openmpi-1.6.3 on linux.

推荐答案

按照

As per the OpenMPI manpage you can send a SIGUSR1 or SIGUSR2 to mpirun which will forward it and not shut down itsself.

这篇关于用于mpirun的自定义中断处理程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆