用于mpirun的自定义中断处理程序 [英] a custom interrupt handler for mpirun
问题描述
显然,mpirun
使用SIGINT处理程序,将SIGINT信号转发"到它产生的每个进程.
Apparently, mpirun
uses a SIGINT handler which "forwards" the SIGINT signal to each of the processes it spawned.
这意味着您可以为启用了mpi的代码编写中断处理程序,执行mpirun -np 3 my-mpi-enabled-executable
,然后将为三个进程中的每个进程引发SIGINT.此后不久,mpirun退出.当您有一个小的自定义处理程序,该处理程序仅打印一条错误消息然后退出时,此方法可以很好地工作. 但是,当您的自定义中断处理程序执行不平凡的工作时(例如进行认真的计算或保留数据),该处理程序将无法运行完毕.我认为这是因为mpirun决定退出的太早了.
This means you can write an interrupt handler for your mpi-enabled code, execute mpirun -np 3 my-mpi-enabled-executable
and then SIGINT will be raised for each of the three processes. Shortly after that, mpirun exits. This works fine when you have a small custom handler which only prints an error message and then exits. However, when your custom interrupt handler is doing a non-trivial job (e.g. doing serious computations or persisting data), the handler does not run to completion. I'm assuming this is because mpirun decided to exit too soon.
这是执行my-mpi-enabled-executable
后按下ctrl-c
时的标准错误(即导致SIGINT).这是理想的预期行为:
Here's the stderr upon pressing ctrl-c
(i.e. causing SIGINT) after executing my-mpi-enabled-executable
. This is the desirable expected behavior:
interrupted by signal 2.
running viterbi... done.
persisting parameters... done.
the master process will now exit.
这是执行mpirun -np 1 my-mpi-enabled-executable
后按ctrl-c
时的标准错误.这是有问题的行为:
Here's the stderr upon pressing ctrl-c
after executing mpirun -np 1 my-mpi-enabled-executable
. This is the problematic behavior:
interrupted by signal 2.
running viterbi... mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 8970 on node pharaoh exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
回答以下任何一个问题都可以解决我的问题:
Answering any of the following questions will solve my problem:
- 如何重写mpirun SIGINT处理程序(如果可能的话)?
- 如何避免mpirun终止后立即产生的mpirun进程终止?
- 在mpirun终止之前,还有其他信号可能会发送给mpirun到子进程吗?
- 是否有办法捕获"所谓的信号0(未知信号0)"(请参见上面的第二个stderr)?
我正在linux上运行openmpi-1.6.3.
I'm running openmpi-1.6.3 on linux.
推荐答案
As per the OpenMPI manpage you can send a SIGUSR1
or SIGUSR2
to mpirun
which will forward it and not shut down itsself.
这篇关于用于mpirun的自定义中断处理程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!