杀死一个mpi进程 [英] Kill an mpi process
问题描述
我想知道是否有办法让一个 MPI 进程向另一个 MPI 进程发送终止信号?
I would like to know if there is a way that an MPI process send a kill signal to another MPI process?
或者不同的是,当进程中的一个仍然处于活动状态时,有没有一种方法可以优雅地退出 MPI 环境?(即 mpi_abort() 打印错误消息).
Or differently, is there a way to exit from an MPI environment graciously, when one of the process is still active? (i.e. mpi_abort() prints an error message).
谢谢
推荐答案
不,这在使用 MPI 库的 MPI 应用程序中是不可能的.
No, this is not possible within an MPI application using the MPI library.
单个进程不会知道其他进程的位置,也不会知道其他进程的进程 ID - MPI 规范中没有任何内容可以执行您想要的终止操作.
Individual processes would not be aware of the location of the other processes, nor of the process IDs of the other processes - and there is nothing in the MPI spec to make the kill you are wanting.
如果您要手动执行此操作,则需要 MPI_Alltoall 以在整个系统中交换进程 ID 和主机名,然后您需要生成 ssh/rsh 以在您想杀死某些东西时访问所需的节点.总而言之,它不便携,不干净.
If you were to do this manually, then you'd need to MPI_Alltoall to exchange process IDs and hostnames across the system, and then you would need to spawn ssh/rsh to visit the required node when you wanted to kill something. All in all, it's not portable, not clean.
MPI_Abort 是实现您想要实现的目标的正确方法.来自 Open MPI 手册:
MPI_Abort is the right way to do what you are trying to achieve. From the Open MPI manual:
此例程进行了最佳尝试"以中止通信组中的所有任务."(即 MPI_Abort(MPI_COMM_WORLD, -1) 正是您所需要的.
"This routine makes a "best attempt" to abort all tasks in the group of comm." (ie. MPI_Abort(MPI_COMM_WORLD, -1) is what you need.
MPI_Abort 期间的任何输出都将是特定于机器的 - 因此您可能会或可能不会收到您提到的错误消息.
Any output during MPI_Abort would be machine specific - so you may, or may not, receive the error message you mention.
这篇关于杀死一个mpi进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!