MPI杀死不需要的进程 [英] MPI kill unwanted processes
问题描述
我正在使用带有C绑定的OpenMPI.在我的代码中,需要一定数量的进程.如果执行MPI使得打开的进程比所需的更多,我希望终止或终止多余的进程.我该怎么办?
I'm using OpenMPI with C bindings. In my code, there is a required number of processes. If MPI is executed such that more processes are opened than are required, I wish to kill or terminate the extra processes. How can I do that?
当我尝试使用几种可以想到的方式进行操作时,出现以下错误:
When I try to do that several ways I can think of, I get the following error:
mpirun has exited due to process rank 3 with PID 24388 on
node pc15-373 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
推荐答案
除以下内容外,我对High Performance Mark已写的内容没有太多补充.您实际上可以调用MPI_FINALIZE
并退出多余的进程,但是您必须意识到以下事实:这将干扰世界通讯器MPI_COMM_WORLD
上的所有其他集体操作-大多数操作根本无法完成(使用
I don't have much to add to what High Performance Mark has already written except the following. You can actually call MPI_FINALIZE
and exit processes that come in excess but you have to be aware of the fact that this will disrupt all further collective operations on the world communicator MPI_COMM_WORLD
- most of them would simply not complete (with MPI_BARRIER
being the one that would certainly hang). To prevent this you might want to first create a new communicator that excludes all unnecessary processes:
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
// Obtain the group of processes in the world communicator
MPI_Group world_group;
MPI_Comm_group(MPI_COMM_WORLD, &world_group);
// Remove all unnecessary ranks
MPI_Group new_group;
int ranges[3] = { process_limit, size-1, 1 };
MPI_Group_range_excl(world_group, 1, ranges, &new_group);
// Create a new communicator
MPI_Comm newworld;
MPI_Comm_create(MPI_COMM_WORLD, new_group, &newworld);
if (newworld == MPI_COMM_NULL)
{
// Bye bye cruel world
MPI_Finalize();
exit(0);
}
// From now on use newworld instead of MPI_COMM_WORLD
此代码首先获取MPI_COMM_WORLD
中的进程组,然后创建一个新组,该组将排除process_limit
之后的所有进程.然后,它从新的流程组创建一个新的通信器. MPI_COMM_CREATE
操作将在不属于新组的这些进程中返回MPI_COMM_NULL
,并且此事实用于终止此类进程.鉴于这一点,某些过程将在MPI_COMM_WORLD
中消失",因此它不再可用于广播,障碍等集体操作,而应使用newworld
.
This code first obtains the group of processes in MPI_COMM_WORLD
and then creates a new group that excludes all processes from process_limit
onwards. Then it creates a new communicator from the new process group. The MPI_COMM_CREATE
operation would return MPI_COMM_NULL
in these processes that are not part of the new group and this fact is used to terminate such processes. Given the fact that after this point some of the processes would have "disappeared" from MPI_COMM_WORLD
, it is no longer usable for collective operations like broadcasts, barriers, etc. and newworld
should be used instead.
此外,正如Mark所指出的,在某些体系结构上,即使从main
返回后,额外的过程实际上仍会徘徊不前.例如,在Blue Gene,Cray或使用硬件分区管理MPI作业的任何其他系统上,只有在整个MPI作业完成后,才会释放额外的资源.如果程序在资源管理器(例如SGE,LSF,Torque,PBS,SLURM等)的控制下在群集或其他系统上运行,情况也将如此.
Also, as Mark has pointed, on some architectures the extra processes might actually linger around even after they have returned from main
. For example on Blue Gene, or Cray, or any other system that uses hardware partitions to manage MPI jobs, the additional resources would not be freed until the whole MPI job has finished. This would also be the case if the program is being run on a cluster or other system under the control of a resource manager (e.g. SGE, LSF, Torque, PBS, SLURM, etc.).
对于这些情况,我通常的做法非常务实:
My usual approach to such cases is very pragmatic:
int size, rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (size != process_limit)
{
if (rank == 0)
printf("Please run this program with %d MPI processes\n", process_limit);
MPI_Finalize();
exit(1);
}
您也可以使用MPI_Abort(MPI_COMM_WORLD, 0);
而不是MPI_Finalize()
来惹恼用户:)
You could also use MPI_Abort(MPI_COMM_WORLD, 0);
instead of MPI_Finalize()
to annoy the user :)
您还可以使用MPI的流程生成功能,但这会使代码变得更加复杂,因为您必须处理内部通信者.
You can also use the process spawning features of MPI, but this would made the code more complex as you would have to deal with intercommunicators.
这篇关于MPI杀死不需要的进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!