MPI杀死不需要的进程 [英] MPI kill unwanted processes

查看:949
本文介绍了MPI杀死不需要的进程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用带有C绑定的OpenMPI.在我的代码中,需要一定数量的进程.如果执行MPI使得打开的进程比所需的更多,我希望终止或终止多余的进程.我该怎么办?

I'm using OpenMPI with C bindings. In my code, there is a required number of processes. If MPI is executed such that more processes are opened than are required, I wish to kill or terminate the extra processes. How can I do that?

当我尝试使用几种可以想到的方式进行操作时,出现以下错误:

When I try to do that several ways I can think of, I get the following error:

mpirun has exited due to process rank 3 with PID 24388 on
node pc15-373 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

推荐答案

除以下内容外,我对High Performance Mark已写的内容没有太多补充.您实际上可以调用MPI_FINALIZE并退出多余的进程,但是您必须意识到以下事实:这将干扰世界通讯器MPI_COMM_WORLD上的所有其他集体操作-大多数操作根本无法完成(使用肯定会挂起).为防止这种情况,您可能需要首先创建一个新的通信器,以排除所有不必要的过程:

I don't have much to add to what High Performance Mark has already written except the following. You can actually call MPI_FINALIZE and exit processes that come in excess but you have to be aware of the fact that this will disrupt all further collective operations on the world communicator MPI_COMM_WORLD - most of them would simply not complete (with MPI_BARRIER being the one that would certainly hang). To prevent this you might want to first create a new communicator that excludes all unnecessary processes:

int rank, size;    
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);

// Obtain the group of processes in the world communicator
MPI_Group world_group;
MPI_Comm_group(MPI_COMM_WORLD, &world_group);

// Remove all unnecessary ranks
MPI_Group new_group;
int ranges[3] = { process_limit, size-1, 1 };
MPI_Group_range_excl(world_group, 1, ranges, &new_group);

// Create a new communicator
MPI_Comm newworld;
MPI_Comm_create(MPI_COMM_WORLD, new_group, &newworld);

if (newworld == MPI_COMM_NULL)
{
   // Bye bye cruel world
   MPI_Finalize();
   exit(0);
}

// From now on use newworld instead of MPI_COMM_WORLD

此代码首先获取MPI_COMM_WORLD中的进程组,然后创建一个新组,该组将排除process_limit之后的所有进程.然后,它从新的流程组创建一个新的通信器. MPI_COMM_CREATE操作将在不属于新组的这些进程中返回MPI_COMM_NULL,并且此事实用于终止此类进程.鉴于这一点,某些过程将在MPI_COMM_WORLD中消失",因此它不再可用于广播,障碍等集体操作,而应使用newworld.

This code first obtains the group of processes in MPI_COMM_WORLD and then creates a new group that excludes all processes from process_limit onwards. Then it creates a new communicator from the new process group. The MPI_COMM_CREATE operation would return MPI_COMM_NULL in these processes that are not part of the new group and this fact is used to terminate such processes. Given the fact that after this point some of the processes would have "disappeared" from MPI_COMM_WORLD, it is no longer usable for collective operations like broadcasts, barriers, etc. and newworld should be used instead.

此外,正如Mark所指出的,在某些体系结构上,即使从main返回后,额外的过程实际上仍会徘徊不前.例如,在Blue Gene,Cray或使用硬件分区管理MPI作业的任何其他系统上,只有在整个MPI作业完成后,才会释放额外的资源.如果程序在资源管理器(例如SGE,LSF,Torque,PBS,SLURM等)的控制下在群集或其他系统上运行,情况也将如此.

Also, as Mark has pointed, on some architectures the extra processes might actually linger around even after they have returned from main. For example on Blue Gene, or Cray, or any other system that uses hardware partitions to manage MPI jobs, the additional resources would not be freed until the whole MPI job has finished. This would also be the case if the program is being run on a cluster or other system under the control of a resource manager (e.g. SGE, LSF, Torque, PBS, SLURM, etc.).

对于这些情况,我通常的做法非常务实:

My usual approach to such cases is very pragmatic:

int size, rank;

MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (size != process_limit)
{
   if (rank == 0)
      printf("Please run this program with %d MPI processes\n", process_limit);
   MPI_Finalize();
   exit(1);
}

您也可以使用MPI_Abort(MPI_COMM_WORLD, 0);而不是MPI_Finalize()来惹恼用户:)

You could also use MPI_Abort(MPI_COMM_WORLD, 0); instead of MPI_Finalize() to annoy the user :)

您还可以使用MPI的流程生成功能,但这会使代码变得更加复杂,因为您必须处理内部通信者.

You can also use the process spawning features of MPI, but this would made the code more complex as you would have to deal with intercommunicators.

这篇关于MPI杀死不需要的进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆