mpi4py:关闭MPI Spawn吗? [英] mpi4py: close MPI Spawn?

查看:128
本文介绍了mpi4py:关闭MPI Spawn吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些python代码,在这些代码中我经常产生多个进程.我收到一个错误:

ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 809

我的代码大致如下:

import mpi4py
comm = MPI.COMM_WORLD
...
icomm = MPI.COMM_SELF.Spawn(sys.executable,args=["front_process.py",str(rank)],maxprocs=no_fronts)
...
message = icomm.recv(source=MPI.ANY_SOURCE,tag=21)
...
icomm.Free()

Spawn命令经常被调用,尽管给出了icomm.Free()命令,但我认为它们在我完成后仍保持打开"状态.如何正确关闭"生成的进程?

"

您可以通过在链接它们的所有互连器的两端调用MPI_COMM_DISCONNECT来断开进程.等效的mpi4py调用可能是icomm.Disconnect().

仍然看到的错误可能来自orterun(符号为mpirunmpiexec)而不是来自主排名. orterun是启动所有MPI流程(初始流程和后来生成的流程)的,然后将其标准输出重定向到其自己的标准输出,以便您可以查看每个等级的输出.在本地主机上启动进程时,orterun使用简单的fork()/exec()机制作为odls框架的一部分来生成新的等级,并使用管道来检测成功启动和IO转发.启动检测管道仅在很短的时间内打开,但是只要队列正在运行,IO转发管道就保持打开状态.如果同时运行多个等级,则许多管道将保持打开状态,因此会出现错误消息.

该错误消息有点令人误解,因为存在两种情况:描述符太多",并且Open MPI不能区分它们.第一种情况是达到硬核限制,但这通常是一个巨大的价值.第二种情况是达到了每个进程对文件描述符数量的限制.后者可以通过ulimit命令进行控制.您应使用ulimit -n检查您的情况下的值,并最终将其增加.例如:

user@host$ ulimit -n 123456
user@host$ mpiexec -n 1 ... ./spawning_code.py arg1 arg2 ...

此处123456是所需的描述符数量限制,并且不能超过ulimit -nH可获得的硬限制.如果您是通过脚本运行程序(出于方便或因为您将作业提交到某个批处理排队系统),则应在调用mpirun/mpiexec之前在脚本中放置ulimit -n行. >

在上面的文字中, rank process 也用于指代同一事物.

I have some python code in which I very often Spawn multiple processes. I get an error:

ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file odls_default_module.c at line 809

My code roughly looks like this

import mpi4py
comm = MPI.COMM_WORLD
...
icomm = MPI.COMM_SELF.Spawn(sys.executable,args=["front_process.py",str(rank)],maxprocs=no_fronts)
...
message = icomm.recv(source=MPI.ANY_SOURCE,tag=21)
...
icomm.Free()

The Spawn command is called very often and I think that they remain "open" after I am finished despite giving the icomm.Free() command. How do I properly "close" a spawned process?

解决方案

The MPI specification for MPI_COMM_FREE states that "... the object is actually deallocated only if there are no other active references to it." You can disconnect processes by calling MPI_COMM_DISCONNECT on both ends of all intercommunicators that link them. The equivalent mpi4py call is probably icomm.Disconnect().

Still the error that you see probably comes from orterun (symlinked as mpirun and mpiexec) and not from the master rank. orterun is the one who launches all MPI processes (the initial ones and those spawned later) and then redirects their standard output to its own standard output so that you can see the output from each rank. When processes are started on the local host, orterun uses simple fork()/exec() mechanism as part of the odls framework to spawn new ranks and makes use of pipes for detection of successful launch and for IO forwarding. The launch detection pipes are open only for a very short period of time but the IO forwarding pipes remain open as long as the rank is running. If you have many ranks running at the same time, lots of pipes will stay open and hence the error message.

The error message is a bit misleading since there are two cases of "too many descriptors" and Open MPI does not distinguish between them. The first case is when the hard kernel limit is reached but this is usually a huge value. The second case is when the per-process limit on the number of file descriptors is reached. The latter can be controlled with the ulimit command. You should check the value in your case with ulimit -n and eventually increase it. For example:

user@host$ ulimit -n 123456
user@host$ mpiexec -n 1 ... ./spawning_code.py arg1 arg2 ...

Here 123456 is the desired limit on the number of descriptors and it cannot exceed the hard limit that can be obtained with ulimit -nH. If you are running your program from a script (either for convenience or because you submit jobs to some batch queueing system), you should put the ulimit -n line in the script before the call to mpirun/mpiexec.

Also in the text above the words rank and process are used to refer to the same thing.

这篇关于mpi4py:关闭MPI Spawn吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆