串行调用MPI二进制文件作为MPI应用程序的子进程 [英] Calling mpi binary in serial as subprocess of mpi application

查看:169
本文介绍了串行调用MPI二进制文件作为MPI应用程序的子进程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大型并行(使用MPI)模拟应用程序,该应用程序会生成大量数据.为了评估这些数据,我使用了python脚本.

I have a large parallel (using MPI) simulation application which produces large amounts of data. In order to evaluate this data I use a python script.

我现在要做的是多次运行此应用程序(> 1000),然后根据所得数据计算统计属性.

What I now need to do is to run this application a large number of times (>1000) and calculate statistical properties from the resulting data.

到目前为止,我的方法是让python脚本并行运行(使用mpi4py,即使用48个节点),并使用subprocess.check_call调用模拟代码. 我需要此调用才能以串行方式运行我的mpi仿真应用程序. 在这种情况下,我不需要模拟也可以并行运行. 然后,python脚本可以并行分析数据,完成后将启动新的模拟运行,直到积累了大量运行.

My approach up until now is, to have a python script running in parallel (using mpi4py, using i.e. 48 nodes) calling the simulation code using subprocess.check_call. I need this call to run my mpi simulation application in serial. I do not need the simulation to also run in parallel in this case. The python script can then analyze the data in parallel and after finishing it will startup a new simulation run till a large number of runs is accumulated.

目标是

  • 不保存2000次运行的整个数据集
  • 将中间数据保存在内存中

存根MWE:

from mpi4py import MPI
import subprocess

print "Master hello"

call_string = 'python multi_call_slave.py'

comm = MPI.COMM_WORLD

rank = comm.Get_rank()
size = comm.Get_size()

print "rank %d of size %d in master calling: %s" % (rank, size, call_string)

std_outfile = "./sm_test.out"
nr_samples = 1
for samples in range(0, nr_samples):
    with open(std_outfile, 'w') as out:
        subprocess.check_call(call_string, shell=True, stdout=out)
#       analyze_data()
#       communicate_results()

文件multi_call_slave.py(这是C模拟代码):

file multi_call_slave.py (this would be the C simulation code):

from mpi4py import MPI

print "Slave hello"

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
print "rank %d of size %d in slave" % (rank, size)

这将不起作用.结果输出到stdout:

This will not work. Resulting output in stdout:

Master hello
rank 1 of size 2 in master calling: python multi_call_slave_so.py
Master hello
rank 0 of size 2 in master calling: python multi_call_slave_so.py
[cli_0]: write_line error; fd=7 buf=:cmd=finalize
:
system msg for write_line failure : Broken pipe
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(311).....: MPI_Finalize failed
MPI_Finalize(229).....: 
MPID_Finalize(150)....: 
MPIDI_PG_Finalize(126): PMI_Finalize failed, error -1
[cli_1]: write_line error; fd=8 buf=:cmd=finalize
:
system msg for write_line failure : Broken pipe
Fatal error in MPI_Finalize: Other MPI error, error stack:
MPI_Finalize(311).....: MPI_Finalize failed
MPI_Finalize(229).....: 
MPID_Finalize(150)....: 
MPIDI_PG_Finalize(126): PMI_Finalize failed, error -1

sm_test.out中的结果输出:

Slave hello
rank 0 of size 2 in slave

原因是,子进程假定将作为并行应用程序运行,而我打算将其作为串行应用程序运行. 作为一种非常"hacky"的解决方法,我执行了以下操作:

The reason is, that the subprocess assumes to be run as a parallel application, whereas I intend to run it as a serial application. As a very "hacky" workaround I did the following:

  • 编译具有特定mpi分布(即intel mpi)的所有需要​​的mpi感知库
  • 使用不同的mpi库(即openmpi)编译仿真代码

如果我现在使用intel mpi启动并行python脚本,则底层仿真将不会意识到周围的并行环境,因为它正在使用其他库.

If I would now start my parallel python script using intel mpi, the underlying simulation would not be aware of the surrounding parallel environment as it was using a different library.

这可以工作一段时间,但不幸的是,它不是很容易移植,并且由于各种原因很难在不同的集群上进行维护.

This worked fine for a while, but unfortunately is not very portable and difficult to maintain on different clusters for various reasons.

我可以

  • 使用srun将子流程调用循环放入shell脚本
    • 将在HD上强制执行任务缓冲结果
    • put the subprocess calling loop into a shell script using srun
      • Would mandate buffering results on HD
      • 不打算那样用
      • 难以确定子流程是否完成
      • 适当地更改必要的C代码
      • 尝试操作环境变量无济于事
      • 也不打算那样用
      • 使用mpirun -n 1srun进行子流程调用无济于事
      • tried manipulating the environment variables to no avail
      • also not meant to be used like that
      • using mpirun -n 1 or srun for the subprocess call does not help

      是否有任何优雅的官方方法?我真的没有主意,不胜感激!

      Is there any elegant, official way of doing this? I am really out of ideas and appreciate any input!

      推荐答案

      不,这既没有优雅的方法,也没有官方的方法.在MPI应用程序中执行其他程序的唯一官方支持的方法是使用MPI_Comm_spawn.通过简单的操作系统机制(例如subprocess提供的那种机制)来生成子MPI进程是危险的,在某些情况下甚至可能带来灾难性的后果.

      No, there is neither an elegant nor an official way to do this. The only officially supported way to execute other programs from within an MPI application is the use of MPI_Comm_spawn. Spawning child MPI processes via simple OS mechanisms like the one provided by subprocess is dangerous and could even have catastrophic consequences in certain cases.

      尽管MPI_Comm_spawn没有提供一种机制来确定子进程何时退出,但是您可以使用内部通信屏障来模拟它.您仍然会遇到问题,因为MPI_Comm_spawn调用不允许任意重定向标准I/O,而是将其重定向到mpiexec/mpirun.

      While MPI_Comm_spawn does not provide a mechanism to find out when the child process has exited, you could kind of simulate it with an intercomm barrier. You will still face problems since the MPI_Comm_spawn call does not allow for the standard I/O to be redirected arbitrarily and instead it gets redirected to mpiexec/mpirun.

      您可以做的是编写一个包装器脚本,该脚本删除MPI库为了传递会话信息而可能使用的所有可能途径.对于Open MPI,它将是任何以OMPI_开头的环境变量.对于Intel MPI,它将是以I_开头的变量.等等.某些库可能使用文件或共享内存块或某些其他OS机制,您也必须注意这些问题.一旦消除了用于传达MPI会话信息的任何可能的机制,您就可以简单地启动可执行文件,并且它应构成一个单例MPI作业(即,表现为与mpiexec -n 1一起运行).

      What you could do is to write a wrapper script that deletes all possible pathways that the MPI library might use in order to pass session information around. For Open MPI that would be any environment variable that starts with OMPI_. For Intel MPI that would be variables that start with I_. And so on. Some libraries might use files or shared memory blocks or some other OS mechanisms and you'll have to take care of those too. Once any possible mechanism to communicate MPI session information has been eradicated, you could simply start the executable and it should form a singleton MPI job (that is, behave as if run with mpiexec -n 1).

      这篇关于串行调用MPI二进制文件作为MPI应用程序的子进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆