使多处理池适应mpi4py [英] adapt multiprocessing Pool to mpi4py

查看:127
本文介绍了使多处理池适应mpi4py的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用多处理池在Python中运行并行化仿真,并且它在具有多核的计算机上运行良好.现在,我想在使用多个节点的群集上执行程序.我想多重处理不能应用于分布式内存.但是mpi4py似乎是一个不错的选择.那么,与这些代码最简单的mpi4py对等是什么:

I'm using multiprocessing Pool to run a parallelized simulation in Python and it works well in a computer with multiple cores. Now I want to execute the program on a cluster using several nodes. I suppose multiprocessing cannot apply on distributed memory. But mpi4py seems a good option. So what is the simplest mpi4py equivalence to these codes:

from multiprocessing import Pool

pool = Pool(processes=16)

pool.map(functionName,parameters_list)

推荐答案

有一个基于mpi4py的老软件包,它为MPI作业启用了功能并行映射.它不是为了提高速度而设计的-它是为了支持从解释器到计算群集的MPI并行映射而构建的(即无需从命令行的mpiexec运行).本质上:

There's an old package of mine that is built on mpi4py which enables a functional parallel map for MPI jobs. It's not built for speed -- it was built to enable aMPI parallel map from the interpreter onto a compute cluster (i.e. without the need to run from the mpiexec from the command line). Essentially:

>>> from pyina.launchers import MpiPool, MpiScatter
>>> pool = MpiPool()
>>> jobs = MpiScatter()
>>> def squared(x):
...   return x**2
... 
>>> pool.map(squared, range(4))
[0, 1, 4, 9]
>>> jobs.map(sqaured, range(4))
[0, 1, 4, 9]

炫耀将工作分配给工人的工人池"策略和分散聚集"策略.当然,我不会将它用于像squared这样的小工作,因为产生MPI世界的开销确实很高(比设置multiprocessing Pool高得多).但是,如果您要执行大量工作,就像通常在使用MPI的群集上运行一样,那么pyina对您来说可能是一个很大的好处.

Showing off the "worker pool" strategy and the "scatter-gather" strategy of distributing jobs to the workers. Of course, I wouldn't use it for such a small job like squared because the overhead of spawning the MPI world is really quite high (much higher than setting up a multiprocessing Pool). However, if you have a big job to run, like you would normally run on a cluster using MPI, then pyina can be a big benefit for you.

但是,使用pyina的真正优势在于,它不仅可以使用MPI生成作业,而且还可以将作业生成到调度程序. pyina理解并抽象了多个调度程序的启动语法.

However, the real advantage of using pyina is that it can not only spawn jobs with MPI, but it can spawn jobs to a scheduler. pyina understands and abstracts the launch syntax for several schedulers.

使用调度程序对pyina地图的典型调用如下:

A typical call to a pyina map using a scheduler goes like this:

>>> # instantiate and configure a scheduler
>>> from pyina.schedulers import Torque
>>> config = {'nodes'='32:ppn=4', 'queue':'dedicated', 'timelimit':'11:59'}
>>> torque = Torque(**config)
>>> 
>>> # instantiate and configure a worker pool
>>> from pyina.launchers import Mpi
>>> pool = Mpi(scheduler=torque)
>>>
>>> # do a blocking map on the chosen function
>>> pool.map(pow, [1,2,3,4], [5,6,7,8])
[1, 64, 2187, 65536]

可以使用几种常用配置作为预配置映射. 以下与上面的示例相同:

Several common configurations are available as pre-configured maps. The following is identical to the above example:

>>> # instantiate and configure a pre-configured worker pool
>>> from pyina.launchers import TorqueMpiPool
>>> config = {'nodes'='32:ppn=4', 'queue':'dedicated', 'timelimit':'11:59'}
>>> pool = TorqueMpiPool(**config)
>>>
>>> # do a blocking map on the chosen function
>>> pool.map(pow, [1,2,3,4], [5,6,7,8])
[1, 64, 2187, 65536]

pyina需要一些TLC,因为它仍然是python2.7,并且它已经有好几年没有发布了……但是它一直保持最新(在github上),并且能够得到这份工作".对我来说是过去".在过去的10年中,我在大型计算集群上运行工作-尤其是与pathos结合使用时(它提供了ssh隧道和multiprocessingParallelPython映射的统一接口). pyina尚未使用共享内存,但是确实在尴尬的情况下很好地执行了并行计算.通常,与调度程序的交互非常好,但是在几种故障情况下,与边缘的交互可能会有些粗糙-非阻塞映射需要进行大量工作.话虽这么说,它提供了一个非常有用的界面,可以在具有MPI的群集上运行令人尴尬的并行作业.

pyina needs some TLC, in that it's still python2.7 and that it hasn't had a release in several years… but it's been kept up to date (on github) otherwise and is able to "get the job done" for me running jobs on large-scale computing clusters over the past 10 years -- especially when coupled with pathos (which provides ssh tunneling and a unified interface for multiprocessing and ParallelPython maps). pyina doesn't yet utilize shared memory, but does do embarrassingly functional parallel computing pretty well. The interactions with the scheduler are pretty good in general, but can be a bit rough around the edges for several failure cases -- and the non-blocking maps need a lot of work. That having been said, it provides a pretty useful interface to run embarrassingly parallel jobs on a cluster with MPI.

在此处获取pyina(和pathos): https://github.com/uqfoundation

这篇关于使多处理池适应mpi4py的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆