MPI中的Python多处理 [英] Python multiprocessing within mpi

查看:129
本文介绍了MPI中的Python多处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用多处理模块编写的python脚本,可以更快地执行.计算令人尴尬地是并行的,因此效率会随着处理器数量的增加而缩放.现在,我想在MPI程序中使用此程序,该程序可以管理多台计算机上的MCMC计算.此代码具有对system()的调用,该调用将调用python脚本.但是,我发现以这种方式调用它时,使用python多处理的效率增益消失了.

I have a python script that I've written using the multiprocessing module, for faster execution. The calculation is embarrassingly parallel, so the efficiency scales with the number of processors. Now, I'd like to use this within an MPI program, which manages an MCMC calculation across multiple computers. This code has a call to system() which invokes the python script. However, I'm finding that when it is called this way, the efficiency gain from using python multiprocessing vanishes.

当从MPI调用时,如何保留我的python脚本以保留多处理带来的速度提升?

How can I get my python script to retain the speed gains from multiprocessing when called from MPI?

这是一个简单的示例,类似于我要使用的更复杂的代码,但显示相同的一般行为.我编写了一个名为junk.py的可执行python脚本.

Here is a simple example, which is analogous to the much more complicated codes I want to use but displays the same general behavior. I write an executable python script called junk.py.

#!/usr/bin/python
import multiprocessing
import numpy as np

nproc = 3
nlen = 100000


def f(x):
    print x
    v = np.arange(nlen)
    result = 0.
    for i, y in enumerate(v):
        result += (x+v[i:]).sum()
    return result


def foo():
    pool = multiprocessing.Pool(processes=nproc)
    xlist = range(2,2+nproc)
    print xlist
    result = pool.map(f, xlist)
    print result

if __name__ == '__main__':
    foo()

当我从外壳程序本身运行时,使用"top",我可以看到三个python进程,每个进程在16核计算机上占用100%的cpu.

When I run this from the shell by itself, using "top" I can see three python processes each taking 100% of cpu on my 16-core machine.

node094:mpi[ 206 ] /usr/bin/time junk.py
[2, 3, 4]
2
3
4
[333343333400000.0, 333348333450000.0, 333353333500000.0]
62.68user 0.04system 0:21.11elapsed 297%CPU (0avgtext+0avgdata 16516maxresident)k
0inputs+0outputs (0major+11092minor)pagefaults 0swaps

但是,如果我使用mpirun调用它,则每个python进程占用33%的cpu,总的来说,运行时间大约是其三倍.用-np 2或更高的值调用会导致更多的进程,但不会加快计算速度.

However, if I invoke this with mpirun, each python process takes 33% of cpu, and overall it takes about three times as long to run. Calling with -np 2 or more results in more processes, but doesn't speed up the computation any.

node094:mpi[ 208 ] /usr/bin/time mpirun -np 1 junk.py
[2, 3, 4]
2
3
4
[333343333400000.0, 333348333450000.0, 333353333500000.0]
61.63user 0.07system 1:01.91elapsed 99%CPU (0avgtext+0avgdata 16520maxresident)k
0inputs+8outputs (0major+13715minor)pagefaults 0swaps

(附加说明:这是mDerun 1.8.1,Linux Debian版本wheezy上的python 2.7.3.我听说MPI程序中并不总是允许system(),但是最近五年它一直在为我工作例如,我在MPI程序中从system()调用了基于pthread的并行代码,根据需要,每个线程都使用了100%的cpu.此外,以防您建议运行python脚本串行并仅在更多节点上调用它... MCMC计算涉及固定数量的链,这些链需要以同步方式移动,因此不幸的是,该计算无法重新组织.)

(Additional notes: This is mpirun 1.8.1, python 2.7.3 on Linux Debian version wheezy. I have heard system() is not always allowed within MPI programs, but it's been working for me for the last five years on this computer. For example I have called a pthread-based parallel code from system() within an MPI program, and it's used 100% of cpu for each thread, as desired. Also, in case you were going to suggest running the python script in serial and just calling it on more nodes...the MCMC calculation involves a fixed number of chains which need to move in a synchronized way, so the computation unfortunately can't be reorganized that way.)

推荐答案

OpenMPI's mpirun, v1.7 and later, defaults to binding processes to cores - that is, when it launches the python junk.py process, it binds it to the core that it will run on. That's fine, and the right default behaviour for most MPI use cases. But here each MPI task is then forking more processes (through the multiprocessing package), and those forked processes inherit the binding state of their parent - so they're all bound to the same core, fighting amongst themselves. (The "P" column in top will show you they're all on the same processor)

如果您使用mpirun -np 2,则会发现两组三个进程,每个进程位于不同的内核上,每个进程之间相互竞争.

If you mpirun -np 2, you'll find two sets of three processes, each on a different core, each contending amongst themselves.

使用OpenMPI,您可以通过关闭绑定来避免这种情况,

With OpenMPI, you can avoid this by turning off binding,

mpirun -np 1 --bind-to none junk.py

或选择其他一些绑定,这些绑定对于您的跑步最终几何形状是有意义的. MPICH具有与hydra类似的选项.

or choosing some other binding which makes sense given the final geometry of your run. MPICH has similar options with hydra.

请注意,带有mpi的子流程的fork()ing isn' t总是安全或受支持的,特别是在运行infiniband互连的群集中,但是OpenMPI的mpirun/mpiexec将在不安全的情况下向您发出警告.

Note that the fork()ing of subprocesses with mpi isn't always safe or supported, particularly with clusters running with infiniband interconnects, but OpenMPI's mpirun/mpiexec will warn you if it isn't safe.

这篇关于MPI中的Python多处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆