具有机器文件的IPython MPI [英] IPython MPI with a Machinefile

查看:83
本文介绍了具有机器文件的IPython MPI的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在分布式计算中使用IPython的MPI功能.也就是说,我希望MPI可以与各种机器文件一起运行,以便可以添加多台机器.

I want to use IPython's MPI abilities with distributed computing. Namely I would like MPI to be run with a machine file of sorts so I can add multiple machines.

我忘了包括我的配置.

配置

~/.ipython/profile_default/ipcluster_config.py
# The command line arguments to pass to mpiexec.                                
c.MPILauncher.mpi_args = ["-machinefile ~/.ipython/profile_default/machinefile"]


# The mpiexec command to use in starting the process.                           
c.MPILauncher.mpi_cmd = ['mpiexec']

重击执行

$ dacluster start -n20
2015-06-10 16:16:46.661 [IPClusterStart] Starting ipcluster with [daemon=False]
2015-06-10 16:16:46.661 [IPClusterStart] Creating pid file: /home/aidan/.ipython/profile_default/pid/ipcluster.pid
2015-06-10 16:16:46.662 [IPClusterStart] Starting Controller with MPI
2015-06-10 16:16:46.700 [IPClusterStart] ERROR | IPython cluster: stopping
2015-06-10 16:16:47.667 [IPClusterStart] Starting 20 Engines with MPIEngineSetLauncher
2015-06-10 16:16:49.701 [IPClusterStart] Removing pid file: /home/aidan/.ipython/profile_default/pid/ipcluster.pid

机器文件

~/.ipython/profile_default/machinefile

localhost slots=8
aidan-slave slots=16

我可能会提到它在我运行时有效

I might mention that it works when I run

mpiexec -machinefile machinefile mpi_hello

该执行的输出包括主机名,因此我确定它实际上是在分发.再加上我在上面看.

And the output of that execution includes hostnames, so I am sure it is actually distributing. Plus I watch on top.

谢谢

推荐答案

我想我问得太早了.问题出在下面的行

I guess I asked too soon. the problem was in the below line

c.MPILauncher.mpi_args = ["-machinefile ~/.ipython/profile_default/machinefile"]

它应该在具有绝对路径的空格上分割

It should have been split on the spaces with absolute path

c.MPILauncher.mpi_args = ["-machinefile", "/home/aidan/.ipython/profile_default/machinefile"]

我希望这可以帮助某人.请注意,这仅解决了BASH输出中的问题.使用MPI与远程服务器(即aidan-slave)建立连接.如果启动dacluster,那么我会在顶部看到一堆启动的python会话,这是远程运行的IPython会话的症状.

I hope this can help someone. Note that this solves only the problem in the BASH output. The connection is made with MPI to a remote server (namely aidan-slave). If start the dacluster, then I see in top a bunch of python sessions start, symptomatic of a IPython session running remotely.

不幸的是,至少pi_montecarlo的DistArray示例无限期地挂起.我回到问题的根源,发现distarray的globalapi模块的context.py文件中的第736行中悬挂着该行.

Unfortunately, DistArray examples, at least pi_montecarlo, hang indefinitely. I worked back to the source of the issue and found that the line that is hanging in line 736 in the context.py file of the globalapi module in distarray.

def _execute(self, lines, targets):
    return self.view.execute(lines, targets=targets, block=True)

我认为这是MPI连接损坏或故障的征兆,因为该行似乎想在所有从属进程上执行命令.我不知道该如何解决.

I think this is a symptom of a broken or bad MPI connection because the line seems to want to execute a command on all the slaves processes. I don't know how to fix it.

这篇关于具有机器文件的IPython MPI的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆