bash:/usr/bin/hydra_pmi_proxy:没有这样的文件或目录 [英] bash: /usr/bin/hydra_pmi_proxy: No such file or directory

查看:1095
本文介绍了bash:/usr/bin/hydra_pmi_proxy:没有这样的文件或目录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Ubuntu中设置MPICH2集群之后,我正在努力建立MPI集群一个>教程.我正在运行某些文件,而我的机器文件是这样的:

pythagoras:2  # this will spawn 2 processes on pythagoras
geomcomp      # this will spawn 1 process on geomcomp

本教程指出:

并运行它(-n旁边的参数指定要在节点之间生成和分布的进程数): mpiu @ ub0:〜$ mpiexec -n 8 -f machinefile ./mpi_hello

使用-n 1和-n 2可以正常运行,但是使用-n 3可以运行,如下所示:

gsamaras@pythagoras:/mirror$ mpiexec -n 1 -f machinefile ./mpi_hello            
Hello from processor 0 of 1
gsamaras@pythagoras:/mirror$ mpiexec -n 2 -f machinefile ./mpi_hello
Hello from processor 0 of 2
Hello from processor 1 of 2
gsamaras@pythagoras:/mirror$ mpiexec -n 3 -f machinefile ./mpi_hello
bash: /usr/bin/hydra_pmi_proxy: No such file or directory
{hungs up}

也许-n旁边的参数指定了计算机数量?我的意思是进程数在机器文件中说明,不是吗?另外,我已经为MPI集群使用了2台机器(希望是这种情况,并且我得到的输出不仅来自主节点(即pythagoras),而且还来自从属节点(即geomcomp)).

Edit_1

好吧,我认为-n旁边的参数实际上指定了进程数,因为在我链接的教程中,它使用4台计算机,并且计算机文件暗示将运行8个进程.那么为什么我们需要在-n旁边的参数呢?不管是什么原因,我仍然无法理解为什么运行-n 3会失败.

Edit_2

在Edit_1之后,它-n 3是合乎逻辑的,因为我的机器文件暗示要生成3个进程.

Edit_3

我认为问题出在尝试在从属节点(即geomcomp)中生成进程时.

Edit_4

pythagoras在Debian 8上运行,而geomcomp在Debian 6上运行.这些机器具有相同的体系结构.问题出在geomcomp上,因为我在那儿尝试了mpiexec -n 1 ./mpi_hello并说没有守护程序在运行.

所以,我在 pythagoras 中得到了

gsamaras@pythagoras:~$ mpichversion
MPICH Version:      3.1
MPICH Release date: Thu Feb 20 11:41:13 CST 2014
MPICH Device:       ch3:nemesis
MPICH configure:    --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --enable-shared --prefix=/usr --enable-fc --disable-rpath --disable-wrapper-rpath --sysconfdir=/etc/mpich --libdir=/usr/lib/x86_64-linux-gnu --includedir=/usr/include/mpich --docdir=/usr/share/doc/mpich --with-hwloc-prefix=system --enable-checkpointing --with-hydra-ckpointlib=blcr
MPICH CC:   gcc -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -g -O2 -fstack-protector-strong -Wformat -Werror=format-security  -O2
MPICH CXX:  g++ -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -g -O2 -fstack-protector-strong -Wformat -Werror=format-security
MPICH F77:  gfortran -g -O2 -fstack-protector-strong -g -O2 -fstack-protector-strong -O2
MPICH FC:   gfortran -g -O2 -fstack-protector-strong -g -O2 -fstack-protector-strong
gsamaras@pythagoras:~$ which mpiexec
/usr/bin/mpiexec
gsamaras@pythagoras:~$ which mpirun
/usr/bin/mpirun

geomcomp 中,我得到了:

gsamaras@geomcomp:~$ mpichversion
-bash: mpichversion: command not found
gsamaras@geomcomp:~$ which mpiexec
/usr/bin/mpiexec
gsamaras@geomcomp:~$ which mpirun
/usr/bin/mpirun

我已经按照教程中的说明安装了MPICH2.我该怎么办?我正在主节点上的/mirror上工作.它安装在从属节点上.

1.这个相关的问题, mpiexec.hydra-如何在hydra_pmi_proxy的位置不同的机器上运行MPI进程?, 与我的有所不同,但在这里也可能如此. 2.该死,我所知道的唯一九头蛇是一个希腊小岛,我想念的是什么? :/

解决方案

我要说的是,您发现了Hydra的真正缺点:应该有某种方法可以告诉它其他节点上的路径是不同的.

毕达哥拉斯上的mpich安装在哪里? mpich安装在geocomp的什么位置?

在最简单的配置中,例如,您将具有一个公共主目录,并且已将mpich安装到$ {HOME}/soft/mpich中.

Hydra可能未在远程计算机上启动登录外壳".如果将MPICH安装路径添加到PATH环境变量中,则必须在.bashrc之类的文件中(或与您的shell等效的文件中)进行此操作.

要对此进行测试,请尝试使用"ssh geocomp mpichversion"和"ssh pythagoras mpichversion"以及普通的"mpichversion".那应该告诉您一些有关如何设置环境的信息.

在您的情况下,您的环境确实很荒唐! debian 8和debian 6,甚至看起来都不是MPICH的同一版本..我想,由于ABI的倡议,MPICH-3.1和更高版本可以与MPICH-3.1一起使用,但是如果您拥有的MPICH版本可以-"MPICH2到MPICH"的转换日期,没有这样的保证.

将ABI放在一边,您将拥有一个期望使用hydra启动器(debian 8版本)的MPICH,以及一个期望使用MPD启动器的MPICH. (debian 6版本)

即使您最近拥有足够的软件包,也可以使用的唯一方法是,如果所有机器上都具有相同的体系结构.正如Ken所指出的那样,ABI并不意味着异构环境的发展.

删除发行版软件包,并在两台计算机上自己构建MPICH.

I am struggling to set up an MPI cluster, following the Setting Up an MPICH2 Cluster in Ubuntu tutorial. I have something running and my machine file is this:

pythagoras:2  # this will spawn 2 processes on pythagoras
geomcomp      # this will spawn 1 process on geomcomp

The tutorial states:

and run it (the parameter next to -n specifies the number of processes to spawn and distribute among nodes): mpiu@ub0:~$ mpiexec -n 8 -f machinefile ./mpi_hello

With -n 1 and -n 2 it runs fine, but with -n 3, it fails, as you can see below:

gsamaras@pythagoras:/mirror$ mpiexec -n 1 -f machinefile ./mpi_hello            
Hello from processor 0 of 1
gsamaras@pythagoras:/mirror$ mpiexec -n 2 -f machinefile ./mpi_hello
Hello from processor 0 of 2
Hello from processor 1 of 2
gsamaras@pythagoras:/mirror$ mpiexec -n 3 -f machinefile ./mpi_hello
bash: /usr/bin/hydra_pmi_proxy: No such file or directory
{hungs up}

Maybe that parameter next to -n specifies the number of machines? I mean the number of processes is stated in the machinefile, isn't it? Also, I have used 2 machines for the MPI cluster (hope this is the case and the output I am getting is not only from the master node (i.e. pythagoras), but also from the slave one (i.e. geomcomp)).

Edit_1

Well I think that the parameter next to -n actually specifies the number of processes, since in the tutorial I linked to, it uses 4 machines and the machine file implies that 8 processes will run. Then why we need that parameter next to -n though? Whatever the reason is, I still can't get why my run fails with -n 3.

Edit_2

Following Edit_1, it -n 3 is logical, since my machinefile implies 3 processes to be spawned.

Edit_3

I think that the problem lies when it tries to spawn a process in the slave node (i.e. geomcomp).

Edit_4

pythagoras runs on Debian 8, while geomcomp runs on Debian 6. The machines are of same architecture. The problem lies in geomcomp, since I tried mpiexec -n 1 ./mpi_hello there and said that no daemon runs.

So, I got, in pythagoras:

gsamaras@pythagoras:~$ mpichversion
MPICH Version:      3.1
MPICH Release date: Thu Feb 20 11:41:13 CST 2014
MPICH Device:       ch3:nemesis
MPICH configure:    --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --enable-shared --prefix=/usr --enable-fc --disable-rpath --disable-wrapper-rpath --sysconfdir=/etc/mpich --libdir=/usr/lib/x86_64-linux-gnu --includedir=/usr/include/mpich --docdir=/usr/share/doc/mpich --with-hwloc-prefix=system --enable-checkpointing --with-hydra-ckpointlib=blcr
MPICH CC:   gcc -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -g -O2 -fstack-protector-strong -Wformat -Werror=format-security  -O2
MPICH CXX:  g++ -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -g -O2 -fstack-protector-strong -Wformat -Werror=format-security
MPICH F77:  gfortran -g -O2 -fstack-protector-strong -g -O2 -fstack-protector-strong -O2
MPICH FC:   gfortran -g -O2 -fstack-protector-strong -g -O2 -fstack-protector-strong
gsamaras@pythagoras:~$ which mpiexec
/usr/bin/mpiexec
gsamaras@pythagoras:~$ which mpirun
/usr/bin/mpirun

where in geomcomp I got:

gsamaras@geomcomp:~$ mpichversion
-bash: mpichversion: command not found
gsamaras@geomcomp:~$ which mpiexec
/usr/bin/mpiexec
gsamaras@geomcomp:~$ which mpirun
/usr/bin/mpirun

I had installed MPICH2, like the tutorial instructed. What should I do? I am working on /mirror at the master node. It is mounted on the slave node.

1. This relevant question, mpiexec.hydra - how to run MPI process on machines where locations of hydra_pmi_proxy are different?, is different from mine, but it might be the case here too. 2. Damn it, the only Hydra I know is a Greek island, what am I missing? :/

解决方案

I'd say you've identified a genuine shortcomming of Hydra: there should be some way to tell it the paths on the other nodes are different.

Where is mpich installed on pythagoras? Where is mpich installed on geocomp?

In the simplest configuration, you would have, for example, a common home directory, and you would have installed mpich into ${HOME}/soft/mpich.

Hydra might not be starting a "login shell" on the remote machine. If you add the MPICH installation path to your PATH environment variable, you'll have to do so in a file like .bashrc (or whatever the equivalent for your shell is).

To test this, try 'ssh geocomp mpichversion' and 'ssh pythagoras mpichversion' and plain old 'mpichversion'. That should tell you something about how your environment is set up.

In your case, your environment is really strage! debian 8 and debian 6 and it looks like not even the same versions of MPICH.. I think, thanks to the ABI initiative, that MPICH-3.1 and newer will work with MPICH-3.1, but if you have a version of MPICH that pre-dates the "MPICH2 to MPICH" conversion, there are no such gaurantees.

And setting ABI aside, you've got an MPICH that expects the hydra launcher (the debian 8 version) and an MPICH that expects the MPD launcher. (the debian 6 version)

And even if you do have recent enough packages, the only way things can work is if you have the same architecture on all machines. ABI, as Ken points out, does not mean suppore for heterogeneous environments.

remove the distro packages and build MPICH yourself on both machines.

这篇关于bash:/usr/bin/hydra_pmi_proxy:没有这样的文件或目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆