OpenMPI:简单的 2 节点设置 [英] OpenMPI: Simple 2-Node Setup

查看:109
本文介绍了OpenMPI:简单的 2 节点设置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在仅使用两个节点运行 OpenMPI 程序时遇到问题(其中一个节点是执行 mpiexec 命令的同一台机器,另一个节点是单独的机器).

我将调用运行 mpiexec 的机器、ma​​ster 和另一个节点 slave.

ma​​sterslave 上,我都在 ~/mpi

下的主目录中安装了 OpemMPI

我在 ma​​ster 上有一个名为 ~/machines.txt 的文件.

理想情况下,~/machines.txt 应包含:

<块引用>

大师
奴隶

但是,当我在 ma​​ster 上运行以下命令时:

<块引用>

mpiexec -n 2 --hostfile ~/machines.txt 主机名

输出,我收到以下错误:

<块引用>

bash: orted: 命令未找到

但是如果 ~/maschines.txt 只包含运行命令的节点的名称,它就可以工作.~/machines.txt:

<块引用>

大师

命令:

<块引用>

mpiexec -n 2 --hostfile ~/machines.txt 主机名

输出:

<块引用>

大师
大师

我已经尝试在 slave 上运行相同的命令,并将 machine.txt 文件更改为仅包含 slave,并且它也有效.我已经确保我的 .bashrc 文件包含 OpenMPI 的正确路径.

我做错了什么?简而言之,只有当我尝试在远程机器上执行程序时才会出现问题,但是我可以在执行命令的机器上完美地运行 mpiexec.这让我相信这不是路径问题.我是否缺少连接两台机器的步骤?我有从主到从的无密码 ssh 登录功能.

解决方案

此错误消息意味着您没有在远程机器上安装 Open MPI,或者您没有在远程机器上正确设置非- 交互式登录(即,无法在远程机器上找到 Open MPI 的安装).orted"是 Open MPI 用来在远程节点上启动进程的辅助可执行文件之一——所以如果没有找到orted",那么它甚至不会尝试在远程节点上启动主机名"节点.

请注意,您的 shell 启动文件(例如,在您的 .bashrc 中)中的交互式登录和非交互式登录之间可能存在差异.

另请注意,将 Open MPI 安装在所有节点上的相同路径位置要简单得多——这样,在远程节点上执行时,上述前缀方法将自动添加正确的 PATH 和 LD_LIBRARY_PATH,并且您不必处理 shell 启动文件.

请注意,主要的 Open MPI 网站上有很多关于这些类型主题的常见问题解答.

I'm having trouble running an OpenMPI program using only two nodes (one of the nodes is the same machine that is executing the mpiexec command and the other node is a separate machine).

I'll call the machine that is running mpiexec, master, and the other node slave.

On both master and slave, I've installed OpemMPI in my home directory under ~/mpi

I have a file called ~/machines.txt on master.

Ideally, ~/machines.txt should contain:

master
slave

However, when I run the following on master:

mpiexec -n 2 --hostfile ~/machines.txt hostname

OUTPUT, I get the following error:

bash: orted: command not found

But if ~/maschines.txt only contains the name of the node that the command is running on, it works. ~/machines.txt:

master

Command:

mpiexec -n 2 --hostfile ~/machines.txt hostname

OUTPUT:

master
master

I've tried running the same command on slave, and changed the machines.txt file to contain only slave, and it worked too. I've made sure that my .bashrc file contains the proper paths for OpenMPI.

What am I doing wrong? In short, there is only a problem when I try to execute a program on a remote machine, but I can run mpiexec perfectly fine on the machine that is executing the command. This makes me believe that it's not a path issue. Am I missing a step in connecting both machines? I have passwordless ssh login capability from master to slave.

解决方案

This error message means that you either do not have Open MPI installed on the remote machine, or you do not have your PATH set properly on the remote machine for non-interactive logins (i.e., such that it can't find the installation of Open MPI on the remote machine). "orted" is one of the helper executables that Open MPI uses to launch processes on remote nodes -- so if "orted" was not found, then it didn't even get to the point of trying to launch "hostname" on the remote node.

Note that there might be a difference between interactive and non-interactive logins in your shell startup files (e.g., in your .bashrc).

Also note that it is considerably simpler to have Open MPI installed in the same path location on all nodes -- in that way, the prefix method described above will automatically add the right PATH and LD_LIBRARY_PATH when executing on the remote nodes, and you don't have to muck with your shell startup files.

Note that there are a bunch of FAQ items about these kinds of topics on the main Open MPI web site.

这篇关于OpenMPI:简单的 2 节点设置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆