是否可以在本地计算机和远程群集上运行OpenMPI? [英] Is it possible to run OpenMPI on a local computer AND a remote cluster?

查看:164
本文介绍了是否可以在本地计算机和远程群集上运行OpenMPI?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组需要在集群中执行的计算操作(例如512个MPI进程).现在,我在群集上的根节点打开了一个套接字,并在计算操作之间将数据传输到本地计算机,但是我想知道是否可以仅创建两个MPI组,而这些组之一是我的本地计算机.机器和另一个远程群集,并使用MPI命令在它们之间发送数据.

I have a set of computational operations that need to be performed a cluster (maybe like 512 MPI processes). Right now, I have the root node on the cluster open a socket and transfer data to my local computer in between the compute operations, but I'm wondering if it's possible to just create two MPI groups, and one of those groups is my local machine, and the other the remote cluster, and to send data between them using MPI commands.

这可能吗?

推荐答案

是的,只要集群节点和您的计算机之间存在网络路径,就可以. MPI标准提供了执行此操作的抽象机制,而Open MPI提供了一种使事情正常工作的非常简单的方法.您必须查看标准的流程创建和管理部分(MPI-2.2的第10章),尤其是建立通信小节(MPI- 2.2).基本上,这些步骤是:

Yes, it is possible, as long as there is a network path between the cluster node and your machine. The MPI standard provides the abstract mechanisms to do it, while Open MPI provides a really simple way to make the things work. You have to look into the Process Creation and Management section of the standard (Chapter 10 of MPI-2.2), and specifically into the Establishing Communication subsection (§10.4 of MPI-2.2). Basically the steps are:

  1. 您分别启动两个MPI作业.显然,这就是您要做的,所以这里没有新内容.
  2. 其中一个作业使用MPI_Open_port()创建网络端口.该MPI调用返回唯一的端口名,然后必须使用MPI_Publish_name()将其发布为众所周知的服务名.打开端口后,可以通过调用阻塞例程MPI_Comm_accept()将其用于接受客户端连接.现在,该作业已成为服务器作业.
  3. 另一个MPI作业(称为客户端作业)首先使用MPI_Lookup_name()从服务名称中解析端口名称.一旦有了端口名,它就可以调用MPI_Comm_connect()以便连接到远程服务器.
  4. 一旦MPI_Comm_connect()与相应的MPI_Comm_accept()配对,两个作业将在它们之间建立一个互连器,然后可以来回发送消息.
  1. You start both MPI jobs separately. This is obviously what you do, so nothing new here.
  2. One of the jobs creates a network port using MPI_Open_port(). This MPI call returns a unique port name that then has to be published as a well-known service name using MPI_Publish_name(). Once the port is opened, it can be used to accept client connections by calling the blocking routine MPI_Comm_accept(). The job has now become the server job.
  3. The other MPI job, referred to as the client job, first resolves the port name from the service name using MPI_Lookup_name(). Once it has the port name, it can call MPI_Comm_connect() in order to connect to the remote server.
  4. Once MPI_Comm_connect() is paired with the respective MPI_Comm_accept(), both jobs will establish an intercommunicator between them and messages could then be sent back and forth.

一个复杂的细节是,在给定服务名称的情况下,客户端作业如何查找端口名称?这是Open MPI的文档较少的部分,但它很简单:您必须提供用于启动客户端作业的mpiexec命令以及服务器作业mpiexec的URI,用作服务器的mpiexec命令.某种目录服务.为此,您应该使用--report-uri -参数启动服务器作业,以使其将URI打印到标准输出中:

One intricate detail is how the client job could look up the port name given the service name? This is a less documented part of Open MPI, but it is quite easy: you have to provide the mpiexec command that you use to start the client job with the URI of the mpiexec of the server job, which acts as a sort of directory service. To do that, you should launch the server job with the --report-uri - argument to make it print its URI to the standard output:

$ mpiexec --report-uri - <other arguments like -np> ./server ...

它将为您提供格式为1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351的长URI.现在,您必须使用--ompi-server uri选项将此URI提供给客户端mpiexec:

It will give you a long URI of the form 1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351. Now you have to supply this URI to the client mpiexec with the --ompi-server uri option:

$ mpiexec --ompi-server 1221656576.0;tcp://10.1.13.164:36351... ./client ...

请注意,URI包含启动服务器mpiexec的节点上存在的所有已配置和已启用的网络接口的地址.您应确保客户能够联系其中至少一位.另外,请确保已启用的BTL组件列表中包含TCP BTL组件,否则将不会有任何消息流.通常默认情况下启用TCP BTL,但在某些InfiniBand安装中,可以通过设置环境变量OMPI_MCA_btl的相应值或在默认的Open MPI MCA配置文件中将其显式禁用.可以使用--mca选项覆盖MCA参数,例如:

Note that the URI contains the addresses of all configured and enabled network interfaces that are present at the node, where the server's mpiexec is started. You should ensure that the client is able to reach at least one of them. Also ensure that you have the TCP BTL component in the list of enabled BTL components, otherwise no messages could flow. The TCP BTL is usually enabled by default, but on some InfiniBand installations it is explicitly disabled, either by setting the corresponding value of the environment variable OMPI_MCA_btl or in the default Open MPI MCA configuration file. The MCA parameters can be overridden with --mca option, for example:

$ mpiexec --mca btl self,sm,openib,tcp --report-uri - ...

另请参阅答案(我对类似问题的回答).

Also see the answer that I gave to a similar question.

这篇关于是否可以在本地计算机和远程群集上运行OpenMPI?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆