MPI_Publish_name可以用于两个单独启动的应用程序吗? [英] Can MPI_Publish_name be used for two separately started applications?

查看:154
本文介绍了MPI_Publish_name可以用于两个单独启动的应用程序吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个OpenMPI应用程序,它由分别启动的服务器和客户端部分组成:

I write an OpenMPI application which consists of a server and a client part which are launched separately:

me@server1:~> mpirun server

me@server2:~> mpirun client

server使用MPI_Open_port创建端口.问题是:OpenMPI是否具有将端口与client进行通信的机制?我想MPI_Publish_nameMPI_Lookup_name在这里不起作用,因为server不知道该信息应发送到其他计算机.

server creates a port using MPI_Open_port. The question is: Does OpenMPI have a mechanism to communicate the port to client? I suppose that MPI_Publish_name and MPI_Lookup_name doesn't work here because server wouldn't know to which other computer the information should be sent.

在我看来,只有使用单个mpirun启动的进程才能与MPI_Publish_name通信.

To me, it looks like only processes which were started using a single mpirun can communicate with MPI_Publish_name.

我还发现 ompi服务器,但是文档对于我来说太简单了.有人知道如何使用吗?

I also found ompi-server, but the documentation is too minimalistic for me to understand this. Does anyone know how this is used?

相关: MPICH:如何发布名称,以便客户端应用程序可以查找其名称?推荐答案

MPI_Publish_name提供了一个MPI信息对象,该对象可以具有特定于Open MPI的布尔键ompi_global_scope.如果此键设置为true,则该名称将发布到全局范围,即发布到已经运行的ompi-server实例.如果提供了ompi-server的URI,默认情况下MPI_Lookup_name首先会进行全局名称查找.

MPI_Publish_name is supplied with an MPI info object, which could have an Open MPI specific boolean key ompi_global_scope. If this key is set to true, then the name would be published to the global scope, i.e. to an already running instance of ompi-server. MPI_Lookup_name by default first does a global name lookup if the URI of the ompi-server was provided.

该过程涉及几个步骤:

1)在群集中的某个位置启动ompi-server,可以从所有节点对其进行访问.出于调试目的,您可以向其传递--no-daemonize -r +参数.它会启动并将类似于此的URI打印到标准输出:

1) Start the ompi-server somewhere in the cluster where it could be accessed from all nodes. For debugging purposes you may pass it the --no-daemonize -r + argument. It would start and print to the standard output an URI similar to this one:

$ ompi-server --no-daemonize -r +
1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351

2)在服务器中,构建一个MPI信息对象,并将ompi_global_scope键设置为true:

2) In the server, build an MPI info object and set the ompi_global_scope key to true:

MPI_Info info;

MPI_Info_create(&info);
MPI_Info_set(info, "ompi_global_scope", "true");

然后将信息对象传递给MPI_Publish_name:

Then pass the info object to MPI_Publish_name:

MPI_Publish_name("server", info, port_name);

3)在客户端中,对MPI_Lookup_name的调用将首先自动在全局上下文中进行查找(可以通过在MPI info对象中提供适当的键来更改此查找,但是在您的情况下,默认行为就足够了).

3) In the client, the call to MPI_Lookup_name would automatically do the lookup in the global context first (this could be changed by providing the proper key in the MPI info object, but in your case the default behaviour should suffice).

为了使客户端代码和服务器代码都能知道ompi-server的位置,必须使用--ompi-server 1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351选项将其URI赋予两个mpirun命令.

In order for both client and server code to know where the ompi-server is located, you have to give its URI to both mpirun commands with the --ompi-server 1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351 option.

另一种选择是让ompi-server将URI写入文件,然后可以在要运行mpirun的节点上读取该URI.例如,如果您在同时执行两个mpirun命令的同一节点上启动服务器,则可以使用/tmp中的文件.如果在其他节点上启动ompi-server,则可以使用共享文件系统(NFS,Lustre等).无论哪种方式,命令集都是:

Another option is to have ompi-server write the URI to a file, which can then be read on the node(s) where mpirun is to be run. For example, if you start the server on the same node where both mpirun commands are executed, then you could use a file in /tmp. If you start the ompi-server on a different node, then a shared file system (NFS, Lustre, etc.) would do. Either way, the set of commands would be:

$ ompi-server [--no-daemonize] -r file:/path/to/urifile
...
$ mpirun --ompi-server file:/path/to/urifile server
...
$ mpirun --ompi-server file:/path/to/urifile client

无服务器方法

如果在同一个节点上同时运行两个mpirun,则--ompi-server还可以指定已经运行的mpirun实例的PID,以用作名称服务器.它允许您在服务器中使用本地名称发布(即,跳过运行ompi服务器"和创建信息对象"部分).命令的顺序为:

Serverless method

If run both mpirun's on the same node, the --ompi-server could also specify the PID of an already running mpirun instance to be used as a name server. It allows you to use local name publishing in the server (i.e. skip the "run an ompi-server" and "make an info object" parts). The sequence of commands would be:

head-node$ mpirun --report-pid server
[ note the PID of this mpirun instance ]
...
head-node$ mpirun --ompi-server pid:12345 client

其中12345应该用服务器mpirun的实际PID替换.

where 12345 should be replaced by the real PID of the server's mpirun.

您还可以让服务器的mpirun打印其URI并将该URI传递给客户端的mpirun:

You can also have the server's mpirun print its URI and pass that URI to the client's mpirun:

$ mpirun --report-uri + server
[ note the URI ]
...
$ mpirun --ompi-server URI client

如果在--report-uri选项后指定/path/to/file(注意:此处没有file:前缀)而不是+,也可以将URI写入文件:

You could also have the URI written to a file if you specify /path/to/file (note: no file: prefix here) instead of + after the --report-uri option:

$ mpirun --report-uri /path/to/urifile server
...
$ mpirun --ompi-server file:/path/to/urifile client

请注意,由mpirun返回的URI具有与ompi-server相同的格式,即,它包含主机IP地址,因此,如果第二个mpirun在不同的节点上执行,则它也可以工作.能够通过TCP/IP与第一个节点进行通讯(并且/path/to/urifile驻留在共享文件系统上).

Note that the URI returned by mpirun has the same format as that of an ompi-server, i.e. it includes the host IP address, so it also works if the second mpirun is executed on a different node, which is able to talk to the first node via TCP/IP (and /path/to/urifile lives on a shared file system).

我使用Open MPI 1.6.1测试了以上所有内容.某些变体可能不适用于早期版本.

I tested all of the above with Open MPI 1.6.1. Some of the variant might not work with earlier versions.

这篇关于MPI_Publish_name可以用于两个单独启动的应用程序吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆