MPI_Publish_name可以用于两个单独启动的应用程序吗? [英] Can MPI_Publish_name be used for two separately started applications?
问题描述
我编写了一个OpenMPI应用程序,它由分别启动的服务器和客户端部分组成:
I write an OpenMPI application which consists of a server and a client part which are launched separately:
me@server1:~> mpirun server
和
me@server2:~> mpirun client
server
使用MPI_Open_port
创建端口.问题是:OpenMPI是否具有将端口与client
进行通信的机制?我想MPI_Publish_name
和MPI_Lookup_name
在这里不起作用,因为server
不知道该信息应发送到其他计算机.
server
creates a port using MPI_Open_port
. The question is: Does OpenMPI have a mechanism to communicate the port to client
? I suppose that MPI_Publish_name
and MPI_Lookup_name
doesn't work here because server
wouldn't know to which other computer the information should be sent.
在我看来,只有使用单个mpirun
启动的进程才能与MPI_Publish_name
通信.
To me, it looks like only processes which were started using a single mpirun
can communicate with MPI_Publish_name
.
我还发现 ompi服务器,但是文档对于我来说太简单了.有人知道如何使用吗?
I also found ompi-server, but the documentation is too minimalistic for me to understand this. Does anyone know how this is used?
相关: MPICH:如何发布名称,以便客户端应用程序可以查找其名称?和推荐答案
该过程涉及几个步骤: 1)在群集中的某个位置启动 1) Start the 2)在服务器中,构建一个MPI信息对象,并将 2) In the server, build an MPI info object and set the 然后将信息对象传递给 Then pass the info object to 3)在客户端中,对 3) In the client, the call to 为了使客户端代码和服务器代码都能知道 In order for both client and server code to know where the 另一种选择是让 Another option is to have
如果在同一个节点上同时运行两个
If run both 其中 where 您还可以让服务器的 You can also have the server's 如果在 You could also have the URI written to a file if you specify 请注意,由 Note that the URI returned by 我使用Open MPI 1.6.1测试了以上所有内容.某些变体可能不适用于早期版本. I tested all of the above with Open MPI 1.6.1. Some of the variant might not work with earlier versions. 这篇关于MPI_Publish_name可以用于两个单独启动的应用程序吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!MPI_Publish_name
提供了一个MPI信息对象,该对象可以具有特定于Open MPI的布尔键ompi_global_scope
.如果此键设置为true,则该名称将发布到全局范围,即发布到已经运行的ompi-server
实例.如果提供了ompi-server
的URI,默认情况下MPI_Lookup_name
首先会进行全局名称查找.MPI_Publish_name
is supplied with an MPI info object, which could have an Open MPI specific boolean key ompi_global_scope
. If this key is set to true, then the name would be published to the global scope, i.e. to an already running instance of ompi-server
. MPI_Lookup_name
by default first does a global name lookup if the URI of the ompi-server
was provided.ompi-server
,可以从所有节点对其进行访问.出于调试目的,您可以向其传递--no-daemonize -r +
参数.它会启动并将类似于此的URI打印到标准输出:ompi-server
somewhere in the cluster where it could be accessed from all nodes. For debugging purposes you may pass it the --no-daemonize -r +
argument. It would start and print to the standard output an URI similar to this one:$ ompi-server --no-daemonize -r +
1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351
ompi_global_scope
键设置为true:ompi_global_scope
key to true:MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "ompi_global_scope", "true");
MPI_Publish_name
:MPI_Publish_name
:MPI_Publish_name("server", info, port_name);
MPI_Lookup_name
的调用将首先自动在全局上下文中进行查找(可以通过在MPI info对象中提供适当的键来更改此查找,但是在您的情况下,默认行为就足够了).MPI_Lookup_name
would automatically do the lookup in the global context first (this could be changed by providing the proper key in the MPI info object, but in your case the default behaviour should suffice).ompi-server
的位置,必须使用--ompi-server 1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351
选项将其URI赋予两个mpirun
命令.ompi-server
is located, you have to give its URI to both mpirun
commands with the --ompi-server 1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351
option.ompi-server
将URI写入文件,然后可以在要运行mpirun
的节点上读取该URI.例如,如果您在同时执行两个mpirun
命令的同一节点上启动服务器,则可以使用/tmp
中的文件.如果在其他节点上启动ompi-server
,则可以使用共享文件系统(NFS,Lustre等).无论哪种方式,命令集都是:ompi-server
write the URI to a file, which can then be read on the node(s) where mpirun
is to be run. For example, if you start the server on the same node where both mpirun
commands are executed, then you could use a file in /tmp
. If you start the ompi-server
on a different node, then a shared file system (NFS, Lustre, etc.) would do. Either way, the set of commands would be:$ ompi-server [--no-daemonize] -r file:/path/to/urifile
...
$ mpirun --ompi-server file:/path/to/urifile server
...
$ mpirun --ompi-server file:/path/to/urifile client
无服务器方法
mpirun
,则--ompi-server
还可以指定已经运行的mpirun
实例的PID,以用作名称服务器.它允许您在服务器中使用本地名称发布(即,跳过运行ompi服务器"和创建信息对象"部分).命令的顺序为:Serverless method
mpirun
's on the same node, the --ompi-server
could also specify the PID of an already running mpirun
instance to be used as a name server. It allows you to use local name publishing in the server (i.e. skip the "run an ompi-server" and "make an info object" parts). The sequence of commands would be:head-node$ mpirun --report-pid server
[ note the PID of this mpirun instance ]
...
head-node$ mpirun --ompi-server pid:12345 client
12345
应该用服务器mpirun
的实际PID替换.12345
should be replaced by the real PID of the server's mpirun
.mpirun
打印其URI并将该URI传递给客户端的mpirun
:mpirun
print its URI and pass that URI to the client's mpirun
:$ mpirun --report-uri + server
[ note the URI ]
...
$ mpirun --ompi-server URI client
--report-uri
选项后指定/path/to/file
(注意:此处没有file:
前缀)而不是+
,也可以将URI写入文件:/path/to/file
(note: no file:
prefix here) instead of +
after the --report-uri
option:$ mpirun --report-uri /path/to/urifile server
...
$ mpirun --ompi-server file:/path/to/urifile client
mpirun
返回的URI具有与ompi-server
相同的格式,即,它包含主机IP地址,因此,如果第二个mpirun
在不同的节点上执行,则它也可以工作.能够通过TCP/IP与第一个节点进行通讯(并且/path/to/urifile
驻留在共享文件系统上).mpirun
has the same format as that of an ompi-server
, i.e. it includes the host IP address, so it also works if the second mpirun
is executed on a different node, which is able to talk to the first node via TCP/IP (and /path/to/urifile
lives on a shared file system).