将MPI进程映射到特定节点 [英] Mapping MPI processes to particular nodes
问题描述
我认为这个问题是不切实际的.但是不能帮助自己. 假设我有一个包含100个节点的集群,每个节点有16个核心. 我有一个mpi应用程序,其通信模式已经知道,我也知道集群拓扑(即节点之间的跳距离). 现在,我知道了到节点映射的过程,这些过程减少了网络上的争用.例如:进程到节点的映射是10-> 20,30-> 90. 如何将等级为10的进程映射到节点20? 请帮助我.
I think this question is irrelavant to ask here. But could n't help myself. Suppose I have a cluster with 100 nodes with each node having 16 cores. I have an mpi application whose communication pattern is already known and I also know the cluster topology(i.e hop distance between nodes). Now I know the processes to node mapping that reduces the contention on the network. For example: process to node mappings are 10->20,30->90. How do I map the process with rank 10 to the node-20? Please help me in this.
推荐答案
如果您不受任何类型的排队系统的限制,则可以通过创建自己的machinefile
来控制等级到节点的映射.
If you are not constrained with any kind of a queueing system you can control the rank to node mapping by creating your own machinefile
.
例如,文件my_machine_file
具有以下1600行
For instance if the file my_machine_file
has the following 1600 lines
node001
node002
node003
....
node100
node001
node002
node003
....
node100
...
[repeat 13 more times]
...
node001
node002
node003
....
node100
它将对应于映射
0-> node001, 1 -> node002, ... 99 -> node100, 100 -> node001, ...
您应该使用以下程序运行应用程序
you should run your application with
mpirun -machinefile my_machine_file -n 1600 my_app
当您的应用程序需要少于1600个进程时,您可以相应地编辑计算机文件.
When your application needs less than 1600 processes you can edit your machinefile accordingly.
请记住,尽管集群管理员可能已经按照互连拓扑对节点进行了编号.但是,有报告称,通过仔细利用群集拓扑可以显着提高性能(10%-20%的数量级). (参考文献如下).
Please remember though that the cluster admin has probably numbered the nodes respecting the topology of the interconnect. Yet there are reports of sensible increase (order of 10%-20%) in performance through careful exploitation of the cluster topology. (References to follow).
注意:使用mpirun
启动MPI程序既不是标准化的,也不是可移植的.但是,这里的问题显然与特定的计算群集和特定的实现(OpenMPI)有关,并且不需要可移植的解决方案.
Note: Starting an MPI program with mpirun
is neither standardized nor portable. However here the question is clearly related to a specific compute cluster and a specific implementation (OpenMPI) and does not request a portable solution.
这篇关于将MPI进程映射到特定节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!