将MPI进程映射到特定节点 [英] Mapping MPI processes to particular nodes

查看:486
本文介绍了将MPI进程映射到特定节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为这个问题是不切实际的.但是不能帮助自己. 假设我有一个包含100个节点的集群,每个节点有16个核心. 我有一个mpi应用程序,其通信模式已经知道,我也知道集群拓扑(即节点之间的跳距离). 现在,我知道了到节点映射的过程,这些过程减少了网络上的争用.例如:进程到节点的映射是10-> 20,30-> 90. 如何将等级为10的进程映射到节点20? 请帮助我.

I think this question is irrelavant to ask here. But could n't help myself. Suppose I have a cluster with 100 nodes with each node having 16 cores. I have an mpi application whose communication pattern is already known and I also know the cluster topology(i.e hop distance between nodes). Now I know the processes to node mapping that reduces the contention on the network. For example: process to node mappings are 10->20,30->90. How do I map the process with rank 10 to the node-20? Please help me in this.

推荐答案

如果您不受任何类型的排队系统的限制,则可以通过创建自己的machinefile来控制等级到节点的映射.

If you are not constrained with any kind of a queueing system you can control the rank to node mapping by creating your own machinefile.

例如,文件my_machine_file具有以下1600行

For instance if the file my_machine_file has the following 1600 lines

   node001
   node002
   node003
   ....
   node100
   node001
   node002
   node003
   ....
   node100
   ...
   [repeat 13 more times]
   ...
   node001
   node002
   node003
   ....
   node100

它将对应于映射

  0-> node001, 1 -> node002, ... 99 -> node100, 100 -> node001, ...

您应该使用以下程序运行应用程序

you should run your application with

  mpirun -machinefile my_machine_file -n 1600 my_app

当您的应用程序需要少于1600个进程时,您可以相应地编辑计算机文件.

When your application needs less than 1600 processes you can edit your machinefile accordingly.

请记住,尽管集群管理员可能已经按照互连拓扑对节点进行了编号.但是,有报告称,通过仔细利用群集拓扑可以显着提高性能(10%-20%的数量级). (参考文献如下).

Please remember though that the cluster admin has probably numbered the nodes respecting the topology of the interconnect. Yet there are reports of sensible increase (order of 10%-20%) in performance through careful exploitation of the cluster topology. (References to follow).

注意:使用mpirun启动MPI程序既不是标准化的,也不是可移植的.但是,这里的问题显然与特定的计算群集和特定的实现(OpenMPI)有关,并且不需要可移植的解决方案.

Note: Starting an MPI program with mpirun is neither standardized nor portable. However here the question is clearly related to a specific compute cluster and a specific implementation (OpenMPI) and does not request a portable solution.

这篇关于将MPI进程映射到特定节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆