使用Rmpi初始化MPI集群 [英] Initialize MPI cluster using Rmpi

查看:135
本文介绍了使用Rmpi初始化MPI集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我尝试利用部门群集在R中进行并行计算.群集系统由SGE进行管理. OpenMPI已安装并通过了安装测试.

Recently I try to make use of the department cluster to do parallel computing in R. The cluster system is manged by SGE. OpenMPI has been installed and passed the installation test.

我通过qsub命令将查询提交到集群.在脚本中,我通过以下命令指定要使用的节点数.
#PBS -l nodes=2:ppn=24 (two nodes with 24 threads each)
然后,mpirun -np 1 R --slave -f test.R
之后,我已经检查了$PBS_NODEFILE.我希望分配两个节点.我可以找到两个节点的名称node1, node2,每个节点出现24次.

I submit my query to the cluster via qsub command. In the script, I specify the number of node I want to use via the following command.
#PBS -l nodes=2:ppn=24 (two nodes with 24 threads each)
Then, mpirun -np 1 R --slave -f test.R
I have checked $PBS_NODEFILE afterwards. Two nodes are allocated as I wish. I could find two nodes' names node1, node2 and each of them appears 24 times.

test.R`的内容如下所示.

The content of ``test.R` is listed as follows.

library(Rmpi)
library(snow)

cl <- makeCluster(41,type="MPI")
clusterCall(cl, function() Sys.info()[c("nodename","machine")])
stopCluster(cl)
mpi.quit()

clusterCall()的输出非常令人失望.只有一个节点的名称node1出现41次.这绝对是错误的,因为node1上只有24个线程.看来我的R脚本只能从其中找到一个节点或什至一个线程.我只是想知道构造MPI集群的正确方法是什么?

The output of clusterCall() is quite disappointing. There is only one node's name node1 which appears 41 times. This is definitely wrong since there are only 24 threads on node1. It seems that my R script only finds one node or even one thread out of it. I just wonder what is the right way to construct a MPI cluster?

推荐答案

首先,即使安装了SGE,您的群集也绝对不受SGE管理. SGE无法理解作业文件中的#PBS前哨,并且不会导出PBS_NODEFILE环境变量(SGE导出的大多数环境变量都以SGE_开头).由于指定的并行环境控制分配的节点之间的插槽分配,因此它也不接受nodes=2:ppn=24资源请求.您所拥有的是PBS Pro或Torque.但是SGE将命令行实用程序命名为相同,并且qsub或多或少采用相同的参数,这可能就是为什么您认为您拥有的是SGE.

First of all, your cluster is definitely not managed by SGE even if the latter is installed. SGE doesn't understand the #PBS sentinel in the job files and it doesn't export the PBS_NODEFILE environment variable (most environment variables that SGE exports start with SGE_). It also won't accept the nodes=2:ppn=24 resource request as the distribution of the slots among the allocated nodes is controlled by the specified parallel environment. What you have is either PBS Pro or Torque. But SGE names the command line utilities the same and qsub takes more or less the same arguments, which probably is why you think it is SGE that you have.

如果Open MPI无法从环境中正确获取节点列表,例如,通常会发生您描述的问题.如果未在PBS Pro/Torque的支持下进行编译.在这种情况下,它将在执行mpirun的节点上启动所有MPI进程.通过运行以下命令检查是否已编译正确的RAS模块:

The problem you describe usually occurs if Open MPI is not able to properly obtain the node list from the environment, e.g. if it wasn't compiled with support for PBS Pro/Torque. In that case, it will start all MPI processes on the node on which mpirun was executed. Check that the proper RAS module was compiled by running:

ompi_info | grep ras

它应该列出各种RAS模块,并且其中一个应该叫做tm:

It should list the various RAS modules and among them should be one called tm:

...
MCA ras: tm (MCA v2.0, API v2.0, Component v1.6.5)
...

如果未列出tm模块,则Open MPI将不会自动获取节点列表,并且必须明确指定主机文件:

If the tm module is not listed, then Open MPI will not automatically obtain the node list and the hostfile must be explicitly specified:

mpiexec ... -machinefile $PBS_NODEFILE ...

在PBS Pro/Torque下,Open MPI还需要tm PLM模块.缺少该模块将阻止Open MPI使用TM API远程启动第二个节点上的进程,因此将退回到使用SSH.在这种情况下,您应确保无密码SSH登录,例如从每个群集节点到另一个节点,都可以使用公钥身份验证.

Under PBS Pro/Torque, Open MPI also needs the tm PLM module. The lack of that module will prevent Open MPI from using the TM API to remotely launch the processes on the second node and it will therefore fall back to using SSH. In such case, you should make sure that passwordless SSH login, e.g. one using public key authentication, is possible from each cluster node into each other node.

解决问题的第一步是检查是否存在正确的模块,如上所述.如果有模块,则应在mpiexec下启动hostname并检查其是否有效,例如:

Your first step in solving the issue is to check for the presence of the correct modules as shown above. If the modules are there, you should launch hostname under mpiexec and check if that works, e.g.:

#PBS -l nodes=2:ppn=24

echo "Allocated nodes:"
cat $PBS_NODEFILE
echo "MPI nodes:"
mpiexec --mca ras_base_display_alloc 1 hostname

然后比较两个列表,并检查ALLOCATED NODES块.列表应大致相等,并且两个节点都应显示在已分配的节点表中,每个节点具有24个插槽(参见Num slots).如果第二个列表仅包含一个主机名,则Open MPI无法正确获取该主机文件,因为某些事情阻止tm模块(假定它们确实存在)被初始化或选择.这可以是系统范围的Open MPI配置,也可以是其他具有更高优先级的RAS模块.将--mca ras_base_verbose 10传递给mpiexec有助于确定是否是这种情况.

then compare the two lists and also examine the ALLOCATED NODES block. The lists should be more or less equal and both nodes should be shown in the allocated nodes table with 24 slots per node (cf. Num slots). If the second list contains only one hostname, then Open MPI is not able to properly obtain the hostfile because something is preventing the tm modules (given that they do exist) from initialising or being selected. This could either be the system-wide Open MPI configuration or some other RAS module having higher priority. Passing --mca ras_base_verbose 10 to mpiexec helps in determining if that is the case.

这篇关于使用Rmpi初始化MPI集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆