在服务器群集上运行openMPI期间无限等待? [英] infinite wait during openMPI run on a cluster of servers?

查看:135
本文介绍了在服务器群集上运行openMPI期间无限等待?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经成功在服务器和计算机之间设置了密码less ssh. 有一个简单的openMPI程序,可以在单台计算机上很好地运行. 但是,不幸的是,当我在集群上尝试此操作时,我既没有收到密码提示(因为我已经设置了ssh授权),也没有执行进展.

I have successfully set up the password less ssh between the servers and my computer. There is a simple openMPI program which is running well on the single computer. But ,unfortunately when i am trying this on a cluster ,neither i am getting a password prompt(as i have set up ssh authorization) nor the execution is moving forward.

主机文件如下所示,

# The Hostfile for Open MPI

# The master node, 'slots=8' is used because it has 8 cores
  localhost slots=8
# The following slave nodes are single processor machines:
  gautam@pcys13.grm.polymtl.ca slots=8 
  gautam@srvgrm04 slots=160

我正在集群上运行hello world MPI程序,

I am running hello world MPI program on the cluster,

int main(int argc, char *argv[]) {
  int numprocs, rank, namelen;
  char processor_name[MPI_MAX_PROCESSOR_NAME]; 
  double t;
  MPI_Init(&argc, &argv);
  t=MPI_Wtime();    
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name(processor_name, &namelen);

  printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
  MPI_Finalize();
}

我正在这样运行mpirun -np 16 --hostfile hostfile ./hello

使用-d选项时, 日志是这样的,

when using -d option, the log is like this,

[gautam@pcys33:~/LTE/check ]% mpirun -np 16 --hostfile hostfile -d ./hello
[pcys33.grm.polymtl.ca:02686] procdir: /tmp/openmpi-sessions-gautam@pcys33.grm.polymtl.ca_0/60067/0/0
[pcys33.grm.polymtl.ca:02686] jobdir: /tmp/openmpi-sessions-gautam@pcys33.grm.polymtl.ca_0/60067/0
[pcys33.grm.polymtl.ca:02686] top: openmpi-sessions-gautam@pcys33.grm.polymtl.ca_0
[pcys33.grm.polymtl.ca:02686] tmp: /tmp
[srvgrm04:77812] procdir: /tmp/openmpi-sessions-gautam@srvgrm04_0/60067/0/1
[srvgrm04:77812] jobdir: /tmp/openmpi-sessions-gautam@srvgrm04_0/60067/0
[srvgrm04:77812] top: openmpi-sessions-gautam@srvgrm04_0
[srvgrm04:77812] tmp: /tmp

您可以从日志中进行推断吗?

can you make a inference from the logs ?

推荐答案

您只需要禁用每台计算机的防火墙

You just need to disable the firewall of each machine

这篇关于在服务器群集上运行openMPI期间无限等待?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆