MPI:等级1的通信错误:连接被拒绝 [英] MPI : Communication error with rank 1: Connection refused

查看:534
本文介绍了MPI:等级1的通信错误:连接被拒绝的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的mpi程序,如下所示:

I have a simple mpi program as follows :

#include <iostream>
#include <mpi.h>

int main(int argc, char * argv[]) {
    MPI::Init(argc, argv);

    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    int number;
    if (world_rank == 0) {
        number = -1;
        MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
    } else if (world_rank == 1) {
        MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
                 MPI_STATUS_IGNORE);
        printf("Process 1 received number %d from process 0\n",
               number);
    }


    return 0;
}

我指定了一个包含2个条目的主机文件.服务器是我在其上运行mpi的本地计算机,而ubuntu是远程计算机.

I have specified a host file which includes 2 entries. Server is the local machine i am running mpi on and ubuntu is the remote machine.

Server
ubuntu

当我尝试通过运行mpirun -np 2 --hostfile hosts ./test运行mpi可执行文件时,它给我一个错误Communication error with rank 1: Connection refused.但是,如果我颠倒了主机文件中主机的顺序

When i try to run the mpi executable by running mpirun -np 2 --hostfile hosts ./test it gives me an error Communication error with rank 1: Connection refused. However if i reverse the order of hosts in my host file

ubuntu
Server

工作正常.我似乎不明白为什么.主机文件中主机的顺序重要吗?

It works fine. I cannot seem to understand why. Does the order of the hosts in the hosts file matter?

推荐答案

我碰到了与您在运行时错误中遇到的相同问题.

I've happened to face the same issue as you did in run time error.

在确保可以ssh并登录到客户端计算机后,没有密码. MPICH主机无法找到客户端(从机)主机名的主要原因是因为/etc/hosts file中的客户端将本地主机名分配给了客户端用户名和本地主机本身,例如运行sudo vim etc/hosts,您将拥有计算机的主机名列表:

After you've make sure you can ssh and login to your clients machine without passwords. The main reason for the MPICH master can't find the client(slave)'s hostname is because the client in /etc/hosts file, the localhost name is assign both to the client user name and to the localhost itself, e.g. run sudo vim etc/hosts you will have the hosts name list for your machine:

127.0.0.1 clientUsrName
127.0.0.1 localhost

顺序根本不重要.您所要做的就是注释您的clientUsrName错误,该错误将127.0.0.1称为localhost IP.例如:

the order doesn't matter at all. All you have to do is to comment your clientUsrName wrong refered to the 127.0.0.1 as the localhost IP. For example:

# 127.0.0.1 clientUsrName
127.0.0.1 localhost

这篇关于MPI:等级1的通信错误:连接被拒绝的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆