MPICH示例cpi在多个新安装的vps上运行时会生成错误 [英] MPICH example cpi generates error when it runs on multiple fresh installed vps

查看：195 发布时间：2020/5/12 20:02:59 mpi

本文介绍了MPICH示例cpi在多个新安装的vps上运行时会生成错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我刚刚开始学习有关mpi的知识，因此我购买了3个vps来创建实验环境.我成功安装并配置了ssh和mpich.这三个节点无需密码即可相互ssh(但不能本身).并且cpi示例在本地计算机上没有任何麻烦地通过了.当我尝试在所有3个节点上运行它时，cpi程序始终存在错误 Fatal error in PMPI_Reduce: Unknown error class, error stack:. 这是完整的说明，我做了什么，错误说了什么.

I just begin to learn something about mpi, So I bought 3 vps to create a experiment enviornment. I successfully installed and configed the ssh and mpich. The three nodes could ssh each other (but not itself) without password. And the cpi example passed without any ptoblem on local machine. When I tried to run it on all the 3 nodes, the cpi program always exist with error Fatal error in PMPI_Reduce: Unknown error class, error stack:. Here is the full description what i did and what the error said.

[root@fire examples]# mpiexec -f ~/mpi/machinefile  -n 6 ./cpi
Process 3 of 6 is on mpi0
Process 0 of 6 is on mpi0
Process 1 of 6 is on mpi1
Process 2 of 6 is on mpi2
Process 4 of 6 is on mpi1
Process 5 of 6 is on mpi2
Fatal error in PMPI_Reduce: Unknown error class, error stack:
PMPI_Reduce(1263)...............: MPI_Reduce(sbuf=0x7fff1c18c440, rbuf=0x7fff1c18c448, count=1, MPI_DOUBLE, MPI_SUM, root=0, MPI_COMM_WORLD) failed
MPIR_Reduce_impl(1075)..........:
MPIR_Reduce_intra(826)..........:
MPIR_Reduce_impl(1075)..........:
MPIR_Reduce_intra(881)..........:
MPIR_Reduce_binomial(188).......:
MPIDI_CH3U_Recvq_FDU_or_AEP(636): Communication error with rank 1
MPIR_Reduce_binomial(188).......:
MPIDI_CH3U_Recvq_FDU_or_AEP(636): Communication error with rank 2
MPIR_Reduce_intra(846)..........:
MPIR_Reduce_impl(1075)..........:
MPIR_Reduce_intra(881)..........:
MPIR_Reduce_binomial(250).......: Failure during collective

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 1563 RUNNING AT mpi0
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:2@mpi2] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:2@mpi2] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:2@mpi2] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:1@mpi1] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:885): assert (!closed) failed
[proxy:0:1@mpi1] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1@mpi1] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@mpi0] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@mpi0] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@mpi0] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec@mpi0] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion

我不知道发生了什么，有些见识? 就像评论所暗示的那样，这是mpi cpi代码.

I just have no clue what happened, some insights? As the comment suggests, here is the mpi cpi code.

#include "mpi.h"
#include <stdio.h>
#include <math.h>

double f(double);

double f(double a)
{
    return (4.0 / (1.0 + a*a));
}

int main(int argc,char *argv[])
{
    int    n, myid, numprocs, i;
    double PI25DT = 3.141592653589793238462643;
    double mypi, pi, h, sum, x;
    double startwtime = 0.0, endwtime;
    int    namelen;
    char   processor_name[MPI_MAX_PROCESSOR_NAME];

    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);
    MPI_Get_processor_name(processor_name,&namelen);

    fprintf(stdout,"Process %d of %d is on %s\n",
    myid, numprocs, processor_name);
    fflush(stdout);

    n = 10000;          /* default # of rectangles */
    if (myid == 0)
    startwtime = MPI_Wtime();

    MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);

    h   = 1.0 / (double) n;
    sum = 0.0;
    /* A slightly better approach starts from large i and works back */
    for (i = myid + 1; i <= n; i += numprocs)
    {
        x = h * ((double)i - 0.5);
        sum += f(x);
    }
    mypi = h * sum;

    MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

    if (myid == 0) {
        endwtime = MPI_Wtime();
        printf("pi is approximately %.16f, Error is %.16f\n",
               pi, fabs(pi - PI25DT));
        printf("wall clock time = %f\n", endwtime-startwtime);         
        fflush(stdout);
    }

    MPI_Finalize();
    return 0;
}

MPICH示例cpi在多个新安装的vps上运行时会生成错误 [英] MPICH example cpi generates error when it runs on multiple fresh installed vps

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

MPICH示例cpi在多个新安装的vps上运行时会生成错误 [英] MPICH example cpi generates error when it runs on multiple fresh installed vps

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭