为什么此MPI代码执行不正确? [英] Why does this MPI code execute out of order?

查看:102
本文介绍了为什么此MPI代码执行不正确?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个你好,世界!" (Open)MPI中的应用程序,这样每个进程将按顺序打印出来.

I'm trying to create a "Hello, world!" application in (Open)MPI such that each process will print out in order.

我的想法是让第一个进程完成时向第二个进程发送消息,然后第二个进程向第三个进程发送消息,等等:

My idea was to have the first process send a message to the second when it's finished, then the second to the third, etc.:

#include <mpi.h>
#include <stdio.h>

int main(int argc,char **argv) {

    int rank, size;

    MPI_Init(&argc, &argv);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    // See: http://mpitutorial.com/mpi-send-and-receive/
    if (rank == 0) {
        // This is the first process.
        // Print out immediately.
        printf("Hello, World! I am rank %d of %d.\n", rank, size);
        MPI_Send(&rank, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
    } else {
        // Wait until the previous one finishes.
        int receivedData;
        MPI_Recv(&receivedData, 1, MPI_INT, rank - 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
        printf("Hello, World! I am rank %d of %d (message: %d).\n", rank, size, receivedData);
        if (rank + 1 < size) {
            // We're not the last one. Send out a message.
            MPI_Send(&rank, 1, MPI_INT, rank + 1, 0, MPI_COMM_WORLD);
        } else {
            printf("Hello world completed!\n");
        }
    }

    MPI_Finalize();
    return 0;
}

当我在一个八核集群上运行它时,它每次都能完美运行.但是,当我在16核集群上运行它时,有时它可以工作,有时它会输出如下内容:

When I run this on an eight-core cluster, it runs perfectly every time. However, when I run it on a sixteen-core cluster, sometimes it works, and sometimes it outputs something like this:

Hello, world, I am rank 0 of 16.
Hello, world, I am rank 1 of 16 (message: 0).
Hello, world, I am rank 2 of 16 (message: 1).
Hello, world, I am rank 3 of 16 (message: 2).
Hello, world, I am rank 4 of 16 (message: 3).
Hello, world, I am rank 5 of 16 (message: 4).
Hello, world, I am rank 6 of 16 (message: 5).
Hello, world, I am rank 7 of 16 (message: 6).
Hello, world, I am rank 10 of 16 (message: 9).
Hello, world, I am rank 11 of 16 (message: 10).
Hello, world, I am rank 8 of 16 (message: 7).
Hello, world, I am rank 9 of 16 (message: 8).
Hello, world, I am rank 12 of 16 (message: 11).
Hello, world, I am rank 13 of 16 (message: 12).
Hello, world, I am rank 14 of 16 (message: 13).
Hello, world, I am rank 15 of 16 (message: 14).
Hello world completed!

也就是说,大多数输出​​是按顺序排列的,但是有些输出不合适.

That is, most of the output is in order, but some is out of place.

为什么会这样?怎么可能呢?我该如何解决?

Why is this happening? How is it even possible? How can I fix it?

推荐答案

MPI代码不能保证以任何特定顺序完成.在多个节点上运行时尤其如此,但即使在一个节点上也是如此.

MPI codes are not guaranteed to complete in any specific order. This is especially true when running on multiple nodes, but still true even on one node.

尽管您通过添加顺序发送和接收来执行某种排序,但是输出消息仍然从应用程序进程转发到MPI层,并返回到要打印到的mpiexec/mpirun进程.屏幕.此消息转发可以以任何顺序发生,并且与其他通信交错(因为它使用完全不同的通信拓扑).如果确实必须确保按顺序打印消息,则必须确保相同的MPI等级将所有消息都打印出来.

While you are enforcing some sort of ordering by adding the sequential sends and receives, the output messages are still forwarded from the application process to the MPI layer and back up to the mpiexec/mpirun process to be printed to the screen. This message forwarding can happen in any order and is interleaved with other communication (since it uses a completely different communication topology). If you really must ensure that messages are printed in order, you have to make sure that the same MPI rank prints all of them out.

这篇关于为什么此MPI代码执行不正确?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆