MPI_ERR_TRUNCATE:广播 [英] MPI_ERR_TRUNCATE: On Broadcast

查看:405
本文介绍了MPI_ERR_TRUNCATE:广播的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 int 我打算从根广播( rank ==(FIELD = 0))。

I have an int I intend to broadcast from root (rank==(FIELD=0)).

int winner

if (rank == FIELD) {
    winner = something;
}

MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(&winner, 1, MPI_INT, FIELD, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
if (rank != FIELD) {
    cout << rank << " informed that winner is " << winner << endl;
}

但看起来我得到了

[JM:6892] *** An error occurred in MPI_Bcast
[JM:6892] *** on communicator MPI_COMM_WORLD
[JM:6892] *** MPI_ERR_TRUNCATE: message truncated
[JM:6892] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

发现我可以在 Bcast

MPI_Bcast(&winner, NUMPROCS, MPI_INT, FIELD, MPI_COMM_WORLD);

其中 NUMPROCS 是正在运行的进程数。 (实际上似乎我只需要它是2)。然后它运行,但给出意外的输出...

Where NUMPROCS is number of running processes. (actually seems like I just need it to be 2). Then it runs, but gives unexpected output ...

1 informed that winner is 103
2 informed that winner is 103
3 informed that winner is 103
5 informed that winner is 103
4 informed that winner is 103

当我 cout 胜者时,应该是 -1

When I cout the winner, it should be -1

推荐答案

代码早期出现错误:

if (rank == FIELD) {
   // randomly place ball, then broadcast to players
   ballPos[0] = rand() % 128;
   ballPos[1] = rand() % 64;
   MPI_Bcast(ballPos, 2, MPI_INT, FIELD, MPI_COMM_WORLD);
}

这是一个很常见的错误。 MPI_Bcast 是一个集合操作,必须由所有进程调用才能完成。在你的情况下会发生什么,这个广播不是由 MPI_COMM_WORLD 中的所有进程调用(但只有根),因此干扰下一个广播操作,即内部循环。第二个广播操作实际上接收第一个(两个 int 元素)发送到缓冲区中的消息,只有一个 int 并因此产生截断错误消息。在开放MPI中,每个广播在内部使用相同的消息标签值,因此不同的广播可以相互干扰而不是按顺序发出。这符合(旧的)MPI标准 - 在MPI-2.2中不能有多于一个突出的集体操作(在MPI-3.0中,一个可以具有几个突出的非阻塞集合操作)。您应该将代码重写为:

This is a very common mistake. MPI_Bcast is a collective operation and it must be called by all processes in order to complete. What happens in your case is that this broadcast is not called by all processes in MPI_COMM_WORLD (but only by the root) and hence interferes with the next broadcast operation, namely the one inside the loop. The second broadcast operation actually receives messages sent by the first one (two int elements) into a buffer for just one int and hence the truncation error message. In Open MPI each broadcast uses internally the same message tag values and hence different broadcasts can interfere with each other in not issued in sequence. This is compliant with the (old) MPI standard - one cannot have more than one outstanding collective operations in MPI-2.2 (in MPI-3.0 one can have several outstanding non-blocking collective operations). You should rewrite the code as:

if (rank == FIELD) {
   // randomly place ball, then broadcast to players
   ballPos[0] = rand() % 128;
   ballPos[1] = rand() % 64;
}
MPI_Bcast(ballPos, 2, MPI_INT, FIELD, MPI_COMM_WORLD);

这篇关于MPI_ERR_TRUNCATE:广播的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆