MPI发送和接收死锁 [英] MPI send and receive deadlock

查看:408
本文介绍了MPI发送和接收死锁的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对MPI还是很陌生,我只是在编写一个基本的发送和接收模块,在该模块中,我向n个处理器发送12个月,并接收每个月并打印其值.因此,我可以正确发送值,也可以接收所有值,但是我的程序卡住了,即最后一次不打印程序完成后".你能帮忙吗?

I'm very new to MPI and I'm just writing a basic send and receive module in which I'm sending 12 months to n number of processors and receiving each Month and printing its values. So I'm able to send the values correctly and also able to receive all of them but my program is stuck i.e It is not printing "After program is complete" at the last. Can you please help.

#include <stdio.h>
#include <string.h>
#include "mpi.h"
#include<math.h>

int main(int argc, char* argv[]){
int  my_rank; /* rank of process */
int  p;       /* number of processes */

int tag=0;    /* tag for messages */

MPI_Status status ;   /* return status for receive */
int i;
int pro;
/* start up MPI */

MPI_Init(&argc, &argv);

// find out process rank
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); 

//find out number of processes
MPI_Comm_size(MPI_COMM_WORLD, &p); 
if (my_rank==0)
{
    for(i=1;i<=12;i++)
    {
        pro = (i-1)%p;
        MPI_Send(&i, 1, MPI_INT,pro, tag, MPI_COMM_WORLD);
        printf("Value of Processor is %d Month %d\n",pro,i);
    }
}

//else{
for(int n=0;n<=p;n++)
{

    MPI_Recv(&i, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status);
    printf("My Month is %d and rank is %d\n",i,my_rank);

}
//}
MPI_Barrier(MPI_COMM_WORLD);
if(my_rank==0)
{
    printf("After program is complete\n");
}
/* shut down MPI */

MPI_Finalize(); 
return 0;
}

Below is the output:
Value of Processor is 0 Month 1
Value of Processor is 1 Month 2
Value of Processor is 2 Month 3
Value of Processor is 3 Month 4
Value of Processor is 4 Month 5
Value of Processor is 0 Month 6
Value of Processor is 1 Month 7
Value of Processor is 2 Month 8
Value of Processor is 3 Month 9
Value of Processor is 4 Month 10
Value of Processor is 0 Month 11
My Month is 2 and rank is 1
My Month is 7 and rank is 1
My Month is 3 and rank is 2
My Month is 8 and rank is 2
Value of Processor is 1 Month 12
My Month is 1 and rank is 0
My Month is 6 and rank is 0
My Month is 11 and rank is 0
My Month is 12 and rank is 1
My Month is 4 and rank is 3
My Month is 9 and rank is 3
My Month is 5 and rank is 4
My Month is 10 and rank is 4

推荐答案

第一:您违反了MPI的基本规则之一,必须将一个发送与一个接收匹配.

First: You violate one of the basic rules of MPI, there you must match one send with one receive.

在示例运行中,您使用5个处理器(等级)运行,并且可以看到等级0将3条消息发送到等级0,将1条消息发送到其余等级.但是,每个等级职位13接收.因此,他们自然会被困在等待从未发送过的消息中.请记住,MPI_Recv循环中的代码由所有级别执行.因此,总共将有5 * 13接收.

In your example run, you run with 5 processors (ranks) and as you can see rank 0 sends 3 messages to ranks 0 and 1 and 2 messages to the remaining ranks. However, each rank posts 13 receives. So they will naturally get stuck waiting for a messages that are never sent. Remember, that the code in the loop around MPI_Recv is executed by all ranks. So there will be a total of 5 * 13 receives.

如果轮到您了接收,则可以通过在循环内进行过滤来解决此问题.但这取决于您是否真的事先知道等级0将发送多少消息-您可能需要更复杂的机制.

You can fix that by filtering inside the loop if it is your turn to receive. But it depends if you actually know beforehand how many messages the rank 0 is going to send - you may need more complicated mechanisms.

第二:您的等级0向自身发送了一条阻止消息(没有先发布非阻止接收).那可能已经造成了僵局.请记住,即使在实践中有时也会保证MPI_Send在发布匹配的收据之前不会返回.

Second: You rank 0 sends a blocking message to itself (without posting a non-blocking receive first). That can already cause a deadlock. Remember that a MPI_Send is never guaranteed to return before the matching receive was posted, even though it sometimes may in practice.

第三条:该循环for(int n=0;n<=p;n++)运行13次.即使运行12次也不正确,但您当然肯定不希望这样做.

Third: That loop for(int n=0;n<=p;n++) runs 13 times. You most certainly didn't want that, even though it isn't correct if you run it 12 times.

最后:对于特定示例,首选解决方案是将月份保存在数组中,并使用

Finally: For the specific example, the preferred solution would be to save the months inside an array and spread it around all processes using MPI_Scatterv.

这篇关于MPI发送和接收死锁的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆