MPI非阻塞Irecv没有收到数据? [英] MPI Non-blocking Irecv didn't receive data?

查看:319
本文介绍了MPI非阻塞Irecv没有收到数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用MPI无阻塞通信(MPI_Irecv,MP_Isend)来监视从站的空闲状态,代码就像下面这样.

等级0:

 int dest = -1;
while( dest <= 0){
   int i;
   for(i=1;i<=slaves_num;i++){
      printf("slave %d, now is %d \n",i,idle_node[i]);
      if (idle_node[i]== 1) {
         idle_node[i] = 0;
         dest = i;
         break;
      }
   }
   if(dest <= 0){
      MPI_Irecv(&idle_node[1],1,MPI_INT,1,MSG_IDLE,MPI_COMM_WORLD,&request);
      MPI_Irecv(&idle_node[2],1,MPI_INT,2,MSG_IDLE,MPI_COMM_WORLD,&request);
      MPI_Irecv(&idle_node[3],1,MPI_INT,3,MSG_IDLE,MPI_COMM_WORLD,&request);
      // MPI_Wait(&request,&status);
   }
   usleep(100000);
}

idle_node[dest] = 0;//indicates this slave is busy now
 

1、2、3级:

 while(1)
{
   ...//do something
   MPI_Isend(&idle,1,MPI_INT,0,MSG_IDLE,MPI_COMM_WORLD,&request);
   MPI_Wait(&request,&status);
}
 

它可以工作,但是我希望它更快,所以我删除了这一行:

 usleep(100000);
 

然后等级0像这样死掉了:

 slave 1, now is 0
slave 2, now is 0
slave 3, now is 0 
slave 1, now is 0
slave 2, now is 0
slave 3, now is 0 
...
 

那么是否表示我使用MPI_Irecv时,它只是告诉MPI我要在此处接收消息(尚未接收到消息),而MPI需要其他时间来接收真实数据?还是其他原因?

解决方案

在此反复讨论了非阻塞操作的使用.根据MPI规范(无阻塞通信部分):

类似地,无阻塞的接收开始调用会启动接收操作,但不会完成接收操作.可以在将消息存储到接收缓冲区之前返回该调用.需要单独的 receive complete 调用来完成接收操作,并验证是否已将数据接收到接收缓冲区中. 使用适当的硬件,可以将数据传输到接收器内存中,并与接收启动之后和完成之前完成的计算同时进行.

(粗体是从标准逐字复制的; 斜体的重点是我的)

关键句子是最后一个.除非调用MPI_WAIT[ALL|SOME|ANY]MPI_TEST[ALL|SOME|ANY](MPI_TEST*将完成标志的值设置为true),否则该标准不能保证无阻塞接收操作将完成(甚至启动). >

默认情况下,Open MPI作为单线程库提供,并且没有特殊的硬件加速,进行非阻塞操作的唯一方法是定期调用某些非阻塞调用(使用MPI_TEST*的主要示例)或呼叫阻塞的呼叫(主要示例为MPI_WAIT*).

此外,您的代码还会导致严重的泄漏,迟早会导致资源耗尽:您使用相同的request变量多次调用MPI_Irecv,从而有效覆盖其值并丢失对先前启动的请求的引用.未等待的请求永远不会释放,因此会保留在内存中.

在您的情况下,绝对不需要使用非阻塞操作.如果我正确理解逻辑,则可以使用以下简单的代码来实现所需的目标:

 MPI_Recv(&dummy, 1, MPI_INT, MPI_ANY_SOURCE, MSG_IDLE, MPI_COMM_WORLD, &status);
idle_node[status.MPI_SOURCE] = 0;
 

如果您想同时处理多个工作进程,则涉及到更多:

 MPI_Request reqs[slaves_num];
int indices[slaves_num], num_completed;

for (i = 0; i < slaves_num; i++)
   reqs[i] = MPI_REQUEST_NULL;

while (1)
{
   // Repost all completed (or never started) receives
   for (i = 1; i <= slaves_num; i++)
      if (reqs[i-1] == MPI_REQUEST_NULL)
         MPI_Irecv(&idle_node[i], 1, MPI_INT, i, MSG_IDLE,
                   MPI_COMM_WORLD, &reqs[i-1]);

   MPI_Waitsome(slaves_num, reqs, &num_completed, indices, MPI_STATUSES_IGNORE);

   // Examine num_completed and indices and feed the workers with data
   ...
}
 

在调用MPI_Waitsome之后,将有一个或多个已完成的请求.确切的数字将在num_completed中,并且已完成请求的索引将被填充在indices[]的前num_completed个元素中.完成的请求将被释放,并且reqs[]的相应元素将被设置为MPI_REQUEST_NULL.

此外,对于使用非阻塞操作似乎存在一个普遍的误解.非阻塞发送可以与阻塞接收匹配,并且阻塞发送也可以与非阻塞接收相等地匹配.这使这种构造变得毫无意义:

 // Receiver
MPI_Irecv(..., &request);
... do something ...
MPI_Wait(&request, &status);

// Sender
MPI_Isend(..., &request);
MPI_Wait(&request, MPI_STATUS_IGNORE);
 

MPI_Isend紧随其后的是等同于MPI_Send,并且下面的代码是完全有效的(并且更易于理解):

 // Receiver
MPI_Irecv(..., &request);
... do something ...
MPI_Wait(&request, &status);

// Sender
MPI_Send(...);
 

I use MPI non-blocking communication(MPI_Irecv, MP_Isend) to monitor the slaves' idle states, the code is like bellow.

rank 0:

int dest = -1;
while( dest <= 0){
   int i;
   for(i=1;i<=slaves_num;i++){
      printf("slave %d, now is %d \n",i,idle_node[i]);
      if (idle_node[i]== 1) {
         idle_node[i] = 0;
         dest = i;
         break;
      }
   }
   if(dest <= 0){
      MPI_Irecv(&idle_node[1],1,MPI_INT,1,MSG_IDLE,MPI_COMM_WORLD,&request);
      MPI_Irecv(&idle_node[2],1,MPI_INT,2,MSG_IDLE,MPI_COMM_WORLD,&request);
      MPI_Irecv(&idle_node[3],1,MPI_INT,3,MSG_IDLE,MPI_COMM_WORLD,&request);
      // MPI_Wait(&request,&status);
   }
   usleep(100000);
}

idle_node[dest] = 0;//indicates this slave is busy now

rank 1,2,3:

while(1)
{
   ...//do something
   MPI_Isend(&idle,1,MPI_INT,0,MSG_IDLE,MPI_COMM_WORLD,&request);
   MPI_Wait(&request,&status);
}

it works, but I want it to be faster, so I delete the line:

usleep(100000);

then rank 0 goes into a dead while like this:

slave 1, now is 0
slave 2, now is 0
slave 3, now is 0 
slave 1, now is 0
slave 2, now is 0
slave 3, now is 0 
...

So does it indicate that when I use the MPI_Irecv, it just tells MPI I want to receive a message here(haven't received message), and MPI needs other time to receive the real data? or some reasons else?

解决方案

The use of non-blocking operations has been discussed over and over again here. From the MPI specification (section Nonblocking Communication):

Similarly, a nonblocking receive start call initiates the receive operation, but does not complete it. The call can return before a message is stored into the receive buffer. A separate receive complete call is needed to complete the receive operation and verify that the data has been received into the receive buffer. With suitable hardware, the transfer of data into the receiver memory may proceed concurrently with computations done after the receive was initiated and before it completed.

(the bold text is copied verbatim from the standard; the emphasis in italic is mine)

The key sentence is the last one. The standard does not give any guarantee that a non-blocking receive operation will ever complete (or even start) unless MPI_WAIT[ALL|SOME|ANY] or MPI_TEST[ALL|SOME|ANY] was called (with MPI_TEST* setting a value of true for the completion flag).

By default Open MPI comes as a single-threaded library and without special hardware acceleration the only way to progress non-blocking operations is to either call periodically into some non-blocking calls (with the primary example of MPI_TEST*) or call into a blocking one (with the primary example being MPI_WAIT*).

Also your code leads to a nasty leak that will sooner or later result in resource exhaustion: you are calling MPI_Irecv multiple times with the same request variable, effectively overwriting its value and losing the reference to the previously started requests. Requests that are not waited upon are never freed and therefore remain in memory.

There is absolutely no need to use non-blocking operations in your case. If I understand the logic correctly, you can achieve what you want with code as simple as:

MPI_Recv(&dummy, 1, MPI_INT, MPI_ANY_SOURCE, MSG_IDLE, MPI_COMM_WORLD, &status);
idle_node[status.MPI_SOURCE] = 0;

If you'd like to process more than one worker processes at the same time, it is a bit more involving:

MPI_Request reqs[slaves_num];
int indices[slaves_num], num_completed;

for (i = 0; i < slaves_num; i++)
   reqs[i] = MPI_REQUEST_NULL;

while (1)
{
   // Repost all completed (or never started) receives
   for (i = 1; i <= slaves_num; i++)
      if (reqs[i-1] == MPI_REQUEST_NULL)
         MPI_Irecv(&idle_node[i], 1, MPI_INT, i, MSG_IDLE,
                   MPI_COMM_WORLD, &reqs[i-1]);

   MPI_Waitsome(slaves_num, reqs, &num_completed, indices, MPI_STATUSES_IGNORE);

   // Examine num_completed and indices and feed the workers with data
   ...
}

After the call to MPI_Waitsome there will be one or more completed requests. The exact number will be in num_completed and the indices of the completed requests will be filled in the first num_completed elements of indices[]. The completed requests will be freed and the corresponding elements of reqs[] will be set to MPI_REQUEST_NULL.

Also, there appears to be a common misconception about using non-blocking operations. A non-blocking send can be matched by a blocking receive and also a blocking send can be equally matched by a non-blocking receive. That makes such constructs nonsensical:

// Receiver
MPI_Irecv(..., &request);
... do something ...
MPI_Wait(&request, &status);

// Sender
MPI_Isend(..., &request);
MPI_Wait(&request, MPI_STATUS_IGNORE);

MPI_Isend immediately followed by MPI_Wait is equivalent to MPI_Send and the following code is perfectly valid (and easier to understand):

// Receiver
MPI_Irecv(..., &request);
... do something ...
MPI_Wait(&request, &status);

// Sender
MPI_Send(...);

这篇关于MPI非阻塞Irecv没有收到数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆