什么是正确的“通知"方式?处理器没有阻塞? [英] What is the right way to "notify" processors without blocking?

查看:74
本文介绍了什么是正确的“通知"方式?处理器没有阻塞?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有很多东西,并且必须对所有这些东西做一些操作. 万一某个元素的操作失败,我想停止所有阵列上的工作[该工作分布在多个处理器上.]

Suppose I have a very large array of things and I have to do some operation on all these things. In case operation fails for one element, I want to stop the work [this work is distributed across number of processors] across all the array.

我想实现这一点,同时将发送/接收消息的数量保持在最低限度. 另外,如果不需要,我也不想阻塞处理器.

I want to achieve this while keeping the number of sent/received messages to a minimum. Also, I don't want to block processors if there is no need to.

如何使用MPI做到这一点?

How can I do it using MPI?

推荐答案

以非阻塞方式派生此全局停止条件的可能策略是依赖MPI_Test.

A possible strategy to derive this global stop condition in a non-blocking fashion is to rely on MPI_Test.

考虑到每个进程都使用给定的标记将MPI_INT类型的异步接收发布到其左行,以建立一个环.然后开始计算.如果等级遇到停止条件,它将发送自己的等级到正确的等级.同时,每个等级在计算过程中使用MPI_Test检查MPI_Irecv是否完成,然后进入一个分支,首先等待消息,然后将接收到的等级传递到右边,除非右边的等级是等于消息的有效负载(不循环).

Consider that each process posts an asynchronous receive of type MPI_INT to its left rank with a given tag to build a ring. Then start your computation. If a rank encounters the stop condition it sends its own rank to its right rank. In the meantime each rank uses MPI_Test to check for the completion of the MPI_Irecv during the computation if it is completed then enter a branch first waiting the message and then transitively propagating the received rank to the right except if the right rank is equal to the payload of the message (not to loop).

完成此操作后,分支中的所有进程都应准备就绪,可以触发任意恢复操作.

This done you should have all processes in the branch, ready to trigger an arbitrary recovery operation.

保留的拓扑是一个环,因为它最大程度地减少了消息数量(n-1),但是却增加了传播时间.其他拓扑可以保留更多的消息,但空间复杂度较低,例如,具有n.ln(n)复杂度的树.

The topology retained is a ring as it minimizes the number of messages at most (n-1) however it augments the propagation time. Other topologies could be retained with more messages but lower spatial complexity for example a tree with a n.ln(n) complexity.

类似这样的东西.

int rank, size;
MPI_Init(&argc,&argv);
MPI_Comm_rank( MPI_COMM_WORLD, &rank);
MPI_Comm_size( MPI_COMM_WORLD, &size);

int left_rank = (rank==0)?(size-1):(rank-1);
int right_rank = (rank==(size-1))?0:(rank+1)%size;

int stop_cond_rank;
MPI_Request stop_cond_request;
int stop_cond= 0;

while( 1 )
{
         MPI_Irecv( &stop_cond_rank, 1, MPI_INT, left_rank, 123, MPI_COMM_WORLD, &stop_cond_request);

         /* Compute Here and set stop condition accordingly */

         if( stop_cond )
         {
                 /* Cancel the left recv */
                 MPI_Cancel( &stop_cond_request );
                 if( rank != right_rank )
                            MPI_Send( &rank, 1, MPI_INT, right_rank, 123, MPI_COMM_WORLD ); 

                   break;
         }

         int did_recv = 0;
         MPI_Test( &stop_cond_request, &did_recv, MPI_STATUS_IGNORE );
         if( did_recv )
         {
                  stop_cond = 1;
                  MPI_Wait( &stop_cond_request, MPI_STATUS_IGNORE );
                  if( right_rank != stop_cond_rank )
                            MPI_Send( &stop_cond_rank, 1, MPI_INT, right_rank, 123, MPI_COMM_WORLD );

                   break;
          }
}

if( stop_cond )
{
      /* Handle the stop condition */
}
else
{
      /* Cleanup */
     MPI_Cancel( &stop_cond_request );
}

这篇关于什么是正确的“通知"方式?处理器没有阻塞?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆