MPI_列的聚集 [英] MPI_Gather of columns

查看:88
本文介绍了MPI_列的聚集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数组,用于计算的进程之间按列划分.之后,我想在一个进程(0)中收集此数组.

I have an array which is split up by columns between the processes for my calculation. Afterwards I want to gather this array in one process (0).

每个进程的列保存在数组A中,进程0的数组F用于收集数据. F数组的大小为n * n,每个进程都有part_size列,因此局部数组A为n * part_size.列被发送到交替的过程-c0进入p0,c1进入p1,c2再次进入p0,依此类推.

Each process has its columns saved in array A, process 0 has an array F for collecting the data. The F-array is of size n*n, each process has part_size columns, so the local arrays A are n*part_size. Columns are sent to alternating processes - c0 goes to p0, c1 to p1, c2 to p0 again and so on.

我创建了用于发送和接收列的新数据类型.

I created new datatypes for sending and receiving the columns.

在所有进程上:

MPI_Type_vector(n, 1, part_size, MPI::FLOAT, &col_send);
MPI_Type_commit(&col_send);

在进程0上:

MPI_Type_vector(n, 1, n, MPI::FLOAT, &col_recv);
MPI_Type_commit(&col_recv);

现在,我想按如下方式收集数组:

Now I would like to gather the array as follows:

MPI_Gather(&A, part_size, col_send, &F, part_size, col_recv, 0, MPI::COMM_WORLD);

但是结果不符合预期.我的示例有n = 4和两个过程.结果,来自p0的值应该存储在F的第0和2列中,而p1应该存储在1和3中.取而代之的是,p0的两列都存储在0和1中,而p1的值根本不存在. /p>

However the result is not as expected. My example has n = 4 and two processes. As a result the values from p0 should be in columns 0 and 2 of F and p1 should be stored in 1 and 3. Instead both columns of p0 are stored in 0 and 1, while the values of p1 are not there at all.

0: F[0][0]: 8.31786
0: F[0][1]: 3.90439
0: F[0][2]: -60386.2
0: F[0][3]: 4.573e-41
0: F[1][0]: 0
0: F[1][1]: 6.04768
0: F[1][2]: -60386.2
0: F[1][3]: 4.573e-41
0: F[2][0]: 0
0: F[2][1]: 8.88266
0: F[2][2]: -60386.2
0: F[2][3]: 4.573e-41
0: F[3][0]: 0
0: F[3][1]: 0
0: F[3][2]: -60386.2
0: F[3][3]: 4.573e-41

我承认我对此一无所知.我显然误解了Gather或Type_vector的工作原理并保存了它们的值.有人可以指出我正确的方向吗?任何帮助将不胜感激.

I'll admit that I'm out of ideas on this one. I obviously misunderstood how Gather or Type_vector works and saves their values. Could someone point me in the right direction? Any help would be much appreciated.

推荐答案

我看到的问题是,使用MPI_Type_vector()创建的数据类型具有从第一项到最后一项的范围.例如:

The problem that I see is that the datatype created with MPI_Type_vector() has extent going from the first to the last item. For example:

您的col_recv数据类型的范围在><之间(我希望掩码的这种表示足够清楚):

The extent for your col_recv datatype is between > and < (I hope this representation of the mask is clear enough):

>x . . .
 x . . .
 x . . .
 x<. . .

这是13个MPI_FLOAT项(必须按行读取,这是C顺序). 接收其中两个会导致:

That is 13 MPI_FLOAT items (must be read by row, that's C ordering). receiving two of them will lead to:

>x . . .
 x . . .
 x . . .
 x y . .
 . y . .
 . y . .
 . y . .

那显然不是你想要的.

要让MPI_Gather()正确跳过接收器上的数据,您需要将col_recv的范围设置为正好 ONE ELEMENT .您可以使用MPI_Type_create_resized():

To let the MPI_Gather() properly skip data on the receiver you need to set the extent of col_recv as large as exactly ONE ELEMENT. You can do this by using MPI_Type_create_resized():

>x<. . .
 x . . .
 x . . .
 x . . .

以便正确接收接收到的连续块:

so that receiving successive blocks gets correctly interleaved:

   x y . . 
   x y . . 
   x y . . 
   x y . . 

但是,接收两列而不是一列会导致:

However receiving two columns instead of one will lead to:

   x x y y
   x x y y
   x x y y
   x x y y

那不是您想要的,即使距离更近.

That again is not what you want, even if closer.

由于您需要交错的列,因此需要创建一个更复杂的数据类型,该数据类型能够像以前一样以1-item-extent来描述所有列:

Since you want interleaved columns, you need to create a more complex datatype, capable of describing all the columns, with 1-item-extent as before:

每个列都作为一个ELEMENT分开(跨步)(即先前定义的列的范围-不是大小,即4个元素):

Each column is separated (stride) as one ELEMENT (that is the extent - not the size, that is 4 elements - of the previously defined column):

  >x<. x .
   x . x .
   x . x .
   x . x .

每个处理器接收其中之一,您将得到想要的东西:

receiving one of them per processor you'll get what you want:

   x y x y
   x y x y
   x y x y
   x y x y

您也可以使用MPI_Type_create_darray()进行此操作,因为它允许创建适合与 scalapack ,它是您的一维子箱.

You can do it with MPI_Type_create_darray() as well, since it allow to create datatypes suitable to be used with the block-cyclic distribution of scalapack, being your one a 1D subcase of it.

我也尝试过.这是在两个处理器上的工作代码:

I have also tried it. Here is a working code, on two processors:

#include <mpi.h>

#define N      4
#define NPROCS 2
#define NPART  (N/NPROCS)

int main(int argc, char **argv) {
  float a_send[N][NPART];
  float a_recv[N][N] = {0};
  MPI_Datatype column_send_type;
  MPI_Datatype column_recv_type;
  MPI_Datatype column_send_type1;
  MPI_Datatype column_recv_type1;
  MPI_Datatype matrix_columns_type;
  MPI_Datatype matrix_columns_type1;

  MPI_Init(&argc, &argv);
  int my_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

  for(int i=0; i<N; ++i) {
    for(int j=0; j<NPART; ++j) {
      a_send[i][j] = my_rank*100+10*(i+1)+(j+1);
    }
  }

  MPI_Type_vector(N, 1, NPART, MPI_FLOAT, &column_send_type);
  MPI_Type_commit(&column_send_type);

  MPI_Type_create_resized(column_send_type, 0, sizeof(float), &column_send_type1);
  MPI_Type_commit(&column_send_type1);

  MPI_Type_vector(N, 1, N, MPI_FLOAT, &column_recv_type);
  MPI_Type_commit(&column_recv_type);

  MPI_Type_create_resized(column_recv_type, 0, sizeof(float), &column_recv_type1);
  MPI_Type_commit(&column_recv_type1);

  MPI_Type_vector(NPART, 1, NPROCS, column_recv_type1, &matrix_columns_type);
  MPI_Type_commit(&matrix_columns_type);

  MPI_Type_create_resized(matrix_columns_type, 0, sizeof(float), &matrix_columns_type1);
  MPI_Type_commit(&matrix_columns_type1);

  MPI_Gather(a_send, NPART, column_send_type1, a_recv, 1, matrix_columns_type1, 0, MPI_COMM_WORLD);

  if (my_rank==0) {
    for(int i=0; i<N; ++i) {
      for(int j=0; j<N; ++j) {
        printf("%4.0f  ",a_recv[i][j]);
      }
      printf("\n");
    }
  }

  MPI_Finalize();
}

这篇关于MPI_列的聚集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆