MPI_列的聚集 [英] MPI_Gather of columns
问题描述
我有一个数组,用于计算的进程之间按列划分.之后,我想在一个进程(0)中收集此数组.
I have an array which is split up by columns between the processes for my calculation. Afterwards I want to gather this array in one process (0).
每个进程的列保存在数组A中,进程0的数组F用于收集数据. F数组的大小为n * n,每个进程都有part_size列,因此局部数组A为n * part_size.列被发送到交替的过程-c0进入p0,c1进入p1,c2再次进入p0,依此类推.
Each process has its columns saved in array A, process 0 has an array F for collecting the data. The F-array is of size n*n, each process has part_size columns, so the local arrays A are n*part_size. Columns are sent to alternating processes - c0 goes to p0, c1 to p1, c2 to p0 again and so on.
我创建了用于发送和接收列的新数据类型.
I created new datatypes for sending and receiving the columns.
在所有进程上:
MPI_Type_vector(n, 1, part_size, MPI::FLOAT, &col_send);
MPI_Type_commit(&col_send);
在进程0上:
MPI_Type_vector(n, 1, n, MPI::FLOAT, &col_recv);
MPI_Type_commit(&col_recv);
现在,我想按如下方式收集数组:
Now I would like to gather the array as follows:
MPI_Gather(&A, part_size, col_send, &F, part_size, col_recv, 0, MPI::COMM_WORLD);
但是结果不符合预期.我的示例有n = 4和两个过程.结果,来自p0的值应该存储在F的第0和2列中,而p1应该存储在1和3中.取而代之的是,p0的两列都存储在0和1中,而p1的值根本不存在. /p>
However the result is not as expected. My example has n = 4 and two processes. As a result the values from p0 should be in columns 0 and 2 of F and p1 should be stored in 1 and 3. Instead both columns of p0 are stored in 0 and 1, while the values of p1 are not there at all.
0: F[0][0]: 8.31786
0: F[0][1]: 3.90439
0: F[0][2]: -60386.2
0: F[0][3]: 4.573e-41
0: F[1][0]: 0
0: F[1][1]: 6.04768
0: F[1][2]: -60386.2
0: F[1][3]: 4.573e-41
0: F[2][0]: 0
0: F[2][1]: 8.88266
0: F[2][2]: -60386.2
0: F[2][3]: 4.573e-41
0: F[3][0]: 0
0: F[3][1]: 0
0: F[3][2]: -60386.2
0: F[3][3]: 4.573e-41
我承认我对此一无所知.我显然误解了Gather或Type_vector的工作原理并保存了它们的值.有人可以指出我正确的方向吗?任何帮助将不胜感激.
I'll admit that I'm out of ideas on this one. I obviously misunderstood how Gather or Type_vector works and saves their values. Could someone point me in the right direction? Any help would be much appreciated.
推荐答案
我看到的问题是,使用MPI_Type_vector()
创建的数据类型具有从第一项到最后一项的范围.例如:
The problem that I see is that the datatype created with MPI_Type_vector()
has extent going from the first to the last item. For example:
您的col_recv
数据类型的范围在>
和<
之间(我希望掩码的这种表示足够清楚):
The extent for your col_recv
datatype is between >
and <
(I hope this representation of the mask is clear enough):
>x . . .
x . . .
x . . .
x<. . .
这是13个MPI_FLOAT项(必须按行读取,这是C顺序). 接收其中两个会导致:
That is 13 MPI_FLOAT items (must be read by row, that's C ordering). receiving two of them will lead to:
>x . . .
x . . .
x . . .
x y . .
. y . .
. y . .
. y . .
那显然不是你想要的.
要让MPI_Gather()
正确跳过接收器上的数据,您需要将col_recv
的范围设置为正好 ONE ELEMENT .您可以使用MPI_Type_create_resized()
:
To let the MPI_Gather()
properly skip data on the receiver you need to set the extent of col_recv
as large as exactly ONE ELEMENT. You can do this by using MPI_Type_create_resized()
:
>x<. . .
x . . .
x . . .
x . . .
以便正确接收接收到的连续块:
so that receiving successive blocks gets correctly interleaved:
x y . .
x y . .
x y . .
x y . .
但是,接收两列而不是一列会导致:
However receiving two columns instead of one will lead to:
x x y y
x x y y
x x y y
x x y y
那不是您想要的,即使距离更近.
That again is not what you want, even if closer.
由于您需要交错的列,因此需要创建一个更复杂的数据类型,该数据类型能够像以前一样以1-item-extent来描述所有列:
Since you want interleaved columns, you need to create a more complex datatype, capable of describing all the columns, with 1-item-extent as before:
每个列都作为一个ELEMENT分开(跨步)(即先前定义的列的范围-不是大小,即4个元素):
Each column is separated (stride) as one ELEMENT (that is the extent - not the size, that is 4 elements - of the previously defined column):
>x<. x .
x . x .
x . x .
x . x .
每个处理器接收其中之一,您将得到想要的东西:
receiving one of them per processor you'll get what you want:
x y x y
x y x y
x y x y
x y x y
您也可以使用MPI_Type_create_darray()
进行此操作,因为它允许创建适合与 scalapack ,它是您的一维子箱.
You can do it with MPI_Type_create_darray()
as well, since it allow to create datatypes suitable to be used with the block-cyclic distribution of scalapack, being your one a 1D subcase of it.
我也尝试过.这是在两个处理器上的工作代码:
I have also tried it. Here is a working code, on two processors:
#include <mpi.h>
#define N 4
#define NPROCS 2
#define NPART (N/NPROCS)
int main(int argc, char **argv) {
float a_send[N][NPART];
float a_recv[N][N] = {0};
MPI_Datatype column_send_type;
MPI_Datatype column_recv_type;
MPI_Datatype column_send_type1;
MPI_Datatype column_recv_type1;
MPI_Datatype matrix_columns_type;
MPI_Datatype matrix_columns_type1;
MPI_Init(&argc, &argv);
int my_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
for(int i=0; i<N; ++i) {
for(int j=0; j<NPART; ++j) {
a_send[i][j] = my_rank*100+10*(i+1)+(j+1);
}
}
MPI_Type_vector(N, 1, NPART, MPI_FLOAT, &column_send_type);
MPI_Type_commit(&column_send_type);
MPI_Type_create_resized(column_send_type, 0, sizeof(float), &column_send_type1);
MPI_Type_commit(&column_send_type1);
MPI_Type_vector(N, 1, N, MPI_FLOAT, &column_recv_type);
MPI_Type_commit(&column_recv_type);
MPI_Type_create_resized(column_recv_type, 0, sizeof(float), &column_recv_type1);
MPI_Type_commit(&column_recv_type1);
MPI_Type_vector(NPART, 1, NPROCS, column_recv_type1, &matrix_columns_type);
MPI_Type_commit(&matrix_columns_type);
MPI_Type_create_resized(matrix_columns_type, 0, sizeof(float), &matrix_columns_type1);
MPI_Type_commit(&matrix_columns_type1);
MPI_Gather(a_send, NPART, column_send_type1, a_recv, 1, matrix_columns_type1, 0, MPI_COMM_WORLD);
if (my_rank==0) {
for(int i=0; i<N; ++i) {
for(int j=0; j<N; ++j) {
printf("%4.0f ",a_recv[i][j]);
}
printf("\n");
}
}
MPI_Finalize();
}
这篇关于MPI_列的聚集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!