MPI_Gather()将中心元素转换成全局矩阵 [英] MPI_Gather() the central elements into a global matrix
问题描述
这是来自 MPI_Gather 2D阵列的后续问题.情况如下:
This is a follow-up question from MPI_Gather 2D array. Here is the situation:
id = 0 has this submatrix
|16.000000| |11.000000| |12.000000| |15.000000|
|6.000000| |1.000000| |2.000000| |5.000000|
|8.000000| |3.000000| |4.000000| |7.000000|
|14.000000| |9.000000| |10.000000| |13.000000|
-----------------------
id = 1 has this submatrix
|12.000000| |15.000000| |16.000000| |11.000000|
|2.000000| |5.000000| |6.000000| |1.000000|
|4.000000| |7.000000| |8.000000| |3.000000|
|10.000000| |13.000000| |14.000000| |9.000000|
-----------------------
id = 2 has this submatrix
|8.000000| |3.000000| |4.000000| |7.000000|
|14.000000| |9.000000| |10.000000| |13.000000|
|16.000000| |11.000000| |12.000000| |15.000000|
|6.000000| |1.000000| |2.000000| |5.000000|
-----------------------
id = 3 has this submatrix
|4.000000| |7.000000| |8.000000| |3.000000|
|10.000000| |13.000000| |14.000000| |9.000000|
|12.000000| |15.000000| |16.000000| |11.000000|
|2.000000| |5.000000| |6.000000| |1.000000|
-----------------------
The global matrix:
|1.000000| |2.000000| |5.000000| |6.000000|
|3.000000| |4.000000| |7.000000| |8.000000|
|11.000000| |12.000000| |15.000000| |16.000000|
|-3.000000| |-3.000000| |-3.000000| |-3.000000|
我想做的是仅收集全局网格中的中心元素(不在边界中的元素),因此全局网格应如下所示:
What I am trying to do is gather only the central elements (the ones not in the borders) in the global grid, so the global grid should like this:
|1.000000| |2.000000| |5.000000| |6.000000|
|3.000000| |4.000000| |7.000000| |8.000000|
|9.000000| |10.000000| |13.000000| |14.000000|
|11.000000| |12.000000| |15.000000| |16.000000|
而不像我得到的那样.这是我的代码:
and not like the one I am getting. This is the code I have:
float **gridPtr;
float **global_grid;
lengthSubN = N/pSqrt; // N is the dim of global gird and pSqrt the sqrt of the number of processes
MPI_Type_contiguous(lengthSubN, MPI_FLOAT, &rowType);
MPI_Type_commit(&rowType);
if(id == 0) {
MPI_Gather(&gridPtr[1][1], 1, rowType, global_grid[0], 1, rowType, 0, MPI_COMM_WORLD);
MPI_Gather(&gridPtr[2][1], 1, rowType, global_grid[1], 1, rowType, 0, MPI_COMM_WORLD);
} else {
MPI_Gather(&gridPtr[1][1], 1, rowType, NULL, 0, rowType, 0, MPI_COMM_WORLD);
MPI_Gather(&gridPtr[2][1], 1, rowType, NULL, 0, rowType, 0, MPI_COMM_WORLD);
}
...
float** allocate2D(float** A, const int N, const int M) {
int i;
float *t0;
A = malloc(M * sizeof (float*)); /* Allocating pointers */
if(A == NULL)
printf("MALLOC FAILED in A\n");
t0 = malloc(N * M * sizeof (float)); /* Allocating data */
if(t0 == NULL)
printf("MALLOC FAILED in t0\n");
for (i = 0; i < M; i++)
A[i] = t0 + i * (N);
return A;
}
这是我的尝试,没有MPI_Gather()
,但是有子数组:
Here is my attempt without MPI_Gather()
, but with subarray:
MPI_Datatype mysubarray;
int starts[2] = {1, 1};
int subsizes[2] = {lengthSubN, lengthSubN};
int bigsizes[2] = {N_glob, M_glob};
MPI_Type_create_subarray(2, bigsizes, subsizes, starts,
MPI_ORDER_C, MPI_FLOAT, &mysubarray);
MPI_Type_commit(&mysubarray);
MPI_Isend(&(gridPtr[0][0]), 1, mysubarray, 0, 3, MPI_COMM_WORLD, &req[0]);
MPI_Type_free(&mysubarray);
MPI_Barrier(MPI_COMM_WORLD);
if(id == 0) {
for(i = 0; i < p; ++i) {
MPI_Irecv(&(global_grid[i][0]), lengthSubN * lengthSubN, MPI_FLOAT, i, 3, MPI_COMM_WORLD, &req[0]);
}
}
if(id == 0)
print(global_grid, N_glob, N_glob);
但是结果是:
|1.000000| |2.000000| |3.000000| |4.000000|
|5.000000| |6.000000| |7.000000| |8.000000|
|9.000000| |10.000000| |11.000000| |12.000000|
|13.000000| |14.000000| |15.000000| |16.000000|
这不是我想要的.我必须找到一种方法可以说出它应该以另一种方式放置数据.所以,如果我这样做:
which is not exactly what I want. I have to find a way to say to recv that it should place the data in another fashion. So, if I do:
MPI_Irecv(&(global_grid[0][0]), 1, mysubarray, 0, 3, MPI_COMM_WORLD, &req[0]);
然后我会得到:
|-3.000000| |-3.000000| |-3.000000| |-3.000000|
|-3.000000| |1.000000| |2.000000| |-3.000000|
|-3.000000| |3.000000| |4.000000| |-3.000000|
|-3.000000| |-3.000000| |-3.000000| |-3.000000|
推荐答案
我无法提供完整的解决方案,但是我将解释为什么使用MPI_Gather
的原始示例无法按预期工作.
I cannot give a full solution, but I will explain why your original example using MPI_Gather
does not work as expected.
使用lengthSubN=2
定义了2个浮点数的新数据类型,它们在此行中相邻存储在内存中:
With lengthSubN=2
you defined a new datatype of 2 floats which are stored adjacent in memory at this line:
MPI_Type_contiguous(lengthSubN, MPI_FLOAT, &rowType);
现在,让我们看一下第一个MPI_Gather
调用:
Now, let's take a look at the first MPI_Gather
call which is:
if(id == 0) {
MPI_Gather(&gridPtr[1][1], 1, rowType, global_grid[0], 1, rowType, 0, MPI_COMM_WORLD);
} else {
MPI_Gather(&gridPtr[1][1], 1, rowType, NULL, 0, rowType, 0, MPI_COMM_WORLD);
}
它需要1个rowType
,这是从每个列中的元素gridPtr[1][1]
开始的2个相邻浮点.这些是值:
It takes 1 rowType
which is 2 adjacent float starting at element gridPtr[1][1]
from each rank. These are the values:
id 0: 1.0 2.0
id 1: 5.0 6.0
id 2: 9.0 10.0
id 3: 13.0 14.0
并将它们放置在global_grid[0]
指向的接收缓冲区中相邻.该指针实际上指向第一行的开头,因此内存充满了:
and places them adjacent in the receive buffer pointed to by global_grid[0]
. This pointer actually points to the start of the first row, so that the memory is filled with:
1.0 2.0 5.0 6.0 9.0 10.0 13.0 14.0
但是,global_grid
每行只有4列,因此最后4个值将换行到global_grid[1]
(*)指向的第二行.这甚至可能是不确定的行为.因此,在此MPI_Gather
之后,global_grid
的内容为:
But, global_grid
has only 4 columns per row, so that the last 4 value wrap to the second row pointed to by global_grid[1]
(*). This may even by undefined behaviour. Thus, after this MPI_Gather
the contents of global_grid
is:
1.0 2.0 5.0 6.0
9.0 10.0 13.0 14.0
-3.0 -3.0 -3.0 -3.0
-3.0 -3.0 -3.0 -3.0
第二个MPI_Gather
以相同的方式工作,并在global_grid
的第二行开始写入:
The second MPI_Gather
works the same way and starts writing at the second row of global_grid
:
3.0 4.0 7.0 8.0 11.0 12.0 15.0 16.0
因此它覆盖了上面的一些值,结果如所观察到的:
It thus overwrites some values above and the result is as observed:
1.0 2.0 5.0 6.0
3.0 4.0 7.0 8.0
11.0 12.0 15.0 16.0
-3.0 -3.0 -3.0 -3.0
(*)allocate2d
实际上为2维数据缓冲区分配了连续的内存.
(*) allocate2d
actually allocates continous memory for the 2 dimensional data buffer.
这篇关于MPI_Gather()将中心元素转换成全局矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!