使用MPI_Gather在Fortran中发送2D数组 [英] Sending 2D arrays in Fortran with MPI_Gather

查看:226
本文介绍了使用MPI_Gather在Fortran中发送2D数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用MPI_GATHER发送2d块数据。例如,我在每个节点上都有2x3数组,如果我有4个节点,我想在根上使用8x3数组。对于1d数组MPI_GATHER根据MPI排序数据进行排序,但对于2d数据则会造成混乱!

我期待这段代码的输出:

  program testmpi 
use mpi
implicit none
integer :: send(2,3)
integer :: rec(4,3)
integer :: ierror,my_rank,i,j
call MPI_Init(ierror)
MPI_DATA_TYPE type_col
!找出进程等级
调用MPI_Comm_rank(MPI_COMM_WORLD,my_rank,ierror)
if(my_rank == 0)then
send = 1
do i = 1,2
print *,(send(i,j),j = 1,3)
enddo
endif
if(my_rank == 1)then
send = 5
!做1,2
! print *,(send(i,j),j = 1,3)
! enddo
endif
call MPI_GATHER(发送,6,MPI_INTEGER,rec,6,MPI_INTEGER,0,MPI_COMM_WORLD,错误)
if(my_rank == 0)then
print *, '<><> rec'
do i = 1,4
print *,(rec(i,j),j = 1,3)
enddo
endif
call MPI_Finalize(ierror)
end program testmpi

是这样的:

  1 1 1 
1 1 1
5 5 5
5 5 5

但它看起来像这样:

  1 1 5 
1 1 5
1 5 5
1 5 5


解决方案

以下是此答案。我曾认为这是不必要的,但数组索引和内存布局的多重差异可能意味着它值得做一个Fortran版本。



让我首先说你一般不会真的想这样做 - 从一些主过程中分散并收集大量的数据。通常情况下,您希望每项任务都能摆脱它自己的难题,并且您应该致力于永远不要让一个处理器需要对整个数据进行全局观察;只要你需要,你可以限制可伸缩性和问题的大小。如果你正在为I / O做这件事 - 一个进程读取数据,然后分散它,然后将它收集起来写入,你最终会想要查看MPI-IO。



尽管如此,MPI拥有非常好的方式将任意数据从内存中抽出,并将其分散/收集到一组处理器中。不幸的是,这需要相当数量的MPI概念 - MPI类型,范围和集体操作。在这个问题的答案中讨论了很多基本思想 - MPI_Type_create_subarray和MPI_Gather

考虑一个1d整数全局数组,任务0有你想要分配给MPI任务的数量,这样他们每个人都可以得到一块在他们的本地阵列中。假设你有4个任务,全局数组是[0,1,2,3,4,5,6,7]。你可以让任务0发送四条消息(包括一条本身)来分发这条消息,当它是重新组装的时候,接收四条消息将它们捆绑在一起;但在大量流程中显然会非常耗时。有针对这些操作的优化例程 - 分散/收集操作。所以在这个1d的情况下,你应该这样做:

  integer,dimension(8):: global!只有root有这个
整数,维(2):: local!每个人都有这个
整数,参数:: root = 0
integer :: rank,comsize
integer :: i,ierr

call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD,comsize,ierr)
call MPI_Comm_rank(MPI_COMM_WORLD,rank,ierr)

if(rank == root)then
global = [(i, i = 1,8)]
endif

call MPI_Scatter(全局,2,MPI_INTEGER,&!每个人从全局
local,2,MPI_INTEGER,& !每个proc收到2到
root,&!发送进程是root,
MPI_COMM_WORLD,ierr!)! COMM_WORLD中的所有过程参与

在此之后,处理器的数据看起来像

 任务0:本地:[1,2]全局:[1,2,3,4,5,6,7,8] 
任务1:本地:[3,4]全局:[垃圾]
任务2:本地:[5,6]全局:[垃圾]
任务3:本地:[7,8]全局:[垃圾]

也就是说,分散操作采用全局数组并发送连续的2-int块到所有的处理器。

为了重新组装数组,我们使用MPI_Gather()操作,它的工作方式完全相同,但相反:

  local = local + rank 

call MPI_Gather(local,2,MPI_INTEGER,&!everyone everyone from 2 $ ints b $ b global,2,MPI_INTEGER,&!root每个进程接收2个整数到全局
root,&!接收进程是root,
MPI_COMM_WORLD,ierr)! COMM_WORLD中的所有过程参与

现在数组看起来像:

 任务0:本地:[1,2]全局:[1,2,4,5,7,8,10,11] 
任务1 :本地:[4,5]全局:[垃圾 - ]
任务2:本地:[7,8]全局:[垃圾 - ]
任务3:本地:[10,11]全局:

聚集带回所有数据。



如果数据点的数量没有均匀分配进程数量,我们需要向每个进程发送不同数量的项目会发生什么?那么你需要一个通用的分散版本, MPI_Scatterv ,它可以让您指定每个处理器的计数,以及位移 - 全局数组中的数据开始处的位置。所以,让我们假设同样的4个任务有9个字符的字符数组[a,b,c,d,e,f,g,h,i],并且您将为每个进程分配除最后一个字符外的两个字符,那有三个。然后,您需要

 字符,维度(9)::全局
字符,维度(3)::本地
整数,维度(4)::计数
整数,维度(4)::显示

if(rank == root)然后
global = [ (achar(i + ichar('a')),i = 0,8)]
endif
local = [' - ',' - ',' - ']

counts = [2,2,2,3]
displs = [0,2,4,6]

mycounts =计数(等级+1)

call MPI_Scatterv(全局,计数,显示,&!proc我得到计数​​(i)字符从显示(i)
MPI_CHARACTER,&
本地,mycounts,MPI_CHARACTER,&!我得到mycounts字符转换为
root,&!根级别发送
MPI_COMM_WORLD,ierr)! COMM_WORLD中的所有过程参与

现在数据看起来像

 任务0:本地:ab-全局:abcdefghi
任务1:本地:cd-全局:*垃圾*
任务2:local:ef-global:* garbage *
任务3:local:ghiglobal:* garbage *

您现在已经使用scatterv来分配不规则的数据量。每种情况下的位移是从数组的起始位置开始的两个*等级(以字符测量的;位移以散列发送或收集的类型为单位,通常不以字节或其他方式为单位),而计数是[2,2,2,3]。如果它是我们想要有3个字符的第一个处理器,那么我们可以设置计数= [3,2,2,2],位移应该是[0,3,5,7]。 Gatherv再次工作完全相同,但相反;计数和显示数组将保持不变。



现在,对于2D来说,这有点棘手。如果我们想发送2d数组的2d子块,我们现在发送的数据不再是连续的。如果我们发送(比如说)一个6x6阵列的3x3子区块到4个处理器,我们发送的数据就有了漏洞:

  2D阵列

---------
| 000 | 222 |
| 000 | 222 |
| 000 | 222 |
| --- + --- |
| 111 | 333 |
| 111 | 333 |
| 111 | 333 |
---------

内存中的实际布局

[000111000111000111222333222333222333]


如果我们想要将标记为1的数据发送给任务1,则需要跳过三个值,发送三个值,跳过三个值,发送三个值,跳过三个值,发送三个值。第二个复杂情况是分区域停止和开始的地方;注意区域1在区域0停止的位置不开始;在区域0的最后一个元素之后,内存中的下一个位置是通过区域1的中途。



让我们首先解决第一个布局问题 - 如何只提取我们想发送的数据。我们总是可以将所有0区域数据复制到另一个连续的数组中,并发送该数据;如果我们足够小心地进行计划,我们甚至可以这样做,以便我们可以对结果调用MPI_Scatter。但我们宁愿不必以这种方式转换整个主数据结构。



到目前为止,我们使用的所有MPI数据类型都是简单的 - MPI_INTEGER指定(比如说)连续4个字节。但是,MPI允许您创建自己的数据类型,用于描述内存中任意复杂的数据布局。而这种情况 - 数组的矩形子区域 - 已经足够普遍,以至于有一个具体的要求。对于上面描述的二维情况,

  integer :: newtype; 
整数,维(2)::大小,分类大小,开始

大小= [6,6]!全局数组的大小
subsizes = [3,3]!子区域的大小
starts = [0,0]!假设我们正在查看区域0

$ b $ MPI_Type_create_subarray(2,size,subsizes,starting,MPI_ORDER_FORTRAN,MPI_INTEGER,newtype,ierr)
call MPI_Type_commit(newtype,ierr)

这会创建一个从全局数组中只挑选出区域0的类型。请注意,即使在Fortran中,起始参数也是从数组的开始(而不是索引(例如,基于1))以偏移量(例如,基于0的形式)给出的。



我们现在可以将这部分数据发送给另一个处理器。

 调用MPI_Send(全局,1,newtype,dest,标记,MPI_COMM_WORLD,ierr)!发送区域0

并且接收进程可以将它接收到本地数组中。请注意,接收过程(如果仅将其接收到3x3数组中)无法描述它作为新类型的接收方式;不再描述内存布局,因为在一行的结尾和下一行的开头之间没有大的跳跃。相反,它只是接收一个3 * 3 = 9整数块:

$ p $ call $ MPI_Recv(local,3 * 3,MPI_INTEGER, 0,tag,MPI_COMM_WORLD,ierr)

请注意,我们也可以为其他子区域执行此操作,或者为其他块创建一个不同的类型(使用不同的起始数组),或者从特定块的第一个位置开始发送:

  if(rank == root)then 
call MPI_Send(global(4,1​​),1,newtype,1,tag,MPI_COMM_WORLD,ierr)
call MPI_Send(global ,4),1,newtype,2,tag,MPI_COMM_WORLD,ierr)
call MPI_Send(global(4,4),1,newtype,3,tag,MPI_COMM_WORLD,ierr)
local = global 1:3,1:3)
else
call MPI_Recv(local,3 * 3,MPI_INTEGER,0,tag,MPI_COMM_WORLD,rstatus,ierr)
endif

现在我们已经知道如何指定子区域,在使用分散/聚集操作之前还有一件事要讨论,这就是这些类型的大小。我们不能仅仅使用这些类型的MPI_Scatter()(或者甚至是scatterv),因为这些类型有15个整数的范围;也就是说,在它们开始后它们结束的位置是15个整数,并且它们结束的位置与下一个块的开始位置不匹配,所以我们不能只使用分散 - 它会选择错误的地方开始发送数据到下一个处理器。

当然,我们可以使用MPI_Scatterv()并自己指定位移,这就是我们要做的 - 除了位移是以发送类型的大小,这也没有帮助我们;块从全局数组起始处的(0,3,18,21)整数偏移处开始,并且块从其开始位置结束15个整数的事实不允许我们以整数倍数表示这些位移。

为了解决这个问题,MPI允许您为这些计算的目的设置类型的范围。它不截断类型;它只是用于确定下一个元素开始给出最后一个元素的位置。对于像这些洞中有洞的类型,将范围设置为比内存中的距离小到实际类型的结尾通常比较方便。



我们可以将程度设置为对我们来说很方便的任何事情。我们可以制作1个整数的范围,然后以整数为单位设置位移。不过,在这种情况下,我喜欢将范围设置为3个整数 - 子列的大小 - 这样,块1在块0之后立即开始,并且块3在块 2\" 。不幸的是,当从块2跳到块3时,它不会很好地工作,但这是无法帮助的。

因此,要分散在这种情况下,我们会执行以下操作:

 整数(kind = MPI_ADDRESS_KIND):: extent 

starts = [0,0]
尺寸= [6,6]
子尺寸= [3,3]

调用MPI_Type_create_subarray(2,尺寸,子尺寸,开始,&
MPI_ORDER_FORTRAN,MPI_INTEGER,&
newtype,ierr)
调用MPI_Type_size(MPI_INTEGER,intsize,ierr)
extent = 3 * intsize
call MPI_Type_create_resized
call MPI_Type_commit(resizedtype,ierr)

这里我们已经创建了与之前相同的块类型,但是我们已经调整了它的大小;我们没有改变类型开始(0)的位置,但我们改变了结束(3个整数)的位置。我们之前没有提及过,但是 MPI_Type_commit 需要能够使用该类型;但你只需要提交实际使用的最终类型,而不是任何中间步骤。当你完成后,你可以使用 MPI_Type_free 来释放提交的类型。

现在,最后,我们可以散布块:上面的数据操作有点复杂,但一旦完成,scatterv看起来就像之前: 1!我们会将这些新类型之一发送给所有人
displs = [0,1,6,7]!每个人的数据起点
!在全局数组中,在块范围

中调用MPI_Scatterv(全局,计数,显示,&!proc我从显示(i)获得计数(i)类型
resizedtype,& $ (根,MPI_COMM_WORLD)

$ p
$ b现在我们完成了散点图,聚集点和MPI派生类型的一些小游览。



下面是一个示例代码,其中显示了收集和分散操作以及字符数组。运行程序:

  $ mpirun -np 4 ./scatter2d 
全局数组是:
000222
000222
000222
111333
111333
111333
收到的等级0:
000
000
000
等级1收到:
111
111
111
收到的等级2:
222
222
222
已收到等级3 :
333
333
333
排名0发送:
111
111
111
排名1发送:
222
222
222
等级2发送:
333
333
333
等级3发送:
444
444
444
收到的根源:
111333
111333
111333
222444
222444
222444

,代码如下:

  program scatter 
use mpi
implicit none

integer,parameter :: gridsize = 6!数组的大小
整数,参数:: procgridsize = 2!尺寸(:,:) ::全局,本地
整数,维(procgridsize ** 2)::计数,显示
整数,参数:: root = 0
integer :: rank,comsize
integer :: localsize
integer :: i,j,row,col,ierr,p,charsize
integer,dimension(2 ):: sizes,subsizes,starts

integer :: newtype,resizedtype
integer,parameter :: tag = 1
整数,维度(MPI_STATUS_SIZE):: rstatus
integer(kind = MPI_ADDRESS_KIND):: extent,begin

call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD,comsize,ierr)
call MPI_Comm_rank(MPI_COMM_WORLD,rank,ierr )

if(comsize / = procgridsize ** 2)then
if(rank == root)then
print *,'只适用于np =',procgridsize ** 2,'for now。'
endif
call MPI_Finalize(ierr)
stop
endif

localsize = gridsize / procgridsize
allocate(local(localsize,localsize))
if(rank == root)then
allocate(global(gridsize,gridsize))
forall(col = 1:procgridsize,row = 1:procgridsize)
global((row-1)* localsize + 1:row * localsize,&
(col-1)* localsize + 1:col * localsize)=&
achar(ichar('0')+(row-1)+(col-1)* procgridsize)
end fbull
$ b'全局数组是:'
do i = 1,gridsize
print *,global(i,:)
enddo
endif
starts = [0,0]
sizes = [gridsize,gridsize]
subsizes = [localsize,localsize]

call MPI_Type_create_subarray(2,sizes,subsizes,starts,&
MPI_ORDER_FORTRAN,MPI_CHARACTER,&
newtype,ierr)
调用MPI_Type_size(MPI_CHARACTER,charsize,ierr)
extent = localsize * charsize
begin = 0
调用MPI_Type_create_resized(newtype,begin,extent,resizedtype,ierr )
call MPI_Type_commit(resizedtype,ierr)

counts = 1!我们会将这些新类型之一发送给所有人
forall(col = 1:procgridsize,row = 1:procgridsize)
displs(1+(row-1)+ procgridsize *(col-1)) =(row-1)+ localsize * procgridsize *(col-1)
endforall

call MPI_Scatterv(global,counts,displs,&!proc) (i)
resizedtype,&
local,localsize ** 2,MPI_CHARACTER,&!我收到localsize ** 2个字符
root,MPI_COMM_WORLD,ierr)!.. 。from(root,MPI_COMM_WORLD)

do p = 1,comsize
if(rank == p-1)then
print *,'Rank',rank,'received :'
do i = 1,localsize
print *,local(i,:)
enddo
endif
call MPI_Barrier(MPI_COMM_WORLD,ierr)
enddo

local = achar(ichar(local)+ 1)

do p = 1,comsize
if(rank == p-1)then
print *,'Rank',rank,'sending:'
do i = 1,localsize
print *,local(i,:)
enddo
endif
call MPI_Barrier(MPI_COMM_WORLD,ierr)
enddo

call MPI_Gatherv(local,localsize ** 2,MPI_CHARACTER,& !我发送本地化** 2个字符
全局,计数,显示,重新调整类型,&
root,MPI_COMM_WORLD,ierr)

if(rank == root)然后
print *,'Root received:'
do i = 1,gridsize
print *,global(i,:)
enddo
endif

call MPI_Type_free(newtype,ierr)
if(rank == root)deallocate(global )
取消分配(本地)
调用MPI_Finalize(ierr)

结束程序分散

这就是一般的解决方案。对于您的特定情况,我们只需按行添加,我们不需要Gatherv,我们可以使用聚集,因为在这种情况下,所有的位移都是相同的 - 在2d块的情况下,我们有一个位移正在'下移',然后随着你穿过'跨越'到下一列的块而跳跃。在这里,位移总是比前一个位移一个程度,所以我们不需要明确地给出位移。因此,最终的代码如下所示:

 程序testmpi 
使用mpi
隐式无
整数,dimension(:, :),allocatable :: send,recv
整数,参数:: nsendrows = 2,nsendcols = 3
整数,参数:: root = 0
整数:: ierror ,my_rank,comsize,i,j,ierr
integer :: blocktype,resizedtype
integer,dimension(2):: starts,sizes,存储子
整数(kind = MPI_Address_kind):: start ,范围
integer :: intsize

call MPI_Init(ierror)
call MPI_Comm_rank(MPI_COMM_WORLD,my_rank,ierror)
call MPI_Comm_size(MPI_COMM_WORLD,comsize,ierror)

分配(发送(nsendrows,nsendcols))

发送= my_rank

if(my_rank == root)然后
!我们将追加本地数组
!作为发送行的组
allocate(recv(nsendrows * comsize,nsendcols))
endif

!描述这些子块在完整连接数组里面的样子
size = [nsendrows * comsize,nsendcols]
subsizes = [nsendrows,nsendcols]
starts = [0,0]

调用MPI_Type_create_subarray(2,size,subsizes,starts,&
MPI_ORDER_FORTRAN,MPI_INTEGER,&
blocktype,ierr)

start = 0
调用MPI_Type_size(MPI_INTEGER,intsize,ierr)
extent = intsize * nsendrows

调用MPI_Type_create_resized(blocktype,start,extent,resizedtype,ierr)
调用MPI_Type_commit(resizedtype,ierr)

call MPI_Gather(发送,nsendrows * nsendcols,MPI_INTEGER,&!每个人发送3 * 2整数
recv,1,resizedtype,&!root从所有人获取1个调整大小的类型
root,MPI_COMM_WORLD,ierr)

if(my_rank == 0)then
print *,'<><><><> recv'
do i = 1,nsendrows * comsize
print *,(recv(i,j ),j = 1,nsendcols)
enddo
endif
call MPI_Finalize(ierror)

end program testmpi

使用3个过程运行此命令:

  $ mpirun -np 3 ./testmpi 
<><><> recv
0 0 0
0 0 0
1 1 1
1 1 1
2 2 2
2 2 2


I want to send 2d chunks of data using MPI_GATHER.For example I have 2x3 arrays on each node and I want 8x3 array on root, if I have 4 nodes. for 1d arrays MPI_GATHER sort data according MPI ranks but for 2d data it create mess!. What is the clean way to put chunks in order?

I expected the output of this code:

program testmpi
    use mpi
implicit none
integer :: send (2,3)
integer :: rec (4,3)
integer :: ierror,my_rank,i,j
call MPI_Init(ierror)
MPI_DATA_TYPE type_col
! find out process rank
call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierror)
if (my_rank==0) then
send=1
do i=1,2
    print*,(send(i,j),j=1,3)
enddo
endif
if (my_rank==1) then
send=5
! do 1,2
!   print*,(send(i,j),j=1,3)
! enddo
endif
call MPI_GATHER(send,6,MPI_INTEGER,rec,6,MPI_INTEGER,0,MPI_COMM_WORLD,ierror)
if (my_rank==0) then
    print*,'<><><><><>rec'
do i=1,4
    print*,(rec(i,j),j=1,3)
enddo
endif
call MPI_Finalize(ierror)
end program testmpi

to be something like this :

   1           1           1
   1           1           1
   5           5           5
   5           5           5

but it looks like this:

   1           1           5
   1           1           5
   1           5           5
   1           5           5

解决方案

The following a literal Fortran translation of this answer. I had thought this was unnecessary, but the multiple differences in array indexing and memory layout might mean that it is worth doing a Fortran version.

Let me start by saying that you generally don't really want to do this - scatter and gather huge chunks of data from some "master" process. Normally you want each task to be chugging away at its own piece of the puzzle, and you should aim to never have one processor need a "global view" of the whole data; as soon as you require that, you limit scalability and the problem size. If you're doing this for I/O - one process reads the data, then scatters it, then gathers it back for writing, you'll want eventually to look into MPI-IO.

Getting to your question, though, MPI has very nice ways of pulling arbitrary data out of memory, and scatter/gathering it to and from a set of processors. Unfortunately that requires a fair number of MPI concepts - MPI Types, extents, and collective operations. A lot of the basic ideas are discussed in the answer to this question -- MPI_Type_create_subarray and MPI_Gather .

Consider a 1d integer global array that task 0 has that you want to distribute to a number of MPI tasks, so that they each get a piece in their local array. Say you have 4 tasks, and the global array is [0,1,2,3,4,5,6,7]. You could have task 0 send four messages (including one to itself) to distribute this, and when it's time to re-assemble, receive four messages to bundle it back together; but that obviously gets very time consuming at large numbers of processes. There are optimized routines for these sorts of operations - scatter/gather operations. So in this 1d case you'd do something like this:

integer, dimension(8) :: global      ! only root has this
integer, dimension(2) :: local       ! everyone has this
integer, parameter    :: root = 0
integer :: rank, comsize
integer :: i, ierr

call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

if (rank == root) then
    global = [ (i, i=1,8) ]
endif

call MPI_Scatter(global, 2, MPI_INTEGER, &    ! send everyone 2 ints from global
                 local,  2, MPI_INTEGER, &    ! each proc recieves 2 into
                 root,                   &    ! sending process is root,
                 MPI_COMM_WORLD, ierr)        ! all procs in COMM_WORLD participate

After this, the processors' data would look like

task 0:  local:[1,2]  global: [1,2,3,4,5,6,7,8]
task 1:  local:[3,4]  global: [garbage]
task 2:  local:[5,6]  global: [garbage]
task 3:  local:[7,8]  global: [garbage]

That is, the scatter operation takes the global array and sends contiguous 2-int chunks to all the processors.

To re-assemble the array, we use the MPI_Gather() operation, which works exactly the same but in reverse:

local = local + rank

call MPI_Gather (local,  2, MPI_INTEGER, &    ! everyone sends 2 ints from local
                 global, 2, MPI_INTEGER, &    ! root receives 2 ints each proc into global
                 root,                   &    ! receiving process is root,
                 MPI_COMM_WORLD, ierr)        ! all procs in COMM_WORLD participate

And now the arrays look like:

task 0:  local:[1,2]    global: [1,2,4,5,7,8,10,11]
task 1:  local:[4,5]    global: [garbage-]
task 2:  local:[7,8]    global: [garbage-]
task 3:  local:[10,11]  global: [garbage-]

Gather brings all the data back.

What happens if the number of data points doesn't evenly divide the number of processes, and we need to send different numbers of items to each process? Then you need a generalized version of scatter, MPI_Scatterv, which lets you specify the counts for each processor, and displacements -- where in the global array that piece of data starts. So let's say with the same 4 tasks you had an array of characters [a,b,c,d,e,f,g,h,i] with 9 characters, and you were going to assign every process two characters except the last, that got three. Then you'd need

character, dimension(9) :: global
character, dimension(3) :: local
integer, dimension(4)   :: counts
integer, dimension(4)   :: displs

if (rank == root) then
    global = [ (achar(i+ichar('a')), i=0,8) ]
endif
local = ['-','-','-']

counts = [2,2,2,3]
displs = [0,2,4,6]

mycounts = counts(rank+1)

call MPI_Scatterv(global, counts, displs,         & ! proc i gets counts(i) chars from displs(i)
                  MPI_CHARACTER,                  &
                  local, mycounts, MPI_CHARACTER, & ! I get mycounts chars into
                  root,                           & ! root rank does sending
                  MPI_COMM_WORLD, ierr)             ! all procs in COMM_WORLD participate

Now the data looks like

task 0:  local:"ab-"  global: "abcdefghi"
task 1:  local:"cd-"  global: *garbage*
task 2:  local:"ef-"  global: *garbage*
task 3:  local:"ghi"  global: *garbage*

You've now used scatterv to distribute the irregular amounts of data. The displacement in each case is two*rank (measured in characters; the displacement is in unit of the types being sent for a scatter or received for a gather; it's not generally in bytes or something) from the start of the array, and the counts are [2,2,2,3]. If it had been the first processor we wanted to have 3 characters, we would have set counts=[3,2,2,2] and displacements would have been [0,3,5,7]. Gatherv again works exactly the same but reverse; the counts and displs arrays would remain the same.

Now, for 2D, this is a bit trickier. If we want to send 2d sublocks of a 2d array, the data we're sending now no longer is contiguous. If we're sending (say) 3x3 subblocks of a 6x6 array to 4 processors, the data we're sending has holes in it:

2D Array

   ---------
   |000|222|
   |000|222|
   |000|222|
   |---+---|
   |111|333|
   |111|333|
   |111|333|
   ---------

Actual layout in memory

   [000111000111000111222333222333222333]

(Note that all high-performance computing comes down to understanding the layout of data in memory.)

If we want to send the data that is marked "1" to task 1, we need to skip three values, send three values, skip three values, send three values, skip three values, send three values. A second complication is where the subregions stop and start; note that region "1" doesn't start where region "0" stops; after the last element of region "0", the next location in memory is partway-way through region "1".

Let's tackle the first layout problem first - how to pull out just the data we want to send. We could always just copy out all the "0" region data to another, contiguous array, and send that; if we planned it out carefully enough, we could even do that in such a way that we could call MPI_Scatter on the results. But we'd rather not have to transpose our entire main data structure that way.

So far, all the MPI data types we've used are simple ones - MPI_INTEGER specifies (say) 4 bytes in a row. However, MPI lets you create your own data types that describe arbitrarily complex data layouts in memory. And this case -- rectangular subregions of an array -- is common enough that there's a specific call for that. For the 2-dimensional case we're describing above,

integer :: newtype;
integer, dimension(2) :: sizes, subsizes, starts

sizes    = [6,6]     ! size of global array
subsizes = [3,3]     ! size of sub-region 
starts   = [0,0]     ! let's say we're looking at region "0"
                     ! which begins at offset [0,0] 

call MPI_Type_create_subarray(2, sizes, subsizes, starts, MPI_ORDER_FORTRAN, MPI_INTEGER, newtype, ierr)
call MPI_Type_commit(newtype, ierr)

This creates a type which picks out just the region "0" from the global array. Note that even in Fortran, the start parameter is given as an offset (eg, 0-based) from the start of the array, not an index (eg, 1-based).

We could send just that piece of data now to another processor

call MPI_Send(global, 1, newtype, dest, tag, MPI_COMM_WORLD, ierr)  ! send region "0"

and the receiving process could receive it into a local array. Note that the receiving process, if it's only receiving it into a 3x3 array, can not describe what it's receiving as a type of newtype; that no longer describes the memory layout, because there aren't big skips between the end of one row and the start of the next. Instead, it's just receiving a block of 3*3 = 9 integers:

call MPI_Recv(local, 3*3, MPI_INTEGER, 0, tag, MPI_COMM_WORLD, ierr)

Note that we could do this for other sub-regions, too, either by creating a different type (with different start array) for the other blocks, or just by sending starting from the first location of the particular block:

if (rank == root) then
    call MPI_Send(global(4,1), 1, newtype, 1, tag, MPI_COMM_WORLD, ierr)
    call MPI_Send(global(1,4), 1, newtype, 2, tag, MPI_COMM_WORLD, ierr)
    call MPI_Send(global(4,4), 1, newtype, 3, tag, MPI_COMM_WORLD, ierr)
    local = global(1:3, 1:3)
else
    call MPI_Recv(local, 3*3, MPI_INTEGER, 0, tag, MPI_COMM_WORLD, rstatus, ierr)
endif

Now that we understand how to specify subregions, there's only one more thing to discuss before using scatter/gather operations, and that's the "size" of these types. We couldn't just use MPI_Scatter() (or even scatterv) with these types yet, because these types have an extent of 15 integers; that is, where they end is 15 integers after they start -- and where they end doesn't line up nicely with where the next block begins, so we can't just use scatter - it would pick the wrong place to start sending data to the next processor.

Of course, we could use MPI_Scatterv() and specify the displacements ourselves, and that's what we'll do - except the displacements are in units of the send-type size, and that doesn't help us either; the blocks start at offsets of (0,3,18,21) integers from the start of the global array, and the fact that a block ends 15 integers from where it starts doesn't let us express those displacements in integer multiples at all.

To deal with this, MPI lets you set the extent of the type for the purposes of these calculations. It doesn't truncate the type; it's just used for figuring out where the next element starts given the last element. For types like these with holes in them, it's frequently handy to set the extent to be something smaller than the distance in memory to the actual end of the type.

We can set the extent to be anything that's convenient to us. We could just make the extent 1 integer, and then set the displacements in units of integers. In this case, though, I like to set the extent to be 3 integers - the size of a sub-column - that way, block "1" starts immediately after block "0", and block "3" starts immediately after block "2". Unfortunately, it doesn't quite work as nicely when jumping from block "2" to block "3", but that can't be helped.

So to scatter the subblocks in this case, we'd do the following:

integer(kind=MPI_ADDRESS_KIND) :: extent

starts   = [0,0]
sizes    = [6, 6]
subsizes = [3, 3]

call MPI_Type_create_subarray(2, sizes, subsizes, starts,        &
                              MPI_ORDER_FORTRAN, MPI_INTEGER,  &
                              newtype, ierr)
call MPI_Type_size(MPI_INTEGER, intsize, ierr)
extent = 3*intsize
call MPI_Type_create_resized(newtype, 0, extent, resizedtype, ierr)
call MPI_Type_commit(resizedtype, ierr)

Here we've created the same block type as before, but we've resized it; we haven't changed where the type "starts" (the 0) but we've changed where it "ends" (3 integers). We didn't mention this before, but the MPI_Type_commit is required to be able to use the type; but you only need to commit the final type you actually use, not any intermediate steps. You use MPI_Type_free to free the committed type when you're done.

So now, finally, we can scatterv the blocks: the data manipulations above are a little complicated, but once it's done, the scatterv looks just like before:

counts = 1          ! we will send one of these new types to everyone
displs = [0,1,6,7]  ! the starting point of everyone's data
                    ! in the global array, in block extents

call MPI_Scatterv(global, counts, displs, & ! proc i gets counts(i) types from displs(i) 
        resizedtype,                      &
        local, 3*3, MPI_INTEGER,          & ! I'm receiving 3*3 int
        root, MPI_COMM_WORLD, ierr)         !... from (root, MPI_COMM_WORLD)

And now we're done, after a little tour of scatter, gather, and MPI derived types.

An example code which shows both the gather and the scatter operation, with character arrays, follows. Running the program:

$ mpirun -np 4 ./scatter2d
 global array is:
 000222
 000222
 000222
 111333
 111333
 111333
 Rank            0  received:
 000
 000
 000
 Rank            1  received:
 111
 111
 111
 Rank            2  received:
 222
 222
 222
 Rank            3  received:
 333
 333
 333
 Rank            0  sending:
 111
 111
 111
 Rank            1  sending:
 222
 222
 222
 Rank            2  sending:
 333
 333
 333
 Rank            3  sending:
 444
 444
 444
  Root received:
 111333
 111333
 111333
 222444
 222444
 222444

and the code follows:

program scatter
    use mpi
    implicit none

    integer, parameter :: gridsize = 6    ! size of array
    integer, parameter :: procgridsize = 2 ! size of process grid
    character, allocatable, dimension (:,:) :: global, local
    integer, dimension(procgridsize**2)   :: counts, displs
    integer, parameter    :: root = 0
    integer :: rank, comsize
    integer :: localsize
    integer :: i, j, row, col, ierr, p, charsize
    integer, dimension(2) :: sizes, subsizes, starts

    integer :: newtype, resizedtype
    integer, parameter :: tag = 1
    integer, dimension(MPI_STATUS_SIZE) :: rstatus
    integer(kind=MPI_ADDRESS_KIND) :: extent, begin

    call MPI_Init(ierr)
    call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierr)
    call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

    if (comsize /= procgridsize**2) then
        if (rank == root) then
            print *, 'Only works with np = ', procgridsize**2, ' for now.'
        endif
        call MPI_Finalize(ierr)
        stop
    endif

    localsize = gridsize/procgridsize
    allocate( local(localsize, localsize) )
    if (rank == root) then
        allocate( global(gridsize, gridsize) )
        forall( col=1:procgridsize, row=1:procgridsize )
            global((row-1)*localsize+1:row*localsize, &
                   (col-1)*localsize+1:col*localsize) = &
                    achar(ichar('0')+(row-1)+(col-1)*procgridsize)
        end forall

        print *, 'global array is: '
        do i=1,gridsize
            print *, global(i,:)
        enddo
    endif
    starts   = [0,0]
    sizes    = [gridsize, gridsize]
    subsizes = [localsize, localsize]

    call MPI_Type_create_subarray(2, sizes, subsizes, starts,        &
                                  MPI_ORDER_FORTRAN, MPI_CHARACTER,  &
                                  newtype, ierr)
    call MPI_Type_size(MPI_CHARACTER, charsize, ierr)
    extent = localsize*charsize
    begin  = 0
    call MPI_Type_create_resized(newtype, begin, extent, resizedtype, ierr)
    call MPI_Type_commit(resizedtype, ierr)

    counts = 1          ! we will send one of these new types to everyone
    forall( col=1:procgridsize, row=1:procgridsize )
       displs(1+(row-1)+procgridsize*(col-1)) = (row-1) + localsize*procgridsize*(col-1)
    endforall

    call MPI_Scatterv(global, counts, displs,   & ! proc i gets counts(i) types from displs(i)
            resizedtype,                        &
            local, localsize**2, MPI_CHARACTER, & ! I'm receiving localsize**2 chars
            root, MPI_COMM_WORLD, ierr)           !... from (root, MPI_COMM_WORLD)

    do p=1, comsize
        if (rank == p-1) then
            print *, 'Rank ', rank, ' received: '
            do i=1, localsize
                print *, local(i,:)
            enddo
        endif
        call MPI_Barrier(MPI_COMM_WORLD, ierr)
    enddo

    local = achar( ichar(local) + 1 )

    do p=1, comsize
        if (rank == p-1) then
            print *, 'Rank ', rank, ' sending: '
            do i=1, localsize
                print *, local(i,:)
            enddo
        endif
        call MPI_Barrier(MPI_COMM_WORLD, ierr)
    enddo

    call MPI_Gatherv( local, localsize**2, MPI_CHARACTER, & ! I'm sending localsize**2 chars
                      global, counts, displs, resizedtype,&
                      root, MPI_COMM_WORLD, ierr)

    if (rank == root) then
        print *, ' Root received: '
        do i=1,gridsize
            print *, global(i,:)
        enddo
    endif

    call MPI_Type_free(newtype,ierr)
    if (rank == root) deallocate(global)
    deallocate(local)
    call MPI_Finalize(ierr)

end program scatter

So that's the general solution. For your particular case, where we are just appending by rows, we don't need a Gatherv, we can just use a gather, because in this case, all of the displacements are the same -- before, in the 2d block case we had one displacement going 'down', and then jumps in that displacement as you went 'across' to the next column of blocks. Here, the displacement is always one extent from the previous one, so we don't need to give displacements explicitly. So a final code looks like:

program testmpi
use mpi
    implicit none
    integer, dimension(:,:), allocatable :: send, recv
    integer, parameter :: nsendrows = 2, nsendcols = 3
    integer, parameter :: root = 0
    integer :: ierror, my_rank, comsize, i, j, ierr
    integer :: blocktype, resizedtype
    integer, dimension(2) :: starts, sizes, subsizes
    integer (kind=MPI_Address_kind) :: start, extent
    integer :: intsize

    call MPI_Init(ierror)
    call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierror)
    call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierror)

    allocate( send(nsendrows, nsendcols) )

    send = my_rank

    if (my_rank==root) then
        ! we're going to append the local arrays
        ! as groups of send rows
        allocate( recv(nsendrows*comsize, nsendcols) )
    endif

    ! describe what these subblocks look like inside the full concatenated array
    sizes    = [ nsendrows*comsize, nsendcols ]
    subsizes = [ nsendrows, nsendcols ]
    starts   = [ 0, 0 ]

    call MPI_Type_create_subarray( 2, sizes, subsizes, starts,     &
                                   MPI_ORDER_FORTRAN, MPI_INTEGER, &
                                   blocktype, ierr)

    start = 0
    call MPI_Type_size(MPI_INTEGER, intsize, ierr)
    extent = intsize * nsendrows

    call MPI_Type_create_resized(blocktype, start, extent, resizedtype, ierr)
    call MPI_Type_commit(resizedtype, ierr)

    call MPI_Gather( send, nsendrows*nsendcols, MPI_INTEGER, &  ! everyone send 3*2 ints
                     recv, 1, resizedtype,                   &  ! root gets 1 resized type from everyone
                     root, MPI_COMM_WORLD, ierr)

    if (my_rank==0) then
    print*,'<><><><><>recv'
    do i=1,nsendrows*comsize
        print*,(recv(i,j),j=1,nsendcols)
    enddo
    endif
    call MPI_Finalize(ierror)

end program testmpi

Running this with 3 processes gives:

$ mpirun -np 3 ./testmpi
 <><><><><>recv
           0           0           0
           0           0           0
           1           1           1
           1           1           1
           2           2           2
           2           2           2

这篇关于使用MPI_Gather在Fortran中发送2D数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆