MPI进程之间的交换数据(晕轮) [英] Exchange Data Between MPI processes (halo)
问题描述
给定以下场景,我有N个MPI进程,每个进程都有一个对象。当通信阶段到来时,来自这些对象的数据通常小将被交换。
一般来说,任何两个节点之间都有数据交换。
最好的策略是什么:
- 在任何节点X中,为具有与此节点X的连接的每个其他节点创建两个缓冲区,然后在对等基础上发送/接收。
-
在每个节点X中,创建一个缓冲区以收集要传送的所有晕轮数据。
-
有没有其他策略我不知道?
对于最近邻类型光晕交换,通常最有效的实现之一是使用一组 MPI_Sendrecv
调用,通常每个维度有两个:
半步骤 - 正向传输数据每个秩从其左边的一个接收到左边的光晕并向右边的排名发送数据
+ - - + --------- + - + - + + - + - + --------- + - + - + + - + - + - - + - +
- > | R | | (i,j-1)| S | | - > | R | | (i,j)| S | | - > | R | | (i,j + 1)| S | | - >
+ - + - + --------- + - + - + + - + - + --------- + - + - + + - + - + --- ------ + - + - +
( S
指定正在传送的局部数据部分,
R
指定接收数据的光环,(i,j)
+ - + - + --------- + - + - + + - + - + - + - + - + - + + - + - + - + - + - +
< - | X | S | (i,j-1)| | R | < - | X | S | (i,j)| | R | < - | X | S | (i,j + 1)| | R | < -
+ - + - + --------- + - + - + + - + - + - + - + - + - + + - + + --------- + - + - +
( X
是上半个步骤中已经填充的晕圈区域的一部分)
大多数交换网络支持多个同步bi (全双工)通信,并且整个交换的等待时间是
上述半步骤的重复次数与域分解的维数一样多。
这个过程在标准的3.0版本中更加简化,它引入了所谓的邻居集体通信。可以使用对 MPI_Neighbor_alltoallw
的单个调用来执行整个多维光晕交换。
Given the following scenario, I have N MPI processes each with an object. when the communication stage comes, data "usually small" from these object will be exchanged. In general, there is data exchange between any two nodes.
What is the best strategy?:
- In any node X, create tow buffers for each other node with a connection with this node X. and then do send/receive on peer-to-peer basis.
in Each node X, create one buffer to gather all the halo data to be communicated. and then "bcast" that buffer.
Is there any other strategy I am not aware of?
For nearest neighbour style halo swaps, usually one of the most efficient implementations is to use a set of MPI_Sendrecv
calls, usually two per each dimension:
Half-step one - Transfer of data in positive direction: each rank receives from the one on its left and into its left halo and sends data to the rank on its right
+-+-+---------+-+-+ +-+-+---------+-+-+ +-+-+---------+-+-+
--> |R| | (i,j-1) |S| | --> |R| | (i,j) |S| | --> |R| | (i,j+1) |S| | -->
+-+-+---------+-+-+ +-+-+---------+-+-+ +-+-+---------+-+-+
(S
designates the part of the local data being communicated while R
designates the halo into which data is being received, (i,j)
are the coordinates of the rank in the process grid)
Half-step two - Transfer of data in negative direction: each rank receives from the one on its right and into its right halo and sends data to the rank on its left
+-+-+---------+-+-+ +-+-+---------+-+-+ +-+-+---------+-+-+
<-- |X|S| (i,j-1) | |R| <-- |X|S| (i,j) | |R| <-- |X|S| (i,j+1) | |R| <--
+-+-+---------+-+-+ +-+-+---------+-+-+ +-+-+---------+-+-+
(X
is that part of the halo region that has already been populated in the previous half-step)
Most switched networks support multiple simultaneous bi-directional (full duplex) communications and the latency of the whole exchange is
Both of the above half-steps are repeated as many times as is the dimensionality of the domain decomposition.
The process is even more simplified in version 3.0 of the standard, which introduces the so-called neighbourhood collective communications. The whole multidimensional halo swap can be performed using a single call to MPI_Neighbor_alltoallw
.
这篇关于MPI进程之间的交换数据(晕轮)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!