C-MPI-将子阵列发送/接收到阵列 [英] C - MPI - Send/Receive Subarrays to Array

查看:90
本文介绍了C-MPI-将子阵列发送/接收到阵列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以...我的问题很简单.

So... My question is simple.

假设我们有一个主MPI进程,其master_array为6 * 6单元:

Let's assume we have a master MPI process with a master_array of 6*6 cells:

  Master
-----------
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0

我们有4个工人MPI流程,带有3 * 3个单元的worker_array.

And that we have 4 worker MPI processes with worker_array of 3*3 cells.

Worker 1 | Worker 2 | Worker 3 | Worker 4 |
-------  | -------  | -------  | -------  |
 1 1 1   |  2 2 2   |  3 3 3   | 4 4 4    |
 1 1 1   |  2 2 2   |  3 3 3   | 4 4 4    |
 1 1 1   |  2 2 2   |  3 3 3   | 4 4 4    |

现在,我想像这样将工作程序数组发送到主数组:

Now, I want to send the worker arrays to the master array like this:

  Master
-----------
1 1 1 2 2 2
1 1 1 2 2 2
1 1 1 2 2 2
3 3 3 4 4 4 
3 3 3 4 4 4 
3 3 3 4 4 4 

我如何使用某种MPI发送/接收,MPI_datatypes或MPI_vectors或MPI_subarrays或MPI_whatever-does-the-trick来结束这种情况?

How do I end up with this using some-kind-of-MPI-send/receive, MPI_datatypes or MPI_vectors or MPI_subarrays or MPI_whatever-does-the-trick?

我希望你明白我的意思.

I hope you get my point.

我们将不胜感激具有详细而有效的代码的答案.

Answers with detailed and working code will be deeply appreciated.

推荐答案

这是一个同时使用点对点和集合的工作代码(集合版本在下面被注释掉,但是可以正常使用).您需要定义一个向量类型,以与主站接收端的非连续数据相对应.要使用集体收集,您需要弄乱此向量的大小,以确保收集将所有片段放置在正确的位置,并且需要使用collectv版本.

Here is a working code that uses both point-to-point and collectives (the collective version is commented out below but works OK). You need to define a vector type to correspond to the non-contiguous data at the receive side on the master. To use a collective gather, you need to mess about with the size of this vector to ensure gather puts all the pieces in the correct place and you need to use the gatherv version.

很容易弄乱数组的索引,因此,为了通用起见,我在6x12的矩阵上使用了2x3的进程数组,以使事情故意不成正方形.

It's easy to get the array indices messed up, so for generality I have used a 2x3 array of processes on a 6x12 matrix so that things are deliberately not square.

为缩进造成的歉意-我似乎遇到了制表符/空格问题,我真的应该在以后解决!

Apologies for the messed-up indentation - I seem to have tab/space issues that I really ought to sort out in the future!

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

#define M 6
#define N 12

#define MP 2
#define NP 3

#define MLOCAL (M/MP)
#define NLOCAL (N/NP)

#define TAG 0

int main(void)
{
  int master[M][N];
  int local[MLOCAL][NLOCAL];

  MPI_Comm comm = MPI_COMM_WORLD;
  int rank, size, src;
  int i, j;
  int istart, jstart;
  int displs[MP*NP], counts[MP*NP];

  MPI_Status status;
  MPI_Request request;
  MPI_Datatype block, blockresized;

  MPI_Init(NULL, NULL);

  MPI_Comm_size(comm, &size);
  MPI_Comm_rank(comm, &rank);

  if (size != MP*NP)
    {
      if (rank == 0) printf("Size %d not equal to MP*NP = %d\n", size, MP*NP);
      MPI_Finalize();
      return 1;
    }

  for (i=0; i < M; i++)
    {
      for (j=0; j < N; j++)
    {
      master[i][j] = rank;
    }
    }

  for (i=0; i < MLOCAL; i++)
    {
      for (j=0; j < NLOCAL; j++)
    {
      local[i][j] = rank+1;
    }
    }

  // Define vector type appropriate for subsections of master array

  MPI_Type_vector(MLOCAL, NLOCAL, N, MPI_INT, &block);
  MPI_Type_commit(&block);

  // Non-blocking send to avoid deadlock with rank 0 sending to itself

  MPI_Isend(local, MLOCAL*NLOCAL, MPI_INTEGER, 0, TAG, comm, &request);

  // Receive from all the workers

  if (rank == 0)
    {
      for (src=0; src < size; src++)
    {
      // Find out where this block should go

      istart = (src/NP) * MLOCAL;
      jstart = (src%NP) * NLOCAL;

      // receive a single block

      MPI_Recv(&master[istart][jstart], 1, block, src, TAG, comm, &status);
    }
    }

  // Wait for send to complete

  MPI_Wait(&request, &status);

  /* comment out collective

  // Using collectives -- currently commented out!

  MPI_Type_create_resized(block, 0, sizeof(int), &blockresized);
  MPI_Type_commit(&blockresized);

  // Work out displacements in master in counts of integers

  for (src=0; src < size; src++)
    {
      istart = (src/NP) * MLOCAL;
      jstart = (src%NP) * NLOCAL;

      displs[src] = istart*N + jstart;
      counts[src] = 1;
    }

  // Call collective

  MPI_Gatherv(local, MLOCAL*NLOCAL, MPI_INT,
          master, counts, displs, blockresized,
          0, comm);

  */

  // Print out

  if (rank == 0)
    {
      for (i=0; i < M; i++)
    {
      for (j=0; j < N; j++)
        {
          printf("%d ", master[i][j]);
        }
      printf("\n");
    }
    }

  MPI_Finalize();
}

在6个进程上似乎可以正常工作

It seems to work OK on 6 processes:

mpiexec -n 6 ./arraygather
1 1 1 1 2 2 2 2 3 3 3 3 
1 1 1 1 2 2 2 2 3 3 3 3 
1 1 1 1 2 2 2 2 3 3 3 3 
4 4 4 4 5 5 5 5 6 6 6 6 
4 4 4 4 5 5 5 5 6 6 6 6 
4 4 4 4 5 5 5 5 6 6 6 6 

这应该在矩阵完全分解到过程网格上的任何情况下起作用.如果这些进程的子矩阵大小不完全相同,将会变得更加复杂.

This should work in any situation where the matrix decomposes exactly onto the process grid. It'll be a bit more complicated if the processes do not all have exactly the same size of sub-matrix.

这篇关于C-MPI-将子阵列发送/接收到阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆