使用 MPI 在 C 中发送二维数组块 [英] sending blocks of 2D array in C using MPI

查看:49
本文介绍了使用 MPI 在 C 中发送二维数组块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将二维数组块发送到不同的处理器?假设二维数组大小为 400x400,我想将大小为 100X100 的块发送到不同的处理器.这个想法是每个处理器将在其单独的块上执行计算并将其结果发送回第一个处理器以获得最终结果.
我在 C 程序中使用 MPI.

How do you send blocks of 2-D array to different processors? Suppose the 2D array size is 400x400 an I want to send blocks of sizes 100X100 to different processors. The idea is that each processor will perform computation on its separate block and send its result back to the first processor for final result.
I am using MPI in C programs.

推荐答案

首先我要说的是,您通常并不真正想要这样做 - 从某个主"进程中分散和收集大量数据.通常,您希望每个任务都在解决自己的难题,并且您的目标应该是永远不要让一个处理器需要整个数据的全局视图";一旦需要,就会限制可扩展性和问题规模.如果您为 I/O 执行此操作 - 一个进程读取数据,然后将其分散,然后将其收集回来进行写入,您最终会希望研究 MPI-IO.

Let me start by saying that you generally don't really want to do this - scatter and gather huge chunks of data from some "master" process. Normally you want each task to be chugging away at its own piece of the puzzle, and you should aim to never have one processor need a "global view" of the whole data; as soon as you require that, you limit scalability and the problem size. If you're doing this for I/O - one process reads the data, then scatters it, then gathers it back for writing, you'll want eventually to look into MPI-IO.

不过,回答您的问题,MPI 有非常好的方法可以从内存中提取任意数据,并将其分散/收集到一组处理器中或从一组处理器中收集.不幸的是,这需要相当多的 MPI 概念——MPI 类型、范围和集合操作.在这个问题的答案中讨论了很多基本思想——MPI_Type_create_subarray 和MPI_Gather .

Getting to your question, though, MPI has very nice ways of pulling arbitrary data out of memory, and scatter/gathering it to and from a set of processors. Unfortunately that requires a fair number of MPI concepts - MPI Types, extents, and collective operations. A lot of the basic ideas are discussed in the answer to this question -- MPI_Type_create_subarray and MPI_Gather .

更新 - 在寒冷的白天,这是很多代码而不是很多解释.所以让我扩展一点.

Update - In the cold light of day, this is a lot of code and not a lot of explanation. So let me expand a little bit.

考虑任务 0 具有的一维整数全局数组,您希望将其分配给多个 MPI 任务,以便它们每个都在其本地数组中获得一块.假设您有 4 个任务,全局数组是 [01234567].你可以让任务 0 发送四条消息(包括一条给它自己)来分发它,当需要重新组装时,接收四条消息将它捆绑在一起;但这显然在大量进程中变得非常耗时.这些类型的操作有优化的例程 - 分散/收集操作.所以在这个 1d 的情况下,你会做这样的事情:

Consider a 1d integer global array that task 0 has that you want to distribute to a number of MPI tasks, so that they each get a piece in their local array. Say you have 4 tasks, and the global array is [01234567]. You could have task 0 send four messages (including one to itself) to distribute this, and when it's time to re-assemble, receive four messages to bundle it back together; but that obviously gets very time consuming at large numbers of processes. There are optimized routines for these sorts of operations - scatter/gather operations. So in this 1d case you'd do something like this:

int global[8];   /* only task 0 has this */
int local[2];    /* everyone has this */
const int root = 0;   /* the processor with the initial global data */

if (rank == root) {
   for (int i=0; i<7; i++) global[i] = i;
}

MPI_Scatter(global, 2, MPI_INT,      /* send everyone 2 ints from global */
            local,  2, MPI_INT,      /* each proc receives 2 ints into local */
            root, MPI_COMM_WORLD);   /* sending process is root, all procs in */
                                     /* MPI_COMM_WORLD participate */

在此之后,处理器的数据看起来像

After this, the processors' data would look like

task 0:  local:[01]  global: [01234567]
task 1:  local:[23]  global: [garbage-]
task 2:  local:[45]  global: [garbage-]
task 3:  local:[67]  global: [garbage-]

也就是说,分散操作采用全局数组并将连续的 2-int 块发送到所有处理器.

That is, the scatter operation takes the global array and sends contiguous 2-int chunks to all the processors.

为了重新组装数组,我们使用 MPI_Gather() 操作,其工作原理完全相同,但相反:

To re-assemble the array, we use the MPI_Gather() operation, which works exactly the same but in reverse:

for (int i=0; i<2; i++) 
   local[i] = local[i] + rank;

MPI_Gather(local,  2, MPI_INT,      /* everyone sends 2 ints from local */
           global, 2, MPI_INT,      /* root receives 2 ints each proc into global */
           root, MPI_COMM_WORLD);   /* recv'ing process is root, all procs in */
                                    /* MPI_COMM_WORLD participate */

现在数据看起来像

task 0:  local:[01]  global: [0134679a]
task 1:  local:[34]  global: [garbage-]
task 2:  local:[67]  global: [garbage-]
task 3:  local:[9a]  global: [garbage-]

Gather 将所有数据带回来,这里 a 是 10,因为我在开始这个例子时没有仔细考虑我的格式.

Gather brings all the data back, and here a is 10 because I didn't think my formatting through carefully enough upon starting this example.

如果数据点的数量没有平均分配进程的数量,我们需要向每个进程发送不同数量的项目,会发生什么?然后你需要一个分散的通用版本,MPI_Scatterv(),它可以让你指定每个处理器和位移——数据在全局数组中的起始位置.因此,假设您有一个包含 9 个字符的字符 [abcdefghi] 数组,并且您要为每个进程分配两个字符,除了最后一个字符,最后一个是三个字符.那么你需要

What happens if the number of data points doesn't evenly divide the number of processes, and we need to send different numbers of items to each process? Then you need a generalized version of scatter, MPI_Scatterv(), which lets you specify the counts for each processor, and displacements -- where in the global array that piece of data starts. So let's say you had an array of characters [abcdefghi] with 9 characters, and you were going to assign every process two characters except the last, that got three. Then you'd need

char global[9];   /* only task 0 has this */
char local[3]={'-','-','-'};    /* everyone has this */
int  mynum;                     /* how many items */
const int root = 0;   /* the processor with the initial global data */

if (rank == 0) {
   for (int i=0; i<8; i++) global[i] = 'a'+i;
}

int counts[4] = {2,2,2,3};   /* how many pieces of data everyone has */
mynum = counts[rank];
int displs[4] = {0,2,4,6};   /* the starting point of everyone's data */
                             /* in the global array */

MPI_Scatterv(global, counts, displs, /* proc i gets counts[i] pts from displs[i] */
            MPI_INT,      
            local, mynum, MPI_INT;   /* I'm receiving mynum MPI_INTs into local */
            root, MPI_COMM_WORLD);

现在数据看起来像

task 0:  local:[ab-]  global: [abcdefghi]
task 1:  local:[cd-]  global: [garbage--]
task 2:  local:[ef-]  global: [garbage--]
task 3:  local:[ghi]  global: [garbage--]

您现在已经使用 scatterv 分发不规则数量的数据.每种情况下的位移都是从数组的开头开始的二 * 等级(以字符为单位;位移以发送用于分散或接收用于收集的类型为单位;通常不是以字节或其他形式),并且计数为 {2,2,2,3}.如果它是我们想要拥有 3 个字符的第一个处理器,我们会设置 counts={3,2,2,2} 并且位移会是 {0,3,5,7}.Gatherv 的工作方式完全相同,但相反;counts 和 displs 数组将保持不变.

You've now used scatterv to distribute the irregular amounts of data. The displacement in each case is two*rank (measured in characters; the displacement is in unit of the types being sent for a scatter or received for a gather; it's not generally in bytes or something) from the start of the array, and the counts are {2,2,2,3}. If it had been the first processor we wanted to have 3 characters, we would have set counts={3,2,2,2} and displacements would have been {0,3,5,7}. Gatherv again works exactly the same but reverse; the counts and displs arrays would remain the same.

现在,对于 2D,这有点棘手.如果我们想发送二维数组的二维子锁,我们现在发送的数据不再是连续的.如果我们将 6x6 阵列的(比如说)3x3 子块发送到 4 个处理器,那么我们发送的数据就会有漏洞:

Now, for 2D, this is a bit trickier. If we want to send 2d sublocks of a 2d array, the data we're sending now no longer is contiguous. If we're sending (say) 3x3 subblocks of a 6x6 array to 4 processors, the data we're sending has holes in it:

2D Array

   ---------
   |000|111|
   |000|111|
   |000|111|
   |---+---|
   |222|333|
   |222|333|
   |222|333|
   ---------

Actual layout in memory

   [000111000111000111222333222333222333]

(请注意,所有高性能计算都归结为了解内存中数据的布局.)

(Note that all high-performance computing comes down to understanding the layout of data in memory.)

如果我们要将标有1"的数据发送到任务1,需要跳过三个值,发送三个值,跳过三个值,发送三个值,跳过三个值,发送三个值.第二个复杂因素是子区域停止和开始的地方;请注意,区域1"不会从区域0"停止的地方开始;在区域0"的最后一个元素之后,内存中的下一个位置是区域1"的中途.

If we want to send the data that is marked "1" to task 1, we need to skip three values, send three values, skip three values, send three values, skip three values, send three values. A second complication is where the subregions stop and start; note that region "1" doesn't start where region "0" stops; after the last element of region "0", the next location in memory is partway-way through region "1".

让我们先解决第一个布局问题——如何只提取我们想要发送的数据.我们总是可以将所有0"区域数据复制到另一个连续数组,然后发送;如果我们足够仔细地计划它,我们甚至可以这样做,我们可以在结果上调用 MPI_Scatter.但我们宁愿不必以这种方式转置我们的整个主要数据结构.

Let's tackle the first layout problem first - how to pull out just the data we want to send. We could always just copy out all the "0" region data to another, contiguous array, and send that; if we planned it out carefully enough, we could even do that in such a way that we could call MPI_Scatter on the results. But we'd rather not have to transpose our entire main data structure that way.

到目前为止,我们使用的所有 MPI 数据类型都是简单的 - MPI_INT 指定(例如)连续 4 个字节.但是,MPI 允许您创建自己的数据类型来描述内存中任意复杂的数据布局.这种情况——数组的矩形子区域——很常见,以至于有一个特定的要求.对于二维我们在上面描述的情况,

So far, all the MPI data types we've used are simple ones - MPI_INT specifies (say) 4 bytes in a row. However, MPI lets you create your own data types that describe arbitrarily complex data layouts in memory. And this case -- rectangular subregions of an array -- is common enough that there's a specific call for that. For the 2-dimensional case we're describing above,

    MPI_Datatype newtype;
    int sizes[2]    = {6,6};  /* size of global array */
    int subsizes[2] = {3,3};  /* size of sub-region */
    int starts[2]   = {0,0};  /* let's say we're looking at region "0",
                                 which begins at index [0,0] */

    MPI_Type_create_subarray(2, sizes, subsizes, starts, MPI_ORDER_C, MPI_INT, &newtype);
    MPI_Type_commit(&newtype);

这将创建一个类型,它只从全局数组中挑选区域0";我们可以现在只将那条数据发送到另一个处理器

This creates a type which picks out just the region "0" from the global array; we could send just that piece of data now to another processor

    MPI_Send(&(global[0][0]), 1, newtype, dest, tag, MPI_COMM_WORLD);  /* region "0" */

并且接收进程可以将其接收到本地数组中.请注意,接收过程如果只是将其接收到一个 3x3 数组中,则不能不能将其接收的内容描述为一种 newtype;不再描述内存布局.相反,它只是接收一个 3*3 = 9 个整数的块:

and the receiving process could receive it into a local array. Note that the receiving process, if it's only receiving it into a 3x3 array, can not describe what it's receiving as a type of newtype; that no longer describes the memory layout. Instead, it's just receiving a block of 3*3 = 9 integers:

    MPI_Recv(&(local[0][0]), 3*3, MPI_INT, 0, tag, MPI_COMM_WORLD);

请注意,我们也可以为其他子区域执行此操作,方法是为其他块创建不同的类型(具有不同的 start 数组),或者仅通过在特定块:

Note that we could do this for other sub-regions, too, either by creating a different type (with different start array) for the other blocks, or just by sending at the starting point of the particular block:

    MPI_Send(&(global[0][3]), 1, newtype, dest, tag, MPI_COMM_WORLD);  /* region "1" */
    MPI_Send(&(global[3][0]), 1, newtype, dest, tag, MPI_COMM_WORLD);  /* region "2" */
    MPI_Send(&(global[3][3]), 1, newtype, dest, tag, MPI_COMM_WORLD);  /* region "3" */

最后,请注意,这里我们要求全局和本地的内存块是连续的;也就是说,&(global[0][0])&(local[0][0])(或等效地,*global*local 指向连续的 6*6 和 3*3 内存块;分配动态多维数组的通常方式不能保证这一点.它展示了如何做下面这个.

Finally, note that we require global and local to be contiguous chunks of memory here; that is, &(global[0][0]) and &(local[0][0]) (or, equivalently, *global and *local point to contiguous 6*6 and 3*3 chunks of memory; that isn't guaranteed by the usual way of allocating dynamic multi-d arrays. It's shown how to do this below.

既然我们了解了如何指定子区域,那么在使用分散/聚集操作之前只需要讨论一件事,那就是这些类型的大小".我们还不能只对这些类型使用 MPI_Scatter()(甚至 scatterv),因为这些类型有 16 个整数的范围;也就是说,它们开始后的结束位置是 16 个整数——并且它们结束的位置与下一个块的开始位置不一致,所以我们不能只使用 scatter——它会选择错误的位置开始发送数据到下一个处理器.

Now that we understand how to specify subregions, there's only one more thing to discuss before using scatter/gather operations, and that's the "size" of these types. We couldn't just use MPI_Scatter() (or even scatterv) with these types yet, because these types have an extent of 16 integers; that is, where they end is 16 integers after they start -- and where they end doesn't line up nicely with where the next block begins, so we can't just use scatter - it would pick the wrong place to start sending data to the next processor.

当然,我们可以使用 MPI_Scatterv() 并自己指定位移,这就是我们要做的 - 除了位移以发送类型大小为单位,而这不是也不帮助我们;块从全局数组开头的 (0,3,18,21) 个整数的偏移量开始,并且块从它开始的位置结束 16 个整数的事实根本不允许我们以整数倍数表示这些位移.

Of course, we could use MPI_Scatterv() and specify the displacements ourselves, and that's what we'll do - except the displacements are in units of the send-type size, and that doesn't help us either; the blocks start at offsets of (0,3,18,21) integers from the start of the global array, and the fact that a block ends 16 integers from where it starts doesn't let us express those displacements in integer multiples at all.

为了解决这个问题,MPI 允许您为这些计算设置类型的范围.它不会截断类型;它仅用于在给定最后一个元素的情况下确定下一个元素的开始位置.对于此类带有孔洞的类型,将范围设置为小于内存中到类型实际末端的距离通常很方便.

To deal with this, MPI lets you set the extent of the type for the purposes of these calculations. It doesn't truncate the type; it's just used for figuring out where the next element starts given the last element. For types like these with holes in them, it's frequently handy to set the extent to be something smaller than the distance in memory to the actual end of the type.

我们可以将范围设置为对我们来说方便的任何内容.我们可以将范围 1 设为整数,然后以整数为单位设置位移.但是,在这种情况下,我喜欢将范围设置为 3 个整数 - 子行的大小 - 这样,块1"在块0"之后立即开始,块3"在块之后立即开始"2".不幸的是,从块2"跳转到块3"时它的效果不是很好,但这无济于事.

We can set the extent to be anything that's convenient to us. We could just make the extent 1 integer, and then set the displacements in units of integers. In this case, though, I like to set the extent to be 3 integers - the size of a sub-row - that way, block "1" starts immediately after block "0", and block "3" starts immediately after block "2". Unfortunately, it doesn't quite work as nicely when jumping from block "2" to block "3", but that can't be helped.

因此,为了在这种情况下分散子块,我们将执行以下操作:

So to scatter the subblocks in this case, we'd do the following:

    MPI_Datatype type, resizedtype;
    int sizes[2]    = {6,6};  /* size of global array */
    int subsizes[2] = {3,3};  /* size of sub-region */
    int starts[2]   = {0,0};  /* let's say we're looking at region "0",
                                 which begins at index [0,0] */

    /* as before */
    MPI_Type_create_subarray(2, sizes, subsizes, starts, MPI_ORDER_C, MPI_INT, &type);  
    /* change the extent of the type */
    MPI_Type_create_resized(type, 0, 3*sizeof(int), &resizedtype);
    MPI_Type_commit(&resizedtype);

这里我们创建了与之前相同的块类型,但我们调整了它的大小;我们没有改变类型开始"(0)的位置,但我们改变了它结束"的位置(3 个整数).我们之前没有提到这一点,但是需要 MPI_Type_commit 才能使用该类型;但是你只需要提交你实际使用的最终类型,而不是任何中间步骤.完成后使用 MPI_Type_free 释放类型.

Here we've created the same block type as before, but we've resized it; we haven't changed where the type "starts" (the 0) but we've changed where it "ends" (3 ints). We didn't mention this before, but the MPI_Type_commit is required to be able to use the type; but you only need to commit the final type you actually use, not any intermediate steps. You use MPI_Type_free to free the type when you're done.

现在,最后,我们可以对块进行 scatterv:上面的数据操作有点复杂,但是一旦完成,scatterv 看起来就像以前一样:

So now, finally, we can scatterv the blocks: the data manipulations above are a little complicated, but once it's done, the scatterv looks just like before:

int counts[4] = {1,1,1,1};   /* how many pieces of data everyone has, in units of blocks */
int displs[4] = {0,1,6,7};   /* the starting point of everyone's data */
                             /* in the global array, in block extents */

MPI_Scatterv(global, counts, displs, /* proc i gets counts[i] types from displs[i] */
            resizedtype,      
            local, 3*3, MPI_INT;   /* I'm receiving 3*3 MPI_INTs into local */
            root, MPI_COMM_WORLD);

现在我们完成了,在稍微了解了 scatter、gather 和 MPI 派生类型之后.

And now we're done, after a little tour of scatter, gather, and MPI derived types.

示例代码显示了使用字符数组的收集和分散操作.运行程序:

An example code which shows both the gather and the scatter operation, with character arrays, follows. Running the program:

$ mpirun -n 4 ./gathervarray
Global array is:
0123456789
3456789012
6789012345
9012345678
2345678901
5678901234
8901234567
1234567890
4567890123
7890123456
Local process on rank 0 is:
|01234|
|34567|
|67890|
|90123|
|23456|
Local process on rank 1 is:
|56789|
|89012|
|12345|
|45678|
|78901|
Local process on rank 2 is:
|56789|
|89012|
|12345|
|45678|
|78901|
Local process on rank 3 is:
|01234|
|34567|
|67890|
|90123|
|23456|
Processed grid:
AAAAABBBBB
AAAAABBBBB
AAAAABBBBB
AAAAABBBBB
AAAAABBBBB
CCCCCDDDDD
CCCCCDDDDD
CCCCCDDDDD
CCCCCDDDDD
CCCCCDDDDD

代码如下.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include "mpi.h"

int malloc2dchar(char ***array, int n, int m) {

    /* allocate the n*m contiguous items */
    char *p = (char *)malloc(n*m*sizeof(char));
    if (!p) return -1;

    /* allocate the row pointers into the memory */
    (*array) = (char **)malloc(n*sizeof(char*));
    if (!(*array)) {
       free(p);
       return -1;
    }

    /* set up the pointers into the contiguous memory */
    for (int i=0; i<n; i++)
       (*array)[i] = &(p[i*m]);

    return 0;
}

int free2dchar(char ***array) {
    /* free the memory - the first element of the array is at the start */
    free(&((*array)[0][0]));

    /* free the pointers into the memory */
    free(*array);

    return 0;
}

int main(int argc, char **argv) {
    char **global, **local;
    const int gridsize=10; // size of grid
    const int procgridsize=2;  // size of process grid
    int rank, size;        // rank of current process and no. of processes

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);


    if (size != procgridsize*procgridsize) {
        fprintf(stderr,"%s: Only works with np=%d for now
", argv[0], procgridsize);
        MPI_Abort(MPI_COMM_WORLD,1);
    }


    if (rank == 0) {
        /* fill in the array, and print it */
        malloc2dchar(&global, gridsize, gridsize);
        for (int i=0; i<gridsize; i++) {
            for (int j=0; j<gridsize; j++)
                global[i][j] = '0'+(3*i+j)%10;
        }


        printf("Global array is:
");
        for (int i=0; i<gridsize; i++) {
            for (int j=0; j<gridsize; j++)
                putchar(global[i][j]);

            printf("
");
        }
    }

    /* create the local array which we'll process */
    malloc2dchar(&local, gridsize/procgridsize, gridsize/procgridsize);

    /* create a datatype to describe the subarrays of the global array */

    int sizes[2]    = {gridsize, gridsize};         /* global size */
    int subsizes[2] = {gridsize/procgridsize, gridsize/procgridsize};     /* local size */
    int starts[2]   = {0,0};                        /* where this one starts */
    MPI_Datatype type, subarrtype;
    MPI_Type_create_subarray(2, sizes, subsizes, starts, MPI_ORDER_C, MPI_CHAR, &type);
    MPI_Type_create_resized(type, 0, gridsize/procgridsize*sizeof(char), &subarrtype);
    MPI_Type_commit(&subarrtype);

    char *globalptr=NULL;
    if (rank == 0) globalptr = &(global[0][0]);

    /* scatter the array to all processors */
    int sendcounts[procgridsize*procgridsize];
    int displs[procgridsize*procgridsize];

    if (rank == 0) {
        for (int i=0; i<procgridsize*procgridsize; i++) sendcounts[i] = 1;
        int disp = 0;
        for (int i=0; i<procgridsize; i++) {
            for (int j=0; j<procgridsize; j++) {
                displs[i*procgridsize+j] = disp;
                disp += 1;
            }
            disp += ((gridsize/procgridsize)-1)*procgridsize;
        }
    }


    MPI_Scatterv(globalptr, sendcounts, displs, subarrtype, &(local[0][0]),
                 gridsize*gridsize/(procgridsize*procgridsize), MPI_CHAR,
                 0, MPI_COMM_WORLD);

    /* now all processors print their local data: */

    for (int p=0; p<size; p++) {
        if (rank == p) {
            printf("Local process on rank %d is:
", rank);
            for (int i=0; i<gridsize/procgridsize; i++) {
                putchar('|');
                for (int j=0; j<gridsize/procgridsize; j++) {
                    putchar(local[i][j]);
                }
                printf("|
");
            }
        }
        MPI_Barrier(MPI_COMM_WORLD);
    }

    /* now each processor has its local array, and can process it */
    for (int i=0; i<gridsize/procgridsize; i++) {
        for (int j=0; j<gridsize/procgridsize; j++) {
            local[i][j] = 'A' + rank;
        }
    }

    /* it all goes back to process 0 */
    MPI_Gatherv(&(local[0][0]), gridsize*gridsize/(procgridsize*procgridsize),  MPI_CHAR,
                 globalptr, sendcounts, displs, subarrtype,
                 0, MPI_COMM_WORLD);

    /* don't need the local data anymore */
    free2dchar(&local);

    /* or the MPI data type */
    MPI_Type_free(&subarrtype);

    if (rank == 0) {
        printf("Processed grid:
");
        for (int i=0; i<gridsize; i++) {
            for (int j=0; j<gridsize; j++) {
                putchar(global[i][j]);
            }
            printf("
");
        }

        free2dchar(&global);
    }


    MPI_Finalize();

    return 0;
}

这篇关于使用 MPI 在 C 中发送二维数组块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆