发送一个二维数组的块分布在MPI根进程 [英] Sending distributed chunks of a 2D array to the root process in MPI

查看:1078
本文介绍了发送一个二维数组的块分布在MPI根进程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个二维数组是跨越MPI进程格(在此实施例3×2过程)分布。数组的值是该阵列的该块被分配到过程中产生的,我希望收集所有这些组块一起在根过程中显示它们。

到目前为止,我有以下code。这产生笛卡尔通信,找出了MPI进程的坐标和作品出应该多大阵列的获取基于(作为数组不必是笛卡尔网格大小的倍数)。然后,我创建一个新的MPI的数据类型,将那子数组处理作为一项全送(即,步幅,块长度和计数是每个过程不同,因为每个进程都有不同大小的数组)。然而,当我来与MPI_Gather一起收集数据,我得到一个分段错误。

我想这是因为我不应该使用相同的数据类型进行发送,并在MPI_Gather呼叫接收。数据类型是罚款发送数据,因为它有正确的数量,步幅和块长度,但是当它到达另一端,它会需要一个非常不同的派生数据类型。我不知道如何计算参数此数据类型 - 没有任何人有任何想法

另外,如果我是从完全错误的角度接近这一那么请让我知道!

 #包括LT&;&stdio.h中GT;
#包括LT&;&array_alloc.h GT;
#包括LT&;&MATH.H GT;
#包括LT&;&mpi.h GT;INT主(INT ARGC,字符** argv的)
{
    INT大小,职级;
    INT dim_size [2];
    INT周期[2];
    INT A = 2;
    INT B = 3;
    MPI_Comm cart_comm;
    MPI_Datatype BLOCK_TYPE;
    诠释COORDS [2];    浮**阵列;
    浮** whole_array;    INT N = 10;
    INT rows_per_core;
    INT cols_per_core;
    INT I,J;    INT x_start,x_finish;
    INT y_start,y_finish;    / *初始化MPI * /
    MPI_INIT(安培; ARGC,&安培; argv的);    / *获取此过程军衔,和进程数* /
    MPI_Comm_size(MPI_COMM_WORLD,&安培;大小);
    MPI_Comm_rank(MPI_COMM_WORLD,&安培;等级);    如果(排名== 0)
    {
        / *如果我们的主进程* /
        whole_array = alloc_2d_float(N,N);        / *初始化整个数组傻值* /
        对于(i = 0; I< N;我++)
        {
            为(J = 0; J< N; J ++)
            {
                whole_array [I] [J] = 9999.99;
            }
        }        为(J = 0; J< N; J ++)
        {
            对于(i = 0; I< N;我++)
            {
                的printf(%F,whole_array [J] [I]);
            }
            的printf(\\ n);
        }
    }    / *创建笛卡尔沟通* /
    dim_size [0] = B;
    dim_size [1] = A;
    时段[0] = 1;
    时段[1] = 1;    MPI_CART_CREATE(MPI_COMM_WORLD,2,dim_size,句号,1,&安培; cart_comm);    / *的沟通中获得我们的坐标* /
    MPI_Cart_coords(cart_comm,等级,2,COORDS);    rows_per_core = CEIL(N /(浮点)A);
    cols_per_core = CEIL(N /(浮点)B);    如果(COORDS [0] ==(B - 1))
    {
        / *我们正处于一个排的尽头* /
        cols_per_core =正 - (cols_per_core *(B - 1));
    }
    如果(COORDS [1] ==(A - 1))
    {
        / *我们正处在一个山坳底部* /
        rows_per_core = N - (rows_per_core *(A - 1));
    }    的printf(X:%D,Y:%D,RPC:%D,中共数:%d \\ n,COORDS [0],COORDS [1],rows_per_core,cols_per_core);    MPI_Type_vector(rows_per_core,cols_per_core,cols_per_core + 1,MPI_FLOAT,&放大器; BLOCK_TYPE);
    MPI_Type_commit(安培; BLOCK_TYPE);    阵列= alloc_2d_float(rows_per_core,cols_per_core);    如果(阵列== NULL)
    {
        的printf(问题与数组分配\\ nExiting \\ n);
        返回1;
    }    为(J = 0; J< rows_per_core; J ++)
    {
        对于(i = 0; I< cols_per_core;我++)
        {
            阵列[J] [I] =(浮点)(I + 1);
        }
    }    MPI_Barrier(MPI_COMM_WORLD);    MPI_Gather(阵列,1,BLOCK_TYPE,whole_array,1,BLOCK_TYPE,0,MPI_COMM_WORLD);    / *
    如果(排名== 0)
    {
        为(J = 0; J< N; J ++)
        {
            对于(i = 0; I< N;我++)
            {
                的printf(%F,whole_array [J] [I]);
            }
            的printf(\\ n);
        }
    }
    * /
    / *关闭电MPI环境* /
    MPI_Finalize();
}

我在上面所使用的二维数组分配程序被实现为:

 浮动** alloc_2d_float(INT ndim1,诠释ndim2){  浮**数组2 =的malloc(ndim1 * sizeof的(浮动*));  INT I;  如果(ARRAY2!= NULL){    数组2 [0] =的malloc(ndim1 * ndim2 *的sizeof(浮动));    如果(数组2 [0]!= NULL){      对于(i = 1; I< ndim1;我++)
    数组2 [I] =数组2 [0] + I * ndim2;    }    其他{
      免费(数组2);
      数组2 = NULL;
    }  }  返回数组2;}


解决方案

它看起来像第一个参数你 MPI_Gather 通话也许应该是阵列[0] ,而不是阵列

另外,如果你需要获得不同数量的每个级别的数据,你可能会更好使用 MPI_Gatherv

最后,不就是一次地方做输出收集所有你的数据不是很多的情况下可扩展性。随着数据量的增长,最终将超过可用内存排名0您可能会好得多分配输出工作(如果你正在写一个文件,使用MPI IO或其它库调用),或做点 - 点对点发送一次排名0之一,以限制总内存消耗。

在另一方面,我会的的建议协调每个队伍打印到标准输出,一个接一个的,因为一些主要的MPI实现不保证标准输出将依次产生。 Cray的MPI,特别是搅乱了标准输出pretty彻底如果多个行列打印。

I have a 2D array which is distributed across a MPI process grid (3 x 2 processes in this example). The values of the array are generated within the process which that chunk of the array is distributed to, and I want to gather all of those chunks together at the root process to display them.

So far, I have the code below. This generates a cartesian communicator, finds out the co-ordinates of the MPI process and works out how much of the array it should get based on that (as the array need not be a multiple of the cartesian grid size). I then create a new MPI derived datatype which will send the whole of that processes subarray as one item (that is, the stride, blocklength and count are different for each process, as each process has different sized arrays). However, when I come to gather the data together with MPI_Gather, I get a segmentation fault.

I think this is because I shouldn't be using the same datatype for sending and receiving in the MPI_Gather call. The data type is fine for sending the data, as it has the right count, stride and blocklength, but when it gets to the other end it'll need a very different derived datatype. I'm not sure how to calculate the parameters for this datatype - does anyone have any ideas?

Also, if I'm approaching this from completely the wrong angle then please let me know!

#include<stdio.h>
#include<array_alloc.h>
#include<math.h>
#include<mpi.h>

int main(int argc, char ** argv)
{
    int size, rank;
    int dim_size[2];
    int periods[2];
    int A = 2;
    int B = 3;
    MPI_Comm cart_comm;
    MPI_Datatype block_type;
    int coords[2];

    float **array;
    float **whole_array;

    int n = 10;
    int rows_per_core;
    int cols_per_core;
    int i, j;

    int x_start, x_finish;
    int y_start, y_finish;

    /* Initialise MPI */
    MPI_Init(&argc, &argv);

    /* Get the rank for this process, and the number of processes */
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank == 0)
    {
        /* If we're the master process */
        whole_array = alloc_2d_float(n, n);

        /* Initialise whole array to silly values */
        for (i = 0; i < n; i++)
        {
            for (j = 0; j < n; j++)
            {
                whole_array[i][j] = 9999.99;
            }
        }

        for (j = 0; j < n; j ++)
        {
            for (i = 0; i < n; i++)
            {
                printf("%f ", whole_array[j][i]);
            }
            printf("\n");
        }
    }

    /* Create the cartesian communicator */
    dim_size[0] = B;
    dim_size[1] = A;
    periods[0] = 1;
    periods[1] = 1;

    MPI_Cart_create(MPI_COMM_WORLD, 2, dim_size, periods, 1, &cart_comm);

    /* Get our co-ordinates within that communicator */
    MPI_Cart_coords(cart_comm, rank, 2, coords);

    rows_per_core = ceil(n / (float) A);
    cols_per_core = ceil(n / (float) B);

    if (coords[0] == (B - 1))
    {
        /* We're at the far end of a row */
        cols_per_core = n - (cols_per_core * (B - 1));
    }
    if (coords[1] == (A - 1))
    {
        /* We're at the bottom of a col */
        rows_per_core = n - (rows_per_core * (A - 1));
    }

    printf("X: %d, Y: %d, RpC: %d, CpC: %d\n", coords[0], coords[1], rows_per_core, cols_per_core);

    MPI_Type_vector(rows_per_core, cols_per_core, cols_per_core + 1, MPI_FLOAT, &block_type);
    MPI_Type_commit(&block_type);

    array = alloc_2d_float(rows_per_core, cols_per_core);

    if (array == NULL)
    {
        printf("Problem with array allocation.\nExiting\n");
        return 1;
    }

    for (j = 0; j < rows_per_core; j++)
    {
        for (i = 0; i < cols_per_core; i++)
        {
            array[j][i] = (float) (i + 1);
        }
    }

    MPI_Barrier(MPI_COMM_WORLD);

    MPI_Gather(array, 1, block_type, whole_array, 1, block_type, 0, MPI_COMM_WORLD);

    /*
    if (rank == 0)
    {
        for (j = 0; j < n; j ++)
        {
            for (i = 0; i < n; i++)
            {
                printf("%f ", whole_array[j][i]);
            }
            printf("\n");
        }
    }
    */
    /* Close down the MPI environment */
    MPI_Finalize();
}

The 2D array allocation routine I have used above is implemented as:

float **alloc_2d_float( int ndim1, int ndim2 ) {

  float **array2 = malloc( ndim1 * sizeof( float * ) );

  int i;

  if( array2 != NULL ){

    array2[0] = malloc( ndim1 * ndim2 * sizeof( float ) );

    if( array2[ 0 ] != NULL ) {

      for( i = 1; i < ndim1; i++ )
    array2[i] = array2[0] + i * ndim2;

    }

    else {
      free( array2 );
      array2 = NULL;
    }

  }

  return array2;

}

解决方案

It looks like the first argument to you MPI_Gather call should probably be array[0], and not array.

Also, if you need to get different amounts of data from each rank, you might be better off using MPI_Gatherv.

Finally, not that gathering all your data in once place to do output is not scalable in many circumstances. As the amount of data grows, eventually, it will exceed the memory available to rank 0. You might be much better off distributing the output work (if you are writing to a file, using MPI IO or other library calls) or doing point-to-point sends to rank 0 one at a time, to limit the total memory consumption.

On the other hand, I would not recommend coordinating each of your ranks printing to standard output, one after another, because some major MPI implementations don't guarantee that standard output will be produced in order. Cray's MPI, in particular, jumbles up standard output pretty thoroughly if multiple ranks print.

这篇关于发送一个二维数组的块分布在MPI根进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆