Scalapack中的行分配不一致 [英] Inconsistent rows allocation in scalapack

查看:123
本文介绍了Scalapack中的行分配不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下简单的fortran程序

Consider the following simple fortran program

program test_vec_allocation
    use mpi
    implicit none
    integer(kind=8)             :: N
    ! =========================BLACS and MPI=======================
    integer                     :: ierr, size, rank,dims(2)
    ! -------------------------------------------------------------
    integer, parameter          :: block_size = 100
    integer                     :: context, nprow, npcol, local_nprow, local_npcol
    integer                     :: numroc, indxl2g, descmat(9),descvec(9)
    integer                     :: mloc_mat ,nloc_mat ,mloc_vec ,nloc_vec

    call blacs_pinfo(rank,size)
    dims=0
    call MPI_Dims_create(size, 2, dims, ierr)
    nprow = dims(1);npcol = dims(2)
    call blacs_get(0,0,context)
    call blacs_gridinit(context, 'R', nprow, npcol)
    call blacs_gridinfo(context, nprow, npcol, local_nprow,local_npcol)

    N = 700

    mloc_vec = numroc(N,block_size,local_nprow,0, nprow)
    nloc_vec = numroc(1,block_size,local_npcol,0, npcol)
    print *,"Rank", rank, mloc_vec, nloc_vec

    call blacs_gridexit(context)
    call blacs_exit(0)

end program test_vec_allocation

当我以11个mpi等级运行它时

when I run it with 11 mpi ranks i get

 Rank           0         100           1
 Rank           4         100           1
 Rank           2         100           1
 Rank           1         100           1
 Rank           3         100           1
 Rank          10           0           1
 Rank           6         100           1
 Rank           5         100           1
 Rank           9           0           1
 Rank           8           0           1
 Rank           7           0           1

这就是我期望scalapack划分此数组的方式,但是,对于偶数的排名,我得到了:

which is how i would expect scalapack to divide this array, however, for even number of ranks i get:

 Rank           0         200           1
 Rank           8         200           0
 Rank           9         100           1
 Rank          10         100           0
 Rank           1         200           0
 Rank           6         200           1
 Rank          11         100           0
 Rank           3         200           1
 Rank           4         200           0
 Rank           2         200           0
 Rank           7         200           0
 Rank           5         200           0

这没有意义,为什么对于块大小为100的排名0会得到200个元素,而对*块大小的排名为N. 因此,我的程序适用于mp等级1、2、3、5、7、11,但是对于等级4、6、8、9、10、12等却失败(我不知道为什么它对等级9失败!) .谁能解释我的方法有什么问题?

which makes no sense, why would rank 0 get 200 elements for block size 100 and ranks * block size > N. Because of this my program works for mpi ranks 1,2,3,5,7,11, but fails for ranks 4,6,8,9,10,12, etc (I dont why it is failing for rank 9!). Can anyone explain what is wrong in my approach?

GFortran版本:6.1.0

GFortran version: 6.1.0

SCALPACK版本:2.1.0

SCALPACK version: 2.1.0

MacOS版本:10.11

MacOS version: 10.11

推荐答案

您的代码有很多问题

1)首先,不要使用Integer(8).正如弗拉基米尔所说,请不要学习这一点.它不仅不便于移植,因此也是非常不好的做法(请在此处查看许多示例,例如 Fortran 90种类参数),这是错误的,因为numroc期望将默认类型的整数作为其第一个参数(请参见例如

1) Firstly don't use Integer( 8 ). As Vladimir put it, please unlearn this. Not only is it not portable and therefore very bad practice (please see many examples here, e.g. Fortran 90 kind parameter) here it is wrong as numroc expects an integer of default kind as its first argument (see e.g. https://software.intel.com/content/www/us/en/develop/documentation/mkl-developer-reference-fortran/top/scalapack-routines/scalapack-utility-functions-and-routines/numroc.html)

2)您在调用MPI_Init之前先调用MPI例程,但其中充满了异常(这不是一个例外),这会导致不确定的行为.请注意 https://www.netlib.org/blacs/BLACS/QRef.html#BLACS_PINFO 并未提及实际调用MPI_Init.因此,我也更喜欢致电MPI_Finalise

2) You call an MPI routine before you call MPI_Init, with a hand full of exceptions (and this isn't one) this results in undefined behaviour. Note the description at https://www.netlib.org/blacs/BLACS/QRef.html#BLACS_PINFO makes no reference to actually calling MPI_Init. As such I also prefer to call MPI_Finalise

3)您误解了MPI_Dims_create.您似乎假设您将获得一维分布,但实际上却要求它是二维分布.引用该标准,网址为 https://www.mpi-forum .org/docs/mpi-3.1/mpi31-report.pdf

3) You have misunderstood MPI_Dims_create. You seem to assume you will get a 1 dimensional distribution, but you actually ask it for a two dimensional one. Quoting from the standard at https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf

设置数组暗中的条目以描述笛卡尔网格 具有ndims维度和总共nnodes个节点.尺寸是 设置为尽可能靠近彼此,并使用适当的 除数算法.呼叫者可以进一步限制 通过指定数组dims的元素来执行此例程.如果 dims [i]设置为正数,例程不会修改 维度i中的节点数;仅那些dims [i] = 0的条目 会被调用修改.

The entries in the array dims are set to describe a Cartesian grid with ndims dimensions and a total of nnodes nodes. The dimensions are set to be as close to each other as possible,using an appropriate divisibility algorithm. The caller may further constrain the operation of this routine by specifying elements of array dims. If dims[i] is set to a positive number,the routine will not modify the number of nodes in dimension i; only those entries where dims[i] = 0 are modified by the call.

您将调暗设置为零,因此例程可以自由设置两个尺寸.因此,对于11个进程,您将获得一个1x11或11x1的网格,这似乎是您期望的.但是,对于12个进程,作为The dimensions are set to be as close to each other as possible,您将获得3x4或4x3网格,而不是12x1.如果每行是3x4,则您希望numroc返回3个具有200个元素(2个块)的进程,并返回1个具有100个元素.因此,由于存在3行,因此您希望3x3 = 9的进程返回200,而3x1 = 3的进程返回100.这就是您所看到的.还尝试15个proc-您会看到奇数个进程,它们不起作用",这是因为(高级数学警报)15 = 3x5.顺便说一句,在我的计算机上,9个进程不会返回3x3-这对我来说似乎是openmpi中的错误.

You set dims equal to zero, so the routine is free to set both dimensions. Thus for 11 processes you will get a 1x11 or 11x1 grid, which is what you seem to expect. However for 12 processes, as The dimensions are set to be as close to each other as possible you will get either a 3x4 or 4x3 grid, NOT 12x1. If it is 3x4 along each row you expect numroc to return 3 processes with 200 elements ( 2 blocks ), and 1 with 100. As there are 3 rows you therefore expect 3x3=9 processes returning 200 and 3x1=3 returning 100. This is what you see. Also try 15 procs - you will see an odd number of processes that according to you "does not work", this is because (advanced maths alert) 15=3x5. Incidentally on my machine 9 processes does NOT return 3x3 - this looks like a bug in openmpi to me.

这篇关于Scalapack中的行分配不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆