Scalapack中的行分配不一致 [英] Inconsistent rows allocation in scalapack
问题描述
考虑以下简单的fortran程序
Consider the following simple fortran program
program test_vec_allocation
use mpi
implicit none
integer(kind=8) :: N
! =========================BLACS and MPI=======================
integer :: ierr, size, rank,dims(2)
! -------------------------------------------------------------
integer, parameter :: block_size = 100
integer :: context, nprow, npcol, local_nprow, local_npcol
integer :: numroc, indxl2g, descmat(9),descvec(9)
integer :: mloc_mat ,nloc_mat ,mloc_vec ,nloc_vec
call blacs_pinfo(rank,size)
dims=0
call MPI_Dims_create(size, 2, dims, ierr)
nprow = dims(1);npcol = dims(2)
call blacs_get(0,0,context)
call blacs_gridinit(context, 'R', nprow, npcol)
call blacs_gridinfo(context, nprow, npcol, local_nprow,local_npcol)
N = 700
mloc_vec = numroc(N,block_size,local_nprow,0, nprow)
nloc_vec = numroc(1,block_size,local_npcol,0, npcol)
print *,"Rank", rank, mloc_vec, nloc_vec
call blacs_gridexit(context)
call blacs_exit(0)
end program test_vec_allocation
当我以11个mpi等级运行它时
when I run it with 11 mpi ranks i get
Rank 0 100 1
Rank 4 100 1
Rank 2 100 1
Rank 1 100 1
Rank 3 100 1
Rank 10 0 1
Rank 6 100 1
Rank 5 100 1
Rank 9 0 1
Rank 8 0 1
Rank 7 0 1
这就是我期望scalapack划分此数组的方式,但是,对于偶数的排名,我得到了:
which is how i would expect scalapack to divide this array, however, for even number of ranks i get:
Rank 0 200 1
Rank 8 200 0
Rank 9 100 1
Rank 10 100 0
Rank 1 200 0
Rank 6 200 1
Rank 11 100 0
Rank 3 200 1
Rank 4 200 0
Rank 2 200 0
Rank 7 200 0
Rank 5 200 0
这没有意义,为什么对于块大小为100的排名0会得到200个元素,而对*块大小的排名为N. 因此,我的程序适用于mp等级1、2、3、5、7、11,但是对于等级4、6、8、9、10、12等却失败(我不知道为什么它对等级9失败!) .谁能解释我的方法有什么问题?
which makes no sense, why would rank 0 get 200 elements for block size 100 and ranks * block size > N. Because of this my program works for mpi ranks 1,2,3,5,7,11, but fails for ranks 4,6,8,9,10,12, etc (I dont why it is failing for rank 9!). Can anyone explain what is wrong in my approach?
GFortran版本:6.1.0
GFortran version: 6.1.0
SCALPACK版本:2.1.0
SCALPACK version: 2.1.0
MacOS版本:10.11
MacOS version: 10.11
推荐答案
您的代码有很多问题
1)首先,不要使用Integer(8).正如弗拉基米尔所说,请不要学习这一点.它不仅不便于移植,因此也是非常不好的做法(请在此处查看许多示例,例如 Fortran 90种类参数),这是错误的,因为numroc
期望将默认类型的整数作为其第一个参数(请参见例如
1) Firstly don't use Integer( 8 ). As Vladimir put it, please unlearn this. Not only is it not portable and therefore very bad practice (please see many examples here, e.g. Fortran 90 kind parameter) here it is wrong as numroc
expects an integer of default kind as its first argument (see e.g. https://software.intel.com/content/www/us/en/develop/documentation/mkl-developer-reference-fortran/top/scalapack-routines/scalapack-utility-functions-and-routines/numroc.html)
2)您在调用MPI_Init之前先调用MPI例程,但其中充满了异常(这不是一个例外),这会导致不确定的行为.请注意 https://www.netlib.org/blacs/BLACS/QRef.html#BLACS_PINFO 并未提及实际调用MPI_Init.因此,我也更喜欢致电MPI_Finalise
2) You call an MPI routine before you call MPI_Init, with a hand full of exceptions (and this isn't one) this results in undefined behaviour. Note the description at https://www.netlib.org/blacs/BLACS/QRef.html#BLACS_PINFO makes no reference to actually calling MPI_Init. As such I also prefer to call MPI_Finalise
3)您误解了MPI_Dims_create.您似乎假设您将获得一维分布,但实际上却要求它是二维分布.引用该标准,网址为 https://www.mpi-forum .org/docs/mpi-3.1/mpi31-report.pdf
3) You have misunderstood MPI_Dims_create. You seem to assume you will get a 1 dimensional distribution, but you actually ask it for a two dimensional one. Quoting from the standard at https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
设置数组暗中的条目以描述笛卡尔网格 具有ndims维度和总共nnodes个节点.尺寸是 设置为尽可能靠近彼此,并使用适当的 除数算法.呼叫者可以进一步限制 通过指定数组dims的元素来执行此例程.如果 dims [i]设置为正数,例程不会修改 维度i中的节点数;仅那些dims [i] = 0的条目 会被调用修改.
The entries in the array dims are set to describe a Cartesian grid with ndims dimensions and a total of nnodes nodes. The dimensions are set to be as close to each other as possible,using an appropriate divisibility algorithm. The caller may further constrain the operation of this routine by specifying elements of array dims. If dims[i] is set to a positive number,the routine will not modify the number of nodes in dimension i; only those entries where dims[i] = 0 are modified by the call.
您将调暗设置为零,因此例程可以自由设置两个尺寸.因此,对于11个进程,您将获得一个1x11或11x1的网格,这似乎是您期望的.但是,对于12个进程,作为The dimensions are set to be as close to each other as possible
,您将获得3x4或4x3网格,而不是12x1.如果每行是3x4,则您希望numroc
返回3个具有200个元素(2个块)的进程,并返回1个具有100个元素.因此,由于存在3行,因此您希望3x3 = 9的进程返回200,而3x1 = 3的进程返回100.这就是您所看到的.还尝试15个proc-您会看到奇数个进程,它们不起作用",这是因为(高级数学警报)15 = 3x5.顺便说一句,在我的计算机上,9个进程不会返回3x3-这对我来说似乎是openmpi中的错误.
You set dims equal to zero, so the routine is free to set both dimensions. Thus for 11 processes you will get a 1x11 or 11x1 grid, which is what you seem to expect. However for 12 processes, as The dimensions are set to be as close to each other as possible
you will get either a 3x4 or 4x3 grid, NOT 12x1. If it is 3x4 along each row you expect numroc
to return 3 processes with 200 elements ( 2 blocks ), and 1 with 100. As there are 3 rows you therefore expect 3x3=9 processes returning 200 and 3x1=3 returning 100. This is what you see. Also try 15 procs - you will see an odd number of processes that according to you "does not work", this is because (advanced maths alert) 15=3x5. Incidentally on my machine 9 processes does NOT return 3x3 - this looks like a bug in openmpi to me.
这篇关于Scalapack中的行分配不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!