MPI IO读写块循环矩阵 [英] MPI IO Reading and Writing Block Cyclic Matrix

查看：216 发布时间：2016/8/18 14:53:23 c io mpi

本文介绍了MPI IO读写块循环矩阵的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个学校项目做一个HPC分布式系统上的矩阵乘法。

I have a school project to do matrix multiplication on a hpc distributed system.

我需要从一个并行IO系统以矩阵阅读和使用pblacs在许多计算节点（处理器）执行并行矩阵乘法。数据必须在使用MPI IO命令被读出。我知道PBlacs使用块循环分布来进行乘法运算。

I need to read in a matrix from a parallel IO system and use pblacs to perform the matrix multiplication in parallel on many compute nodes(processors). The data must be read in using MPI IO commands. I know PBlacs uses block cyclic distributions to perform the multiplication.

这位教授并没有给我们很多信息的MPI IO，和我有麻烦就可以找到很多信息/资源。 特别是，是否有办法从并行IO系统在一个块循环方式在基体中阅读和轻松地把它插入到pblacs pdgemm？

The professor has not given us much info on MPI IO, and I am having trouble finding much information/resources on it. Specifically, are there ways to read in a matrix from a parallel io system in a block cyclic manner and easily plug that into pblacs pdgemm?

有用的资源，任何指针将是非常美联社preciated。我有点短的时间，并在这个项目上缺乏方向感到沮丧。

Any pointers to useful resources would be much appreciated. I am a bit short on time, and getting frustrated with the lack of direction on this project.

推荐答案

这其实是相对简单做（如果你已经知道一些关于BLACS /的ScaLAPACK和MPI-io的！），但即使是这样的文件 - 即使在线 - 是因为你已经发现，有些不佳。

This is actually relatively straightforward to do (if you already know something about blacs/scalapack and mpi-io!) but even then the documentation - even online - is as you've discovered, somewhat poor.

要了解MPI-IO的第一件事是，它可以让你用正常的MPI数据类型指定每个进程的文件的查看，然后只读落入该视图中的数据。在我们的中心，我们有并行IO半天课程的PPT和源$ C $ C;前三分之一左右是关于MPI-IO的基本知识。有滑梯这里和样品来源$ C $ C的这里。

The first thing to know about MPI-IO is that it lets you use normal MPI data types to specify each process' "view" of the file, and then read only the data that falls into that view. At our centre we have slides and source code for a half-day course on parallel IO; the first third or so is about the basics of MPI-IO. There are slides here and sample source code here.

要知道的第二件事是，MPI有一个内置的方式来创建分布式阵的数据类型，其中一个组合可以让你铺陈块循环数据分布;这是笼统地讨论我的回答这个问题：<一href=\"http://stackoverflow.com/questions/5716677/what-is-the-difference-between-darray-and-subarray-in-mpi\">What在MPI darray和子阵之间的差别？。

The second thing to know is that MPI has a built-in way to create "distributed array" data types, one combination of which lets you lay out a block-cyclic data distribution; that's discussed in general terms in my answer to this question: What is the difference between darray and subarray in mpi? .

因此，这意味着，如果你有一个包含一个大的矩阵的二进制文件，可以使用MPI-IO使用读 MPI_Type_create_darray ，它会通过在任务分发块环状方式。然后，它只是一个做BLACS或电话的ScaLAPACK的问题。使用矩阵向量乘法，而不是psgemm的ScaLAPACK的psgemv的示例程序上市
在回答一个<一个href=\"http://scicomp.stackexchange.com/questions/1688/how-do-i-use-scalapack-pblas-for-matrix-vector-multiplication/1713#1713\">question在计算科学堆叠交换。

So that means if you have a binary file containing a big matrix, you can read it in with mpi-io using MPI_Type_create_darray and it'll be distributed by tasks in a block-cyclic way. Then it's just a matter of doing the blacs or scalapack call. An example program of using the scalapack psgemv for matrix-vector multiplication rather than psgemm is listed in my answer to a question on the Computational Science stack exchange.

为了让你的作品如何结合在一起的想法，下面是一个简单的程序，它读取包含矩阵（方阵N个第一的大小，然后N ^ 2元）的二进制文件，然后计算使用的ScaLAPACK的（新） pssyevr 常规的特征值和特征向量。它结合了MPI-IO，darray和ScaLAPACK的东西。它在Fortran语言，但函数调用是基于C的语言一样。

To give you an idea of how the pieces fit together, the following is a simple program which reads in a binary file containing a matrix (first the size of the square matrix N and then the N^2 elements) and then calculates the eigenvalues and vectors using scalapack's (new) pssyevr routine. It combines the MPI-IO, darray, and scalapack stuff. It's in Fortran, but the function calls are the same in C-based languages.

!
! Use MPI-IO to read a diagonal matrix distributed block-cyclically,
! use Scalapack to calculate its eigenvalues, and compare
! to expected results.
!
program darray
      use mpi
      implicit none

      integer :: n, nb    ! problem size and block size
      integer :: myArows, myAcols   ! size of local subset of global array
      real :: p
      real, dimension(:), allocatable :: myA, myZ
      real, dimension(:), allocatable :: work
      integer, dimension(:), allocatable :: iwork
      real, dimension(:), allocatable :: eigenvals
      real, dimension(:), allocatable :: expected
      integer :: worksize, totwork, iworksize

      integer, external :: numroc   ! blacs routine
      integer :: me, procs, icontxt, prow, pcol, myrow, mycol  ! blacs data
      integer :: info    ! scalapack return value
      integer, dimension(9)   :: ides_a, ides_z ! scalapack array desc
      integer :: clock
      real :: calctime, iotime

      character(len=128) :: filename
      integer :: mpirank
      integer :: ierr
      integer, dimension(2) :: pdims, dims, distribs, dargs
      integer :: infile
      integer, dimension(MPI_STATUS_SIZE) :: mpistatus
      integer :: darray
      integer :: locsize, nelements
      integer(kind=MPI_ADDRESS_KIND) :: lb, locextent
      integer(kind=MPI_OFFSET_KIND) :: disp
      integer :: nargs
      integer :: m, nz

! Initialize MPI (for MPI-IO)

      call MPI_Init(ierr)
      call MPI_Comm_size(MPI_COMM_WORLD,procs,ierr)
      call MPI_Comm_rank(MPI_COMM_WORLD,mpirank,ierr)

! May as well get the process grid from MPI_Dims_create
      pdims = 0
      call MPI_Dims_create(procs, 2, pdims, ierr)
      prow = pdims(1)
      pcol = pdims(2)

! get filename
      nargs = command_argument_count()
      if (nargs /= 1) then
          print *,'Usage: darray filename'
          print *,'       Where filename = name of binary matrix file.'
          call MPI_Abort(MPI_COMM_WORLD,1,ierr)
      endif
      call get_command_argument(1, filename)

! find the size of the array we'll be using

      call tick(clock)
      call MPI_File_open(MPI_COMM_WORLD, filename, MPI_MODE_RDONLY, MPI_INFO_NULL, infile, ierr)
      call MPI_File_read_all(infile,n,1,MPI_INTEGER,mpistatus,ierr)
      call MPI_File_close(infile,ierr)

! create the darray that will read in the data.

      nb = 64
      if (nb > (N/prow)) nb = N/prow
      if (nb > (N/pcol)) nb = N/pcol
      dims = [n,n]
      distribs = [MPI_DISTRIBUTE_CYCLIC, MPI_DISTRIBUTE_CYCLIC]
      dargs = [nb, nb]

      call MPI_Type_create_darray(procs, mpirank, 2, dims, distribs, dargs, &
                                  pdims, MPI_ORDER_FORTRAN, MPI_REAL, darray, ierr)
      call MPI_Type_commit(darray,ierr)

      call MPI_Type_size(darray, locsize, ierr)
      nelements = locsize/4
      call MPI_Type_get_extent(darray, lb, locextent, ierr)

! Initialize local arrays    

      allocate(myA(nelements))
      allocate(myZ(nelements))
      allocate(eigenvals(n), expected(n))

! read in the data
      call MPI_File_open(MPI_COMM_WORLD, trim(filename), MPI_MODE_RDONLY, MPI_INFO_NULL, infile, ierr)
      disp = 4   ! skip N = 4 bytes
      call MPI_File_set_view(infile, disp, MPI_REAL, darray, "native", MPI_INFO_NULL, ierr)
      call MPI_File_read_all(infile, myA, nelements, MPI_REAL, mpistatus, ierr)
      call MPI_File_close(infile,ierr)

      iotime = tock(clock)
      if (mpirank == 0) then
          print *,'I/O time      = ', iotime
      endif

! Initialize blacs processor grid

      call tick(clock)
      call blacs_pinfo   (me,procs)

      call blacs_get     (-1, 0, icontxt)
      call blacs_gridinit(icontxt, 'R', prow, pcol)
      call blacs_gridinfo(icontxt, prow, pcol, myrow, mycol)

      myArows = numroc(n, nb, myrow, 0, prow)
      myAcols = numroc(n, nb, mycol, 0, pcol)

! Construct local arrays
! Global structure:  matrix A of n rows and n columns

! Prepare array descriptors for ScaLAPACK 
      call descinit( ides_a, n, n, nb, nb, 0, 0, icontxt, myArows, info )
      call descinit( ides_z, n, n, nb, nb, 0, 0, icontxt, myArows, info )

! Call ScaLAPACK library routine

      allocate(work(1), iwork(1))
      iwork(1) = -1
      work(1)  = -1.
      call pssyevr( 'V', 'A', 'U', n, myA, 1, 1, ides_a, -1.e20, +1.e20, 1, n, &
                     m,  nz, eigenvals, myZ, 1, 1, ides_z, work, -1,           &
                     iwork, -1, info )
      worksize  = int(work(1))/2*3
      iworksize = iwork(1)/2*3
      print *, 'Local workspace ', worksize
      call MPI_Reduce(worksize, totwork, 1, MPI_INTEGER, MPI_SUM, 0, MPI_COMM_WORLD, ierr)
      if (mpirank == 0) print *, ' total work space ', totwork
      call MPI_Reduce(iworksize, totwork, 1, MPI_INTEGER, MPI_SUM, 0, MPI_COMM_WORLD, ierr)
      if (mpirank == 0) print *, ' total iwork space ', totwork
      deallocate(work,iwork)
      allocate(work(worksize),iwork(iworksize))
      call pssyev('N','U',n,myA,1,1,ides_a,eigenvals,myZ,1,1,ides_z,work,worksize,info)
      if (info /= 0) then
         print *, 'Error: info = ', info
      else if (mpirank == 0) then
         print *, 'Calculated ', m, ' eigenvalues and ', nz, ' eigenvectors.'
      endif

! Deallocate the local arrays

      deallocate(myA, myZ)
      deallocate(work, iwork)

! End blacs for processors that are used

      call blacs_gridexit(icontxt)
      calctime = tock(clock)

! calculated the expected eigenvalues for a particular test matrix

      p = 3. + sqrt((4. * n - 3.) * (n - 1.)*3./(n+1.))
      expected(1) = p/(n*(5.-2.*n))
      expected(2) = 6./(p*(n + 1.))
      expected(3:n) = 1.

! Print results

      if (me == 0) then
        if (info /= 0) then
             print *, 'Error -- info = ', info
        endif
        print *,'Eigenvalues L_infty err = ', &
          maxval(abs(eigenvals-expected))
        print *,'Compute time = ', calctime
      endif

      deallocate(eigenvals, expected)

      call MPI_Finalize(ierr)


contains
    subroutine tick(t)
        integer, intent(OUT) :: t

        call system_clock(t)
    end subroutine tick

    ! returns time in seconds from now to time described by t
    real function tock(t)
        integer, intent(in) :: t
        integer :: now, clock_rate

        call system_clock(now,clock_rate)

        tock = real(now - t)/real(clock_rate)
    end function tock

end program darray

这篇关于MPI IO读写块循环矩阵的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

MPI IO读写块循环矩阵 [英] MPI IO Reading and Writing Block Cyclic Matrix

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

MPI IO读写块循环矩阵 [英] MPI IO Reading and Writing Block Cyclic Matrix

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭