使用DGEMM的BLAS LDB [英] BLAS LDB using DGEMM

查看:73
本文介绍了使用DGEMM的BLAS LDB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将矩阵乘以D * W',其中W'是W的转置版本.

I want to multiply matrices by D*W', where W' is a transposed version of W.

虽然我将使用DGEMM,但在@IanBush的帮助下,我发现在这种情况下LDB应该是矩阵W的行数而不是列数.这种情况的代码是

While I'll use DGEMM I'l figured out with the help of @IanBush that the LDB in this case should be number of rows of matrix W instead of number of columns. The code for this case is

  Call dgemm('n', 't', N1, M1, N1, 1.0_wp, D, N1, W, M1, 0.0_wp, c, n1)

其中n1和m1是矩阵的维数

where n1 and m1 are dimensions of my matrices

Dimensions of the matrices:
W = M1*N1
D = N1*N1

如官方文档中所述

      LDB is INTEGER
       On entry, LDB specifies the first dimension of B as declared
       in the calling (sub) program. When  TRANSB = 'N' or 'n' then
       LDB must be at least  max( 1, k ), otherwise  LDB must be at
       least  max( 1, n ).

为什么会这样?

推荐答案

对dgemm的调用包含两组整数.首先是定义数学描述的矩阵维数.在 http://www.netlib上查看BLAS文档. .org/lapack/explore-html/d7/d2b/dgemm_8f.html ,我们可以看到该例程的自变量定义为(并以一种有助于我记住其工作方式的方式对其进行了格式化):

The call to dgemm contains two sets of integers. The first is to define the dimensions of the matrices as described by the maths. Looking at the documentation for BLAS at http://www.netlib.org/lapack/explore-html/d7/d2b/dgemm_8f.html we can see the arguments to the routine defined as (and formatting it in a way that helps me remember how it works):

         Subroutine dgemm (TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, &
                                                           B, LDB, & 
                                                     BETA, C, LDC)

用于定义数学尺寸的整数集为M, N, K在此集合中

The set of integers to define the mathematical dimensions is M, N, K In this set

  • M是结果矩阵的第一个数学维,C
  • N是结果矩阵C
  • 的第二个数学维
  • K是产生结果矩阵C的单个元素所需的点积的长度.
  • M is the first mathematical dimension of the result matrix, C
  • N is the second mathematical dimension of the result matrix C
  • K is the length of the dot product required to produce an individual element of the result matrix C

因此,M, N, K完全与问题的数学有关,仅此而已.但是我们在计算机上,所以还有第二个问题-计算机内存中固有的一维对象的二维矩阵分别如何布置?现在,Fortran将其对象按主要列"顺序存储.这意味着在内存中,如果您具有元素A(i,j),则下一个内存位置将在我们不是列尾的情况下保持A(i + 1,j);如果在列的末尾,则将保持A(1,j + 1)我们是.因此,矩阵的布局是第一个索引移动最快".要定义二维对象的布局,我们需要告诉程序的是,从A(i,j)跳到A(i,j + 1)时需要跳过A的多少个元素-就是这样文档中的"c5>"是领先尺寸"的编号.因此,这仅仅是声明或分配的矩阵中的行数.

So M, N, K are purely to do with the maths of the problem, nothing else. But we are on a computer, so there is a second issue - how is each of the two dimensional matrices laid out in the computer's memory, which is an inherently one dimensional object? Now Fortran stores its objects in Column Major order. This means that in memory if you have element A( i, j ) the next memory location will hold A( i + 1, j ) if we are not an the end of a column, or A( 1, j + 1 ) if we are. So the matrix is laid out with the 'first index moving fastest'. To define this layout of a 2 dimensional object all we need to tell the program is how many elements of A you need to jump over to go from A( i, j ) to A( i, j + 1 ) - and it is this number which is the 'leading dimension', LDA, LDB, LDC in the documentation. And thus it is simply the number of rows in the matrix as declared or allocated.

现在让我们看看您引用的文档

Now let's look at the documentation you quote

  LDB is INTEGER
   On entry, LDB specifies the first dimension of B as declared
   in the calling (sub) program. When  TRANSB = 'N' or 'n' then
   LDB must be at least  max( 1, k ), otherwise  LDB must be at
   least  max( 1, n ).

LDB是声明的矩阵B中的行数.因此

LDB is the number of rows in the matrix B is it is declared. Thus

  1. 如果不对B进行转置,则B的每一列都将包含点积,该点积具有A的行或列,具体取决于A的转置选项.数学尺寸表示点积为k长.因此,B中的行数必须至少为k才能使数学正确,因此LDB必须至少为k
  2. 如果B被换位,则B的数学第一维也将成为结果矩阵的数学第二维,这由N给出.因此,在这种情况下LDB必须至少为N

这样可以回答大部分问题,并且如果您始终将一个声明的所有矩阵乘以另一个第二个矩阵的所有前缀,则前导维将只是声明的每个矩阵的最脏维.但是至少"意味着您可以将数组的位相乘.请考虑以下内容,并通过仔细分离定义数学的整数和定义内存布局的整数,您能否弄清楚为什么它能做到这一点?请注意,参数列表中的a(2,2)意味着我们从该元素开始"矩阵-现在仔细考虑前导维告诉您的内容,您应该能够弄清楚其工作原理.

So that answers most of it, and if you are always multiplying all of one matrix as declared by all of a second matrix the leading dimension will simply be the dirst dimension of each matrix as declared. But the 'at least' means you can multiply bits of arrays together. Consider the following, and by carefully separating those integers which define the maths, and those which define the memory layout, can you work out why it does what it does? Note that a( 2, 2 ) in the argument list means we "start" the matrix at this element - now think carefully what the leading dimension tells you and you should be able to sort out how this works.

ian@eris:~/work/stack$ cat matmul2.f90
Program sirt

  Integer, Parameter :: wp = Kind( 1.0d0 )

  Real( wp ), Dimension( 1:5, 1:5 ) :: A
  Real( wp ), Dimension( 1:3, 1:3 ) :: B
  Real( wp ), Dimension( 1:4, 1:4 ) :: C

  Integer :: i

  A = 0.0_wp
  Do i = 1, 5
     A( i, i ) = Real( i, wp )
  End Do
  Write( *, * ) 'A = '
  Do i = 1, Size( A, Dim = 1 )
     Write( *, '( 100( f4.1, 1x ) )' ) A( i, : )
  End Do

  B = 0.0_wp
  B( 1, 1 ) = 1.0_wp
  B( 3, 2 ) = 1.0_wp
  B( 2, 3 ) = 1.0_wp
  Write( *, * ) 'B = '
  Do i = 1, Size( B, Dim = 1 )
     Write( *, '( 100( f4.1, 1x ) )' ) B( i, : )
  End Do

  ! Lazy - should really just initialise only the bits of C that are NOT touched
  ! by the dgemm
  C = 0.0_wp

  Call dgemm('N', 'N', 3, 3, 3, 1.0_wp, A( 2, 2 ), Size( A, Dim = 1 ), &
                                        B        , Size( B, Dim = 1 ), &
                                0.0_wp, C        , Size( C, Dim = 1 ) )

  Write( *, * ) 'C after dgemm'
  Write( *, * ) 'B = '
  Do i = 1, Size( C, Dim = 1 )
     Write( *, '( 100( f4.1, 1x ) )' ) C( i, : )
  End Do


End Program sirt
ian@eris:~/work/stack$ gfortran-8  -std=f2008 -fcheck=all -Wall -Wextra -pedantic -O matmul2.f90 -lblas
/usr/bin/ld: warning: libgfortran.so.4, needed by //usr/lib/x86_64-linux-gnu/libopenblas.so.0, may conflict with libgfortran.so.5
ian@eris:~/work/stack$ ./a.out
 A = 
 1.0  0.0  0.0  0.0  0.0
 0.0  2.0  0.0  0.0  0.0
 0.0  0.0  3.0  0.0  0.0
 0.0  0.0  0.0  4.0  0.0
 0.0  0.0  0.0  0.0  5.0
 B = 
 1.0  0.0  0.0
 0.0  0.0  1.0
 0.0  1.0  0.0
 C after dgemm
 B = 
 2.0  0.0  0.0  0.0
 0.0  0.0  3.0  0.0
 0.0  4.0  0.0  0.0
 0.0  0.0  0.0  0.0

这种将两个较大矩阵的一部分相乘的用法非常常见,特别是在分块线性代数算法中,数学和布局维的分离使您可以做到这一点

This use for multiplying a part of two larger matrices together is surprisingly common, especially in blocked linear algebra algorithms, and the separation of the mathematical and layout dimensions allows you to do it

这篇关于使用DGEMM的BLAS LDB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆