在Fortran中加权和求和多个矩阵的最有效方法 [英] Most efficient way to weight and sum a number of matrices in Fortran

查看:178
本文介绍了在Fortran中加权和求和多个矩阵的最有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在Fortran中编写一个函数,该函数将多个具有不同权重的矩阵相乘,然后将它们加在一起以形成单个矩阵.我已经确定此过程是程序中的瓶颈(对于该程序的单个运行,将使用不同的权重对该加权进行很多次).现在,我正在尝试通过从Matlab切换到Fortran来使其运行更快.我是Fortran的新手,因此感谢您的帮助.

I am trying to write a function in Fortran that multiplies a number of matrices with different weights and then adds them together to form a single matrix. I have identified that this process is the bottleneck in my program (this weighting will be made many times for a single run of the program, with different weights). Right now I'm trying to make it run faster by switching from Matlab to Fortran. I am a newbie at Fortran so I appreciate all help.

在Matlab中,我发现进行这种计算的最快方法如下:

In Matlab the fastest way I have found to make such a computation looks like this:

function B = weight_matrices()
n = 46;
m = 1800;
A = rand(n,m,m);
w = rand(n,1);
tic;
B = squeeze(sum(bsxfun(@times,w,A),1));
toc;

分配了B的行在我的机器(Matlab R2012b,MacBook Pro 13英寸视网膜,2.5 GHz Intel Core i5、8 GB 1600 MHz DDR3)上运行约0.9秒. ,张量A在程序的整个运行过程中(初始化后)将是相同的(常量),但是w可以取任何值,而且,此处通常使用nm的值,这意味着张量A的内存大小约为1 GB.

The line where B is assigned runs in about 0.9 seconds on my machine (Matlab R2012b, MacBook Pro 13" retina, 2.5 GHz Intel Core i5, 8 GB 1600 MHz DDR3). It should be noted that for my problem, the tensor A will be the same (constant) for the whole run of the program (after initialization), but w can take any values. Also, typical values of n and m are used here, meaning that the tensor A will have a size of about 1 GB in memory.

我想用Fortran编写此代码的最清晰的方式是这样的:

The clearest way I can think of writing this in Fortran is something like this:

pure function weight_matrices(w,A) result(B)
    implicit none
    integer, parameter :: n = 46
    integer, parameter :: m = 1800
    double precision, dimension(num_sizes), intent(in) :: w
    double precision, dimension(num_sizes,msize,msize), intent(in) :: A
    double precision, dimension(msize,msize) :: B
    integer :: i
    B = 0
    do i = 1,n
        B = B + w(i)*A(i,:,:)
    end do
end function weight_matrices

使用gfortran 4.7.2使用-O3进行编译(此函数调用的时间为"call cpu_time(t)")时,此函数运行约1.4秒.如果我手动将循环展开为

This function runs in about 1.4 seconds when compiled with gfortran 4.7.2, using -O3 (function call timed with "call cpu_time(t)"). If I manually unwrap the loop into

B = w(1)*A(1,:,:)+w(2)*A(2,:,:)+ ... + w(46)*A(46,:,:)

该功能大约需要运行0.11秒.这很棒,这意味着与Matlab版本相比,我的速度提高了约8倍.但是,我仍然对可读性和性能存在一些疑问.

the function takes about 0.11 seconds to run instead. This is great and means that I get a speedup of about 8 times compared to the Matlab version. However, I still have some questions on readability and performance.

首先,我想知道是否有更快的方法来执行矩阵的加权和求和.我已经浏览了BLAS和LAPACK,但是找不到任何合适的函数.我还尝试将尺寸枚举放在A中,该尺寸将矩阵作为最后一个尺寸枚举(即,将元素从(i,j,k)切换到(k,i,j)),但这会导致代码变慢.

First, I wonder if there is an even faster way to perform this weighting and summing of matrices. I have looked through BLAS and LAPACK, but can't find any function that seems to fit. I have also tried to put the dimension in A that enumerates the matrices as the last dimension (i.e. switching from (i,j,k) to (k,i,j) for the elements), but this resulted in slower code.

第二,此快速版本不是很灵活,并且实际上看起来很丑陋,因为对于这种简单的计算而言,它的文本太多了.对于正在运行的测试,我想尝试使用不同数量的权重,以使w的长度变化,以了解它如何影响其余算法.但是,这意味着我每次都非常麻烦地重写B的分配.有什么方法可以使它更灵活,同时保持性能相同(或更优)?

Second, this fast version is not very flexible, and actually looks quite ugly, since it is so much text for such a simple computation. For the tests I am running I would like to try to use different numbers of weights, so that the length of w will vary, to see how it affects the rest of my algorithm. However, that means I quite tedious rewrite of the assignment of B every time. Is there any way to make this more flexible, while keeping the performance the same (or better)?

第三,如前所述,张量A在程序运行期间将保持不变.我已经在自己的模块中使用参数"属性在程序中设置了恒定的标量值,并将其与使用"表达式一起导入到需要它们的函数/子例程中.对张量A做等效操作的最佳方法是什么?我想告诉编译器,在张数之后,这个张量将是恒定的,以便可以进行任何相应的优化.请注意,A的大小通常约为1 GB,因此将其直接输入到源文件中是不切实际的.

Third, the tensor A will, as mentioned before, be constant during the run of the program. I have set constant scalar values in my program using the "parameter" attribute in their own module, importing them with the "use" expression into the functions/subroutines that need them. What is the best way to do the equivalent thing for the tensor A? I want to tell the compiler that this tensor will be constant, after init., so that any corresponding optimizations can be done. Note that A is typically ~1 GB in size, so it is not practical to enter it directly in the source file.

在此先感谢您的输入! :)

Thank you in advance for any input! :)

推荐答案

也许您可以尝试类似的

    do k=1,m
       do j=1,m
          B(j,k)=sum( [ ( (w(i)*A(i,j,k)), i=1,n) ])
       enddo
    enddo

方括号是(//)的较新形式,即一维矩阵(向量). sum中的术语是尺寸为(n)的矩阵,并且sum将所有这些元素求和.这正是您解包的代码所要做的(并且不完全等于您的do循环).

The square brace is a newer form of (/ /), the 1d matrix (vector). The term in sum is a matrix of dimension (n) and sum sums all of those elements. This is precisely what your unwrapped code does (and is not exactly equal to the do loop you have).

这篇关于在Fortran中加权和求和多个矩阵的最有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆