在Fortran中加权和求和多个矩阵的最有效方法 [英] Most efficient way to weight and sum a number of matrices in Fortran

查看：178 发布时间：2020/11/10 6:27:37 performance fortran

本文介绍了在Fortran中加权和求和多个矩阵的最有效方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在Fortran中编写一个函数，该函数将多个具有不同权重的矩阵相乘，然后将它们加在一起以形成单个矩阵.我已经确定此过程是程序中的瓶颈(对于该程序的单个运行，将使用不同的权重对该加权进行很多次).现在，我正在尝试通过从Matlab切换到Fortran来使其运行更快.我是Fortran的新手，因此感谢您的帮助.

I am trying to write a function in Fortran that multiplies a number of matrices with different weights and then adds them together to form a single matrix. I have identified that this process is the bottleneck in my program (this weighting will be made many times for a single run of the program, with different weights). Right now I'm trying to make it run faster by switching from Matlab to Fortran. I am a newbie at Fortran so I appreciate all help.

在Matlab中，我发现进行这种计算的最快方法如下:

In Matlab the fastest way I have found to make such a computation looks like this:

function B = weight_matrices()
n = 46;
m = 1800;
A = rand(n,m,m);
w = rand(n,1);
tic;
B = squeeze(sum(bsxfun(@times,w,A),1));
toc;

分配了B的行在我的机器(Matlab R2012b，MacBook Pro 13英寸视网膜，2.5 GHz Intel Core i5、8 GB 1600 MHz DDR3)上运行约0.9秒. ，张量A在程序的整个运行过程中(初始化后)将是相同的(常量)，但是w可以取任何值，而且，此处通常使用n和m的值，这意味着张量A的内存大小约为1 GB.

The line where B is assigned runs in about 0.9 seconds on my machine (Matlab R2012b, MacBook Pro 13" retina, 2.5 GHz Intel Core i5, 8 GB 1600 MHz DDR3). It should be noted that for my problem, the tensor A will be the same (constant) for the whole run of the program (after initialization), but w can take any values. Also, typical values of n and m are used here, meaning that the tensor A will have a size of about 1 GB in memory.

我想用Fortran编写此代码的最清晰的方式是这样的:

The clearest way I can think of writing this in Fortran is something like this:

pure function weight_matrices(w,A) result(B)
    implicit none
    integer, parameter :: n = 46
    integer, parameter :: m = 1800
    double precision, dimension(num_sizes), intent(in) :: w
    double precision, dimension(num_sizes,msize,msize), intent(in) :: A
    double precision, dimension(msize,msize) :: B
    integer :: i
    B = 0
    do i = 1,n
        B = B + w(i)*A(i,:,:)
    end do
end function weight_matrices

使用gfortran 4.7.2使用-O3进行编译(此函数调用的时间为"call cpu_time(t)")时，此函数运行约1.4秒.如果我手动将循环展开为

This function runs in about 1.4 seconds when compiled with gfortran 4.7.2, using -O3 (function call timed with "call cpu_time(t)"). If I manually unwrap the loop into

B = w(1)*A(1,:,:)+w(2)*A(2,:,:)+ ... + w(46)*A(46,:,:)

该功能大约需要运行0.11秒.这很棒，这意味着与Matlab版本相比，我的速度提高了约8倍.但是，我仍然对可读性和性能存在一些疑问.

the function takes about 0.11 seconds to run instead. This is great and means that I get a speedup of about 8 times compared to the Matlab version. However, I still have some questions on readability and performance.

首先，我想知道是否有更快的方法来执行矩阵的加权和求和.我已经浏览了BLAS和LAPACK，但是找不到任何合适的函数.我还尝试将尺寸枚举放在A中，该尺寸将矩阵作为最后一个尺寸枚举(即，将元素从(i,j,k)切换到(k,i,j))，但这会导致代码变慢.

First, I wonder if there is an even faster way to perform this weighting and summing of matrices. I have looked through BLAS and LAPACK, but can't find any function that seems to fit. I have also tried to put the dimension in A that enumerates the matrices as the last dimension (i.e. switching from (i,j,k) to (k,i,j) for the elements), but this resulted in slower code.

第二，此快速版本不是很灵活，并且实际上看起来很丑陋，因为对于这种简单的计算而言，它的文本太多了.对于正在运行的测试，我想尝试使用不同数量的权重，以使w的长度变化，以了解它如何影响其余算法.但是，这意味着我每次都非常麻烦地重写B的分配.有什么方法可以使它更灵活，同时保持性能相同(或更优)?

Second, this fast version is not very flexible, and actually looks quite ugly, since it is so much text for such a simple computation. For the tests I am running I would like to try to use different numbers of weights, so that the length of w will vary, to see how it affects the rest of my algorithm. However, that means I quite tedious rewrite of the assignment of B every time. Is there any way to make this more flexible, while keeping the performance the same (or better)?

第三，如前所述，张量A在程序运行期间将保持不变.我已经在自己的模块中使用参数"属性在程序中设置了恒定的标量值，并将其与使用"表达式一起导入到需要它们的函数/子例程中.对张量A做等效操作的最佳方法是什么?我想告诉编译器，在张数之后，这个张量将是恒定的，以便可以进行任何相应的优化.请注意，A的大小通常约为1 GB，因此将其直接输入到源文件中是不切实际的.

Third, the tensor A will, as mentioned before, be constant during the run of the program. I have set constant scalar values in my program using the "parameter" attribute in their own module, importing them with the "use" expression into the functions/subroutines that need them. What is the best way to do the equivalent thing for the tensor A? I want to tell the compiler that this tensor will be constant, after init., so that any corresponding optimizations can be done. Note that A is typically ~1 GB in size, so it is not practical to enter it directly in the source file.

在此先感谢您的输入！ :)

Thank you in advance for any input! :)

在Fortran中加权和求和多个矩阵的最有效方法 [英] Most efficient way to weight and sum a number of matrices in Fortran

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Fortran中加权和求和多个矩阵的最有效方法 [英] Most efficient way to weight and sum a number of matrices in Fortran

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭