fortran矩阵向量乘法优化 [英] fortran matrix vector multiplication optimization
问题描述
我试图测量Fortran中不同矩阵向量乘法方案的差异.我实际上已经编写了以下代码: http://pastebin.com/dmKXdnX6
I tried to measure the difference of different matrix-vector-multiplication schemes in Fortran. I have actually written the following code: http://pastebin.com/dmKXdnX6
优化版本"旨在通过交换循环以访问矩阵元素来尊重矩阵的内存布局.所提供的代码应使用gfortran进行编译,并以以下相当令人惊讶的结果运行:
The 'optimized version' is meant to respect the memory layout of the matrix, by swapping the loops to access the matrix-elements. The provided code should compile with gfortran and it runs with the following rather surprising results:
Vectors match! Calculations are OK.
Optimized time: 0.34133333333333332
Naive time: 1.4133333333333331E-002
Ratio (t_optimized/t_naive): 24.150943396226417
我可能犯了一个令人尴尬的错误,但我无法发现它.我希望其他人能帮助我.
I've probably made an embarrassing mistake, but I'm unable to spot it. I hope someone else can help me.
我知道fortran提供了一些优化的版本,但出于好奇,我正在对此进行评估.
I know that there are optimized versions provided by fortran, but I'm measuring this just out of curiosity.
预先感谢.
推荐答案
好吧,这是一个简单的偏执问题:
Well, it's a simple matter of paranthesis:
t_optimized = t2-t1/iterations
肯定是错误的...您可能是说
is most certainly wrong... You probably mean
t_optimized = (t2-t1)/iterations
因此,我得到了〜2的加速.
With that I get a speed-up of ~2.
我需要纠正/调整的其他几件事:
A couple of other things I needed to correct/adjust:
- 第一个循环是错误的,您正在尝试将元素设置在边界之外.它应显示为:
A(j,i) = (-1.0)**(i-j)
- 现代编译器非常智能.他们可能注意到您没有在循环体内更改函数调用的输入.然后,他们可以优化整个循环!为了防止这种情况,我插入了以下行:
do i = 1,iterations
call optimized(A, m, n, x, y1)
x(1:n) = y1
end do
(与 y2
相同).不要忘记在每个基准测试开始时重新初始化 x
.
(and the same for y2
). Don't forget to re-initialize x
at the beginning of each benchmark.
- 不要使用
;
那么多-除非您想在一行上放置多个语句,否则不需要使用 - 不要在Fortran中使用制表符-有些编译器不喜欢它-而是使用空格
- Don't use
;
that much - it is not required unless you want to put multiple statements on one line - Don't use tabs in Fortran - some compilers don't like it - use whitespaces instead
这篇关于fortran矩阵向量乘法优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!