Fortran矩阵操作的性能 [英] performance of fortran matrix operations

查看:291
本文介绍了Fortran矩阵操作的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在某处使用Fortran而不是C,而且我对Fortran很新。我正在尝试做一些大计算,但与C相比非常慢(可能是10倍或更多,我正在使用英特尔的编译器)。我认为原因是Fortran将矩阵保留为列主要格式,并且我正在尝试执行像sum(matrix(i,j,:))这样的操作,因为它是列主要的,可能这会非常低效地使用缓存(可能不会完全使用)。但是,我不确定这是否是真正的原因(因为我对Fortran知之甚少)。问题是,Fortran中的约定是对列向量执行操作而不是行向量吗?

(顺便说一句:我检查了Fortran已经使用英特尔的LAPACK库的速度,这是非常快的,所以它不涉及任何编译器或生成问题。)



谢谢。



Mete

解决方案

尝试在做矩阵运算时更改循环顺序,例如如果你在C中有类似的东西:

  for(i = 0; i  {
for(j = 0; j {
// //矩阵运算A [i] [j]
}
}

然后在Fortran中希望将j(列)循环作为外部循环,将i(row)循环作为内部循环。

另一种实现相同目的的方法是保持循环原样,但改变数组的定义,例如如果在C中它是 A [x] [y] [z] [t] 那么在FORTRAN中使它 A [t] [z] [y假设 t 是变化最快的循环索引,并且 x > [x] 最慢的。


I need to use Fortran instead of C somewhere and I am very new to Fortran. I am trying to do some big calculations but it is quite slow comparing to C (maybe 10x or more and I am using Intel's compilers for both). I think the reason is Fortran keeps the matrix in column major format, and I am trying to do operations like sum(matrix(i, j, :)), because it is column major, probably this uses the cache very inefficiently (probably not using at all). However, I am not sure if this is the actual reason (since I know so less about Fortran). Question is, the convention in Fortran is to do operations on column vectors instead of row vectors ?

(BTW: I checked the speed of Fortran already using Intel's LAPACK libraries, and it is quite fast, so it is not related to any compiler or build issue.)

Thanks.

Mete

解决方案

Try changing the order of your loops when doing matrix operations, e.g. if you have something like this in C:

for (i = 0; i < M; ++i) // for each row
{
    for (j = 0; j < N; ++j) // for each col
    {
        // matrix operations on e.g. A[i][j]
    }
}

then in Fortran you want the j (column) loop as the outer loop and the i (row) loop as the inner loop.

An alternative approach, which achieves the same thing, is to keep the loops as they are but change the definition of the array, e.g. if in C it's A[x][y][z][t] then in FORTRAN make it A[t][z][y][x], assuming that t is the fastest varying loop index, and x the slowest.

这篇关于Fortran矩阵操作的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆