为什么这些矩阵乘法的性能如此不同? [英] Why is the performance of these matrix multiplications so different?

查看：225 发布时间：2020/5/7 19:44:32 java performance matrix-multiplication

本文介绍了为什么这些矩阵乘法的性能如此不同?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我用Java编写了两个矩阵类，只是为了比较它们的矩阵乘法的性能.一类(Mat1)存储一个double[][] A成员，其中矩阵的行i是A[i].另一个类(Mat2)存储A和T，其中T是A的转置.

I wrote two matrix classes in Java just to compare the performance of their matrix multiplications. One class (Mat1) stores a double[][] A member where row i of the matrix is A[i]. The other class (Mat2) stores A and T where T is the transpose of A.

假设我们有一个方矩阵M，我们想要M.mult(M)的乘积.将产品称为P.

Let's say we have a square matrix M and we want the product of M.mult(M). Call the product P.

当M是Mat1实例时，使用的算法很简单:

When M is a Mat1 instance the algorithm used was the straightforward one:

P[i][j] += M.A[i][k] * M.A[k][j]
    for k in range(0, M.A.length)

在我使用的M是Mat2的情况下:

In the case where M is a Mat2 I used:

P[i][j] += M.A[i][k] * M.T[j][k]

，因为T[j][k]==A[k][j]，所以算法相同.在1000x1000矩阵上，第二种算法在我的计算机上大约需要1.2秒，而第一种算法至少需要25秒.我期望第二个更快，但不会那么快.问题是，为什么它要快得多?

which is the same algorithm because T[j][k]==A[k][j]. On 1000x1000 matrices the second algorithm takes about 1.2 seconds on my machine, while the first one takes at least 25 seconds. I was expecting the second one to be faster, but not by this much. The question is, why is it this much faster?

我唯一的猜测是，第二种算法可以更好地利用CPU缓存，因为数据以大于1个字的块的形式被拉入缓存，而第二种算法则通过仅遍历行而从中受益，而第一种算法则忽略了通过立即转到下面的行(在内存中大约有1000个字，因为数组以行主顺序存储)将数据拉入缓存，没有数据被缓存.

My only guess is that the second one makes better use of the CPU caches, since data is pulled into the caches in chunks larger than 1 word, and the second algorithm benefits from this by traversing only rows, while the first ignores the data pulled into the caches by going immediately to the row below (which is ~1000 words in memory, because arrays are stored in row major order), none of the data for which is cached.

我问一个人，他认为这是因为内存访问模式更友好(即第二个版本将导致更少的TLB软错误).我一点都没有想到，但是我可以看到它如何导致更少的TLB故障.

I asked someone and he thought it was because of friendlier memory access patterns (i.e. that the second version would result in fewer TLB soft faults). I didn't think of this at all but I can sort of see how it results in fewer TLB faults.

那是什么?还是性能差异还有其他原因吗?

So, which is it? Or is there some other reason for the performance difference?

为什么这些矩阵乘法的性能如此不同? [英] Why is the performance of these matrix multiplications so different?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

为什么这些矩阵乘法的性能如此不同? [英] Why is the performance of these matrix multiplications so different?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭