计算矩阵乘以转置矩阵的高效算法 [英] Efficient Algorithms for Computing a matrix times its transpose

查看:903
本文介绍了计算矩阵乘以转置矩阵的高效算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于一堂课,我的老师提出的一个问题是矩阵乘以其转置的算法成本.使用标准的3循环矩阵乘法算法,效率为O(N ^ 3),我想知道是否有一种方法可以操纵或利用矩阵*矩阵转置来获得更快的算法.我知道当矩阵乘以转置矩阵时,由于矩阵对称,因此必须减少矩阵的计算量,但是我想不出如何处理可能花费不到O(n ^ 3)的算法.

For a class, a question that was posed by my teacher was the algorithmic cost of multiplying a matrix times its transpose. With the standard 3 loop matrix multiplication algorithm, the efficiency is O(N^3), and I wonder if there was a way to manipulate or take advantage of matrix * matrix transpose to get a faster algorithm. I understand that when you multiply a matrix by its transpose you have to calculate less of the matrix because its symmetrical, but I can't think of how to manipulate an algorithm that could take less than O(n^3).

我知道有像Coppensmith和Straussen这样的算法,它们是更快的通用矩阵乘法算法,但是谁能对如何利用转置运算提供任何提示或见解?

i know there's algorithms like Coppensmith and Straussen that are faster general matrix multiplication algorithms but could anyone give any hints or insights on how to computationally take advantage of the transpose?

谢谢

推荐答案

截至目前,这种特殊乘法没有渐近性突破屏障的性质.

As of right now there aren't any aymptotic barrier-breaking properties of this particular multiplication.

明显的优化是利用产品的对称性.也就是说,第[i][j]个条目等于第[j][i]个条目.

The obvious optimization is to take advantage of the symmetry of the product. That is to say, the [i][j]th entry is equal to the [j][i]th entry.

对于特定于实现的优化,您可以进行大量的缓存.在大型矩阵乘法中,要花费大量的时间来往于存储器和CPU之间的数据传输.因此,CPU设计人员实现了一个智能缓存系统,该系统将最近使用的内存存储在一个称为缓存"的小内存部分中.除此之外,他们还做到了附近的内存也被缓存.这是因为大量的内存IO是由于对数组的读/写而导致的,这些数组是顺序存储的.

For implementation-specific optimizations, there is a significant amount of caching that you can do. A very significant amount of time in the multiplication of large matrices is spent transferring data to and from memory and CPU. So CPU designers implemented a smart caching system whereby recently used memory is stored in a small memory section called the cache. In addition to that, they also made it so that nearby memory is also cached. This is because a lot of the memory IO is due to reading/writing from/to arrays, which are stored sequentially.

由于矩阵的转置就是与交换索引的同一矩阵,因此在矩阵中缓存值的影响可能会超过两倍.

Since the transpose of a matrix is simply the same matrix with the indices swapped, caching a value in the matrix can have over twice the impact.

这篇关于计算矩阵乘以转置矩阵的高效算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆