在numpy中计算矩阵乘积的迹线的最佳方法是什么? [英] What is the best way to compute the trace of a matrix product in numpy?
问题描述
如果我有numpy数组A
和B
,则可以使用以下公式计算其矩阵乘积的迹线:
If I have numpy arrays A
and B
, then I can compute the trace of their matrix product with:
tr = numpy.linalg.trace(A.dot(B))
但是,当轨迹中仅使用对角线元素时,矩阵乘法A.dot(B)
不必要地计算矩阵乘积中的所有非对角线条目.相反,我可以做类似的事情:
However, the matrix multiplication A.dot(B)
unnecessarily computes all of the off-diagonal entries in the matrix product, when only the diagonal elements are used in the trace. Instead, I could do something like:
tr = 0.0
for i in range(n):
tr += A[i, :].dot(B[:, i])
但是这会在Python代码中执行循环,并且不如numpy.linalg.trace
那样明显.
but this performs the loop in Python code and isn't as obvious as numpy.linalg.trace
.
是否有更好的方法来计算numpy数组的矩阵乘积的轨迹?最快或最惯用的方法是什么?
Is there a better way to compute the trace of a matrix product of numpy arrays? What is the fastest or most idiomatic way to do this?
推荐答案
您可以通过将中间存储空间仅减少到对角线元素来改进@Bill的解决方案:
You can improve on @Bill's solution by reducing intermediate storage to the diagonal elements only:
from numpy.core.umath_tests import inner1d
m, n = 1000, 500
a = np.random.rand(m, n)
b = np.random.rand(n, m)
# They all should give the same result
print np.trace(a.dot(b))
print np.sum(a*b.T)
print np.sum(inner1d(a, b.T))
%timeit np.trace(a.dot(b))
10 loops, best of 3: 34.7 ms per loop
%timeit np.sum(a*b.T)
100 loops, best of 3: 4.85 ms per loop
%timeit np.sum(inner1d(a, b.T))
1000 loops, best of 3: 1.83 ms per loop
另一种选择是使用np.einsum
并且根本没有显式的中间存储:
Another option is to use np.einsum
and have no explicit intermediate storage at all:
# Will print the same as the others:
print np.einsum('ij,ji->', a, b)
在我的系统上,它的运行速度比使用inner1d
慢一些,但可能不适用于所有系统,请参见
On my system it runs slightly slower than using inner1d
, but it may not hold for all systems, see this question:
%timeit np.einsum('ij,ji->', a, b)
100 loops, best of 3: 1.91 ms per loop
这篇关于在numpy中计算矩阵乘积的迹线的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!