写双(三人间)总和内的产品? [英] Write double (triple) sum as inner product?
问题描述
由于我的 np.dot
由OpenBlas和加速的openmpi如果有写的双总和可能性我想知道
Since my np.dot
is accelerated by OpenBlas and Openmpi I am wondering if there was a possibility to write the double sum
for i in range(N):
for j in range(N):
B[k,l] += A[i,j,k,l] * X[i,j]
作为内积。就在那一刻,我使用
as an inner product. Right at the moment I am using
B = np.einsum("ijkl,ij->kl",A,X)
但遗憾的是它是相当缓慢的,只使用一个处理器。
任何想法?
but unfortunately it is quite slow and only uses one processor. Any ideas?
编辑:
我给出基准到现在一个简单的例子答案,好像他们都在相同的数量级顺序为:
I benchmarked the answers given until now with a simple example, seems like they are all in the same order of magnitude:
A = np.random.random([200,200,100,100])
X = np.random.random([200,200])
def B1():
return es("ijkl,ij->kl",A,X)
def B2():
return np.tensordot(A, X, [[0,1], [0, 1]])
def B3():
shp = A.shape
return np.dot(X.ravel(),A.reshape(shp[0]*shp[1],1)).reshape(shp[2],shp[3])
%timeit B1()
%timeit B2()
%timeit B3()
1 loops, best of 3: 300 ms per loop
10 loops, best of 3: 149 ms per loop
10 loops, best of 3: 150 ms per loop
根据这些结果,我会选择np.einsum结论,因为它的语法仍然是最可读的,并与其他两个的改善只是一个因素2倍。我想下一步将是外部化code到C或FORTRAN。
Concluding from these results I would choose np.einsum, since its syntax is still the most readable and the improvement with the other two are only a factor 2x. I guess the next step would be to externalize the code into C or fortran.
推荐答案
您可以使用 np.tensordot()
:
np.tensordot(A, X, [[0,1], [0, 1]])
这确实使用多个内核。
which does use multiple cores.
编辑:这是insteresting怎么看 np.einsum
和 np.tensordot
规模越来越大的规模时,输入数组:
it is insteresting to see how np.einsum
and np.tensordot
scale when increasing the size of the input arrays:
In [18]: for n in range(1, 31):
....: A = np.random.rand(n, n+1, n+2, n+3)
....: X = np.random.rand(n, n+1)
....: print(n)
....: %timeit np.einsum('ijkl,ij->kl', A, X)
....: %timeit np.tensordot(A, X, [[0, 1], [0, 1]])
....:
1
1000000 loops, best of 3: 1.55 µs per loop
100000 loops, best of 3: 8.36 µs per loop
...
11
100000 loops, best of 3: 15.9 µs per loop
100000 loops, best of 3: 17.2 µs per loop
12
10000 loops, best of 3: 23.6 µs per loop
100000 loops, best of 3: 18.9 µs per loop
...
21
10000 loops, best of 3: 153 µs per loop
10000 loops, best of 3: 44.4 µs per loop
和它变得清晰使用 tensordot
较大阵列的优势。
and it becomes clear the advantage of using tensordot
for larger arrays.
这篇关于写双(三人间)总和内的产品?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!