写双(三人间)总和内的产品? [英] Write double (triple) sum as inner product?

查看:144
本文介绍了写双(三人间)总和内的产品?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于我的 np.dot 由OpenBlas和加速的openmpi如果有写的双总和可能性我想知道

Since my np.dot is accelerated by OpenBlas and Openmpi I am wondering if there was a possibility to write the double sum

for i in range(N):
     for j in range(N):
         B[k,l] += A[i,j,k,l] * X[i,j]

作为内积。就在那一刻,我使用

as an inner product. Right at the moment I am using

B = np.einsum("ijkl,ij->kl",A,X)

但遗憾的是它是相当缓慢的,只使用一个处理器。
任何想法?

but unfortunately it is quite slow and only uses one processor. Any ideas?

编辑:
我给出基准到现在一个简单的例子答案,好像他们都在相同的数量级顺序为:

I benchmarked the answers given until now with a simple example, seems like they are all in the same order of magnitude:

A = np.random.random([200,200,100,100])
X = np.random.random([200,200])
def B1():
    return es("ijkl,ij->kl",A,X) 
def B2():
    return np.tensordot(A, X, [[0,1], [0, 1]])
def B3():
    shp = A.shape
    return np.dot(X.ravel(),A.reshape(shp[0]*shp[1],1)).reshape(shp[2],shp[3])

%timeit B1()
%timeit B2()
%timeit B3()

1 loops, best of 3: 300 ms per loop
10 loops, best of 3: 149 ms per loop
10 loops, best of 3: 150 ms per loop

根据这些结果,我会选择np.einsum结论,因为它的语法仍然是最可读的,并与其他两个的改善只是一个因素2倍。我想下一步将是外部化code到C或FORTRAN。

Concluding from these results I would choose np.einsum, since its syntax is still the most readable and the improvement with the other two are only a factor 2x. I guess the next step would be to externalize the code into C or fortran.

推荐答案

您可以使用 np.tensordot()

np.tensordot(A, X, [[0,1], [0, 1]])

这确实使用多个内核。

which does use multiple cores.

编辑:这是insteresting怎么看 np.einsum np.tensordot 规模越来越大的规模时,输入数组:

it is insteresting to see how np.einsum and np.tensordot scale when increasing the size of the input arrays:

In [18]: for n in range(1, 31):
   ....:     A = np.random.rand(n, n+1, n+2, n+3)
   ....:     X = np.random.rand(n, n+1)
   ....:     print(n)
   ....:     %timeit np.einsum('ijkl,ij->kl', A, X)
   ....:     %timeit np.tensordot(A, X, [[0, 1], [0, 1]])
   ....:
1
1000000 loops, best of 3: 1.55 µs per loop
100000 loops, best of 3: 8.36 µs per loop
...
11
100000 loops, best of 3: 15.9 µs per loop
100000 loops, best of 3: 17.2 µs per loop
12
10000 loops, best of 3: 23.6 µs per loop
100000 loops, best of 3: 18.9 µs per loop
...
21
10000 loops, best of 3: 153 µs per loop
10000 loops, best of 3: 44.4 µs per loop

和它变得清晰使用 tensordot 较大阵列的优势。

and it becomes clear the advantage of using tensordot for larger arrays.

这篇关于写双(三人间)总和内的产品?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆