计算阵列的期望值,提高速度的建议 [英] Computing expectations of an array, suggestions for speed improvements
问题描述
我有一个(N0,N1,N2,N3)矩阵V和一个(N1,N1)矩阵M.N1通常约为30-50,N0xN1xN2xN3约为1 000000.我想要一个新的Matrix EV,用于i0,i1,i2,i3条目由
I have a (N0, N1, N2, N3) Matrix V and a (N1, N1) Matrix M. N1 is typically around 30-50, N0xN1xN2xN3 is around 1 000 000. I want a new Matrix EV, for which the i0, i1, i2, i3 entry is given by
np.sum(V[i0, :, i2, i3] * M[i1, :])
我当前要实现的代码是:
My current code to achieve that is:
V_exp = np.tile(V[:, :, :, :, None], (1, 1, 1, 1, N1))
M_exp = np.tile(M.T[None, :, None, None, :], (N0, 1, N2, N3, 1))
EV = np.sum(V_exp * M_exp, axis = 1)
EV = np.rollaxis(EV, 3, 1)
我必须多次执行此操作,这是我的代码的绝对瓶颈.我想知道是否有可能提高我的代码的速度.我提出建议!
I have to perform this operation many times, and it is the absolute bottleneck of my code. I wonder if there's any potential for a speed improvement of my code. I appriciate suggestions!
推荐答案
单个调用 np.newaxis/None
,就像这样-
A single call to np.einsum
would perform all those operations in one go, after extending V
to a 5D
shape with np.newaxis/None
, like so -
EV = np.einsum('ijklm,mj->imkl',V[...,None],M)
因此,我们要避免使用任何中间数组作为内存有效的解决方案.
Thus, we are avoiding any intermediate arrays for a memory efficient solution.
说明
(1)涉及扩展尺寸的起始代码:
(1) Starting code that involved extending dimensions :
V_exp = np.tile(V[:, :, :, :, None], (1, 1, 1, 1, N1))
M_exp = np.tile(M.T[None, :, None, None, :], (N0, 1, N2, N3, 1))
output1 = V_exp * M_exp
以np.einsum
的方式,它将被翻译为:
In np.einsum
's way, it would be translated as :
np.einsum('ijklm,mj->ijklm',V[...,None],M)
请注意,我们使用mj
代替了通常的jm
来使M
对应于M.T[None, :, None, None, :]
.
Please notice that we have used mj
instead of the usual jm
for M
to correspond to M.T[None, :, None, None, :]
.
(2)接下来,我们有:
(2) Next up, we have :
EV = np.sum(V_exp * M_exp, axis = 1)
因此,我们沿axis = 1
求和,因此einsum调用需要将输出字符串说明符从->ijklm
更改为->iklm
.
Thus, we are summing along axis = 1
, so the einsum call would need the output string specifier be changed from ->ijklm
to ->iklm
.
(3)最后:
EV = np.rollaxis(EV, 3, 1)
对于rollaxis
的这种移植,我们只需要向上推 axis=3
到axis=1
的位置并向下推每个axes=1,2
位置在右边.因此,输出字符串说明符将从->iklm
更改为->imkl
以提供给我们:
For this porting of rollaxis
, we just need to push up axis=3
to axis=1
's position and push down each of axes=1,2
one position to the right. Thus, the output string specifier would change from ->iklm
to ->imkl
to give us :
np.einsum('ijklm,mj->imkl',V[...,None],M)
这篇关于计算阵列的期望值,提高速度的建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!