使用numpy向量化方程式 [英] Vectorising an equation using numpy
问题描述
我正在尝试将上述公式实现为矢量化形式.
K=3
在这里,X
是150x4
numpy数组. mu
是3x4
numpy数组. Gamma
是150x3
numpy数组. Sigma
是kx4x4
numpy数组.因此Sigma[k]
是4x4
numpy数组. N=150
I am trying to implement the above formula as a vectorised form.
K=3
here, X
is 150x4
numpy array. mu
is 3x4
numpy array. Gamma
is a 150x3
numpy array. Sigma
is a kx4x4
numpy array. Therefore Sigma[k]
is a 4x4
numpy array. N=150
N_k = np.sum(Gamma, axis=0)
for k in range(K): # Correct
x_new = X - mu[k] #Correct
a = np.dot(x_new.T, x_new) #Incorrect from here I feel
for i in range(len(data)):
sigma[k] = Gamma[i][k] * a
sigma[k]=sigma[k]/N_k #totally incorrect
该如何解决?
推荐答案
产品总和?听起来像是 np.einsum 的工作:>
A sum of products? sounds like a job for np.einsum:
import numpy as np
N = 150
K = 3
M = 4
x = np.random.random((N,M))
mu = np.random.random((K,M))
gamma = np.random.random((N,K))
xbar = x-mu[:,None,:] # shape (3, 150, 4)
sigma = np.einsum('nk,knm,kno->kmo', gamma, xbar, xbar)
sigma /= gamma.sum(axis=0)[:,None,None]
解码'nk,knm,kno->kmo'
:
Decoding 'nk,knm,kno->kmo'
:
此下标规范在数组(->
)的左侧具有三个组成部分,在数组的右侧具有一个组成部分.
This subscript specification has three components to the left of the array (->
) followed by one component to the right.
左侧的三个部分与gamma
,xbar
和xbar
的下标相对应,操作数被传递给np.einsum
.
The three components on the left correspond with the subscripts for gamma
, xbar
and xbar
, the operands being passed to np.einsum
.
gamma
具有下标nk
,就像您发布的公式中一样.
xbar
具有形状(3,150,4).您可以认为它具有下标knm
,其中k
和n
的含义与您发布的公式中的含义相同,并且m
是表示长度为4的轴的下标,在下面未明确提及您的公式,但是显然可以找到您对数组形状的描述.
gamma
has subscript nk
, just as in the formula you posted.
xbar
has shape (3, 150, 4). You can think of it as having subscript knm
, where k
and n
have the same meaning as in the formula you posted, and m
is a subscript representing the axis of length 4 which is not explicitly mentioned in your formula, but is apparently there given your description of the shape of the arrays.
现在,第三个下标组件为kno
.使用o
下标是因为o
与m
下标具有相同的作用,但是我们不想对m
求和.实际上,我们希望独立地迭代m
和o
下标,而不是逐步进行迭代.因此,我们给第三个下标不同的字母.
Now the third subscript component is kno
. The o
subscript is used because the o
plays the same role as the m
subscript, but we don't want summation over m
. In fact, we want the m
and o
subscripts to be iterated over independently, not in lock-step. Hence we give the third subscript different letters.
请注意,n
出现在左侧的下标(nk, knm, kno
)中,但没有出现在右侧的kmo
中.这告诉np.einsum
对n
求和.
Notice that n
appears in the subscripts on the left (nk, knm, kno
), but does not appear on the right (in kmo
). That tells np.einsum
to sum over n
.
k
出现在左侧的和的下标中.这告诉np.einsum
我们希望逐步进行k
下标的操作,但是(由于它出现在右侧)我们不希望对k
求和.
The k
appears in the subscripts on the left and on the right. This tells np.einsum
that we wish to advance the k
subscript in lock-step, but (since it appears on the right) we don't want to sum over k
.
由于kmo
出现在右侧,因此这些下标仍保留在结果中.这导致sigma
具有形状(K,M,M)
(即(3,4,4)).
Since kmo
appears on the right, these subscripts remain in the result. This results in sigma
having shape (K,M,M)
(i.e. (3,4,4)).
这篇关于使用numpy向量化方程式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!