矩阵的每一列与向量之间的numpy协方差 [英] numpy covariance between each column of a matrix and a vector
问题描述
基于这篇文章,我可以使用np.cov((x,y), rowvar=0)
获得两个向量之间的协方差.我有一个矩阵MxN和一个向量Mx1.我想找到矩阵的每一列与给定向量之间的协方差.我知道我可以使用for
循环来编写.我想知道是否可以使用np.cov()
直接获取结果.
Based on this post, I can get covariance between two vectors using np.cov((x,y), rowvar=0)
. I have a matrix MxN and a vector Mx1. I want to find the covariance between each column of the matrix and the given vector. I know that I can use for
loop to write. I was wondering if I can somehow use np.cov()
to get the result directly.
推荐答案
正如沃伦·韦克瑟(Warren Weckesser)所说,numpy.cov(X, Y)
不适合该工作,因为它只会将一个M×(N + 1)数组连接在一起并通过(N + 1)协方差矩阵找到巨大的(N + 1).但是,我们将始终具有协方差的定义,并且易于使用:>
As Warren Weckesser said, the numpy.cov(X, Y)
is a poor fit for the job because it will simply join the arrays in one M by (N+1) array and find the huge (N+1) by (N+1) covariance matrix. But we'll always have the definition of covariance and it's easy to use:
A = np.sqrt(np.arange(12).reshape(3, 4)) # some 3 by 4 array
b = np.array([[2], [4], [5]]) # some 3 by 1 vector
cov = np.dot(b.T - b.mean(), A - A.mean(axis=0)) / (b.shape[0]-1)
这将返回A的每一列与b的协方差.
This returns the covariances of each column of A with b.
array([[ 2.21895142, 1.53934466, 1.3379221 , 1.20866607]])
我使用的公式用于样本协方差(也是numpy.cov计算的),因此除以(b.shape [0]-1).如果除以b.shape[0]
,则会得到未调整的人口协方差.
The formula I used is for sample covariance (which is what numpy.cov computes, too), hence the division by (b.shape[0] - 1). If you divide by b.shape[0]
you get the unadjusted population covariance.
为进行比较,使用np.cov
进行相同的计算:
For comparison, the same computation using np.cov
:
import numpy as np
A = np.sqrt(np.arange(12).reshape(3, 4))
b = np.array([[2], [4], [5]])
np.cov(A, b, rowvar=False)[-1, :-1]
相同的输出,但是需要大约两倍的时间(对于大型矩阵,差异会大得多).最后进行切片是因为np.cov
计算5 x 5矩阵,其中只有最后一行的前4个条目才是您想要的.其余部分是A与其自身或b与其自身的协方差.
Same output, but it takes about twice this long (and for large matrices, the difference will be much larger). The slicing at the end is because np.cov
computes a 5 by 5 matrix, in which only the first 4 entries of the last row are what you wanted. The rest is covariance of A with itself, or of b with itself.
通过除以方差的平方根获得相关系数.请注意前面提到的-1调整:numpy.var
默认情况下不会进行调整,要使其变为现实,您需要ddof=1
参数.
The correlation coefficientis obtained by dividing by square roots of variances. Watch out for that -1 adjustment mentioned earlier: numpy.var
does not make it by default, to make it happen you need ddof=1
parameter.
corr = cov / np.sqrt(np.var(b, ddof=1) * np.var(A, axis=0, ddof=1))
检查输出是否与效率较低的版本相同
Check that the output is the same as the less efficient version
np.corrcoef(A, b, rowvar=False)[-1, :-1]
这篇关于矩阵的每一列与向量之间的numpy协方差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!