矩阵的每一列与向量之间的numpy协方差 [英] numpy covariance between each column of a matrix and a vector

查看:236
本文介绍了矩阵的每一列与向量之间的numpy协方差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基于这篇文章,我可以使用np.cov((x,y), rowvar=0)获得两个向量之间的协方差.我有一个矩阵MxN和一个向量Mx1.我想找到矩阵的每一列与给定向量之间的协方差.我知道我可以使用for循环来编写.我想知道是否可以使用np.cov()直接获取结果.

Based on this post, I can get covariance between two vectors using np.cov((x,y), rowvar=0). I have a matrix MxN and a vector Mx1. I want to find the covariance between each column of the matrix and the given vector. I know that I can use for loop to write. I was wondering if I can somehow use np.cov() to get the result directly.

推荐答案

正如沃伦·韦克瑟(Warren Weckesser)所说,numpy.cov(X, Y)不适合该工作,因为它只会将一个M×(N + 1)数组连接在一起并通过(N + 1)协方差矩阵找到巨大的(N + 1).但是,我们将始终具有协方差的定义,并且易于使用:

As Warren Weckesser said, the numpy.cov(X, Y) is a poor fit for the job because it will simply join the arrays in one M by (N+1) array and find the huge (N+1) by (N+1) covariance matrix. But we'll always have the definition of covariance and it's easy to use:

A = np.sqrt(np.arange(12).reshape(3, 4))   # some 3 by 4 array 
b = np.array([[2], [4], [5]])              # some 3 by 1 vector
cov = np.dot(b.T - b.mean(), A - A.mean(axis=0)) / (b.shape[0]-1)

这将返回A的每一列与b的协方差.

This returns the covariances of each column of A with b.

array([[ 2.21895142,  1.53934466,  1.3379221 ,  1.20866607]])

我使用的公式用于样本协方差(也是numpy.cov计算的),因此除以(b.shape [0]-1).如果除以b.shape[0],则会得到未调整的人口协方差.

The formula I used is for sample covariance (which is what numpy.cov computes, too), hence the division by (b.shape[0] - 1). If you divide by b.shape[0] you get the unadjusted population covariance.

为进行比较,使用np.cov进行相同的计算:

For comparison, the same computation using np.cov:

import numpy as np
A = np.sqrt(np.arange(12).reshape(3, 4))
b = np.array([[2], [4], [5]])
np.cov(A, b, rowvar=False)[-1, :-1]

相同的输出,但是需要大约两倍的时间(对于大型矩阵,差异会大得多).最后进行切片是因为np.cov计算5 x 5矩阵,其中只有最后一行的前4个条目才是您想要的.其余部分是A与其自身或b与其自身的协方差.

Same output, but it takes about twice this long (and for large matrices, the difference will be much larger). The slicing at the end is because np.cov computes a 5 by 5 matrix, in which only the first 4 entries of the last row are what you wanted. The rest is covariance of A with itself, or of b with itself.

通过除以方差的平方根获得相关系数.请注意前面提到的-1调整:numpy.var默认情况下不会进行调整,要使其变为现实,您需要ddof=1参数.

The correlation coefficientis obtained by dividing by square roots of variances. Watch out for that -1 adjustment mentioned earlier: numpy.var does not make it by default, to make it happen you need ddof=1 parameter.

corr = cov / np.sqrt(np.var(b, ddof=1) * np.var(A, axis=0, ddof=1)) 

检查输出是否与效率较低的版本相同

Check that the output is the same as the less efficient version

np.corrcoef(A, b, rowvar=False)[-1, :-1]

这篇关于矩阵的每一列与向量之间的numpy协方差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆