自行计算协方差矩阵(不使用`cov`) [英] Compute covariance matrix on our own (without using `cov`)

查看:336
本文介绍了自行计算协方差矩阵(不使用`cov`)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在跟踪有关协方差矩阵的教程,该教程可以在这里找到:

I am following a tutorial about covariance matrices that could be found here: http://stats.seandolinar.com/making-a-covariance-matrix-in-r/

它包括以下步骤:

#create a dataframe
a <- c(1,2,3,4,5,6)
b <- c(2,3,5,6,1,9)
c <- c(3,5,5,5,10,8)     
d <- c(10,20,30,40,50,55)
e <- c(7,8,9,4,6,10)

#create matrix from vectors
M <- cbind(a,b,c,d,e)
M_mean <- matrix(data=1, nrow=n) %*% cbind(mean(a),mean(b),mean(c),mean(d),mean(e)) 

k <- ncol(M) #number of variables
n <- nrow(M) #number of subjects

然后创建一个像这样的差异矩阵:

And then creating a difference matrix like this:

D <- M - M_mean

这一切对我来说都是直截了当的.但是下一步是创建协方差矩阵:

This is all pretty straighforward to me. But the next step does this to create a covariance matrix:

C <- (n-1)^-1 t(D) %*% D

我得到部分t(D)%%D除以(n-1)^ 1 =6.但是我不知道t(D)%%D到底有多精确积累.

I get that the part t(D) %% D is divided by (n-1)^1 = 6. But I do not get how exactly t(D) %% D is build up.

有人可以向我解释吗?

推荐答案

但我不知道t(D)%% D的精确构建方式.

But I do not get how exactly t(D) %% D is built up.

这是矩阵叉积,是矩阵乘法的一种特殊形式.如果您不了解它在做什么,请考虑以下R循环以帮助您吸收它:

This is matrix cross product, a special form of matrix multiplication. If you don't understand what it is doing, consider the following R loop to help you absorb this:

DtD <- matrix(0, nrow = ncol(D), ncol = ncol(D))
for (j in 1:ncol(D)) 
  for (i in 1:ncol(D))
    DtD[i, j] <- sum(D[, i] * D[, j])

注意,实际上没有人会为此编写R循环;这只是为了帮助您了解算法.

Note, nobody is actually going to write R loop for this; this is just to help you understand the algorithm.

原始答案

假设我们有一个矩阵X,其中每一列都给出了对特定随机变量的观察结果,通常我们只使用R基本函数cov(X)来获得协方差矩阵.

Suppose we have a matrix X, where each column gives observations for a specific random variable, normally we just use R base function cov(X) to get covariance matrix.

现在,您想自己编写一个协方差函数;这也不难(我很久以前就曾做过练习).这需要3个步骤:

Now you want to write a covariance function yourself; that is also not difficult (I did this a long time ago as an exercise). It takes 3 steps:

  • 列居中(即所有变量的均值);
  • 矩阵叉积;
  • 求平均值(进行偏置调整时,nrow(X) - 1而非nrow(X)).
  • column centring (i.e., de-mean for all variables);
  • matrix cross product;
  • averaging (over nrow(X) - 1 not nrow(X) for bias adjustment).

此短代码可以做到:

crossprod(sweep(X, 2L, colMeans(X))) / (nrow(X) - 1L)

考虑一个小例子

set.seed(0)
## 3 variable, each with 10 observations
X <- matrix(rnorm(30), nrow = 10, ncol = 3)

## reference computation by `cov`
cov(X)
#           [,1]        [,2]        [,3]
#[1,]  1.4528358 -0.20093966 -0.10432388
#[2,] -0.2009397  0.46086672 -0.05828058
#[3,] -0.1043239 -0.05828058  0.48606879

## own implementation
crossprod(sweep(X, 2L, colMeans(X))) / (nrow(X) - 1L)
#           [,1]        [,2]        [,3]
#[1,]  1.4528358 -0.20093966 -0.10432388
#[2,] -0.2009397  0.46086672 -0.05828058
#[3,] -0.1043239 -0.05828058  0.48606879


如果要获取相关矩阵怎么办?

有很多方法.如果我们想直接获得它,请执行以下操作:

There are many ways. If we want to get it directly, do:

crossprod(scale(X)) / (nrow(X) - 1L)
#           [,1]       [,2]       [,3]
#[1,]  1.0000000 -0.2455668 -0.1241443
#[2,] -0.2455668  1.0000000 -0.1231367
#[3,] -0.1241443 -0.1231367  1.0000000

如果我们要先获得协方差,然后通过对角线对角线(对称)重新缩放比例以获得相关性,则可以执行以下操作:

If we want to first get covariance, then (symmetrically) rescale it by root diagonal to get correlation, we can do:

## covariance first
V <- crossprod(sweep(X, 2L, colMeans(X))) / (nrow(X) - 1L)

## symmetric rescaling
V / tcrossprod(diag(V) ^ 0.5)
#           [,1]       [,2]       [,3]
#[1,]  1.0000000 -0.2455668 -0.1241443
#[2,] -0.2455668  1.0000000 -0.1231367
#[3,] -0.1241443 -0.1231367  1.0000000

我们还可以使用服务R函数cov2cor将协方差转换为相关性:

We can also use a service R function cov2cor to convert covariance to correlation:

cov2cor(V)
#           [,1]       [,2]       [,3]
#[1,]  1.0000000 -0.2455668 -0.1241443
#[2,] -0.2455668  1.0000000 -0.1231367
#[3,] -0.1241443 -0.1231367  1.0000000

这篇关于自行计算协方差矩阵(不使用`cov`)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆