加权皮尔逊相关系数? [英] Weighted Pearson's Correlation?

查看:1019
本文介绍了加权皮尔逊相关系数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 2396x34双矩阵名为 y ,其中每行(2396)代表一个单独的情况,由34

I have a 2396x34 double matrix named y wherein each row (2396) represents a separate situation consisting of 34 consecutive time segments.

我也有一个数字[34] 名为 x 代表34个连续时间段的单一情况。

I also have a numeric[34] named x that represents a single situation of 34 consecutive time segments.

当前,我正在计算 y x 像这样:

Currently I am calculating the correlation between each row in y and x like this:

crs [,2]<-cor(t(y​​),x)

我现在需要的是将上述语句中的 cor 函数替换为加权的相关性。权重向量 xy.wt 长为34个元素,因此可以为34个连续的时间段中的每个时间段分配不同的权重。

What I need now is to replace the cor function in the above statement with a weighted correlation. The weight vector xy.wt is 34 elements long so that a different weight can be assigned to each of the 34 consecutive time segments.

我找到了加权协方差矩阵函数 cov.wt 并认为如果我先 scale 数据应该像 cor 函数一样工作。实际上,您也可以为函数指定返回关联矩阵。不幸的是,由于我无法提供两个变量( x y

I found the Weighted Covariance Matrix function cov.wt and thought that if I first scale the data it should work just like the cor function. In fact you can specify for the function to return a correlation matrix as well. Unfortunately it does not seem like I can use it in the same manner because I cannot supply my two variables (x and y) separately.

有人知道我能以我所描述的方式获得加权相关性而又不牺牲太多速度吗?

Does anyone know of a way I can get a weighted correlation in the manner I described without sacrificing much speed?

编辑:也许某些数学函数可以先于 y 应用于 cor 函数以获取与我所寻找的结果相同的结果。也许我是否将每个元素乘以 xy.wt / sum(xy.wt)

Perhaps some mathematical function could be applied to y prior to the cor function in order to get the same results that I'm looking for. Maybe if I multiply each element by xy.wt/sum(xy.wt)?

编辑#2 我发现了另一个功能 corr 启动包。

Edit #2 I found another function corr in the boot package.

corr(d, w = rep(1, nrow(d))/nrow(d))

d   
A matrix with two columns corresponding to the two variables whose correlation we wish to calculate.

w   
A vector of weights to be applied to each pair of observations. The default is equal weights for each pair. Normalization takes place within the function so sum(w) need not equal 1.

这也不是什么我需要,但距离更近。

This also is not what I need but it is closer.

编辑#3
这是一些代码,用于生成我正在使用的数据类型:

Edit #3 Here is some code to generate the type of data I am working with:

x<-cumsum(rnorm(34))
y<- t(sapply(1:2396,function(u) cumsum(rnorm(34))))
xy.wt<-1/(34:1)

crs<-cor(t(y),x) #this works but I want to use xy.wt as weight


推荐答案

您可以回到相关性的定义。

You can go back to the definition of the correlation.

f <- function( x, y, w = rep(1,length(x))) {
  stopifnot( length(x) == dim(y)[2] )
  w <- w / sum(w)
  # Center x and y, using the weighted means
  x <- x - sum(x*w)
  y <- y - apply( t(y) * w, 2, sum )
  # Compute the variance
  vx <- sum( w * x * x )
  vy <- rowSums( w * y * y ) # Incorrect: see Heather's remark, in the other answer
  # Compute the covariance
  vxy <- colSums( t(y) * x * w )
  # Compute the correlation
  vxy / sqrt(vx * vy)
}
f(x,y)[1]
cor(x,y[1,]) # Identical
f(x, y, xy.wt)

这篇关于加权皮尔逊相关系数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆