加权皮尔逊相关系数? [英] Weighted Pearson's Correlation?
问题描述
我有一个 2396x34双矩阵
名为 y
,其中每行(2396)代表一个单独的情况,由34
I have a 2396x34 double matrix
named y
wherein each row (2396) represents a separate situation consisting of 34 consecutive time segments.
我也有一个数字[34]
名为 x
代表34个连续时间段的单一情况。
I also have a numeric[34]
named x
that represents a single situation of 34 consecutive time segments.
当前,我正在计算 y
和 x $中每一行之间的相关性c $ c>像这样:
Currently I am calculating the correlation between each row in y
and x
like this:
crs [,2]<-cor(t(y),x)
我现在需要的是将上述语句中的 cor
函数替换为加权的相关性。权重向量 xy.wt
长为34个元素,因此可以为34个连续的时间段中的每个时间段分配不同的权重。
What I need now is to replace the cor
function in the above statement with a weighted correlation. The weight vector xy.wt
is 34 elements long so that a different weight can be assigned to each of the 34 consecutive time segments.
我找到了加权协方差矩阵
函数 cov.wt
并认为如果我先 scale
数据应该像 cor
函数一样工作。实际上,您也可以为函数指定返回关联矩阵。不幸的是,由于我无法提供两个变量( x
和 y
I found the Weighted Covariance Matrix
function cov.wt
and thought that if I first scale
the data it should work just like the cor
function. In fact you can specify for the function to return a correlation matrix as well. Unfortunately it does not seem like I can use it in the same manner because I cannot supply my two variables (x
and y
) separately.
有人知道我能以我所描述的方式获得加权相关性而又不牺牲太多速度吗?
Does anyone know of a way I can get a weighted correlation in the manner I described without sacrificing much speed?
编辑:也许某些数学函数可以先于 y
应用于 cor
函数以获取与我所寻找的结果相同的结果。也许我是否将每个元素乘以 xy.wt / sum(xy.wt)
?
Perhaps some mathematical function could be applied to y
prior to the cor
function in order to get the same results that I'm looking for. Maybe if I multiply each element by xy.wt/sum(xy.wt)
?
编辑#2 我发现了另一个功能 corr
在启动$ c $中c>包。
Edit #2 I found another function corr
in the boot
package.
corr(d, w = rep(1, nrow(d))/nrow(d))
d
A matrix with two columns corresponding to the two variables whose correlation we wish to calculate.
w
A vector of weights to be applied to each pair of observations. The default is equal weights for each pair. Normalization takes place within the function so sum(w) need not equal 1.
这也不是什么我需要,但距离更近。
This also is not what I need but it is closer.
编辑#3
这是一些代码,用于生成我正在使用的数据类型:
Edit #3 Here is some code to generate the type of data I am working with:
x<-cumsum(rnorm(34))
y<- t(sapply(1:2396,function(u) cumsum(rnorm(34))))
xy.wt<-1/(34:1)
crs<-cor(t(y),x) #this works but I want to use xy.wt as weight
推荐答案
您可以回到相关性的定义。
You can go back to the definition of the correlation.
f <- function( x, y, w = rep(1,length(x))) {
stopifnot( length(x) == dim(y)[2] )
w <- w / sum(w)
# Center x and y, using the weighted means
x <- x - sum(x*w)
y <- y - apply( t(y) * w, 2, sum )
# Compute the variance
vx <- sum( w * x * x )
vy <- rowSums( w * y * y ) # Incorrect: see Heather's remark, in the other answer
# Compute the covariance
vxy <- colSums( t(y) * x * w )
# Compute the correlation
vxy / sqrt(vx * vy)
}
f(x,y)[1]
cor(x,y[1,]) # Identical
f(x, y, xy.wt)
这篇关于加权皮尔逊相关系数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!