R数据表中的组之间的相关性 [英] Correlation between groups in R data.table
问题描述
有一种方法可以优雅地计算值之间的相关性,如果这些值存储在一个数据表(而不是将data.table转换为一个矩阵)的单一列中的组?
library(data.table)
set.seed(1)#reproducibility
dt< - data.table(id = 1 :4,group = rep(letters [1:2],c(4,4)),value = rnorm(8))
setkey(dt,group)
#值
#1:1 a -0.6264538
#2:2 a 0.1836433
#3:3 a -0.8356286
#4:4 a 1.5952808
# 1 b 0.3295078
#6:2 b -0.8204684
#7:3 b 0.4874291
#8:4 b 0.7383247
有效的,但需要输入群组名称:
cor (dt [a] $ value,dt [b] $ value)
#[1] 0.1556371
我在寻找更多类似的东西:
dt [,cor(value,value) by =group]
但这不会给我的相关性
对于具有正确结果的矩阵,这里也有同样的问题。
set.seed(1)#reproducibility
m < - matrix(rnorm(8),ncol = 2)
dimnames [1:2])
#group
#id ab
#1 -0.6264538 0.3295078
#2 0.1836433 -0.8204684
#3 -0.8356286 0.4874291
#4 1.5952808 0.7383247
cor(m)#组之间的相关性
#ab
#a 1.0000000 0.1556371
#b 0.1556371 1.0000000
任何评论或帮助非常感激。
使用 data.table
没有简单的方法。您提供的第一种方式:
cor(dt [a] $ value,dt [b] $值)
可能是最简单的。
reshape
您的 data.table
从long
格式,wide
格式:
dtw < - reshape(dt,timevar =group,idvar =id,direction =wide)
> dtw
id value.a value.b
1:1 -0.6264538 0.3295078
2:2 0.1836433 -0.8204684
3:3 -0.8356286 0.4874291
4:4 1.5952808 0.7383247
> cor(dtw [,list(value.a,value.b)])
value.a value.b
value.a 1.0000000 0.1556371
value.b 0.1556371 1.0000000
更新:如果您使用
data.table
version> = 1.9.0,那么你可以使用dcast.data.table
更快。有关详情,请查看此信息。dcast.data.table(dt,id〜group)
Is there a way of elegantly calculating the correlations between values if those values are stored by group in a single column of a data.table (other than converting the data.table to a matrix)?
library(data.table) set.seed(1) # reproducibility dt <- data.table(id=1:4, group=rep(letters[1:2], c(4,4)), value=rnorm(8)) setkey(dt, group) # id group value # 1: 1 a -0.6264538 # 2: 2 a 0.1836433 # 3: 3 a -0.8356286 # 4: 4 a 1.5952808 # 5: 1 b 0.3295078 # 6: 2 b -0.8204684 # 7: 3 b 0.4874291 # 8: 4 b 0.7383247
Something that works, but requires the group names as input:
cor(dt["a"]$value, dt["b"]$value) # [1] 0.1556371
I'm looking more for something like:
dt[, cor(value, value), by="group"]
But that does not give me the correlation(s) I'm after.
Here's the same problem for a matrix with the correct results.
set.seed(1) # reproducibility m <- matrix(rnorm(8), ncol=2) dimnames(m) <- list(id=1:4, group=letters[1:2]) # group # id a b # 1 -0.6264538 0.3295078 # 2 0.1836433 -0.8204684 # 3 -0.8356286 0.4874291 # 4 1.5952808 0.7383247 cor(m) # correlations between groups # a b # a 1.0000000 0.1556371 # b 0.1556371 1.0000000
Any comments or help greatly appreciated.
解决方案There is no simple way to do this with
data.table
. The first way you've provided:cor(dt["a"]$value, dt["b"]$value)
Is probably the simplest.
An alternative is to
reshape
yourdata.table
from"long"
format, to"wide"
format:> dtw <- reshape(dt, timevar="group", idvar="id", direction="wide") > dtw id value.a value.b 1: 1 -0.6264538 0.3295078 2: 2 0.1836433 -0.8204684 3: 3 -0.8356286 0.4874291 4: 4 1.5952808 0.7383247 > cor(dtw[,list(value.a, value.b)]) value.a value.b value.a 1.0000000 0.1556371 value.b 0.1556371 1.0000000
Update: If you're using
data.table
version >= 1.9.0, then you can usedcast.data.table
instead which'll be much faster. Check this post for more info.dcast.data.table(dt, id ~ group)
这篇关于R数据表中的组之间的相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!