R数据表中的组之间的相关性 [英] Correlation between groups in R data.table

查看:160
本文介绍了R数据表中的组之间的相关性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一种方法可以优雅地计算值之间的相关性,如果这些值存储在一个数据表(而不是将data.table转换为一个矩阵)的单一列中的组?

  library(data.table)
set.seed(1)#reproducibility
dt< - data.table(id = 1 :4,group = rep(letters [1:2],c(4,4)),value = rnorm(8))
setkey(dt,group)

#值
#1:1 a -0.6264538
#2:2 a 0.1836433
#3:3 a -0.8356286
#4:4 a 1.5952808
# 1 b 0.3295078
#6:2 b -0.8204684
#7:3 b 0.4874291
#8:4 b 0.7383247

有效的,但需要输入群组名称:

  cor (dt [a] $ value,dt [b] $ value)
#[1] 0.1556371

我在寻找更多类似的东西:

  dt [,cor(value,value) by =group] 

但这不会给我的相关性



对于具有正确结果的矩阵,这里也有同样的问题。

  set.seed(1)#reproducibility 
m < - matrix(rnorm(8),ncol = 2)
dimnames [1:2])

#group
#id ab
#1 -0.6264538 0.3295078
#2 0.1836433 -0.8204684
#3 -0.8356286 0.4874291
#4 1.5952808 0.7383247

cor(m)#组之间的相关性

#ab
#a 1.0000000 0.1556371
#b 0.1556371 1.0000000

任何评论或帮助非常感激。

解决方案

使用 data.table 没有简单的方法。您提供的第一种方式:

  cor(dt [a] $ value,dt [b] $值)

可能是最简单的。



reshape 您的 data.table long格式,wide格式:

  dtw < -  reshape(dt,timevar =group,idvar =id,direction =wide)
> dtw
id value.a value.b
1:1 -0.6264538 0.3295078
2:2 0.1836433 -0.8204684
3:3 -0.8356286 0.4874291
4:4 1.5952808 0.7383247
> cor(dtw [,list(value.a,value.b)])
value.a value.b
value.a 1.0000000 0.1556371
value.b 0.1556371 1.0000000






更新:如果您使用 data.table version> = 1.9.0,那么你可以使用 dcast.data.table 更快。有关详情,请查看此信息

  dcast.data.table(dt,id〜group)


Is there a way of elegantly calculating the correlations between values if those values are stored by group in a single column of a data.table (other than converting the data.table to a matrix)?

library(data.table)
set.seed(1)             # reproducibility
dt <- data.table(id=1:4, group=rep(letters[1:2], c(4,4)), value=rnorm(8))
setkey(dt, group)

#    id group      value
# 1:  1     a -0.6264538
# 2:  2     a  0.1836433
# 3:  3     a -0.8356286
# 4:  4     a  1.5952808
# 5:  1     b  0.3295078
# 6:  2     b -0.8204684
# 7:  3     b  0.4874291
# 8:  4     b  0.7383247

Something that works, but requires the group names as input:

cor(dt["a"]$value, dt["b"]$value)
# [1] 0.1556371

I'm looking more for something like:

dt[, cor(value, value), by="group"]

But that does not give me the correlation(s) I'm after.

Here's the same problem for a matrix with the correct results.

set.seed(1)             # reproducibility
m <- matrix(rnorm(8), ncol=2)
dimnames(m) <- list(id=1:4, group=letters[1:2])

#        group
# id           a          b
#   1 -0.6264538  0.3295078
#   2  0.1836433 -0.8204684
#   3 -0.8356286  0.4874291
#   4  1.5952808  0.7383247

cor(m)                  # correlations between groups

#           a         b
# a 1.0000000 0.1556371
# b 0.1556371 1.0000000

Any comments or help greatly appreciated.

解决方案

There is no simple way to do this with data.table. The first way you've provided:

cor(dt["a"]$value, dt["b"]$value)

Is probably the simplest.

An alternative is to reshape your data.table from "long" format, to "wide" format:

> dtw <- reshape(dt, timevar="group", idvar="id", direction="wide")
> dtw
   id    value.a    value.b
1:  1 -0.6264538  0.3295078
2:  2  0.1836433 -0.8204684
3:  3 -0.8356286  0.4874291
4:  4  1.5952808  0.7383247
> cor(dtw[,list(value.a, value.b)])
          value.a   value.b
value.a 1.0000000 0.1556371
value.b 0.1556371 1.0000000


Update: If you're using data.table version >= 1.9.0, then you can use dcast.data.table instead which'll be much faster. Check this post for more info.

dcast.data.table(dt, id ~ group)

这篇关于R数据表中的组之间的相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆