R数据表中的组之间的相关性 [英] Correlation between groups in R data.table

查看：160 发布时间：2017/3/12 11:31:35 r data.table correlation

本文介绍了R数据表中的组之间的相关性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有一种方法可以优雅地计算值之间的相关性，如果这些值存储在一个数据表（而不是将data.table转换为一个矩阵）的单一列中的组？

  library（data.table）
 set.seed（1）＃reproducibility 
 dt<  -  data.table（id = 1 ：4，group = rep（letters [1：2]，c（4,4）），value = rnorm（8））
 setkey（dt，group）
 
＃值
＃1：1 a -0.6264538 
＃2：2 a 0.1836433 
＃3：3 a -0.8356286 
＃4：4 a 1.5952808 
＃ 1 b 0.3295078 
＃6：2 b -0.8204684 
＃7：3 b 0.4874291 
＃8：4 b 0.7383247

有效的，但需要输入群组名称：

  cor （dt [a] $ value，dt [b] $ value）
＃[1] 0.1556371

我在寻找更多类似的东西：

  dt [，cor（value，value） by =group]

但这不会给我的相关性

对于具有正确结果的矩阵，这里也有同样的问题。

  set.seed（1）＃reproducibility 
m < -  matrix（rnorm（8），ncol = 2）
 dimnames [1：2]）
 
＃group 
＃id ab 
＃1 -0.6264538 0.3295078 
＃2 0.1836433 -0.8204684 
＃3 -0.8356286 0.4874291 
＃4 1.5952808 0.7383247 
 
 cor（m）＃组之间的相关性
 
＃ab 
＃a 1.0000000 0.1556371 
＃b 0.1556371 1.0000000

任何评论或帮助非常感激。

解决方案

使用 data.table 没有简单的方法。您提供的第一种方式：

  cor（dt [a] $ value，dt [b] $值）

可能是最简单的。

reshape 您的 data.table 从long格式，wide格式：

  dtw < -  reshape（dt，timevar =group，idvar =id，direction =wide）
> dtw 
 id value.a value.b 
 1：1 -0.6264538 0.3295078 
 2：2 0.1836433 -0.8204684 
 3：3 -0.8356286 0.4874291 
 4：4 1.5952808 0.7383247 
> cor（dtw [，list（value.a，value.b）]）
 value.a value.b 
 value.a 1.0000000 0.1556371 
 value.b 0.1556371 1.0000000 
  
 
 
 
 
 
  更新：如果您使用 data.table  version> = 1.9.0，那么你可以使用 dcast.data.table 更快。有关详情，请查看此信息。
  dcast.data.table（dt，id〜group）
  
 
Is there a way of elegantly calculating the correlations between values if those values are stored by group in a single column of a data.table (other than converting the data.table to a matrix)?
library(data.table)
set.seed(1)             # reproducibility
dt <- data.table(id=1:4, group=rep(letters[1:2], c(4,4)), value=rnorm(8))
setkey(dt, group)

#    id group      value
# 1:  1     a -0.6264538
# 2:  2     a  0.1836433
# 3:  3     a -0.8356286
# 4:  4     a  1.5952808
# 5:  1     b  0.3295078
# 6:  2     b -0.8204684
# 7:  3     b  0.4874291
# 8:  4     b  0.7383247
Something that works, but requires the group names as input:
cor(dt["a"]$value, dt["b"]$value)
# [1] 0.1556371
I'm looking more for something like:
dt[, cor(value, value), by="group"]
But that does not give me the correlation(s) I'm after.

Here's the same problem for a matrix with the correct results.
set.seed(1)             # reproducibility
m <- matrix(rnorm(8), ncol=2)
dimnames(m) <- list(id=1:4, group=letters[1:2])

#        group
# id           a          b
#   1 -0.6264538  0.3295078
#   2  0.1836433 -0.8204684
#   3 -0.8356286  0.4874291
#   4  1.5952808  0.7383247

cor(m)                  # correlations between groups

#           a         b
# a 1.0000000 0.1556371
# b 0.1556371 1.0000000
Any comments or help greatly appreciated.
 解决方案 
There is no simple way to do this with data.table. The first way you've provided:
cor(dt["a"]$value, dt["b"]$value)
Is probably the simplest.

An alternative is to reshape your data.table from "long" format, to "wide" format:
> dtw <- reshape(dt, timevar="group", idvar="id", direction="wide")
> dtw
   id    value.a    value.b
1:  1 -0.6264538  0.3295078
2:  2  0.1836433 -0.8204684
3:  3 -0.8356286  0.4874291
4:  4  1.5952808  0.7383247
> cor(dtw[,list(value.a, value.b)])
          value.a   value.b
value.a 1.0000000 0.1556371
value.b 0.1556371 1.0000000




Update: If you're using data.table version >= 1.9.0, then you can use dcast.data.table instead which'll be much faster. Check this post for more info.
dcast.data.table(dt, id ~ group)


                        
这篇关于R数据表中的组之间的相关性的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R数据表中的组之间的相关性 [英] Correlation between groups in R data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R数据表中的组之间的相关性 [英] Correlation between groups in R data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭