我如何以类似dcast的方式自加入data.table [英] How do I self join a data.table in a manner like dcast

查看:82
本文介绍了我如何以类似dcast的方式自加入data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我在熔解表单中有一个 data.table ,其中有一个键,以及标识符和值

  library(data.table)
library(reshape2)
DT = data.table(X = c(1:5,1:4) Y = c(rep(A,5),rep(B,4)),Z = rnorm(9) b $ b

如何在 data.table

 > DT 
XYZ
1:1 A -0.19790449
2:2 A 0.17906116
3:3 A 0.01821837
4:4 A 0.17309716
5:5 A 0.05962474
6:1 B -0.24629468
7:2 B 0.92285734
8:3 B 0.66002573
9:4 B -1.01403880
> DT2
XAB
1:1 -0.19790449 -0.2462947
2:2 0.17906116 0.9228573
3:3 0.01821837 0.6600257
4:4 0.17309716 -1.0140388
5 :5 0.05962474 NA

Aside(主要针对Arun):
这是一个我已经使用的解决方案(由马修D的帮助,所以他应该有这个代码),我认为复制完全融化,是相当高效。 Dcast在另一方面(或者应该是dtcast?)是更难的!

  melt.data.table = function ,id.vars,measure.vars,
variable.name =variable,
...,na.rm = FALSE,value.name =value){
if缺少(id.vars)){
id.vars = setdiff(names(data),measure.vars)
}
if(missing(measure.vars)){
measure.vars = setdiff(names(data),id.vars)
}

dtlist = lapply(measure.vars,function(.. colname){
data [ c(id.vars,..colname),with = FALSE] [,(variable.name):= ..colname]
})

dt = rbindlist $ b setnames(dt,measure.vars [1],value.name)
if(na.rm){
return(na.omit(dt))
} else {
return(dt)
}
}


解决方案< 更新 melt dcast in C)in data.table versions > = 1.9.0 。检查 此信息

现在你可以这样做:

  dcast.data.table(DT,X〜Y)

> dcast 单独,目前,它必须完全写出来(因为它不是一个S3通用但 reshape2 )。我们会尽快解决这个问题。对于熔化,,您只需使用熔化(。)
$ b




大致的想法是:

 设置密钥(DT,X,Y)
DT [CJ(1:5,c(A,B)) / code>

您可以将列命名为 V1 V2 as A B 使用 setnames



但这对大型数据或当转换公式很复杂时可能效率不高。或者我应该说,它可以更高效。我们正在找到这样一个实现,将熔融和铸造集成到data.table。



一旦我们在熔体/铸造方面取得了重大进展,我会更新这篇文章。


Suppose I have a data.table in "melted" form where I have a key, and identifier and a value

library(data.table)
library(reshape2)
DT = data.table(X = c(1:5, 1:4), Y = c(rep("A", 5), rep("B", 4)), Z = rnorm(9))
DT2 = data.table(dcast(DT, X~Y))

How can I perform that sort of self join inside data.table?

> DT
   X Y           Z
1: 1 A -0.19790449
2: 2 A  0.17906116
3: 3 A  0.01821837
4: 4 A  0.17309716
5: 5 A  0.05962474
6: 1 B -0.24629468
7: 2 B  0.92285734
8: 3 B  0.66002573
9: 4 B -1.01403880
> DT2
   X           A          B
1: 1 -0.19790449 -0.2462947
2: 2  0.17906116  0.9228573
3: 3  0.01821837  0.6600257
4: 4  0.17309716 -1.0140388
5: 5  0.05962474         NA

Aside (mostly for Arun): Here is a solution I already use for melt (was written with help from Matthew D, so he should have this code), that I think replicates melt completely, and is pretty efficient. Dcast on the other hand (or should that be dtcast?) is much harder!

melt.data.table = function(data, id.vars, measure.vars,
                           variable.name = "variable",
                           ..., na.rm = FALSE, value.name = "value") {
  if(missing(id.vars)){
    id.vars = setdiff(names(data), measure.vars)
  }
  if(missing(measure.vars)){
    measure.vars = setdiff(names(data), id.vars)
  }

  dtlist = lapply(measure.vars, function(..colname) {
    data[, c(id.vars, ..colname), with = FALSE][, (variable.name) := ..colname]
  })

  dt = rbindlist(dtlist)
  setnames(dt, measure.vars[1], value.name)
  if(na.rm){
    return(na.omit(dt))
  } else {
    return(dt)
  }
}

解决方案

Update: faster versions of melt and dcast are now implemented (in C) in data.table versions >= 1.9.0. Check this post for more info.

Now you can just do:

dcast.data.table(DT, X~Y)

In case of dcast alone, at the moment, it has to be written out completely (as it's not a S3 generic yet in reshape2). We'll try to fix this as soon as possible. For melt, you can just use melt(.) as you'd do normally.


The general idea is this:

setkey(DT, X, Y)
DT[CJ(1:5, c("A", "B"))][, as.list(Z), by=X]

You can name the columns V1 and V2 as A and B using setnames.

But this may not be efficient on large data or when the cast formula is complex. Or rather I should say, it could be much more efficient. We're in the process of finding such an implementation to integrate melt and cast on to data.table. Until then, you could get around this as above.

I'll update this post once we've made significant progress with melt/cast.

这篇关于我如何以类似dcast的方式自加入data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆