折叠data.table中的行 [英] collapse rows in data.table

查看:39
本文介绍了折叠data.table中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有1M行和2列的数据表

I have one data.table with 1M rows and 2 columns

虚拟数据:

require(data.table)
ID <- c(1,2,3)
variable <- c("a,b","a,c","c,d")
dt <- data.table(ID,variable)
dt
> dt

ID variable
1      a,b
2      a,c
3      c,d

现在,我想通过"ID"将变量"列折叠到不同的行中,就像reshape2中的"melt"功能或data.table中的melt.data.table一样.

Now I want to collapse the column "variable" into different rows by "ID", just as the "melt" function in reshape2 or melt.data.table in data.table

这就是我想要的:


ID variable
1  a
1  b
2  a
2  c
3  c
3  d 

PS:给定理想的结果,我知道如何执行反向步骤.

PS: Given the desired results, I know how to do the reverse step.

dt2 <- data.table(ID = c(1,1,2,2,3,3), variable = c("a","b","a","c","c","d"))
dt3 <- dt2[, list(variables = paste(variable, collapse = ",")), by = ID]

有任何提示或建议吗?

推荐答案

由于 strsplit 是矢量化的,因此这将是耗时的操作,因此我避免在每个组上使用它.相反,可以先在整个列的上拆分,然后按如下所示重构 data.table :

Since strsplit is vectorised, and that's going to be the time consuming operation here, I'd avoid using it on each group. Instead, one could first split on the , on the entire column and then reconstruct the data.table as follows:

var = strsplit(dt$variable, ",", fixed=TRUE)
len = vapply(var, length, 0L)
ans = data.table(ID=rep(dt$ID, len), variable=unlist(var))

#    ID variable
# 1:  1        a
# 2:  1        b
# 3:  2        a
# 4:  2        c
# 5:  3        c
# 6:  3        d

这篇关于折叠data.table中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆