展开data.tables的列表列 [英] Expand list column of data.tables
问题描述
我有一个带有列表列的 data.table
,其中每个元素都是一个 data.table
:
I have a data.table
with a list column, where each element is a data.table
:
dt <- data.table(id = c(1, 1, 2),
var = list(data.table(a = c(1, 2), b = c(3, 4)),
data.table(a = c(5, 6), b = c(7, 8)),
data.table(a = 9, b = 10)))
dt
# id var
# 1: 1 <data.table>
# 2: 1 <data.table>
# 3: 2 <data.table>
现在我想将此结构取消列出:
Now I want to "unlist" this structure to:
a b id
1: 1 3 1
2: 2 4 1
3: 5 7 1
4: 6 8 1
5: 9 10 2
我知道如何扩展嵌入式 data.table
部分与 rbindlist
,但不知道如何绑定展平的 data.table
,变量为 id。
I know how to expand the embedded data.table
part with rbindlist
, but just have no idea how to bind the flattened data.table
with variable "id".
原始数据集为3000万行,包含数十个变量,因此,如果您提出解决方案,我将不胜感激
The original dataset is 30 million lines and with dozens of variables, so I would really appreciate if you would propose solution not only workable but also memory efficient.
推荐答案
在这种情况下 dt [,var [[1] ],by = id]
有效。但是,我使用 rbindlist
作为OP所述:
In this case dt[, var[[1]], by=id]
works. However, I use rbindlist
as the OP mentioned:
dt[, r := as.character(.I) ]
res <- dt[, rbindlist(setNames(var, r), id="r")]
然后在 r
上合并(行 dt
)如果您确实需要那里的任何变电站:
Then merge on r
(rows of dt
) if you really need any vars from there:
res[dt, on=.(r), `:=`(id = i.id)]
这比 dt [, var [[1]],by = id]
通过以下几种方式:
This is better than dt[, var[[1]], by=id]
in a few ways:
-
rbindlist
应该比带有许多by =
组的事物更快。 - 如果还有更多
dt
中的vars,所有这些都必须以by =
结尾。 - 可能不需要从
dt
继承var,因为以后总是可以从该表中获取它们,并且在那里占用的内存少得多
rbindlist
should be faster than something with a lot ofby=
groups.- If there are more vars in
dt
, all of them will have to end up inby=
. - Probably, it is not necessary to carry over vars from
dt
at all, since they can always be grabbed from that table later and they take up a lot less memory there.
这篇关于展开data.tables的列表列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!