展开data.tables的列表列 [英] Expand list column of data.tables

查看:63
本文介绍了展开data.tables的列表列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有列表列的 data.table ,其中每个元素都是一个 data.table

I have a data.table with a list column, where each element is a data.table:

dt <- data.table(id = c(1, 1, 2),
                 var = list(data.table(a = c(1, 2), b = c(3, 4)),
                            data.table(a = c(5, 6), b = c(7, 8)),
                            data.table(a = 9, b = 10)))

dt
# id             var
# 1:  1 <data.table>
# 2:  1 <data.table>
# 3:  2 <data.table>

现在我想将此结构取消列出:

Now I want to "unlist" this structure to:

   a  b id
1: 1  3  1
2: 2  4  1
3: 5  7  1
4: 6  8  1
5: 9 10  2

我知道如何扩展嵌入式 data.table 部分与 rbindlist ,但不知道如何绑定展平的 data.table ,变量为 id。

I know how to expand the embedded data.table part with rbindlist, but just have no idea how to bind the flattened data.table with variable "id".

原始数据集为3000万行,包含数十个变量,因此,如果您提出解决方案,我将不胜感激

The original dataset is 30 million lines and with dozens of variables, so I would really appreciate if you would propose solution not only workable but also memory efficient.

推荐答案

在这种情况下 dt [,var [[1] ],by = id] 有效。但是,我使用 rbindlist 作为OP所述:

In this case dt[, var[[1]], by=id] works. However, I use rbindlist as the OP mentioned:

dt[, r := as.character(.I) ]
res <- dt[, rbindlist(setNames(var, r), id="r")]

然后在 r 上合并(行 dt )如果您确实需要那里的任何变电站:

Then merge on r (rows of dt) if you really need any vars from there:

res[dt, on=.(r), `:=`(id = i.id)]

这比 dt [, var [[1]],by = id] 通过以下几种方式:

This is better than dt[, var[[1]], by=id] in a few ways:


  • rbindlist 应该比带有许多 by = 组的事物更快。

  • 如果还有更多 dt 中的vars,所有这些都必须以 by = 结尾。

  • 可能不需要从 dt 继承var,因为以后总是可以从该表中获取它们,并且在那里占用的内存少得多

  • rbindlist should be faster than something with a lot of by= groups.
  • If there are more vars in dt, all of them will have to end up in by=.
  • Probably, it is not necessary to carry over vars from dt at all, since they can always be grabbed from that table later and they take up a lot less memory there.

这篇关于展开data.tables的列表列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆