在 data.table 对象的末尾通过引用添加一行 [英] Add a row by reference at the end of a data.table object

查看:17
本文介绍了在 data.table 对象的末尾通过引用添加一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这个问题data.table 包创建者解释了为什么行不能通过在 data.table 中间的引用插入(或删除).他还指出,这种操作可能在表格的末尾.您能否显示执行此操作的代码?这将是

In this question the data.table package creator explains why rows cannot be inserted (or removed) by reference in the middle a data.table yet. He also points out that such operations could be possible at end of the table. Could you show a code to perfome this action? It would be the "by reference" version of

a<- data.table(id=letters[1:2], var=1:2)
> a
   id var
1:  a   1
2:  b   2
> rbind(a, data.table(id="c", var=3))
   id var
1:  a   1
2:  b   2
3:  c   3

谢谢.

由于尚无法找到合适的解决方案,从速度和内存使用的角度来看,以下哪项更好(如果内部不同,则不确定)?

since a proper solution is not possible yet, which of the following is better (if internally different, not sure) either from a speed and memory usage perpective?

rbind(a, data.table(id="c", var=3))

rbindlist(list(a,  data.table(id="c", var=3)))

最终还有其他(更好的)方法吗?

are there eventually other (better) methods?

推荐答案

要回答您的编辑,只需运行基准测试:

To answer your edit, just run a benchmark:

a = data.table(id=letters[1:2], var=1:2)
b = copy(a)
c = copy(b) # let's also just try modifying same value in place
            # to see how well changing existing values does
microbenchmark(a <- rbind(a, data.table(id="c", var=3)),
               b <- rbindlist(list(b,  data.table(id="c", var=3))),
               c[1, var := 3L],
               set(c, 1L, 2L, 3L))
#Unit: microseconds
#                                                  expr     min        lq    median        uq      max neval
#          a <- rbind(a, data.table(id = "c", var = 3)) 865.460 1141.2585 1357.1230 1539.4300 6814.492   100
#b <- rbindlist(list(b, data.table(id = "c", var = 3))) 260.440  325.3835  445.4190  522.8825 1143.930   100
#                                   c[1, `:=`(var, 3L)] 482.147  626.5570  778.3135  904.3595 1109.539   100
#                                    set(c, 1L, 2L, 3L)   2.339    5.677    7.5140    9.5170   19.033   100

rbindlist 明显优于 rbind.感谢 Matthew Dowle 指出在循环中使用 [ 的问题,我用 set 添加了另一个基准.

rbindlist is clearly better than rbind. Thanks to Matthew Dowle pointing out the problems with using [ in a loop, I added another benchmark with set.

从上面你最好的选择是使用 rbindlist,或者调整 data.table 的大小,然后只填充值(你也可以使用类似的策略到 C++ 中的 std::vector,如果您不知道要开始的数据大小,则每次空间不足时将大小加倍,并且然后一旦你完成填写,删除多余的行).

From the above your best options are using rbindlist, or sizing the data.table to begin with and then just populating the values (you can also use a similar strategy to std::vector in C++, and double the size every time you run out of space, if you don't know the size of the data to begin with, and then once you're done filling it in, delete the extra rows).

这篇关于在 data.table 对象的末尾通过引用添加一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆