通过引用在data.table对象的末尾添加一行 [英] Add a row by reference at the end of a data.table object

查看:167
本文介绍了通过引用在data.table对象的末尾添加一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在此问题 data.table 包创建器解释了为什么不能通过引用在中间插入(或删除)a data.table 尚未。他还指出,这种操作可能在表的末尾。你能显示一个代码来perfome这个行动吗?它将是

  a < -  data.table(id = letters [1:2] var = 1:2)
> a
id var
1:a 1
2:b 2
> rbind(a,data.table(id =c,var = 3))
id var
1:a 1
2:b 2
3:c 3 $



编辑:b $ b


$ b <



因为无法获得正确的解决方案,以下哪项更好(如果内部不同,不确定),从速度和内存使用情况?

  rbind(a,data.table(id =c,var = 3))
$ b b rbindlist(list(a,data.table(id =c,var = 3)))



<

要回答你的编辑,只要运行一个基准:



  a = data.table(id = letters [1:2],var = 1:2)
b =
c = copy(b)#让我们也尝试修改相同的值
#来查看改变现有值的效果如何
microbenchmark(a< - rbind(a,data.table id =c,var = 3)),
b < - rbindlist(list(b,data.table(id =c,var = 3))),
c [1,var := 3L],
set(c,1L,2L,3L))
#Unit:microseconds
#expr min lq median uq max neval
#a < - rbind (a,data.table(id =c,var = 3))865.460 1141.2585 1357.1230 1539.4300 6814.492 100
#b < - rbindlist(list(b,data.table(id =c,var = 3)))260.440 325.3835 445.4190 522.8825 1143.930 100
#c [1,`:=`(var,3L)] 482.147 626.5570 778.3135 904.3595 1109.539 100
#set(c,1L,2L,3L )2.339 5.677 7.5140 9.5170 19.033 100

rbindlist 显然优于 rbind 。感谢Matthew Doyle指出在循环中使用 []的问题,我用 set 添加了另一个基准。从上面你最好的选择是使用 rbindlist ,或调整 data.table code>开始,然后只填充值(你也可以使用类似的策略 std :: vector C ++ ,如果你不知道开始的数据大小,那么每次你用完空间都要加倍,然后一旦填充完,删除额外的行)。


In this question the data.table package creator explains why rows cannot be inserted (or removed) by reference in the middle a data.table yet. He also points out that such operations could be possible at end of the table. Could you show a code to perfome this action? It would be the "by reference" version of

a<- data.table(id=letters[1:2], var=1:2)
> a
   id var
1:  a   1
2:  b   2
> rbind(a, data.table(id="c", var=3))
   id var
1:  a   1
2:  b   2
3:  c   3

thanks.

EDIT:

since a proper solution is not possible yet, which of the following is better (if internally different, not sure) either from a speed and memory usage perpective?

rbind(a, data.table(id="c", var=3))

rbindlist(list(a,  data.table(id="c", var=3)))

are there eventually other (better) methods?

解决方案

To answer your edit, just run a benchmark:

a = data.table(id=letters[1:2], var=1:2)
b = copy(a)
c = copy(b) # let's also just try modifying same value in place
            # to see how well changing existing values does
microbenchmark(a <- rbind(a, data.table(id="c", var=3)),
               b <- rbindlist(list(b,  data.table(id="c", var=3))),
               c[1, var := 3L],
               set(c, 1L, 2L, 3L))
#Unit: microseconds
#                                                  expr     min        lq    median        uq      max neval
#          a <- rbind(a, data.table(id = "c", var = 3)) 865.460 1141.2585 1357.1230 1539.4300 6814.492   100
#b <- rbindlist(list(b, data.table(id = "c", var = 3))) 260.440  325.3835  445.4190  522.8825 1143.930   100
#                                   c[1, `:=`(var, 3L)] 482.147  626.5570  778.3135  904.3595 1109.539   100
#                                    set(c, 1L, 2L, 3L)   2.339    5.677    7.5140    9.5170   19.033   100

rbindlist is clearly better than rbind. Thanks to Matthew Doyle pointing out the problems with using [ in a loop, I added another benchmark with set.

From the above your best options are using rbindlist, or sizing the data.table to begin with and then just populating the values (you can also use a similar strategy to std::vector in C++, and double the size every time you run out of space, if you don't know the size of the data to begin with, and then once you're done filling it in, delete the extra rows).

这篇关于通过引用在data.table对象的末尾添加一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆