通过引用在data.table对象的末尾添加一行 [英] Add a row by reference at the end of a data.table object
问题描述
在此问题中 data.table
包创建器解释了为什么不能通过引用在中间插入(或删除)a data.table
尚未。他还指出,这种操作可能在表的末尾。你能显示一个代码来perfome这个行动吗?它将是
a < - data.table(id = letters [1:2] var = 1:2)
> a
id var
1:a 1
2:b 2
> rbind(a,data.table(id =c,var = 3))
id var
1:a 1
2:b 2
3:c 3 $
编辑:b $ b
$ b <
因为无法获得正确的解决方案,以下哪项更好(如果内部不同,不确定),从速度和内存使用情况?
rbind(a,data.table(id =c,var = 3))
$ b b rbindlist(list(a,data.table(id =c,var = 3)))
要回答你的编辑,只要运行一个基准:
<
a = data.table(id = letters [1:2],var = 1:2)
b =
c = copy(b)#让我们也尝试修改相同的值
#来查看改变现有值的效果如何
microbenchmark(a< - rbind(a,data.table id =c,var = 3)),
b < - rbindlist(list(b,data.table(id =c,var = 3))),
c [1,var := 3L],
set(c,1L,2L,3L))
#Unit:microseconds
#expr min lq median uq max neval
#a < - rbind (a,data.table(id =c,var = 3))865.460 1141.2585 1357.1230 1539.4300 6814.492 100
#b < - rbindlist(list(b,data.table(id =c,var = 3)))260.440 325.3835 445.4190 522.8825 1143.930 100
#c [1,`:=`(var,3L)] 482.147 626.5570 778.3135 904.3595 1109.539 100
#set(c,1L,2L,3L )2.339 5.677 7.5140 9.5170 19.033 100
rbindlist
显然优于rbind
。感谢Matthew Doyle指出在循环中使用[
]的问题,我用set
添加了另一个基准。从上面你最好的选择是使用rbindlist
,或调整data.table code>开始,然后只填充值(你也可以使用类似的策略
std :: vector
在C ++
,如果你不知道开始的数据大小,那么每次你用完空间都要加倍,然后一旦填充完,删除额外的行)。
In this question the
data.table
package creator explains why rows cannot be inserted (or removed) by reference in the middle adata.table
yet. He also points out that such operations could be possible at end of the table. Could you show a code to perfome this action? It would be the "by reference" version ofa<- data.table(id=letters[1:2], var=1:2) > a id var 1: a 1 2: b 2 > rbind(a, data.table(id="c", var=3)) id var 1: a 1 2: b 2 3: c 3
thanks.
EDIT:
since a proper solution is not possible yet, which of the following is better (if internally different, not sure) either from a speed and memory usage perpective?
rbind(a, data.table(id="c", var=3)) rbindlist(list(a, data.table(id="c", var=3)))
are there eventually other (better) methods?
解决方案To answer your edit, just run a benchmark:
a = data.table(id=letters[1:2], var=1:2) b = copy(a) c = copy(b) # let's also just try modifying same value in place # to see how well changing existing values does microbenchmark(a <- rbind(a, data.table(id="c", var=3)), b <- rbindlist(list(b, data.table(id="c", var=3))), c[1, var := 3L], set(c, 1L, 2L, 3L)) #Unit: microseconds # expr min lq median uq max neval # a <- rbind(a, data.table(id = "c", var = 3)) 865.460 1141.2585 1357.1230 1539.4300 6814.492 100 #b <- rbindlist(list(b, data.table(id = "c", var = 3))) 260.440 325.3835 445.4190 522.8825 1143.930 100 # c[1, `:=`(var, 3L)] 482.147 626.5570 778.3135 904.3595 1109.539 100 # set(c, 1L, 2L, 3L) 2.339 5.677 7.5140 9.5170 19.033 100
rbindlist
is clearly better thanrbind
. Thanks to Matthew Doyle pointing out the problems with using[
in a loop, I added another benchmark withset
.From the above your best options are using
rbindlist
, or sizing thedata.table
to begin with and then just populating the values (you can also use a similar strategy tostd::vector
inC++
, and double the size every time you run out of space, if you don't know the size of the data to begin with, and then once you're done filling it in, delete the extra rows).这篇关于通过引用在data.table对象的末尾添加一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!