在 data.table 对象的末尾通过引用添加一行 [英] Add a row by reference at the end of a data.table object
问题描述
在这个问题data.table
包创建者解释了为什么行不能通过在 data.table
中间的引用插入(或删除).他还指出,这种操作可能在表格的末尾.您能否显示执行此操作的代码?这将是
In this question the data.table
package creator explains why rows cannot be inserted (or removed) by reference in the middle a data.table
yet. He also points out that such operations could be possible at end of the table. Could you show a code to perfome this action? It would be the "by reference" version of
a<- data.table(id=letters[1:2], var=1:2)
> a
id var
1: a 1
2: b 2
> rbind(a, data.table(id="c", var=3))
id var
1: a 1
2: b 2
3: c 3
谢谢.
由于尚无法找到合适的解决方案,从速度和内存使用的角度来看,以下哪项更好(如果内部不同,则不确定)?
since a proper solution is not possible yet, which of the following is better (if internally different, not sure) either from a speed and memory usage perpective?
rbind(a, data.table(id="c", var=3))
rbindlist(list(a, data.table(id="c", var=3)))
最终还有其他(更好的)方法吗?
are there eventually other (better) methods?
推荐答案
要回答您的编辑,只需运行基准测试:
To answer your edit, just run a benchmark:
a = data.table(id=letters[1:2], var=1:2)
b = copy(a)
c = copy(b) # let's also just try modifying same value in place
# to see how well changing existing values does
microbenchmark(a <- rbind(a, data.table(id="c", var=3)),
b <- rbindlist(list(b, data.table(id="c", var=3))),
c[1, var := 3L],
set(c, 1L, 2L, 3L))
#Unit: microseconds
# expr min lq median uq max neval
# a <- rbind(a, data.table(id = "c", var = 3)) 865.460 1141.2585 1357.1230 1539.4300 6814.492 100
#b <- rbindlist(list(b, data.table(id = "c", var = 3))) 260.440 325.3835 445.4190 522.8825 1143.930 100
# c[1, `:=`(var, 3L)] 482.147 626.5570 778.3135 904.3595 1109.539 100
# set(c, 1L, 2L, 3L) 2.339 5.677 7.5140 9.5170 19.033 100
rbindlist
明显优于 rbind
.感谢 Matthew Dowle 指出在循环中使用 [
的问题,我用 set
添加了另一个基准.
rbindlist
is clearly better than rbind
. Thanks to Matthew Dowle pointing out the problems with using [
in a loop, I added another benchmark with set
.
从上面你最好的选择是使用 rbindlist
,或者调整 data.table
的大小,然后只填充值(你也可以使用类似的策略到 C++
中的 std::vector
,如果您不知道要开始的数据大小,则每次空间不足时将大小加倍,并且然后一旦你完成填写,删除多余的行).
From the above your best options are using rbindlist
, or sizing the data.table
to begin with and then just populating the values (you can also use a similar strategy to std::vector
in C++
, and double the size every time you run out of space, if you don't know the size of the data to begin with, and then once you're done filling it in, delete the extra rows).
这篇关于在 data.table 对象的末尾通过引用添加一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!