为什么rbindlist“更好”比rbind? [英] Why is rbindlist "better" than rbind?
问题描述
我正在通过 data.table
的文档,还注意到在这里的一些对话, rbindlist
应该比 rbind
更好。
I am going through documentation of data.table
and also noticed from some of the conversations over here on SO that rbindlist
is supposed to be better than rbind
.
我想知道为什么 rbindlist
比 rbind
更好,并且在 rbindlist
c> rbind ?
I would like to know why is rbindlist
better than rbind
and in which scenarios rbindlist
really excels over rbind
?
在内存利用方面有什么优势吗?
Is there any advantage in terms of memory utilization?
推荐答案
rbindlist
是优化版本 do.call(rbind,list(...))
,当使用 rbind.data.frame
rbindlist
is an optimized version of do.call(rbind, list(...))
, which is known for being slow when using rbind.data.frame
c $ c> rbindlist 闪耀的是
Some questions that show where rbindlist
shines are
how to merge a list of data.frames by row
> 使用do.call和ldply将数据的长列表(约1百万)转换为单个数据框时遇到问题
这些基准显示了可以快速。
These have benchmarks that show how fast it can be.
rbind.data.frame
执行大量检查,并按名称进行匹配。 (即rbind.data.frame将说明列可能是不同的顺序,并按名称匹配), rbindlist
不做这种检查,并按位置加入
rbind.data.frame
does lots of checking, and will match by name. (i.e. rbind.data.frame will account for the fact that columns may be in different orders, and match up by name), rbindlist
doesn't do this kind of checking, and will join by position
例如
do.call(rbind, list(data.frame(a = 1:2, b = 2:3), data.frame(b = 1:2, a = 2:3)))
## a b
## 1 1 2
## 2 2 3
## 3 2 1
## 4 3 2
rbindlist(list(data.frame(a = 1:5, b = 2:6), data.frame(b = 1:5, a = 2:6)))
## a b
## 1: 1 2
## 2: 2 3
## 3: 1 2
## 4: 2 3
$ b b
rbindlist的一些其他限制
它用于 code>因素,由于已修复的错误:
Some other limitations of rbindlist
It used to struggle to deal with factors
, due to a bug that has since been fixed:
rbindlist两个data.tables其中一个有因子和其他有字符类型的列( Bug#2650 )
它有重复的问题栏名称
请参阅
警告讯息:in rbindlist(allargs):强制引入的NAs:data.table中可能的错误?( Bug#2384 )
rbindlist
可以处理列表
data.frames
和 data.tables
,并返回一个没有rownames的data.table
rbindlist
can handle lists
data.frames
and data.tables
, and will return a data.table without rownames
您可以使用 do.call(rbind,list(...))
查看
you can get in a muddle of rownames using do.call(rbind, list(...))
see
在内存方面 rbindlist
在 C
,因此是内存高效的,它使用 setattr
通过引用设置属性
In terms of memory rbindlist
is implemented in C
, so is memory efficient, it uses setattr
to set attributes by reference
rbind.data.frame
在 R
中实现,它执行很多分配,并使用 attr < -
(和 class < -
和 rownames < -
创建data.frame。
rbind.data.frame
is implemented in R
, it does lots of assigning, and uses attr<-
(and class<-
and rownames<-
all of which will (internally) create copies of the created data.frame.
这篇关于为什么rbindlist“更好”比rbind?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!