为什么rbindlist“更好”比rbind? [英] Why is rbindlist "better" than rbind?

查看:299
本文介绍了为什么rbindlist“更好”比rbind?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过 data.table 的文档,还注意到在这里的一些对话, rbindlist 应该比 rbind 更好。

I am going through documentation of data.table and also noticed from some of the conversations over here on SO that rbindlist is supposed to be better than rbind.

我想知道为什么 rbindlist rbind 更好,并且在 rbindlist c> rbind ?

I would like to know why is rbindlist better than rbind and in which scenarios rbindlist really excels over rbind?

在内存利用方面有什么优势吗?

Is there any advantage in terms of memory utilization?

推荐答案

rbindlist 是优化版本 do.call(rbind,list(...)),当使用 rbind.data.frame

rbindlist is an optimized version of do.call(rbind, list(...)), which is known for being slow when using rbind.data.frame

c $ c> rbindlist 闪耀的是

Some questions that show where rbindlist shines are

how to merge a list of data.frames by row

> 使用do.call和ldply将数据的长列表(约1百万)转换为单个数据框时遇到问题

这些基准显示了可以快速。

These have benchmarks that show how fast it can be.

rbind.data.frame 执行大量检查,并按名称进行匹配。 (即rbind.data.frame将说明列可能是不同的顺序,并按名称匹配), rbindlist 不做这种检查,并按位置加入

rbind.data.frame does lots of checking, and will match by name. (i.e. rbind.data.frame will account for the fact that columns may be in different orders, and match up by name), rbindlist doesn't do this kind of checking, and will join by position

例如

do.call(rbind, list(data.frame(a = 1:2, b = 2:3), data.frame(b = 1:2, a = 2:3)))
##    a b
## 1  1 2
## 2  2 3
## 3  2 1
## 4  3 2

rbindlist(list(data.frame(a = 1:5, b = 2:6), data.frame(b = 1:5, a = 2:6)))
##     a b
##  1: 1 2
##  2: 2 3
##  3: 1 2
##  4: 2 3


$ b b




rbindlist的一些其他限制



用于 code>因素,由于已修复的错误:


Some other limitations of rbindlist

It used to struggle to deal with factors, due to a bug that has since been fixed:

rbindlist两个data.tables其中一个有因子和其他有字符类型的列 Bug#2650

它有重复的问题栏名称

请参阅
警告讯息:in rbindlist(allargs):强制引入的NAs:data.table中可能的错误? Bug#2384

rbindlist 可以处理列表 data.frames data.tables ,并返回一个没有rownames的data.table

rbindlist can handle lists data.frames and data.tables, and will return a data.table without rownames

您可以使用 do.call(rbind,list(...))
查看

you can get in a muddle of rownames using do.call(rbind, list(...)) see

在do.call中使用rbind时如何避免重命名行?

在内存方面 rbindlist C ,因此是内存高效的,它使用 setattr 通过引用设置属性

In terms of memory rbindlist is implemented in C, so is memory efficient, it uses setattr to set attributes by reference

rbind.data.frame R 中实现,它执行很多分配,并使用 attr < - (和 class < - rownames < - 创建data.frame。

rbind.data.frame is implemented in R, it does lots of assigning, and uses attr<- (and class<- and rownames<- all of which will (internally) create copies of the created data.frame.

这篇关于为什么rbindlist“更好”比rbind?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆