为什么 rbindlist “更好"?比rbind? [英] Why is rbindlist "better" than rbind?

查看:24
本文介绍了为什么 rbindlist “更好"?比rbind?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在浏览 data.table 的文档,并且还从这里关于 SO 的一些对话中注意到 rbindlist 应该比 rbind 更好.

I am going through documentation of data.table and also noticed from some of the conversations over here on SO that rbindlist is supposed to be better than rbind.

我想知道为什么 rbindlistrbind 更好,在哪些情况下 rbindlist 确实优于 rbind?

I would like to know why is rbindlist better than rbind and in which scenarios rbindlist really excels over rbind?

在内存利用率方面有什么优势吗?

Is there any advantage in terms of memory utilization?

推荐答案

rbindlistdo.call(rbind, list(...)) 的优化版本,以使用 rbind.data.frame

rbindlist is an optimized version of do.call(rbind, list(...)), which is known for being slow when using rbind.data.frame

显示 rbindlist 亮点的一些问题是

Some questions that show where rbindlist shines are

列表的快速矢量化合并逐行的data.frames

使用 do.call 和 ldply 将一长串 data.frames(约 100 万)转换为单个 data.frame 时出现问题

这些具有显示速度有多快的基准.

These have benchmarks that show how fast it can be.

rbind.data.frame 会进行大量检查,并将按名称进行匹配.(即 rbind.data.frame 将考虑到列可能有不同的顺序,并按名称匹配),rbindlist 不做这种检查,而是按位置加入

rbind.data.frame does lots of checking, and will match by name. (i.e. rbind.data.frame will account for the fact that columns may be in different orders, and match up by name), rbindlist doesn't do this kind of checking, and will join by position

例如

do.call(rbind, list(data.frame(a = 1:2, b = 2:3), data.frame(b = 1:2, a = 2:3)))
##    a b
## 1  1 2
## 2  2 3
## 3  2 1
## 4  3 2

rbindlist(list(data.frame(a = 1:5, b = 2:6), data.frame(b = 1:5, a = 2:6)))
##     a b
##  1: 1 2
##  2: 2 3
##  3: 1 2
##  4: 2 3

<小时>

rbindlist 的一些其他限制

过去难以处理因素,因为一个已经修复的错误:


Some other limitations of rbindlist

It used to struggle to deal with factors, due to a bug that has since been fixed:

rbindlist 两个 data.tables,其中一个具有因子,另一个具有列的字符类型(Bug #2650)

存在重复列名的问题

警告消息:在 rbindlist(allargs) 中:强制引入的 NA:data.table 中可能存在错误? (Bug #2384)

rbindlist 可以处理lists data.framesdata.tables,并且会返回一个data.table没有行名

rbindlist can handle lists data.frames and data.tables, and will return a data.table without rownames

您可以使用 do.call(rbind, list(...)) 进入混乱的行名见

you can get in a muddle of rownames using do.call(rbind, list(...)) see

如何避免在 do.call 中使用 rbind 时重命名行?

在内存方面rbindlist是用C实现的,所以内存效率高,它使用setattr通过引用来设置属性

In terms of memory rbindlist is implemented in C, so is memory efficient, it uses setattr to set attributes by reference

rbind.data.frameR 中实现,它做了很多分配,并使用 attr<- (和 class<-rownames<- 所有这些都将(在内部)创建创建的 data.frame 的副本.

rbind.data.frame is implemented in R, it does lots of assigning, and uses attr<- (and class<- and rownames<- all of which will (internally) create copies of the created data.frame.

这篇关于为什么 rbindlist “更好"?比rbind?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆