将数据帧列表转换为一个数据帧 [英] Convert a list of data frames into one data frame
问题描述
我从早期的问题,这是试图做类似但更复杂的事情。
以下是我刚刚开始的一个例子(这是非常简化的例子):
listOfDataFrames< - vector(mode =list,length = 100)
for(i in 1:100){
listOfDataFrames [[i]] < - data.frame(a = sample(letters,500,rep = T),
b = rnorm(500),c = rnorm(500))
}
我目前使用的是:
df< - do.call(rbind,listOfDataFrames)
另一个选项是使用plyr函数:
df< - ldply(listOfDataFrames,data.frame)
这比原来有点慢:
> system.time({df< - do.call(rbind,listOfDataFrames)})
用户系统已用
0.25 0.00 0.25
> system.time({df2< - ldply(listOfDataFrames,data.frame)})
用户系统已用
0.30 0.00 0.29
>相同(df,df2)
[1] TRUE
我的猜测是使用 do.call(rbind,...)
将是您找到的最快的方法,除非您可以执行以下操作:(a)使用矩阵而不是数据。框架和(b)预分配最终矩阵并分配给它而不是增长。
编辑1 :
根据Hadley的评论,这里是CRAN中最新版本的 rbind.fill
> system.time({df3< - rbind.fill(listOfDataFrames)})
用户系统已用
0.24 0.00 0.23
>相同(df,df3)
[1] TRUE
这比rbind容易,稍微快一些(这些时间在多次运行中保持)。据我所知,在github上 plyr
的版本比这更快。
I have code that at one place ends up with a list of data frames which I really want to convert to a single big data frame.
I got some pointers from an earlier question which was trying to do something similar but more complex.
Here's an example of what I am starting with (this is grossly simplified for illustration):
listOfDataFrames <- vector(mode = "list", length = 100)
for (i in 1:100) {
listOfDataFrames[[i]] <- data.frame(a=sample(letters, 500, rep=T),
b=rnorm(500), c=rnorm(500))
}
I am currently using this:
df <- do.call("rbind", listOfDataFrames)
One other option is to use a plyr function:
df <- ldply(listOfDataFrames, data.frame)
This is a little slower than the original:
> system.time({ df <- do.call("rbind", listOfDataFrames) })
user system elapsed
0.25 0.00 0.25
> system.time({ df2 <- ldply(listOfDataFrames, data.frame) })
user system elapsed
0.30 0.00 0.29
> identical(df, df2)
[1] TRUE
My guess is that using do.call("rbind", ...)
is going to be the fastest approach that you will find unless you can do something like (a) use a matrices instead of a data.frames and (b) preallocate the final matrix and assign to it rather than growing it.
Edit 1:
Based on Hadley's comment, here's the latest version of rbind.fill
from CRAN:
> system.time({ df3 <- rbind.fill(listOfDataFrames) })
user system elapsed
0.24 0.00 0.23
> identical(df, df3)
[1] TRUE
This is easier than rbind, and marginally faster (these timings hold up over multiple runs). And as far as I understand it, the version of plyr
on github is even faster than this.
这篇关于将数据帧列表转换为一个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!