将数据帧列表转换为一个数据帧 [英] Convert a list of data frames into one data frame

查看:142
本文介绍了将数据帧列表转换为一个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个代码,在一个地方结束了一个数据帧列表,我真的想要转换为一个大的数据帧。



我从早期的问题,这是试图做类似但更复杂的事情。



以下是我刚刚开始的一个例子(这是非常简化的例子):

  listOfDataFrames<  -  vector(mode =list,length = 100)

for(i in 1:100){
listOfDataFrames [[i]] < - data.frame(a = sample(letters,500,rep = T),
b = rnorm(500),c = rnorm(500))
}

我目前使用的是:

  df<  -  do.call(rbind,listOfDataFrames)


解决方案>

另一个选项是使用plyr函数:

  df<  -  ldply(listOfDataFrames,data.frame) 

这比原来有点慢:

 > system.time({df<  -  do.call(rbind,listOfDataFrames)})
用户系统已用
0.25 0.00 0.25
> system.time({df2< - ldply(listOfDataFrames,data.frame)})
用户系统已用
0.30 0.00 0.29
>相同(df,df2)
[1] TRUE

我的猜测是使用 do.call(rbind,...)将是您找到的最快的方法,除非您可以执行以下操作:(a)使用矩阵而不是数据。框架和(b)预分配最终矩阵并分配给它而不是增长。



编辑1



根据Hadley的评论,这里是CRAN中最新版本的 rbind.fill

 > system.time({df3<  -  rbind.fill(listOfDataFrames)})
用户系统已用
0.24 0.00 0.23
>相同(df,df3)
[1] TRUE

这比rbind容易,稍微快一些(这些时间在多次运行中保持)。据我所知,在github上 plyr 的版本比这更快。


I have code that at one place ends up with a list of data frames which I really want to convert to a single big data frame.

I got some pointers from an earlier question which was trying to do something similar but more complex.

Here's an example of what I am starting with (this is grossly simplified for illustration):

listOfDataFrames <- vector(mode = "list", length = 100)

for (i in 1:100) {
    listOfDataFrames[[i]] <- data.frame(a=sample(letters, 500, rep=T),
                             b=rnorm(500), c=rnorm(500))
}

I am currently using this:

  df <- do.call("rbind", listOfDataFrames)

解决方案

One other option is to use a plyr function:

df <- ldply(listOfDataFrames, data.frame)

This is a little slower than the original:

> system.time({ df <- do.call("rbind", listOfDataFrames) })
   user  system elapsed 
   0.25    0.00    0.25 
> system.time({ df2 <- ldply(listOfDataFrames, data.frame) })
   user  system elapsed 
   0.30    0.00    0.29
> identical(df, df2)
[1] TRUE

My guess is that using do.call("rbind", ...) is going to be the fastest approach that you will find unless you can do something like (a) use a matrices instead of a data.frames and (b) preallocate the final matrix and assign to it rather than growing it.

Edit 1:

Based on Hadley's comment, here's the latest version of rbind.fill from CRAN:

> system.time({ df3 <- rbind.fill(listOfDataFrames) })
   user  system elapsed 
   0.24    0.00    0.23 
> identical(df, df3)
[1] TRUE

This is easier than rbind, and marginally faster (these timings hold up over multiple runs). And as far as I understand it, the version of plyr on github is even faster than this.

这篇关于将数据帧列表转换为一个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆