将几个数据帧合并到一个带有循环的数据帧中 [英] Merge several data.frames into one data.frame with a loop

查看:155
本文介绍了将几个数据帧合并到一个带有循环的数据帧中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将 merge 几个 data.frames 放入一个 data.frame 。由于我有一个完整的文件列表,我试图用循环结构来完成。



到目前为止,循环方法工作正常。然而,它看起来效率很低,我想知道是否有一个更快,更容易的方法。

这是情景:
我有一个目录有几个 .csv 文件。每个文件都包含可用作合并变量的相同标识符。由于文件的大小相当大,我以为一次只读一个文件到R中,而不是一次读取所有文件。
因此,我用 list.files 获取目录的所有文件,并读入前两个文件。之后我用 merge 来得到一个 data.frame

  FileNames<  -  list.files(path =... / tempDataFolder /)
FirstFile< - read.csv(file = paste(... / tempDataFolder /,FileNames [1],sep =),
header = T,na.strings =NULL)
SecondFile < - read.csv(file = paste( ../tempDataFolder/,FileNames [2],sep =),
header = T,na.strings =NULL)
dataMerge< - merge(FirstFile,SecondFile,by = c(COUNTRYNAME,COUNTRYCODE,Year),
all = T)



<现在我用一个作为循环来获取所有剩余的 .csv 文件和 merge 它们到已经存在的 data.frame 中:

 <$ c (文件名)){
ReadInMerge< - read.csv(file = paste(... / tempDataFolder /,FileNames [i],sep =) ,
header = T,na.strings =NULL)
dataMerge< - merge(dataMerge,ReadInMerge,by = c(COUNTRY NAME,COUNTRYCODE,Year),
all = T)
}

即使它工作的很好,我想知道是否有一个更优雅的方式来完成这项工作吗? 解决方案

你可能想看看在相关的问题在stackoverflow

我会分两步来处理这个问题:导入所有数据(使用 plyr ),然后合并到一起:

 文件名<  -  list.files(path =... / tempDataFolder /,full.names = TRUE)
library(plyr)
import.list< - llply(filenames,read.csv)

这将给你一个你现在需要合并在一起的所有文件的列表。有很多方法可以做到这一点,但这里有一个方法(使用 Reduce ):

 <$ (x,y)合并(x,y,all = T,
by = c(COUNTRYNAME,COUNTRYCODE,Year)),导入。 list,accumulate = F)

或者,您可以使用重塑如果您对 Reduce 不满意,包:

  library(reshape)
data< - merge_recurse(import.list)


I am trying to merge several data.frames into one data.frame. Since I have a whole list of files I am trying to do it with a loop structure.

So far the loop approach works fine. However, it looks pretty inefficient and I am wondering if there is a faster and easier approach.

Here is the scenario: I have a directory with several .csv files. Each file contains the same identifier which can be used as the merger variable. Since the files are rather large in size I thought to read each file one at a time into R instead of reading all files at once. So I get all the files of the directory with list.files and read in the first two files. Afterwards I use merge to get one data.frame.

FileNames <- list.files(path=".../tempDataFolder/")
FirstFile <- read.csv(file=paste(".../tempDataFolder/", FileNames[1], sep=""),
             header=T, na.strings="NULL")
SecondFile <- read.csv(file=paste(".../tempDataFolder/", FileNames[2], sep=""),
              header=T, na.strings="NULL")
dataMerge <- merge(FirstFile, SecondFile, by=c("COUNTRYNAME", "COUNTRYCODE", "Year"),
             all=T)

Now I use a for loop to get all the remaining .csv files and merge them into the already existing data.frame:

for(i in 3:length(FileNames)){ 
ReadInMerge <- read.csv(file=paste(".../tempDataFolder/", FileNames[i], sep=""),
               header=T, na.strings="NULL")
dataMerge <- merge(dataMerge, ReadInMerge, by=c("COUNTRYNAME", "COUNTRYCODE", "Year"),
             all=T)
}

Even though it works just fine I was wondering if there is a more elegant way to get the job done?

解决方案

You may want to look at the closely related question on stackoverflow.

I would approach this in two steps: import all the data (with plyr), then merge it together:

filenames <- list.files(path=".../tempDataFolder/", full.names=TRUE)
library(plyr)
import.list <- llply(filenames, read.csv)

That will give you a list of all the files that you now need to merge together. There are many ways to do this, but here's one approach (with Reduce):

data <- Reduce(function(x, y) merge(x, y, all=T, 
    by=c("COUNTRYNAME", "COUNTRYCODE", "Year")), import.list, accumulate=F)

Alternatively, you can do this with the reshape package if you aren't comfortable with Reduce:

library(reshape)
data <- merge_recurse(import.list)

这篇关于将几个数据帧合并到一个带有循环的数据帧中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆