使用循环将多个 data.frames 合并为一个 data.frame [英] Merge several data.frames into one data.frame with a loop
问题描述
我正在尝试merge
几个data.frames
到一个data.frame
.因为我有一个完整的文件列表,所以我试图用循环结构来做.
I am trying to merge
several data.frames
into one data.frame
. Since I have a whole list of files I am trying to do it with a loop structure.
到目前为止循环方法工作正常.但是,它看起来效率很低,我想知道是否有更快更简单的方法.
So far the loop approach works fine. However, it looks pretty inefficient and I am wondering if there is a faster and easier approach.
场景如下:我有一个包含多个 .csv
文件的目录.每个文件都包含可用作合并变量的相同标识符.由于文件相当大,我想一次将每个文件一个读取到 R 中,而不是一次读取所有文件.所以我用 list.files
获取目录的所有文件并读入前两个文件.之后我使用 merge
得到一个 data.frame
.
Here is the scenario:
I have a directory with several .csv
files. Each file contains the same identifier which can be used as the merger variable. Since the files are rather large in size I thought to read each file one at a time into R instead of reading all files at once.
So I get all the files of the directory with list.files
and read in the first two files. Afterwards I use merge
to get one data.frame
.
FileNames <- list.files(path=".../tempDataFolder/")
FirstFile <- read.csv(file=paste(".../tempDataFolder/", FileNames[1], sep=""),
header=T, na.strings="NULL")
SecondFile <- read.csv(file=paste(".../tempDataFolder/", FileNames[2], sep=""),
header=T, na.strings="NULL")
dataMerge <- merge(FirstFile, SecondFile, by=c("COUNTRYNAME", "COUNTRYCODE", "Year"),
all=T)
现在我使用 for
循环来获取所有剩余的 .csv
文件并将它们merge
到已经存在的 数据中.框架
:
Now I use a for
loop to get all the remaining .csv
files and merge
them into the already existing data.frame
:
for(i in 3:length(FileNames)){
ReadInMerge <- read.csv(file=paste(".../tempDataFolder/", FileNames[i], sep=""),
header=T, na.strings="NULL")
dataMerge <- merge(dataMerge, ReadInMerge, by=c("COUNTRYNAME", "COUNTRYCODE", "Year"),
all=T)
}
尽管它工作得很好,但我想知道是否有更优雅的方式来完成工作?
Even though it works just fine I was wondering if there is a more elegant way to get the job done?
推荐答案
您可能想仔细查看 stackoverflow 上的相关问题.
我会分两步来解决这个问题:导入所有数据(使用 plyr
),然后将它们合并在一起:
I would approach this in two steps: import all the data (with plyr
), then merge it together:
filenames <- list.files(path=".../tempDataFolder/", full.names=TRUE)
library(plyr)
import.list <- llply(filenames, read.csv)
这将为您提供您现在需要合并在一起的所有文件的列表.有很多方法可以做到这一点,但这里有一种方法(使用 Reduce
):
That will give you a list of all the files that you now need to merge together. There are many ways to do this, but here's one approach (with Reduce
):
data <- Reduce(function(x, y) merge(x, y, all=T,
by=c("COUNTRYNAME", "COUNTRYCODE", "Year")), import.list, accumulate=F)
或者,如果您对 Reduce
不满意,您可以使用 reshape
包执行此操作:
Alternatively, you can do this with the reshape
package if you aren't comfortable with Reduce
:
library(reshape)
data <- merge_recurse(import.list)
这篇关于使用循环将多个 data.frames 合并为一个 data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!