如何合并R中嵌套文件夹中的csv文件 [英] How to merge csv files from nested folders in R
问题描述
我有大量的csv文件,这些文件位于不同的文件夹中以及需要合并为一个文件的文件夹中的文件夹中.如果它们都在一个目录中会很容易,但是我不知道一种简单的方法可以将它们全部拉出不同的文件夹.我可以将它们一个接一个地合并,但是有很多.
I have a large collection of csv files that are in different folders and in folders within folders that I need to merge into one file. It would be easy if they were all in one directory but I don't know of a simple way to pull them all out of the different folders. I could combine them one by one but there are A LOT of them.
例如:
+ working directory
|
+-- · data.csv
+-- · data2.csv
+-- + NewFolder
|
+-- · data3.csv
+-- + NewFolder2
|
+-- · data4.csv
我想要一个结合了所有数据csv文件的文件
I want one file that combines all data csv files
推荐答案
您可以使用正则表达式以过滤.csv
文件.一个例子:
You can use dir()
with recursive
set to TRUE
to list all files in the folder tree, and you can use pattern
to define a regular expression to filter the .csv
files. An example:
csv_files <- dir(pattern='.*[.]csv', recursive = T)
甚至更好,更简单(感谢speendo的评论):
or even better and simpler (thanks to speendo for his comment):
csv_files <- dir(pattern='*.csv$', recursive = T)
说明.
-
pattern='*.csv$
:pattern
参数必须是用于过滤文件名的正则表达式.此RegEx筛选出以.csv
结尾的文件名. 如果要过滤以data
开头的内容,则应尝试以下模式:pattern='^data.*.csv$'
-
recursive=T
:强制dir()
递归遍历工作目录下的所有文件夹.
pattern='*.csv$
: Thepattern
argument must be a regular expression that filters the file names. This RegEx filters out the file names that end with.csv
.If you want to filter that starts with
data
, you should try a pattern like this:pattern='^data.*.csv$'
recursive=T
: Forcesdir()
to traverse recursively through all folders below the working directory.
获得文件列表后,并假设它们都具有相同的结构(即所有文件具有相同的列),则可以将它们与read.csv()
和rbind()
合并:
After you get the file list, and assuming all of them have the same structure (i.e. all the files have the same columns), you can merge them with read.csv()
and rbind()
:
for(i in 1:length(csv_files)) {
if(i == 1)
df <- read.csv(csv_files[i])
else
df <- rdbind(df, read.csv(csv_files[i]))
}
Ramnath在他的评论中建议了一种更快的方式来合并.csv
文件(同样,假设所有文件都具有相同的结构):
Ramnath suggests in his comment a faster way to merge the .csv
files (again, assuming all of them have the same structure):
library(dplyr)
df <- rbind_all(lapply(csv_files, read_csv))
这篇关于如何合并R中嵌套文件夹中的csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!