如何将多个(excel)文件读入R? [英] How can I read multiple (excel) files into R?
问题描述
我有数百个中等大小的Excel文件(5000至50.0000行之间,约有100列)要加载到R中.它们具有定义明确的命名模式,例如x_1.xlsx
,x_2.xlsx
等.
I have hundreds of medium sized Excel files (between 5000 and 50.0000 rows with about 100 columns) to load into R. They have a well-defined naming pattern, like x_1.xlsx
, x_2.xlsx
, etc.
如何以最快,最直接的方式将这些文件加载到R中?
How can I load these files into R in the fastest, most straightforward way?
推荐答案
使用list.files
,您可以创建工作目录中所有文件名的列表.接下来,您可以使用lapply
遍历该列表,并使用readxl
包中的read_excel
函数读取每个文件:
With list.files
you can create a list of all the filenames in your workingdirectory. Next you can use lapply
to loop over that list and read each file with the read_excel
function from the readxl
package:
library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)
此方法当然也可以与其他文件读取功能(如read.csv
或read.table
)一起使用.只需将read_excel
替换为适当的文件读取功能,并确保在list.files
中使用正确的模式即可.
This method can off course also be used with other file reading functions like read.csv
or read.table
. Just replace read_excel
with the appropriate file reading function and make sure you use the correct pattern in list.files
.
如果您还希望将文件包含在子目录中,请使用:
If you also want to include the files in subdirectories, use:
file.list <- list.files(pattern='*.xlsx', recursive = TRUE)
用于读取Excel文件的其他可能软件包: openxlsx & xlsx
Other possible packages for reading Excel-files: openxlsx & xlsx
假设每个文件的列相同,则可以使用bind_rows将它们绑定在一个数据框中标记为'dplyr'"rel =" tag> dplyr :
Supposing the columns are the same for each file, you can bind them together in one dataframe with bind_rows
from dplyr:
library(dplyr)
df <- bind_rows(df.list, .id = "id")
或使用rbindlist
来自数据.表格:
library(data.table)
df <- rbindlist(df.list, idcol = "id")
两者都可以选择添加id
列以标识单独的数据集.
Both have the option to add a id
column for identifying the separate datasets.
更新:如果您不希望使用数字标识符,只需将sapply
与simplify = FALSE
一起使用以读取file.list
中的文件:
Update: If you don't want a numeric identifier, just use sapply
with simplify = FALSE
to read the files in file.list
:
df.list <- sapply(file.list, read.csv, simplify=FALSE)
在rbindlist
来自 data.table ,id
列现在包含文件名.
When using bind_rows
from dplyr or rbindlist
from data.table, the id
column now contains the filenames.
另一种方法是使用purrr
-package:
Even another approach is using the purrr
-package:
library(purrr)
file.list <- list.files(pattern='*.csv')
file.list <- setNames(file.list, file.list) # only needed when you need an id-column with the file-names
df <- map_df(file.list, read.csv, .id = "id")
获取命名列表的其他方法:如果不想只使用数字标识符,则可以先将文件名分配给列表中的数据框,然后再将它们绑定在一起.有几种方法可以做到这一点:
Other approaches to getting a named list: If you don't want just a numeric identifier, than you can assign the filenames to the dataframes in the list before you bind them together. There are several ways to do this:
# with the 'attr' function from base R
attr(df.list, "names") <- file.list
# with the 'names' function from base R
names(df.list) <- file.list
# with the 'setattr' function from the 'data.table' package
setattr(df.list, "names", file.list)
现在,您可以使用 data.table 中的rbindlist
或 dplyr 中的bind_rows
将数据帧列表绑定到一个数据帧中. id
列现在将包含文件名而不是数字标识符.
Now you can bind the list of dataframes together in one dataframe with rbindlist
from data.table or bind_rows
from dplyr. The id
column will now contain the filenames instead of a numeric indentifier.
这篇关于如何将多个(excel)文件读入R?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!