如何将多个(excel)文件读入 R? [英] How can I read multiple (excel) files into R?

查看:23
本文介绍了如何将多个(excel)文件读入 R?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有数百个中等大小的 Excel 文件(5000 到 50.0000 行,大约 100 列)要加载到 R 中.它们有明确定义的命名模式,例如 x_1.xlsxx_2.xlsx

I have hundreds of medium sized Excel files (between 5000 and 50.0000 rows with about 100 columns) to load into R. They have a well-defined naming pattern, like x_1.xlsx, x_2.xlsx, etc.

如何以最快、最直接的方式将这些文件加载​​到 R 中?

How can I load these files into R in the fastest, most straightforward way?

推荐答案

使用 list.files 您可以创建工作目录中所有文件名的列表.接下来,您可以使用 lapply 循环遍历该列表并使用 readxl 包中的 read_excel 函数读取每个文件:

With list.files you can create a list of all the filenames in your workingdirectory. Next you can use lapply to loop over that list and read each file with the read_excel function from the readxl package:

library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)

此方法当然也可以与其他文件读取功能一起使用,例如read.csvread.table.只需将 read_excel 替换为适当的文件读取函数,并确保在 list.files 中使用正确的模式.

This method can off course also be used with other file reading functions like read.csv or read.table. Just replace read_excel with the appropriate file reading function and make sure you use the correct pattern in list.files.

如果您还想将文件包含在子目录中,请使用:

If you also want to include the files in subdirectories, use:

file.list <- list.files(pattern='*.xlsx', recursive = TRUE)

其他可能用于读取 Excel 文件的软件包:openxlsx&xlsx

Other possible packages for reading Excel-files: openxlsx & xlsx

假设每个文件的列都相同,您可以使用 :

Supposing the columns are the same for each file, you can bind them together in one dataframe with bind_rows from dplyr:

library(dplyr)
df <- bind_rows(df.list, .id = "id")

或使用 rbindlist 来自 :

library(data.table)
df <- rbindlist(df.list, idcol = "id")

两者都可以选择添加 id 列来标识单独的数据集.

Both have the option to add a id column for identifying the separate datasets.

更新:如果您不需要数字标识符,只需使用 sapplysimplify = FALSE 来读取 file.list:

Update: If you don't want a numeric identifier, just use sapply with simplify = FALSE to read the files in file.list:

df.list <- sapply(file.list, read.csv, simplify=FALSE)

当使用 rbindlist 来自 id 列现在包含文件名.

When using bind_rows from dplyr or rbindlist from data.table, the id column now contains the filenames.

另一种方法是使用 purrr-package:

Even another approach is using the purrr-package:

library(purrr)
file.list <- list.files(pattern='*.csv')
file.list <- setNames(file.list, file.list) # only needed when you need an id-column with the file-names

df <- map_df(file.list, read.csv, .id = "id")

<小时>

获取命名列表的其他方法:如果您不想要一个数字标识符,那么您可以在将它们绑定在一起之前将文件名分配给列表中的数据框.有几种方法可以做到这一点:


Other approaches to getting a named list: If you don't want just a numeric identifier, than you can assign the filenames to the dataframes in the list before you bind them together. There are several ways to do this:

# with the 'attr' function from base R
attr(df.list, "names") <- file.list
# with the 'names' function from base R
names(df.list) <- file.list
# with the 'setattr' function from the 'data.table' package
setattr(df.list, "names", file.list)

现在,您可以使用 data.table 中的 rbindlistdplyr<中的 bind_rows 将数据帧列表绑定到一个数据帧中/em>.id 列现在将包含文件名而不是数字标识符.

Now you can bind the list of dataframes together in one dataframe with rbindlist from data.table or bind_rows from dplyr. The id column will now contain the filenames instead of a numeric indentifier.

这篇关于如何将多个(excel)文件读入 R?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆