将多个（excel）文件读入R - 最佳实践 [英] Reading multiple (excel) files into R - Best practice

查看：174 发布时间：2018/8/1 10:45:11 r excel import import-from-excel

本文介绍了将多个（excel）文件读入R - 最佳实践的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有数百个中等大小的Excel文件（5000到50.0000行，大约100列）加载到R.它们有一个明确定义的命名模式，如 x_1.xlsx ， x_2.xlsx 等。

I have hundreds of medium sized Excel files (between 5000 and 50.0000 rows with about 100 columns) to load into R. They have a well-defined naming pattern, like x_1.xlsx, x_2.xlsx, etc.

我知道有很多方法可以将这些文件加载到R中喜欢for-loop或lapply类型的解决方案。因此，我的问题是：

I know there are many ways to load these files into R like for-loop's or lapply-type solutions. Hence, my questions are:

您认为阅读多个文件的最佳（最快，最直接）方法是什么？

What do you consider the best (fastest, most straightforward) approach to reading multiple files?

你使用什么技巧或功能？

What tricks or functions do you use?

推荐答案

使用 list.files 您可以在工作目录中创建所有文件名的列表。接下来，您可以使用 lapply 循环遍历该列表，并使用 read_excel 函数从<$ c $中读取每个文件c> readxl package：

With list.files you can create a list of all the filenames in your workingdirectory. Next you can use lapply to loop over that list and read each file with the read_excel function from the readxl package:

library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)

此方法当然也可用于其他文件读取功能，如 read.csv 或 read.table 。只需用适当的文件读取功能替换 read_excel ，并确保在 list.files 中使用正确的模式。

This method can off course also be used with other file reading functions like read.csv or read.table. Just replace read_excel with the appropriate file reading function and make sure you use the correct pattern in list.files.

如果您还想将文件包含在子目录中，请使用：

If you also want to include the files in subdirectories, use:

file.list <- list.files(pattern='*.xlsx', recursive = TRUE)

用于读取Excel文件的其他可能包： openxlsx & ; xlsx

Other possible packages for reading Excel-files: openxlsx & xlsx

假设每个文件的列都相同，您可以将它们绑定在一个数据框中，其中 bind_rows 来自< a href =/ questions / tagged / dplyrclass =post-tagtitle =show questions tagged'dplyr' =tag> dplyr ：

Supposing the columns are the same for each file, you can bind them together in one dataframe with bind_rows from dplyr:

library(dplyr)
df <- bind_rows(df.list, .id = "id")

或 rbindlist =post-tagtitle =show questions tagged'Data.table' =tag> data.table ：

or with rbindlist from data.table:

library(data.table)
df <- rbindlist(df.list, idcol = "id")

两者都可以选择添加 id 列来识别单独的数据集。

Both have the option to add a id column for identifying the separate datasets.

更新：如果您不想要数字标识符，只需使用 sapply with simplify = FALSE 读取 file.list 中的文件：

Update: If you don't want a numeric identifier, just use sapply with simplify = FALSE to read the files in file.list:

df.list <- sapply(file.list, read.csv, simplify=FALSE)

从 dplyr 或 rbindlist中使用 bind_rows 时 data.table 的code>， id 列现在包含文件名。


When using bind_rows from dplyr or rbindlist from data.table, the id column now contains the filenames.
甚至另一种方法是使用 purrr  -package：
Even another approach is using the purrr-package:
library(purrr)
file.list <- list.files(pattern='*.csv')
file.list <- setNames(file.list, file.list) # only needed when you need an id-column with the file-names

df <- map_df(file.list, read.csv, .id = "id")

 
 
 
 
 
  获取命名列表的其他方法：如果不这样做只需要一个数字标识符，而不是在将它们绑定在一起之前将文件名分配给列表中的数据帧。有几种方法可以做到这一点：




Other approaches to getting a named list: If you don't want just a numeric identifier, than you can assign the filenames to the dataframes in the list before you bind them together. There are several ways to do this:
# with the 'attr' function from base R
attr(df.list, "names") <- file.list
# with the 'names' function from base R
names(df.list) <- file.list
# with the 'setattr' function from the 'data.table' package
setattr(df.list, "names", file.list)

现在，您可以将数据框列表与 data.table 或<$ c中的 rbindlist 绑定在一个数据框中来自 dplyr 的$ c> bind_rows 。 id 列现在将包含文件名而不是数字标识符。

Now you can bind the list of dataframes together in one dataframe with rbindlist from data.table or bind_rows from dplyr. The id column will now contain the filenames instead of a numeric indentifier.

这篇关于将多个（excel）文件读入R - 最佳实践的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将多个（excel）文件读入R - 最佳实践 [英] Reading multiple (excel) files into R - Best practice

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将多个（excel）文件读入R - 最佳实践 [英] Reading multiple (excel) files into R - Best practice

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭