将具有多个工作表的多个xlsx文件读取到一个R数据帧中 [英] Read multiple xlsx files with multiple sheets into one R data frame

查看:211
本文介绍了将具有多个工作表的多个xlsx文件读取到一个R数据帧中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在学习如何读取多个xlsx文件并将其合并到一个R数据框中,并且遇到了一些非常好的建议,例如

I have been reading up on how to read and combine multiple xlsx files into one R data frame and have come across some very good suggestions like, How to read multiple xlsx file in R using loop with specific rows and columns, but non fits my data set so far.

我希望R读取具有多个工作表的多个xlsx文件.所有工作表和文件的列均相同,但长度不同,因此应排除NA.我想跳过前三行,而只接受列1:6、8:10、12:17、19.

I would like R to read in multiple xlsx files with that have multiple sheets. All sheets and files have the same columns but not the same length and NA's should be excluded. I want to skip the first 3 rows and only take in columns 1:6, 8:10, 12:17, 19.

到目前为止,我已经尝试过:

So far I tried:

file.list <- list.files(recursive=T,pattern='*.xlsx')

dat = lapply(file.list, function(i){
    x = read.xlsx(i, sheetIndex=1, sheetName=NULL, startRow=4,
              endRow=NULL, as.data.frame=TRUE, header=F)
# Column select 
    x = x[, c(1:6,8:10,12:17,19)]
# Create column with file name  
    x$file = i
# Return data
    x
  })

  dat = do.call("rbind.data.frame", dat)

但这只会占用每个文件的所有第一张纸

But this only takes all the first sheet of every file

有人知道如何在一个R数据框中将所有图纸和文件放在一起吗?

Does anyone know how to get all the sheets and files together in one R data frame?

此外,对于大型数据集,您会推荐哪些软件包?到目前为止,我尝试了readxl和XLConnect.

Also, what packages would you recommend for large sets of data? So far I tried readxl and XLConnect.

推荐答案

我将使用这样的嵌套循环遍历每个文件的每个工作表. 它可能不是最快的解决方案,但它是最简单的.

I would use a nested loop like this to go through each sheet of each file. It might not be the fastest solution but it is the simplest.

require(xlsx)    
file.list <- list.files(recursive=T,pattern='*.xlsx')  #get files list from folder

for (i in 1:length(files.list)){                                           
  wb <- loadWorkbook(files.list[i])           #select a file & load workbook
  sheet <- getSheets(wb)                      #get sheet list

  for (j in 1:length(sheet)){ 
    tmp<-read.xlsx(files.list[i], sheetIndex=j, colIndex= c(1:6,8:10,12:17,19),
                   sheetName=NULL, startRow=4, endRow=NULL,
                   as.data.frame=TRUE, header=F)   
    if (i==1&j==1) dataset<-tmp else dataset<-rbind(dataset,tmp)   #happend to previous

  }
}

您可以在加载阶段之后清除NA值.

You can clean NA values after the loading phase.

这篇关于将具有多个工作表的多个xlsx文件读取到一个R数据帧中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆