将excel文件读入R并将其中的所有表合并到单个数据框中 [英] Reading excel file into R and Merging all the sheets in it into a single dataframe

查看:368
本文介绍了将excel文件读入R并将其中的所有表合并到单个数据框中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张表格不同的Excel表格(它可以有2张或3张或仅仅取决于用户)。
所有表格的标题是相同的。 (前两行被视为标题)



我想将所有这些表合并在一个数据框中。



文件格式为.xlsx。并且每张表都包含大量行(30列和8000行)。



我是在阅读R中的excel文件的初学者。我正在经历他们,在平均时间,如果有人知道如何实现,请让我知道。



Excel表格的一个例子是这样的一个数据



PS:我想实现所有这一切有光泽。所以,如果有任何有效的闪光方法,请提及。



ui.R



'pre> fileInput('file2',h5('选择你的观察数据'),accept = c('text / csv','text /逗号分隔值,平原','xlsx'))

server.R p>

  b<  -  reactive({
fileinput2< - 输入$ file2
if(is.null fileinput2))
return(NULL)
#xlfile< - list.files(pattern =.xlsx)
xlfile< - fileinput2 [1]
wb< - loadWorkbook(xl_file)
sheet_ct< - wb $ getNumberOfSheets()
b< - rbindlist(pblapply(1:sheet_ct,function(x){
res < - read.xlsx
(b)
}),填充= TRUE)
b< - b [-c(1),]
print(b)
} / code>


解决方案

您可以从连接R和Excel的百万种方式。这是一个基本的代码片段,您可以使用 xlsx 包来更进一步。它是基于java的,它是l o w,因此使用进度条。



这个代码段采用了非常幼稚的方法,因为我曾经看过的每一个真实世界的excel电子表格通常都是一堆可怕的数字疯狂,很少能保证列全部排列正确,始终如一地命名。为此,我勉强使用填写<$ $ data.table 中的 rbindlist c $ c>选项来处理任何列的不一致。



结果并不完美(你需要注意额外的标题行)从Excel读取,这也远非完美。

  library(xlsx)#excel reading 
library(pbapply )#free progress bars
library(data.table)#rbindlist

xl_file< - Data.xlsx

wb< - loadWorkbook(xl_file)
sheet_ct< - wb $ getNumberOfSheets()

dat < - rbindlist(pblapply(1:sheet_ct,function(x){
res < - read.xlsx xl_file,x)
}),fill = TRUE)

头(dat)
## EN MN ED HO TM SL PH DI TA DI.1 CH PI
## 1:##天小时degC NA NAμmol/ Lμmol/ kgμmol/ Lμg/ Lμmol/ L
## 2:1 1 1 12 9.9 31.23 7.82 2126.1575 2151 15.3 0.93 NA
## 3:1 1 2 36 9.59 31.17 7.84 2120.4175 2150 14.2 1.2044 0.69
## 4:1 1 3 60 9.65 31.13 7.84 2110.885 2143 14.3 0.9137 2.85
## 5:1 1 4 84 10.36 31.16 7.83 2105.4525 2137 13.8 0.7189 7.29
## 6:1 1 5 108 10.06 31.13 7.84 2106.4775 2139 13.7 0.317 5.24
## PO PN PP DC DN DP TP Exp.num Mesocosm
## 1:μmol/ Lμmol / Lμmol/ Lμmol/ Lμmol/ Lμmol/ Lμmol/ L NA NA
## 2:NA 2.319 0.032 100.4 NA NA 5.6306 NA NA
## 3:24.16 2.598 0.048 104.5 NA NA 2.3034 NA NA
## 4:34.815 2.095 0.059 NA NA NA 2.5594 NA NA
## 5:40.999 2.186 0.056 97.5 NA NA 5.8865 NA NA
## 6:37.751 2.173 0.081 NA NA NA 6.1425 NA NA
##实验日小时温度盐度pH DIC DIN Chl.a PIC POC PON POP DOC
## 1:NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 2:NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA $ b ## 4:NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 5:NA NA NA NA NA NA NA NA NA NA NA NA NA
## 6:NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## DON DOP TEP
## 1:NA NA NA
## 2:NA NA NA
## 3: NA NA NA
## 4:NA NA NA
## 5:NA NA NA
## 6:NA NA NA

另一种方法是将Excel文件批量转换为CSV ,这可能从长远来看更有效力。


I have a excel sheet with varying number of sheets in it (it can have 2 sheets or 3 sheets or just depends on the user). Headers of all the sheets are same. (first two rows are considered as headers)

I want to merge all these sheets and put it into a single dataframe.

The files are of .xlsx format. and each sheets contain huge number of rows (30 columns and 8000 rows).

I am a beginner in reading excel files in R. I am going through them, in the mean time if anyone knows how to implement this please let me know.

An example of Excel sheet is something like this Data

PS: I want to implement all this in shiny. so, please do mention if there is any efficient method for shiny.

ui.R

fileInput('file2', h5('Choose Your Observation Data'), accept=c('text/csv','text/comma-separated-values,text/plain','.xlsx'))

server.R

b <- reactive({
   fileinput2 <- input$file2
   if (is.null(fileinput2))
   return(NULL)
   #xlfile <- list.files(pattern = ".xlsx")
   xlfile <- fileinput2[1]
   wb <- loadWorkbook(xl_file)
   sheet_ct <- wb$getNumberOfSheets()
   b <- rbindlist(pblapply(1:sheet_ct, function(x) {
     res <- read.xlsx(xl_file, x)
   }), fill=TRUE)
   b <- b [-c(1),]
   print (b)
   })

解决方案

You can start with A million ways to connect R and Excel. This is a basic snippet to get you a bit further along using the xlsx package. It's java-based and it's s l o w, hence the use of progress bars.

This snippet takes a very naive approach since every real world excel spreadsheet I've ever seen is usually a wretched pile of numerical madness and one can rarely guarantee the columns are all lined up properly and consistently named. To that end, I wantonly use rbindlist from data.table with the fill option to deal with any column inconsistencies.

The result is not perfect (you'll need to take care of the extra header row), but you're reading from Excel, which is also far from perfect.

library(xlsx)        # excel reading
library(pbapply)     # free progress bars
library(data.table)  # rbindlist

xl_file <- "Data.xlsx"

wb <- loadWorkbook(xl_file)
sheet_ct <- wb$getNumberOfSheets()

dat <- rbindlist(pblapply(1:sheet_ct, function(x) {
  res <- read.xlsx(xl_file, x)
}), fill=TRUE)

head(dat)
##    EN MN  ED   HO    TM    SL   PH        DI      TA   DI.1     CH     PI
## 1:  #  # day hour  degC    NA   NA    µmol/L µmol/kg µmol/L   µg/L µmol/L
## 2:  1  1   1   12   9.9 31.23 7.82 2126.1575    2151   15.3   0.93     NA
## 3:  1  1   2   36  9.59 31.17 7.84 2120.4175    2150   14.2 1.2044   0.69
## 4:  1  1   3   60  9.65 31.13 7.84  2110.885    2143   14.3 0.9137   2.85
## 5:  1  1   4   84 10.36 31.16 7.83 2105.4525    2137   13.8 0.7189   7.29
## 6:  1  1   5  108 10.06 31.13 7.84 2106.4775    2139   13.7  0.317   5.24
##        PO     PN     PP     DC     DN     DP     TP Exp.num Mesocosm
## 1: µmol/L µmol/L µmol/L µmol/L µmol/L µmol/L µmol/L      NA       NA
## 2:     NA  2.319  0.032  100.4     NA     NA 5.6306      NA       NA
## 3:  24.16  2.598  0.048  104.5     NA     NA 2.3034      NA       NA
## 4: 34.815  2.095  0.059     NA     NA     NA 2.5594      NA       NA
## 5: 40.999  2.186  0.056   97.5     NA     NA 5.8865      NA       NA
## 6: 37.751  2.173  0.081     NA     NA     NA 6.1425      NA       NA
##    Exp.day Hour Temperature Salinity pH DIC DIN Chl.a PIC POC PON POP DOC
## 1:      NA   NA          NA       NA NA  NA  NA    NA  NA  NA  NA  NA  NA
## 2:      NA   NA          NA       NA NA  NA  NA    NA  NA  NA  NA  NA  NA
## 3:      NA   NA          NA       NA NA  NA  NA    NA  NA  NA  NA  NA  NA
## 4:      NA   NA          NA       NA NA  NA  NA    NA  NA  NA  NA  NA  NA
## 5:      NA   NA          NA       NA NA  NA  NA    NA  NA  NA  NA  NA  NA
## 6:      NA   NA          NA       NA NA  NA  NA    NA  NA  NA  NA  NA  NA
##    DON DOP TEP
## 1:  NA  NA  NA
## 2:  NA  NA  NA
## 3:  NA  NA  NA
## 4:  NA  NA  NA
## 5:  NA  NA  NA
## 6:  NA  NA  NA

An alternate approach would be to batch convert your Excel files to CSV, which may have more efficacy in the long run.

这篇关于将excel文件读入R并将其中的所有表合并到单个数据框中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆