使用R和Openxlsx在单个Excel文件中将数据框列表作为工作表输出 [英] Use R and Openxlsx to output a list of dataframes as worksheets in a single Excel file

查看:91
本文介绍了使用R和Openxlsx在单个Excel文件中将数据框列表作为工作表输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组CSV文件.我想打包它们,并将数据导出到包含多个工作表的单个Excel文件中.我以一组数据帧的形式读取CSV文件.

I have a set of CSV files. I want to package them up and export the data to a single Excel file that contains multiple worksheets. I read in the CSV files as a set of data frames.

我的问题是如何在openxlsx中构造命令,我可以手动执行,但是遇到列表构造问题.具体来说,如何添加数据框作为命名列表的子组件,然后将其作为参数传递给write.xlsx()

My problem is how to construct the command in openxlsx, I can do it manually, but I am having a list construction issue. Specifically how to add a data frame as a subcomponent of a named list and then pass as a parameter to write.xlsx()

好吧,所以我首先列出磁盘上的CSV文件并在内存中生成一组数据帧...

Ok, so I first list the CSV files on disk and generate a set of data frames in memory...

# Generate a list of csv files on disk and shorten names... 
filePath <- "../02benchmark/results/results_20170330/"
filePattern <- "*.csv"
fileListwithPath = list.files(path = filePath, pattern = filePattern, full.names = TRUE)
fileList = list.files(path = filePath, pattern = filePattern, full.names = FALSE)

datasets <- gsub("*.csv$", "", fileList)
datasets <- gsub("sample_", "S", datasets)
datasets

# Now generate the dataframes for each csv file...
list2env(
  lapply(setNames(fileListwithPath, make.names(datasets)),
         read.csv), envir = .GlobalEnv)

示例输出:

dput(datasets)
c("S10000_R3.3.2_201703301839", "S10000_T4.3.0_201703301843", 
"S20000_R3.3.2_201703301826", "S20000_T4.3.0_201703301832", "S280000_R3.3.2_201704020847", 
"S280000_T4.3.0_201704021100", "S290000_R3.3.2_201704020447", 
"S290000_T4.3.0_201704020702", "S30000_R3.3.2_201703301803", 
"S30000_T4.3.0_201703301817", "S310000_R3.3.2_201704012331", 
"S310000_T4.3.0_201704020242", "S320000_R3.3.2_201704011827", 
"S320000_T4.3.0_201704012128", "S330000_R3.3.2_201704011304", 
"S330000_T4.3.0_201704011546", "S340000_R3.3.2_201704010652", 
"S340000_T4.3.0_201704011010", "S350000_R3.3.2_201704010020", 
"S350000_T4.3.0_201704010404", "S360000_R3.3.2_201703311819", 
"S360000_T4.3.0_201703312134", "S370000_R3.3.2_201703310914", 
"S370000_T4.3.0_201703311301", "S380000_R3.3.2_201703310134", 
"S380000_T4.3.0_201703310509", "S390000_R3.3.2_201703301846", 
"S390000_T4.3.0_201703302252", "S40000_R3.3.2_201703301738", 
"S40000_T4.3.0_201703301752", "S50000_R3.3.2_201703301707", "S50000_T4.3.0_201703301724", 
"S60000_R3.3.2_201703301624", "S60000_T4.3.0_201703301647", "S70000_R3.3.2_201703301535", 
"S70000_T4.3.0_201703301602", "S80000_R3.3.2_201703301430", "S80000_T4.3.0_201703301508", 
"S90000_R3.3.2_201703301324", "S90000_T4.3.0_201703301400")

现在我们有一组数据框,我们希望创建一个包含多个工作表的excel文件...

wb <- createWorkbook()
saveWorkbook(wb, 'output.xlsx')

lapply(names(myList), function(x) write.xlsx(myList[[x]], 'output.xlsx', sheetName=x, append=TRUE))

问题:

问题是我可以手动创建列表结构,并且可以确认它是否有效,但我似乎无法自动构造列表.

Problem:

The problem is I can create the list structure manually and can confirm it works BUT I cannot seem to construct the list automatically.

myList <- sapply(datasets,function(x) NULL)
names(myList)
str(myList)
myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)

因此:

> str(myList)
List of 40
 $ S10000_R3.3.2_201703301839 :'data.frame':    43 obs. of  4 variables:
  ..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
  ..$ user     : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
  ..$ system   : num [1:43] 0.63 0.065 0.001 0.004 0 ...
  ..$ elapsed  : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
 $ S10000_T4.3.0_201703301843 : NULL
 $ S20000_R3.3.2_201703301826 : NULL
 ...

特定问题:如何将每个数据框追加到列表中...

myList <- lapply( myList, function(x) eval(x) )

我在这里的lapply有什么错?上面的lapply()不会遍历列表,也不会将数据框追加到名称列表条目中.

what am I doing wrong with lapply here? The above lapply() does not iterate through the list and append the data frame to the name list entry.

i.e. myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)
> str(myList)
    List of 40
     $ S10000_R3.3.2_201703301839 :'data.frame':    43 obs. of  4 variables:
      ..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
      ..$ user     : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
      ..$ system   : num [1:43] 0.63 0.065 0.001 0.004 0 ...
      ..$ elapsed  : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
     $ S10000_T4.3.0_201703301843 : NULL
     $ S20000_R3.3.2_201703301826 : NULL
     ...

我想念什么?感谢所有帮助.是的,我可以肯定我缺少明显的东西……但是……我很沮丧.

What am I missing? All help gratefully appreciated. Yes, I am pretty certain I am missing something obvious... but... I am stumped.

推荐答案

我没有您的数据框,因此无法测试,但是下面的代码与我需要阅读的方法类似并编写Excel文件.下面的代码使用xlsx包,因为这是我所熟悉的,但是希望您可以在需要使用openxlsx的情况下对其进行修改.

I don't have your data frames, so I can't test this, but the code below is similar to the approach I use when I need to read and write Excel files. The code below uses the xlsx package, as that's what I'm familiar with, but hopefully you can adapt it if you need to use openxlsx.

library(xlsx)

首先,将文件读入列表.像这样:

First, read the files into a list. Something like this:

filePath <- "../02benchmark/results/results_20170330/"
filePattern <- "*.csv"
fileListwithPath = list.files(path = filePath, 
                              pattern = filePattern, 
                              full.names = TRUE)
fileList = list.files(path = filePath, pattern = filePattern, full.names = FALSE)
fileListwithPath = setNames( fileListwithPath, 
                             list.files(path = filePath, pattern = filePattern))
df.list = lapply(fileListwithPath, read.csv)

# Now we rename the List Names for use in worksheets...
# Remove .csv and sample_ prefix used in filenames...
# Reult in workbook S<size>_<R version>_<date>
names(df.list) <- gsub("\\.csv$","", names(df.list))
names(df.list) <- gsub("sample_","S", names(df.list))

您现在有了一个列表,其中每个元素是一个数据框,每个元素的名称是文件的名称.现在,让我们将每个数据框写入同一Excel工作簿中的不同工作表,然后将文件另存为xlsx文件:

You now have a list in which each element is a data frame and each element's name is the name of the file. Now, let's write each data frame to a different worksheet in the same Excel workbook and then save the file as an xlsx file:

wb = createWorkbook()

lapply( names(df.list), 
        function(df) {
          sheet = createSheet(wb, df)
          addDataFrame(df.list[[df]], sheet = sheet, row.names = FALSE)
          } )

saveWorkbook(wb, "My_workbook.xlsx")

我已经将读写csv文件分开来进行说明,但是您可以将它们组合为一个函数,该函数可以读取每个单独的csv文件并将其写入单个Excel工作簿中的新工作表中.

I've separated reading and writing the csv files for illustration, but you can combine them into a single function that reads each individual csv file and writes it to a new sheet in a single Excel workbook.

这篇关于使用R和Openxlsx在单个Excel文件中将数据框列表作为工作表输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆