读取文件夹中的所有文件并将函数应用于每个数据框 [英] Read all files in a folder and apply a function to each data frame

查看:17
本文介绍了读取文件夹中的所有文件并将函数应用于每个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对特定文件夹中的所有文件进行一个相对简单的分析,并将其放入一个函数中.我想知道是否有人有任何提示可以帮助我在许多不同的文件夹上自动化该过程.

I am doing a relatively simple piece of analysis which I have put into a function, on all the files in a particular folder. I was wondering whether anyone had any tips to help me automate the process on a number of different folders.

  1. 首先,我想知道是否有办法将特定文件夹中的所有文件直接读入 R.我相信以下命令将列出所有文件:

文件 <- (Sys.glob("*.csv"))

...这是我从 使用 R 中发现的列出具有指定扩展名的所有文件

然后以下代码将所有这些文件读入 R.

And then the following code reads all those files into R.

listOfFiles <- lapply(files, function(x) read.table(x, header = FALSE)) 

...来自在 R 中操作多个文件

但是这些文件似乎是作为一个连续列表而不是单个文件读入的……我如何更改脚本以将特定文件夹中的所有 csv 文件作为单个数据帧打开?

But the files seem to be read in as one continuous list and not individual files… how can I change the script to open all the csv files in a particular folder as individual dataframes?

  1. 其次,假设我可以单独读取所有文件,我如何一次性完成所有这些数据帧的功能.例如,我创建了四个小数据框,以便说明我想要的内容:

  1. Secondly, assuming that I can read all the files in separately, how do I complete a function on all these dataframes in one go. For example, I have created four small dataframes so I can illustrate what I want:

Df.1 <- data.frame(A = c(5,4,7,6,8,4),B = (c(1,5,2,4,9,1)))
Df.2 <- data.frame(A = c(1:6),B = (c(2,3,4,5,1,1)))
Df.3 <- data.frame(A = c(4,6,8,0,1,11),B = (c(7,6,5,9,1,15)))
Df.4 <- data.frame(A = c(4,2,6,8,1,0),B = (c(3,1,9,11,2,16)))

我还编写了一个示例函数:

I have also made up an example function:

Summary<-function(dfile){
SumA<-sum(dfile$A)
MinA<-min(dfile$A)
MeanA<-mean(dfile$A)
MedianA<-median(dfile$A)
MaxA<-max(dfile$A)

sumB<-sum(dfile$B)
MinB<-min(dfile$B)
MeanB<-mean(dfile$B)
MedianB<-median(dfile$B)
MaxB<-max(dfile$B)

Sum<-c(sumA,sumB)
Min<-c(MinA,MinB)
Mean<-c(MeanA,MeanB)
Median<-c(MedianA,MedianB)
Max<-c(MaxA,MaxB)
rm(sumA,sumB,MinA,MinB,MeanA,MeanB,MedianA,MedianB,MaxA,MaxB)

Label<-c("A","B")
dfile_summary<-data.frame(Label,Sum,Min,Mean,Median,Max)
return(dfile_summary)}

我通常会使用以下命令将该函数应用于每个单独的数据帧.

I would ordinarily use the following command to apply the function to each individual dataframe.

Df1.summary<-Summary(dfile)

Df1.summary<-Summary(dfile)

有没有办法不将函数应用于所有数据框,而是使用汇总表中数据框的标题(即 Df1.summary).

Is there a way instead of applying the function to all the dataframes, and use the titles of the dataframes in the summary tables (i.e. Df1.summary).

非常感谢,

凯蒂

推荐答案

相反,我确实认为使用 list 可以很容易地自动化这些事情.

On the contrary, I do think working with list makes it easy to automate such things.

这是一种解决方案(我将您的四个数据帧存储在文件夹 temp/ 中).

Here is one solution (I stored your four dataframes in folder temp/).

filenames <- list.files("temp", pattern="*.csv", full.names=TRUE)
ldf <- lapply(filenames, read.csv)
res <- lapply(ldf, summary)
names(res) <- substr(filenames, 6, 30)

存储文件的完整路径很重要(就像我对 full.names 所做的那样),否则你必须粘贴工作目录,例如

It is important to store the full path for your files (as I did with full.names), otherwise you have to paste the working directory, e.g.

filenames <- list.files("temp", pattern="*.csv")
paste("temp", filenames, sep="/")

也能用.请注意,我使用 substr 来提取文件名,同时丢弃完整路径.

will work too. Note that I used substr to extract file names while discarding full path.

您可以按如下方式访问汇总表:

You can access your summary tables as follows:

> res$`df4.csv`
       A              B        
 Min.   :0.00   Min.   : 1.00  
 1st Qu.:1.25   1st Qu.: 2.25  
 Median :3.00   Median : 6.00  
 Mean   :3.50   Mean   : 7.00  
 3rd Qu.:5.50   3rd Qu.:10.50  
 Max.   :8.00   Max.   :16.00  

如果你真的想得到单独的汇总表,你可以在之后提取它们.例如,

If you really want to get individual summary tables, you can extract them afterwards. E.g.,

for (i in 1:length(res))
  assign(paste(paste("df", i, sep=""), "summary", sep="."), res[[i]])

这篇关于读取文件夹中的所有文件并将函数应用于每个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆