读取文件夹中的所有文件,并对每个数据框应用功能 [英] Read all files in a folder and apply a function to each data frame

查看:72
本文介绍了读取文件夹中的所有文件,并对每个数据框应用功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对一个特定文件夹中的所有文件进行一个相对简单的分析,并将其放入函数中.我想知道是否有人提供了一些技巧来帮助我在许多不同的文件夹上自动化该过程.

I am doing a relatively simple piece of analysis which I have put into a function, on all the files in a particular folder. I was wondering whether anyone had any tips to help me automate the process on a number of different folders.

  1. 首先,我想知道是否有一种方法可以将特定文件夹中的所有文件直接读取到R中.我相信以下命令将列出所有文件:

文件<--(Sys.glob("*.csv"))

...我从>使用R列出具有指定扩展名的所有文件

然后下面的代码将所有这些文件读入R.

And then the following code reads all those files into R.

listOfFiles <- lapply(files, function(x) read.table(x, header = FALSE)) 

...来自在R中处理多个文件

但是文件似乎是作为一个连续列表而不是单个文件读取的……我如何更改脚本以将单个文件夹中的所有csv文件作为单个数据帧打开?

But the files seem to be read in as one continuous list and not individual files… how can I change the script to open all the csv files in a particular folder as individual dataframes?

  1. 其次,假设我可以分别读取所有文件,那么如何一次性完成所有这些数据帧的功能.例如,我创建了四个小数据框,以便可以说明我想要的内容:

  1. Secondly, assuming that I can read all the files in separately, how do I complete a function on all these dataframes in one go. For example, I have created four small dataframes so I can illustrate what I want:

Df.1 <- data.frame(A = c(5,4,7,6,8,4),B = (c(1,5,2,4,9,1)))
Df.2 <- data.frame(A = c(1:6),B = (c(2,3,4,5,1,1)))
Df.3 <- data.frame(A = c(4,6,8,0,1,11),B = (c(7,6,5,9,1,15)))
Df.4 <- data.frame(A = c(4,2,6,8,1,0),B = (c(3,1,9,11,2,16)))

我还组成了一个示例函数:

I have also made up an example function:

Summary<-function(dfile){
SumA<-sum(dfile$A)
MinA<-min(dfile$A)
MeanA<-mean(dfile$A)
MedianA<-median(dfile$A)
MaxA<-max(dfile$A)

sumB<-sum(dfile$B)
MinB<-min(dfile$B)
MeanB<-mean(dfile$B)
MedianB<-median(dfile$B)
MaxB<-max(dfile$B)

Sum<-c(sumA,sumB)
Min<-c(MinA,MinB)
Mean<-c(MeanA,MeanB)
Median<-c(MedianA,MedianB)
Max<-c(MaxA,MaxB)
rm(sumA,sumB,MinA,MinB,MeanA,MeanB,MedianA,MedianB,MaxA,MaxB)

Label<-c("A","B")
dfile_summary<-data.frame(Label,Sum,Min,Mean,Median,Max)
return(dfile_summary)}

通常,我将使用以下命令将该功能应用于每个单独的数据框.

I would ordinarily use the following command to apply the function to each individual dataframe.

Df1.summary<-摘要(dfile)

Df1.summary<-Summary(dfile)

有没有一种方法可以将功能应用于所有数据框,并在汇总表(即Df1.summary)中使用数据框的标题.

Is there a way instead of applying the function to all the dataframes, and use the titles of the dataframes in the summary tables (i.e. Df1.summary).

非常感谢,

凯蒂

推荐答案

相反,我认为使用 list 可以很容易地自动执行此类操作.

On the contrary, I do think working with list makes it easy to automate such things.

这是一种解决方案(我将您的四个数据帧存储在文件夹 temp/中).

Here is one solution (I stored your four dataframes in folder temp/).

filenames <- list.files("temp", pattern="*.csv", full.names=TRUE)
ldf <- lapply(filenames, read.csv)
res <- lapply(ldf, summary)
names(res) <- substr(filenames, 6, 30)

存储文件的完整路径非常重要(就像我对 full.names 所做的那样),否则必须粘贴工作目录,例如

It is important to store the full path for your files (as I did with full.names), otherwise you have to paste the working directory, e.g.

filenames <- list.files("temp", pattern="*.csv")
paste("temp", filenames, sep="/")

也可以.请注意,我使用 substr 提取文件名,而放弃完整路径.

will work too. Note that I used substr to extract file names while discarding full path.

您可以按以下方式访问摘要表:

You can access your summary tables as follows:

> res$`df4.csv`
       A              B        
 Min.   :0.00   Min.   : 1.00  
 1st Qu.:1.25   1st Qu.: 2.25  
 Median :3.00   Median : 6.00  
 Mean   :3.50   Mean   : 7.00  
 3rd Qu.:5.50   3rd Qu.:10.50  
 Max.   :8.00   Max.   :16.00  

如果您确实要获取单个汇总表,则可以在以后提取它们.例如

If you really want to get individual summary tables, you can extract them afterwards. E.g.,

for (i in 1:length(res))
  assign(paste(paste("df", i, sep=""), "summary", sep="."), res[[i]])

这篇关于读取文件夹中的所有文件,并对每个数据框应用功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆