从多个CSV文件计算一列的平均值 [英] Calculate the mean of one column from several CSV files

查看:354
本文介绍了从多个CSV文件计算一列的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的文件夹中有300多个CSV文件(名为001.csv,002.csv等).每个包含一个带有标题的数据帧.我正在编写一个带有三个参数的函数:文件的位置,要计算平均值的列的名称(在数据框内)以及要在计算中使用的文件.

I have over 300 CSV files in a folder (named 001.csv, 002.csv and so on). Each contains a data frame with a header. I am writing a function that will take three arguments: the location of the files, the name of the column you want to calculate the mean (inside the data frames), and the files to use in the calculation.

这是我的功能:

pollutantmean2 <- function(directory = getwd(), pollutant, id = 1:332) {

    # add one or two zeros to ID so that they match the CSV file names
    filenames <- sprintf("%03d.csv", id)

    # path to specdata folder
    # if no path is provided, default is working directory
    filedir <- file.path(directory, filenames)

    # get the data from selected ID or IDs from the specified path
    dataset <- read.csv(filedir, header = TRUE)

    # calculate mean removing all NAs
    polmean <- mean(dataset$pollutant, na.rm = TRUE)

    # return mean
    polmean

}

看来我的代码有两件事.要对其进行分解,我将该函数分为两个单独的函数来处理两个任务:1)获取所需的文件,2)计算所需列的平均值(又名pollutant).

It appears there are two things wrong with my code. To break it down, I separated the function into two separate function to handle the two tasks: 1) get the required files and 2) calculate the mean of the desired column (aka pollutant).

任务1:获取适当的文件-只要我只需要一个文件,它就可以工作.如果选择一系列文件,例如1:25,则会收到一条错误消息,提示Error in file(file, "rt") : invalid 'description' argument.我已经用Google搜索了此错误,但仍然不知道如何解决该错误.

Task 1: Getting the appropriate files - It works as long as I only want one file. If I select a range of files, such as 1:25 I get an error message that says Error in file(file, "rt") : invalid 'description' argument. I have Googled this error but still have no clue how to fix it.

# function that obtains csv files and stores them
getfile <- function(directory = getwd(), id) {
    filenames <- sprintf("%03d.csv", id)
    filedir <- file.path(directory, filenames)
    dataset <- read.csv(filedir, header = TRUE)
    dataset
}

如果我运行getfile("specdata", 1),它可以正常工作,但是如果我运行getfile("specdata", 1:10),则会出现以下错误:Error in file(file, "rt") : invalid 'description' argument.

If I run getfile("specdata", 1) it works fine, but if I run getfile("specdata", 1:10) I get the following error: Error in file(file, "rt") : invalid 'description' argument.

任务2:计算指定命名列的平均值-假设我有一个可用的数据框,然后尝试使用以下函数计算平均值:

Task 2: Calculating mean of specified named column - Assuming I have a usable data frame, I then try to calculate the mean with the following function:

calcMean <- function(dataset, pollutant) {
    polmean <- mean(dataset$pollutant, na.rm = TRUE)
    polmean
}

但是如果我运行calcMean(mydata, "sulfate")(其中mydata是我手动加载的数据帧),则会收到错误消息: Warning message: In mean.default(dataset$pollutant, na.rm = TRUE) : argument is not numeric or logical: returning NA

But if I run calcMean(mydata, "sulfate") (where mydata is a data frame I loaded manually) I get an error message: Warning message: In mean.default(dataset$pollutant, na.rm = TRUE) : argument is not numeric or logical: returning NA

奇怪的是,如果我在控制台中运行mean(mydata$sulfate, na.rm = TRUE),它会正常工作.

The odd thing is that if I run mean(mydata$sulfate, na.rm = TRUE) in the console, it works fine.

我已经研究了好几天,经过无休止的调整后,我的构想耗尽了.

I have researched this for several days and after endless tweaking, I have run out of ideas.

推荐答案

您不需要更多功能.根据我的理解,该解决方案可以更简单地分为6行:

You do not need more functions. The solution can be simpler from my understanding in 6 lines:

pollutantmean <- function(directory, pollutant, id = 1:10) {
filenames <- sprintf("%03d.csv", id)
filenames <- paste(directory, filenames, sep="/")
ldf <- lapply(filenames, read.csv)
df=ldply(ldf)
# df is your list of data.frames
mean(df[, pollutant], na.rm = TRUE)
}

这篇关于从多个CSV文件计算一列的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆