将脚本重复应用于 R 中的 n 个 .csv 文件的最佳方法是什么? [英] Which is the best method to apply a script repetitively to n .csv files in R?

查看:17
本文介绍了将脚本重复应用于 R 中的 n 个 .csv 文件的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的情况:

  1. 我有许多 .csv 前缀都相同的 csv 文件,但文件名的前两个字符不同(即 AA01.csv、AB01.csv、AC01.csv 等)
  2. 我有一个 R 脚本,我想在每个文件上运行它.该文件实质上是从 .csv 中提取数据并将它们分配给向量/将它们转换为时间序列对象.(例如,AA01 xts 时间序列对象,AB01 xts 对象)

我想要达到的目标:

  1. 将脚本嵌入到更大的循环中(或视情况而定)以按顺序运行每个文件并应用脚本
  2. 删除创建的中间对象(参见下面的代码片段)
  3. 留给我从每个原始数据文件(即 AA01 到 AC01 等作为值/向量等)创建的最终 xts 对象

在 R 中嵌入这个脚本的正确方法是什么?对不起,但我是一个编程菜鸟!

What would be the right way to embed this script in R? Sorry, but I am a programming noob!

我的脚本代码如下...每个 CSV 中每列的标题是日期、时间、值

My script code below...heading of each column in each CSV is DATE, TIME, VALUE

    # Pull in Data from the FileSystem and attach it
AA01raw<-read.csv("AA01.csv")
attach(AA01raw)
#format the data for timeseries work
cdt<-as.character(Date)
ctm<-as.character(Time)
tfrm<-timeDate(paste(cdt,ctm),format ="%Y/%m/%d %H:%M:%S")
val<-as.matrix(Value)
aa01tsobj<-timeSeries(val,tfrm)
#convert the timeSeries object to an xts Object
aa01xtsobj<-as.xts(tsobj)
#remove all the intermediate objects to leave the final xts object
rm(cdt)
rm(ctm)
rm(aa01tsobj)
rm(tfrm)
gc()

然后在每个 .csv 文件上重复,直到提取所有 xts 对象.

and then repeat on each .csv file til all xts objects are extracted.

即,我们最终会在 R 中得到什么,为进一步的应用做好准备:

ie, what we would end up within R, ready for further applications are:

aa01xtsobj, ab01xtsobj, ac01xtsobj....etc

非常感谢有关如何执行此操作的任何帮助.

any help on how to do this would be very much appreciated.

推荐答案

我发现 for 循环和列表对于这样的事情已经足够了.一旦你有了一组工作代码,就很容易从循环移动到一个可以 sapplyied 或类似的函数中,但是这种矢量化无论如何都是特殊的,并且可能在私有之外没有用-衬垫.

I find a for loop and lists is well enough for stuff like this. Once you have a working set of code it's easy enough to move from a loop into a function which can be sapplyied or similar, but that kind of vectorization is idiosyncratic anyway and probably not useful outside of private one-liners.

您可能希望避免分配给工作区中具有不同名称的多个对象(这是一个常见问题解答,通常是我如何分配()...").

You probably want to avoid assigning to multiple objects with different names in the workspace (this a FAQ which usually comes up as "how do I assign() . . .").

请注意我未经测试的代码.

Please beware my untested code.

一个文件名向量,以及一个包含每个文件的命名元素的列表.

A vector of file names, and a list with a named element for each file.

files <- c("AA01.csv", "AA02.csv")
lst <- vector("list", length(files))
names(lst) <- files

遍历每个文件.

library(timeSeries)

for (i in 1:length(files)) {
    ## read strings as character
    tmp <- read.csv(files[i], stringsAsFactors = FALSE)
    ## convert to 'timeDate'
    tmp$tfrm <- timeDate(paste(tmp$cdt, tmp$ctm),format ="%Y/%m/%d %H:%M:%S"))
    ## create timeSeries object
    obj <- timeSeries(as.matrix(tmp$Value), tmp$tfrm)
    ## store object in the list, by name
    lst[[files[i]]] <- as.xts(obj)
}

## clean up
rm(tmp, files, obj)

现在所有读取的对象都在 lst 中,但是您需要测试文件是否可用,是否正确读取,并且您可能希望修改名称以使其更合理不仅仅是文件名.

Now all the read objects are in lst, but you'll want to test that the file is available, that it was read correctly, and you may want to modify the names to be more sensible than just the file name.

从列表中按名称索引打印出第一个对象:

Print out the first object by name index from the list:

lst[[files[1]]]

这篇关于将脚本重复应用于 R 中的 n 个 .csv 文件的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆