在目录中导入最新的csv文件 [英] Import newest csv file in directory

查看:110
本文介绍了在目录中导入最新的csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标:
-将最新文件(.csv)从本地目录导入R

Goal:
- Import the newest file (.csv) from a local directory into R

目标详细信息:
-每天在我的Mac上将一个csv文件上传到一个文件夹.我希望能够在我的R脚本中合并一个函数,该函数会自动将最新文件导入到我的工作空间中以进行进一步分析.该文件每天大约在4:30 AM上载
-我希望此功能在早上运行(不早于6AM,因此这里有足够的回旋余地)

Goal Details:
- A csv file is uploaded to a folder daily on my Mac. I would like to be able to incorporate a function in my R script that automatically imports the newest file into my workspace for further analysis. The file is uploaded daily around 4:30AM
- I would like this function to be run in the morning (no earlier than 6AM so there's plenty of time for leeway here)

输入详细信息:
-文件类型:.csv
-命名约定:示例文件名:"2014年7月28日04:37:47 -0400.csv"
-频率:每天导入@〜04:30

Input Details:
- file type: .csv
- naming convention: example file name: "28 Jul 2014 04:37:47 -0400.csv"
- frequency: daily import @ ~ 04:30

我尝试过的事情:
-我知道这似乎是一个较弱的尝试,但我真的对如何在下面修改此功能感到迷茫.
-我在纸上的想法是获取"最新文件的ID,而不是将其粘贴(粘贴)到目录名之前,然后是中提琴! (但是las,我的编程技能缺少在此处编写代码的方法)
-下面的代码是试图运行的代码,但只是挂起"而没有完成.我是从在这里找到R论坛

What I've Tried:
- I know this may seem like a weak attempt but I'm really at a loss on how to amend this function below.
- My thought on paper is to 'grab' the id of the newest file, than paste() it in front of the directory name, then viola! (but alas my programming skills are lacking to code this here)
- The code below is what tried to run but it just 'hangs' and doesn't finish. I got this code from this R forum found here

代码:

lastChange = file.info(directory)$mtime 
while(TRUE){ 
  currentM = file.info(directory)$mtime 
  if(currentM != lastChange){ 
    lastChange = currentM 
    read.csv(directory) 
  } 
  # try again in 10 minutes 
  Sys.sleep(600) 
} 

我的环境:
-R 3.1
-Mac OS X 10.9.4(Mavericks)

My Environment:
- R 3.1
- Mac OS X 10.9.4 (Mavericks)

非常感谢您的帮助! :-)

Thank you so much in advance for any help! :-)

推荐答案

以下函数使用时间戳文件来跟踪"已使用时间戳文件处理过的文件.它可以在R实例中连续运行(如您首先建议的那样),也可以通过单运行实例运行,这有助于@andrew建议执行cron作业. (cat()命令主要用于测试;请随时删除.)

The following function uses a timestamp file to "keep track" of files that have been processed with the use of a timestamp file. It can be run either continually in an R instance (as you first suggested), or by way of single-run instances, lending to @andrew's suggestion of a cron job. (The cat() command is included primarily for testing; feel free to remove it.)

processDir <- function(directory = '.', pattern = '*.csv', loop = FALSE, delay = 600,
                       stampFile = file.path(directory, '.csvProcessor')) {
    if (! file.exists(stampFile))
        file.create(stampFile)
    firstRun <- TRUE
    while (firstRun || loop) {
        firstRun <- FALSE
        stampTime <- file.info(stampFile)$mtime
        allFilesDF <- file.info(list.files(path = directory, pattern = pattern,
                                           full.names = TRUE, no.. = TRUE))
        unprocessedFiles <- allFilesDF[(! allFilesDF$isdir) &
                                       (allFilesDF$mtime > stampTime), ]
        if (nrow(unprocessedFiles)) {
            ## We need to update the timestamp on stampFile quickly so
            ## that files added while this is running will be found in the
            ## next loop.
            ## WARNING: this blindly truncates the stampFile.
            file.create(stampFile, showWarnings = FALSE)
            for (fn in rownames(unprocessedFiles)) {
                cat('Processing ', fn, '\n')
                ## read.csv(fn)
                ## ...
            }
        }
        if (loop) Sys.sleep(delay)
    }
}

正如您最初建议的那样,在连续运行的R实例中运行它将很简单:

As you initially suggested, running it in a continually-running R instance would simply be:

processDir(loop = TRUE)

要使用@andrew的cron作业建议,请在函数定义后添加以下行:

To use @andrew's suggestion of a cron job, append the following line after the function definition:

processDir()

...并使用类似于以下内容的crontab文件:

... and use a crontab file similar to the following:

# crontab
0 8 * * * path/to/Rscript path/to/processDir.R

希望这会有所帮助.

这篇关于在目录中导入最新的csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆