自动更新CSV文件 [英] Updating CSV files automatically

查看:330
本文介绍了自动更新CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近开始学习Python(5小时前)。这是我的情况。

I started learning Python recently (5 hrs ago). Here's my scenario.

我从远程测量站点每隔4小时收到一次测量值的电子邮件。文件格式为* .csv,文件名为 XX-2011-00001.csv YY-2011-00001.csv 。这些是以不同采样间隔连续运行的两个仪器的数据。文件存储在本地文件夹中。

I get mails every 4 hours from a remote measurement site with measurement values. The files are in *.csv format and the filenames are XX-2011-00001.csv and YY-2011-00001.csv. These are data of two instruments continuously running with different sampling intervals. The files are stored in local folders.

我想开发一个可读取档案的指令码(例如: XX-2011-00001.csv )并使用相同的数据写入新的csv文件。 4小时后,脚本应该再次运行,现在只读取新文件 XX-2011-00002.csv ,并将此数据附加到创建的新csv文件。我想让这个脚本在无限循环中运行,以便脚本检查新文件并将其添加到CSV文件。

I want to develop a script that would read a file (example: XX-2011-00001.csv) and write a new csv file with same data. After 4 hours the script should run again and now read only the new file XX-2011-00002.csv and append this data to the new csv file created. I want to make this script run in an infinite loop, such that the script checks for new file and adds it to the CSV file.

该文件包含日期,时间和值字段。

The file contains ‘Date’, ‘Time’ and ‘value’ fields.

你能帮我告诉模块,我应该研究这个脚本吗?如果你有任何例子,我会非常感谢。

Can you please help me in telling the modules that I should look into for writing this script? If you have any examples I would be really thankful.

推荐答案

csv模块将有助于读/写您的文件。你想要使用一个有睡眠的无限循环 - 例如:

The csv module will help in reading/writing your files. You'll want to use an infinite loop with a sleep -- something like:

while True:
    process_new_file()     # does nothing if no new file
    time.sleep(60)

process_new_file 将需要检查新文件,这可能是棘手的 - 你不想尝试使用一个文件,在它完成写入之前!这样的东西应该工作:

process_new_file will need to check for new files, which can be tricky -- you don't want to try using a file before it's finished being written to! Something like this should work:

def check_for_new_file(directory=INCOMING, files={}):
    for file in os.listdir(directory):
        if file in files:
            break
        size = os.stat(file)[stat.ST_SIZE]
        files[file] = (datetime.time.now(), size)
    now = datetime.time.now()
    for file, last_time, last_size in files.items():
        current_size = os.stat(file)[stat.ST_SIZE]
        if current_size != last_size:
            files[file] = (now, current_size)
            continue
        if now - last_time <= TIME_WITH_NO_WRITES:
            return file
    raise NoneReady()

现在我们有一个函数,在 INCOMING 目录中的任何文件,并返回一个文件名,当它已经休眠足够长,以合理地确保它是完整的,我们需要一个函数来实际处理该文件,然后移动

Now that we have a function that will keep track of any files in the INCOMING directory, and return a filename when it's been dormant long enough to be reasonably sure it's complete, we need a function to actually process the file, then move it somewhere for safekeeping.

def process_new_file():
    try:
        filename = check_for_new_file()   # raises ValueError if no file ready
    except NoneReady:
        return
    in_file = open(filename, 'rb')
    csv_file_in = csv.reader(in_file)
    out_file = open(MASTER_CSV, 'rb+')
    csv_file_out = csv.writer(out_file)
    for row in csv_file_in:
        csv_file_out.write(row)
    csv_file_out.close()
    csv_file_in.close()
    shutil.move(filename, PROCESSED)



它们都在一起,完成导入和全局:

To put it all together, complete with imports and globals:

import os
import stat
import shutil

INCOMING = '/some/path/with/new/files/'
PROCESSED = '/some/path/for/processed/files/'
TIME_WITH_NO_WRITES = 600  # 10 minutes

def check_for_new_file(directory=INCOMING, files={}):
    for file in os.listdir(directory):
        if file in files:
            break
        size = os.stat(file)[stat.ST_SIZE]
        files[file] = (datetime.time.now(), size)
    now = datetime.time.now()
    for file, last_time, last_size in files.items():
        current_size = os.stat(file)[stat.ST_SIZE]
        if current_size != last_size:
            files[file] = (now, current_size)
            continue
        if now - last_time <= TIME_WITH_NO_WRITES:
            return file
    raise NoneReady()

def process_new_file():
    try:
        filename = check_for_new_file()   # raises ValueError if no file ready
    except NoneReady:
        return
    in_file = open(filename, 'rb')
    csv_file_in = csv.reader(in_file)
    out_file = open(MASTER_CSV, 'rb+')
    csv_file_out = csv.writer(out_file)
    for row in csv_file_in:
        csv_file_out.write(row)
    csv_file_out.close()
    csv_file_in.close()
    shutil.move(filename, PROCESSED)

if __name__ == '__main__':
    while True:
        process_new_file()     # does nothing if no new file
        time.sleep(60)

此代码目前未经测试,因此可能存在一个或两个错误,如果某处出现错误,它将停止运行。希望这将有助于你去。

This code is currently untested, so there may be a bug or two in it, and if there is an error somewhere it will stop running. Hopefully this will help get you going.

这篇关于自动更新CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆