按日期删除行和多个CSV文件添加列名 [英] delete rows by date and add file name column for multiple csv

查看:1123
本文介绍了按日期删除行和多个CSV文件添加列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个,分隔CSV记录有水管pressure传感器数据,已被日期的,新的排序的文件。对于所有的原始文件,第一列始终包含格式化为YYYYMMDD日期。我已经看过类似的讨论主题,但无法找到我所需要的。


  1. Python脚本到一个新的列添加到该目录,其中评为管子的新列的每一行会有一个文件名,每个CSV文件,省略文件扩展名的字符串。


  2. 有为了在原单输入文件来删除行指定一个截止日期为年月日的选择。例如,如果有些文件有日期20140101至20140630,我想切出的数据行,如果他们的日期为< 20140401。


  3. 有两种选择覆盖原文件,在作出这些修改后或保存每个文件到不同的目录,与同为正本文件名。


输入:PipeRed.csv;头:日期,pressure1,pressure2,温度1,温度2等,

输出:PipeRed.csv;头:管,日期,pressure1,pressure2,温度1,温度2等,

我已经发现了一些code和修改了一点,但它并不像删除上面描述行并添加文件名列最后,而不是第一个。

 导入CSV
进口SYS
进口水珠
进口重在glob.glob文件名(sys.argv中[1]):
#def process_file(文件名):
    #读取文件的内容到行的列表。
    F =开放(文件名,'R')
    内容= f.readlines()
    f.close()    #使用CSV阅读器解析的内容。
    读者= csv.reader(内容)    #打开输出,并为它创建一个CSV作家。
    F =开放(文件名,世行)
    作家= csv.writer(F)    #过程中的报头。
    作家= csv.writer(F)
    writer.writerow(('日','pressure1','pressure2','pressure3','pressure4','管'))
    标题= reader.next()
    header.append(filename.replace('。CSV,))
    writer.writerow(头)    #过程本体中的每一行。
    在读者排:
        row.append(filename.replace('。CSV,))
        writer.writerow(行)    #关闭文件,我们就大功告成了。
    f.close()


解决方案

这个功能应该是非常接近你想要什么。我已经在这两个的Python 2.7.9和3.4.2进行了测试。我发布的最初版本有一些问题,因为—因为我提到然后—这是未经测试。我不知道,如果你使用Python 2或3,但是这任何一个工作正常。

从previous版本的另一个变化是可选关键字date参数的名字已经从 CUTOFF_DATE 更改为起始日期以更好地反映它是什么。一个截止日期通常是指在其上是可以做到的最后日期什么—你用它在你的问题的方式相反。另外请注意,任何日期应该提供一个字符串,即 START_DATE ='20140401',而不是一个整数。

一个增强是,如果指定了一个现在它将创建输出目录,但不存在。

 导入CSV
进口OS
进口SYS高清open_csv(文件名,模式='R'):
    根据Python的优化版本打开适当的模式csv文件
    回报(开放(文件名,模式=模式+'B'),如果sys.version_info [0] == 2其他
            打开(文件名,模式=模式,换行符=''))高清process_file(文件名,起始日期=无,NEW_DIR =无):
    #读取文件的全部内容复制到内存之前​​跳过行
    #给出任何起始日期(假设行[0]是一个日期列)。
    与open_csv(文件名,'R')为f:
        读卡器= csv.reader(F)
        标题=下一个(阅读器)#保存第一行。
        内容= [读者中排为行如果开始日期和行[0]> =起始日期
                                                还是不START_DATE]    #如果指定NEW_DIR创建不同的输出文件的路径。
    基本名称= os.path.basename(文件名)#从文件名删除目录名。
    输出文件名= os.path.join(NEW_DIR,基名),如果其他NEW_DIR名
    如果NEW_DIR不os.path.isdir(NEW_DIR):如果有必要#创建目录。
        os.makedirs(NEW_DIR)    #打开输出文件,并为它创建一个CSV作家。
    与open_csv(输出文件名,'W')为f:
        作家= csv.writer(F)        #添加新列名头。
        标题= ['管'] +标题#prePEND新的列名。
        writer.writerow(头)        新列的数据#无扩展基本文件名。
        NEW_COLUMN = [os.path.splitext(os.path.split这样(基名)[1])[0]]        #过程身体的每一行由prepending数据,新列了。
        writer.writerows((NEW_COLUMN +一行行的内容))

I have multiple "," delimited csv files with recorded water pipe pressure sensor data, already sorted by date older-newer. For all original files, the first column always contains dates formated as YYYYMMDD. I have looked at similar discussion threads but couldn't find what I need.

  1. Python script to add a new column to every csv file in the directory, where each row of the new column titled as "Pipe" would have a file name, omitting file extension string.

  2. Have the option of specifying a cut off date as YYYYMMDD in order to delete rows in the orginal input file. For example, if some file has dates 20140101 to 20140630, I would like cut out rows of data if their date is < 20140401.

  3. Have the option of either to overwrite the original files after having made these modifications or save each file to a different directory, with file names same as the originals.

Input: PipeRed.csv; Headers: Date,Pressure1,Pressure2,Temperature1,Temperature2 etc,

Output: PipeRed.csv; Headers: Pipe,Date,Pressure1,Pressure2,Temperature1, Temperature2,etc,

I have found some code and modified it a little, but it doesn't delete rows like was described above and adds the file name column last rather than 1st.

import csv
import sys
import glob
import re

for filename in glob.glob(sys.argv[1]):
#def process_file(filename):
    # Read the contents of the file into a list of lines.
    f = open(filename, 'r')
    contents = f.readlines()
    f.close()

    # Use a CSV reader to parse the contents.
    reader = csv.reader(contents)

    # Open the output and create a CSV writer for it.
    f = open(filename, 'wb')
    writer = csv.writer(f)

    # Process the header.
    writer = csv.writer(f)
    writer.writerow( ('Date','Pressure1','Pressure2','Pressure3','Pressure4','Pipe') )
    header = reader.next()
    header.append(filename.replace('.csv',""))
    writer.writerow(header)

    # Process each row of the body.
    for row in reader:
        row.append(filename.replace('.csv',""))
        writer.writerow(row)

    # Close the file and we're done.
    f.close()

解决方案

This function should be very close to what you want. I've tested it in both Python 2.7.9 and 3.4.2. The initial version I posted had some problems because — as I mention then — it was untested. I'm not sure if you're using Python 2 or 3, but this worked properly in either one.

Another change from the previous version is that the optional keyword date argument's name had been changed from cutoff_date to start_date to better reflect what it is. A cutoff date usually means the last date on which it is possible to do something—the opposite of the way you used it in your question. Also note that any date provided should a string, i.e. start_date='20140401', not as an integer.

One enhancement is that it will now create the output directory if one is specified but doesn't already exist.

import csv
import os
import sys

def open_csv(filename, mode='r'):
    """ Open a csv file in proper mode depending on Python verion. """
    return (open(filename, mode=mode+'b') if sys.version_info[0] == 2 else
            open(filename, mode=mode, newline=''))

def process_file(filename, start_date=None, new_dir=None):
    # Read the entire contents of the file into memory skipping rows before
    # any start_date given (assuming row[0] is a date column).
    with open_csv(filename, 'r') as f:
        reader = csv.reader(f)
        header = next(reader)  # Save first row.
        contents = [row for row in reader if start_date and row[0] >= start_date
                                                or not start_date]

    # Create different output file path if new_dir was specified.
    basename = os.path.basename(filename)  # Remove dir name from filename.
    output_filename = os.path.join(new_dir, basename) if new_dir else filename
    if new_dir and not os.path.isdir(new_dir):  # Create directory if necessary.
        os.makedirs(new_dir)

    # Open the output file and create a CSV writer for it.
    with open_csv(output_filename, 'w') as f:
        writer = csv.writer(f)

        # Add name of new column to header.
        header = ['Pipe'] + header  # Prepend new column name.
        writer.writerow(header)

        # Data for new column is the base filename without extension.
        new_column = [os.path.splitext( os.path.split(basename)[1] )[0]]

        # Process each row of the body by prepending data for new column to it.
        writer.writerows((new_column+row for row in contents))

这篇关于按日期删除行和多个CSV文件添加列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆