按日期删除行并为多个csv添加文件名列 [英] delete rows by date and add file name column for multiple csv

查看:36
本文介绍了按日期删除行并为多个csv添加文件名列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个以,"分隔的 csv 文件,其中包含记录的水管压力传感器数据,已按日期从旧到新排序.对于所有原始文件,第一列始终包含格式为 YYYYMMDD 的日期.我查看了类似的讨论主题,但找不到我需要的内容.

I have multiple "," delimited csv files with recorded water pipe pressure sensor data, already sorted by date older-newer. For all original files, the first column always contains dates formated as YYYYMMDD. I have looked at similar discussion threads but couldn't find what I need.

  1. 用于向目录中的每个 csv 文件添加新列的 Python 脚本,其中标题为Pipe"的新列的每一行都有一个文件名,省略文件扩展名字符串.

  1. Python script to add a new column to every csv file in the directory, where each row of the new column titled as "Pipe" would have a file name, omitting file extension string.

可以选择将截止日期指定为 YYYYMMDD,以便删除原始输入文件中的行.例如,如果某个文件的日期为 20140101 到 20140630,如果日期为 <20140401.

Have the option of specifying a cut off date as YYYYMMDD in order to delete rows in the orginal input file. For example, if some file has dates 20140101 to 20140630, I would like cut out rows of data if their date is < 20140401.

可以选择在进行这些修改后覆盖原始文件或将每个文件保存到不同的目录,文件名与原始文件相同.

Have the option of either to overwrite the original files after having made these modifications or save each file to a different directory, with file names same as the originals.

输入:PipeRed.csv;标题:日期、压力 1、压力 2、温度 1、温度 2 等,

Input: PipeRed.csv; Headers: Date,Pressure1,Pressure2,Temperature1,Temperature2 etc,

输出:PipeRed.csv;标题:管道、日期、压力 1、压力 2、温度 1、温度 2 等,

Output: PipeRed.csv; Headers: Pipe,Date,Pressure1,Pressure2,Temperature1, Temperature2,etc,

我找到了一些代码并对其进行了一些修改,但它不会像上面描述的那样删除行,而是将文件名列添加到最后而不是第一个.

I have found some code and modified it a little, but it doesn't delete rows like was described above and adds the file name column last rather than 1st.

import csv
import sys
import glob
import re

for filename in glob.glob(sys.argv[1]):
#def process_file(filename):
    # Read the contents of the file into a list of lines.
    f = open(filename, 'r')
    contents = f.readlines()
    f.close()

    # Use a CSV reader to parse the contents.
    reader = csv.reader(contents)

    # Open the output and create a CSV writer for it.
    f = open(filename, 'wb')
    writer = csv.writer(f)

    # Process the header.
    writer = csv.writer(f)
    writer.writerow( ('Date','Pressure1','Pressure2','Pressure3','Pressure4','Pipe') )
    header = reader.next()
    header.append(filename.replace('.csv',""))
    writer.writerow(header)

    # Process each row of the body.
    for row in reader:
        row.append(filename.replace('.csv',""))
        writer.writerow(row)

    # Close the file and we're done.
    f.close()

推荐答案

这个功能应该很接近你想要的.我已经在 Python 2.7.9 和 3.4.2 中对其进行了测试.我发布的初始版本有一些问题,因为 —正如我所提到的—它未经测试.我不确定您使用的是 Python 2 还是 Python 3,但这在任何一个中都可以正常工作.

This function should be very close to what you want. I've tested it in both Python 2.7.9 and 3.4.2. The initial version I posted had some problems because — as I mention then — it was untested. I'm not sure if you're using Python 2 or 3, but this worked properly in either one.

与之前版本相比的另一个变化是可选关键字日期参数的名称已从 cutoff_date 更改为 start_date 以更好地反映它的内容.截止日期通常表示可以执行的最后日期某事——与您在问题中使用它的方式相反.另请注意,提供的任何日期都应为字符串,即 start_date='20140401',而不是整数.

Another change from the previous version is that the optional keyword date argument's name had been changed from cutoff_date to start_date to better reflect what it is. A cutoff date usually means the last date on which it is possible to do something—the opposite of the way you used it in your question. Also note that any date provided should a string, i.e. start_date='20140401', not as an integer.

一项增强功能是,如果指定了但不存在,它现在将创建输出目录.

One enhancement is that it will now create the output directory if one is specified but doesn't already exist.

import csv
import os
import sys

def open_csv(filename, mode='r'):
    """ Open a csv file in proper mode depending on Python verion. """
    return (open(filename, mode=mode+'b') if sys.version_info[0] == 2 else
            open(filename, mode=mode, newline=''))

def process_file(filename, start_date=None, new_dir=None):
    # Read the entire contents of the file into memory skipping rows before
    # any start_date given (assuming row[0] is a date column).
    with open_csv(filename, 'r') as f:
        reader = csv.reader(f)
        header = next(reader)  # Save first row.
        contents = [row for row in reader if start_date and row[0] >= start_date
                                                or not start_date]

    # Create different output file path if new_dir was specified.
    basename = os.path.basename(filename)  # Remove dir name from filename.
    output_filename = os.path.join(new_dir, basename) if new_dir else filename
    if new_dir and not os.path.isdir(new_dir):  # Create directory if necessary.
        os.makedirs(new_dir)

    # Open the output file and create a CSV writer for it.
    with open_csv(output_filename, 'w') as f:
        writer = csv.writer(f)

        # Add name of new column to header.
        header = ['Pipe'] + header  # Prepend new column name.
        writer.writerow(header)

        # Data for new column is the base filename without extension.
        new_column = [os.path.splitext( os.path.split(basename)[1] )[0]]

        # Process each row of the body by prepending data for new column to it.
        writer.writerows((new_column+row for row in contents))

这篇关于按日期删除行并为多个csv添加文件名列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆