使用Python格式化文件中的日期字符串 [英] Format date string in a file using Python

查看:64
本文介绍了使用Python格式化文件中的日期字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从客户端获取了包含可变列数的csv文件.在这些列中,可能有一些包含日期字符串但未定义顺序的列,例如:

I get csv files from my client which contains variable number of columns. Out of these columns there can be some columns containing date string but the order is not defined, for example :

column1str|column2dt|column3str|column4int|column5int|column6dt
ab c1|10/20/2010|1234|10.02|530.55|30-01-2011
ab c2|10/10/2010|12346|11.03|531|05-05-2012
abc3|10/10/2010|122|12|532.44|11-09-2008
abc4|10/11/2010|110|13|533|01-11-2013
abc5|10/10/2010|11111|14|534|30-02-2012

我从客户端获取日期字符串的格式作为输入,在上面的输入中,日期字符串有两种格式 MM/dd/yyyy dd-MM-yyyy .

I get the format of date string from client as input, in the above input there are two formats of date string MM/dd/yyyy and dd-MM-yyyy.

我想在文件本身中以特定格式 dd-MM-yyyyTHH:mmZ 转换所有日期.当输入日期格式给定时,我知道如何将日期字符串转换为所需的日期字符串.我在这里面临的挑战是如何替换文件中特定列的日期字符串.

I want to convert all the dates in a particular format dd-MM-yyyyTHH:mmZ in the file itself. I know how to convert date string to desired date string when the input date format is given. The challenge I am facing here is how can I replace the date string at particular column in the file.

推荐答案

首先,请阅读该内容以作为Python datetime.strptime()格式字符串的参考: https://docs.python.org/3.5/library/datetime.html#strftime-strptime-行为

First, read that for reference for Python datetime.strptime() format strings: https://docs.python.org/3.5/library/datetime.html#strftime-strptime-behavior

用于CSV解析的内容: https://docs.python.org/3.5/library/csv.html

And that for CSV parsing: https://docs.python.org/3.5/library/csv.html

我的答案将仅使用标准Python.作为有效的替代方法,您可以使用专业的数据分析库,例如已经建议的熊猫.

My answer will use standard Python only. As a valid alternative you could use a specialized data analysis library such as pandas as already suggested.

您的 MM/dd/yyyy 将是strptime格式(实际上是C标准格式)和 dd的%m/%d/%Y -MM-yyyy %d-%m-%Y .

your MM/dd/yyyy would be %m/%d/%Y in strptime format (which is actually C standard format), and dd-MM-yyyy would be %d-%m-%Y.

现在,我不确定您是否希望通过python脚本自动发现"日期,或者是否希望能够手动指定适当的列和格式.因此,我将为两个建议一个脚本:

Now I'm not sure if you want the dates to be "autodiscovered" by your python script or if you want to be able to specify the appropriate columns and formats by hand. So I will suggest a script for both:

这将转换INPUT_DATE_FORMATS映射中指定的列名称和输入格式中的所有日期:

This will convert all dates in the columns names and input formats specified in the INPUT_DATE_FORMATS map:

from datetime import datetime

import csv

# file that will be read as input
INPUT_FILENAME = 'yourfile.csv'
# file that will be produced as output (with properly formatted dates)
OUTPUT_FILENAME = 'newfile.csv'


INPUT_DATE_FORMATS = {'column2dt': '%m/%d/%Y', 'column6dt': '%d-%m-%Y'}

OUTPUT_DATE_FORMAT = '%d-%m-%YT%H:%MZ'

with open(INPUT_FILENAME, 'rt') as finput:
    reader = csv.DictReader(finput, delimiter='|')
    with open(OUTPUT_FILENAME, 'wt') as foutput:
        writer = csv.DictWriter(foutput, fieldnames=reader.fieldnames, delimiter='|') # you can change delimiter if you want
        for row in reader: # read each entry one by one
            for header, value in row.items(): # read each field one by one
                date_format = INPUT_DATE_FORMATS.get(header)
                if date_format:
                    parsed_date = datetime.strptime(value, date_format)
                    row[header] = parsed_date.strftime(OUTPUT_DATE_FORMAT)
            writer.writerow(row)

这将尝试解析输入文件中具有INPUT_DATE_FORMATS中指定的所有格式的每个字段,并将写入一个具有OUTPUT_DATE_FORMAT格式的所有日期的新文件:

This will try yo parse each field in the input file with all formats specificied in INPUT_DATE_FORMATS and will write a new file with all those dates formatted with OUTPUT_DATE_FORMAT:

from datetime import datetime

import csv

# file that will be read as input
INPUT_FILENAME = 'yourfile.csv'
# file that will be produced as output (with properly formatted dates)
OUTPUT_FILENAME = 'newfile.csv'


INPUT_DATE_FORMATS = ('%m/%d/%Y', '%d-%m-%Y')
OUTPUT_DATE_FORMAT = '%d-%m-%YT%H:%MZ'

with open(INPUT_FILENAME, 'rt') as finput:
    reader = csv.DictReader(finput, delimiter='|')
    with open(OUTPUT_FILENAME, 'wt') as foutput:
        writer = csv.DictWriter(foutput, fieldnames=reader.fieldnames, delimiter='|') # you can change delimiter if you want
        for row in reader: # read each entry one by one
            for header, value in row.items(): # read each field one by one
                for date_format in INPUT_DATE_FORMATS: # try to parse a date
                    try:
                        parsed_date = datetime.strptime(value, date_format)
                        row[header] = parsed_date.strftime(OUTPUT_DATE_FORMAT)
                    except ValueError:
                        pass
            writer.writerow(row)

这篇关于使用Python格式化文件中的日期字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆