pandas 解析日期从csv [英] pandas parse dates from csv

查看:151
本文介绍了 pandas 解析日期从csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试读取包含日期的csv文件。 csv如下所示:

I am trying to read a csv file which includes dates. The csv looks like this:

h1,h2,h3,h4,h5
A,B,C,D,E,20150420
A,B,C,D,E,20150420
A,B,C,D,E,20150420

对于阅读csv,我使用这个代码:

For reading the csv I use this code:

df = pd.read_csv(filen,
    index_col=None,
    header=0,
    parse_dates=[5],
    date_parser=lambda t:parse(t))

解析函数如下所示:

def parse(t):
    string_ = str(t)
    try:
        return datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
    except:
        return datetime.date(1900,1,1)

我现在奇怪的是,在解析函数中, t 如下所示:

My strange problem now is that in the parsing function, t looks like this:

ndarray: ['20150420' '20150420' '20150420']

您可以看到 t 是数据列的整个数组。我认为在解析第一行时应该只是第一个值,第二个值在解析第二行时等等。现在, parse 总是在except-block,因为 int(string _ [:4])包含一个括号,显然,它不能转换为int。解析函数只能一次解析一个日期(例如 20150420 )。

As you can see t is the whole array of the data column. I think it should be only the first value when parsing the first row, the second value, when parsing the second row, etc. Right now, the parse always ends up in the except-block because int(string_[:4]) contains a bracket, which, obviously, cannot be converted to an int. The parse function is built to parse only one date at a time (e.g. 20150420) in the first place.

我做错了什么?

编辑:

关于 date_parser 参数的熊猫文档,其中似乎按预期工作(当然))。所以我需要调整我的代码。我上面的例子是从别的地方复制和粘贴,我预计它会工作,因此我的问题..我会回报,当我做代码适应。

okay, I just read in the pandas doc about the date_parser argument, and it seems to work as expected (of course ;)). So I need to adapt my code to that. My above example is copy&pasted from somewhere else and I expected it to work, hence, my question.. I will report back, when I did my code adaption.

EDIT2:

我的解析函数现在看起来像这样,我认为代码现在可以工作。如果我还在做错事,请让我知道:

My parse function now looks like this, and I think, the code works now. If I am still doing something wrong, please let me know:

def parse(t):
    ret = []
    for ts in t:
        string_ = str(ts)
        try:
            tsdt = datetime.date(int(string_[:4]), int(string_[4:6]), int(string_[6:]))
        except:
            tsdt = datetime.date(1900,1,1)
        ret.append(tsdt)
    return ret


推荐答案

有六列,只有五个标题在第一行。这就是为什么 parse_dates 失败。您可以跳过第一行:

There are six columns, but only five titles in the first line. This is why parse_dates failed. you can skip the first line:

df = pd.read_csv("tmp.csv",  header=None, skiprows=1, parse_dates=[5])

这篇关于 pandas 解析日期从csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆