pandas 可以自动识别日期吗? [英] Can pandas automatically recognize dates?

查看:184
本文介绍了 pandas 可以自动识别日期吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

今天,我非常惊讶的是,从数据文件(例如)读取数据可以识别值的类型:

  df = pandas.read_csv('test.dat',delimiter = r\s +,names = ['col1','col2','col3'])

例如,可以这样检查:

 df.iterrows()中的i,r的
print type(r ['col1']),type(r ['col2']),type(r ['col3'] )

特别是正确识别整数,浮点数和字符串。但是,我有一列具有以下格式的日期: 2013-6-4 。这些日期被识别为字符串(不是python日期对象)。有没有办法认识大熊猫到认可的日期?

解决方案

你应该添加 parse_dates = True parse_dates = ['column name'] 在阅读时,通常足以神奇地解析它。但是总是有怪异的格式需要手动定义。在这种情况下,您还可以添加一个日期解析器函数,这是最灵活的方式。



假设您的字符串具有datetime列,则:

  dateparse = lambda x:pd.datetime.strptime(x,'%Y-%m-%d%H:% M:%S')

df = pd.read_csv(infile,parse_dates = ['datetime'],date_parser = dateparse)

这样,你甚至可以将多个列组合成一个单一的datetime列,它将'date'和'time'列合并成一个'datetime'列:

  dateparse = lambda x:pd.datetime.strptime(x,'%Y-%m-%d%H:%M:% S')

df = pd.read_csv(infile,parse_dates = {'datetime':['date','time']},date_parser = dateparse)


Today I was positively surprised by the fact that while reading data from a data file (for example) pandas is able to recognize types of values:

df = pandas.read_csv('test.dat', delimiter=r"\s+", names=['col1','col2','col3'])

For example it can be checked in this way:

for i, r in df.iterrows():
    print type(r['col1']), type(r['col2']), type(r['col3'])

In particular integer, floats and strings were recognized correctly. However, I have a column that has dates in the following format: 2013-6-4. These dates were recognized as strings (not as python date-objects). Is there a way to "learn" pandas to recognized dates?

解决方案

You should add parse_dates=True, or parse_dates=['column name'] when reading, thats usually enough to magically parse it. But there are always weird formats which need to be defined manually. In such a case you can also add a date parser function, which is the most flexible way possible.

Suppose you have a column 'datetime' with your string, then:

dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')

df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)

This way you can even combine multiple columns into a single datetime column, this merges a 'date' and a 'time' column into a single 'datetime' column:

dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')

df = pd.read_csv(infile, parse_dates={'datetime': ['date', 'time']}, date_parser=dateparse)

这篇关于 pandas 可以自动识别日期吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆