解析日期当年月日和小时在使用pandas在python的单独的列 [英] Parse dates when year month day and hour are in separate columns using pandas in python
问题描述
阅读后
在YYYYMMDD和HH在Python中使用pandas在不同的列中解析日期
和
使用python pandas解析CSV格式的日期格式为年,日,小时, / a>
After reading Parse dates when YYYYMMDD and HH are in separate columns using pandas in Python and Using python pandas to parse CSV with date in format Year, Day, Hour, Min, Sec
我仍然无法解析年份,月份,日期和时间的分隔列。我的数据看起来像这样(第零列是ID,第一是年,第二是月,第三是日,第四是小时,第五是值)
I still am not able to parse dates with separated columns for year, month, day and hour. My data looks like this (zeroth column is ID, first is year, second is month, third is day, fourth is hour and fifth is value)
50136 2011 1 1 21 9792
50136 2011 1 1 22 9794
50136 2011 1 1 23 9796
50136 2011 1 1 0 9798
50136 2011 1 1 1 9799
50136 2011 1 1 2 9802
我试过以下:
df = pd.read_csv(file,parse_dates = {'date':[1,2,3,4]},,index_col ='date')
获取索引不是时间戳而是作为unicode(?)
I've tried following:
df = pd.read_csv(file, parse_dates = {'date': [1, 2, 3, 4]}, , index_col='date')
, but then I get index not as timestamp but as unicode(?)
In [17]: print df.head()
Out [17]:
0 5
date
2011 1 1 21 50136 9792
2011 1 1 22 50136 9794
2011 1 1 23 50136 9796
2011 1 1 0 50136 9798
2011 1 1 1 50136 9799
In [18]: print df.index
Out [18]:
Index([u'2011 1 1 21', u'2011 1 1 22', u'2011 1 1 23', u'2011 1 1 0', u'2011 1 1 1', u'2011 1 1 2'], dtype=object)
我显然做错了事,但我不知道。
I'm obviously doing something wrong, but I can't figure it out. Any advise is really appreciated.
推荐答案
如果常规方法不工作,你总是可以自己编写自己的解析器。创建一个接受 parse_dates
中的列并返回 datetime
的函数并添加 date_parser
。
If the regular methods dont work you can always fallback on writing your own parser. Make a function which accepts the columns from parse_dates
and returns a datetime
and add that functions with date_parser
.
像这样:
df = pd.read_csv(file, header=None, index_col='datetime',
parse_dates={'datetime': [1,2,3,4]},
date_parser=lambda x: pd.datetime.strptime(x, '%Y %m %d %H'))
返回:
0 5
datetime
2011-01-01 21:00:00 50136 9792
2011-01-01 22:00:00 50136 9794
2011-01-01 23:00:00 50136 9796
2011-01-01 00:00:00 50136 9798
2011-01-01 01:00:00 50136 9799
2011-01-01 02:00:00 50136 9802
编辑:
也许它更清楚,如果你写一个正常的函数,而不是一个lambda:
edit:
Perhaps its more clear if you write it like a normal function instead of a lambda:
def dt_parse(date_string):
dt = pd.datetime.strptime(date_string, '%Y %m %d %H')
return dt
这篇关于解析日期当年月日和小时在使用pandas在python的单独的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!