解析日期当年月日和小时在使用pandas在python的单独的列 [英] Parse dates when year month day and hour are in separate columns using pandas in python

查看:2801
本文介绍了解析日期当年月日和小时在使用pandas在python的单独的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读后
在YYYYMMDD和HH在Python中使用pandas在不同的列中解析日期

使用python pandas解析CSV格式的日期格式为年,日,小时, / a>

After reading Parse dates when YYYYMMDD and HH are in separate columns using pandas in Python and Using python pandas to parse CSV with date in format Year, Day, Hour, Min, Sec

我仍然无法解析年份,月份,日期和时间的分隔列。我的数据看起来像这样(第零列是ID,第一是年,第二是月,第三是日,第四是小时,第五是值)

I still am not able to parse dates with separated columns for year, month, day and hour. My data looks like this (zeroth column is ID, first is year, second is month, third is day, fourth is hour and fifth is value)

50136   2011    1   1   21  9792    
50136   2011    1   1   22  9794    
50136   2011    1   1   23  9796    
50136   2011    1   1   0   9798    
50136   2011    1   1   1   9799    
50136   2011    1   1   2   9802

我试过以下:
df = pd.read_csv(file,parse_dates = {'date':[1,2,3,4]},,index_col ='date')获取索引不是时间戳而是作为unicode(?)

I've tried following: df = pd.read_csv(file, parse_dates = {'date': [1, 2, 3, 4]}, , index_col='date'), but then I get index not as timestamp but as unicode(?)

In  [17]: print df.head()
Out [17]:
                 0     5
date                    
2011 1 1 21  50136  9792
2011 1 1 22  50136  9794
2011 1 1 23  50136  9796
2011 1 1 0   50136  9798
2011 1 1 1   50136  9799

In  [18]: print df.index
Out [18]:
Index([u'2011 1 1 21', u'2011 1 1 22', u'2011 1 1 23', u'2011 1 1 0', u'2011 1 1 1', u'2011 1 1 2'], dtype=object)

我显然做错了事,但我不知道。

I'm obviously doing something wrong, but I can't figure it out. Any advise is really appreciated.

推荐答案

如果常规方法不工作,你总是可以自己编写自己的解析器。创建一个接受 parse_dates 中的列并返回 datetime 的函数并添加 date_parser

If the regular methods dont work you can always fallback on writing your own parser. Make a function which accepts the columns from parse_dates and returns a datetime and add that functions with date_parser.

像这样:

df = pd.read_csv(file, header=None, index_col='datetime', 
                 parse_dates={'datetime': [1,2,3,4]}, 
                 date_parser=lambda x: pd.datetime.strptime(x, '%Y %m %d %H'))

返回:

                         0     5
datetime                        
2011-01-01 21:00:00  50136  9792
2011-01-01 22:00:00  50136  9794
2011-01-01 23:00:00  50136  9796
2011-01-01 00:00:00  50136  9798
2011-01-01 01:00:00  50136  9799
2011-01-01 02:00:00  50136  9802



编辑:



也许它更清楚,如果你写一个正常的函数,而不是一个lambda:

edit:

Perhaps its more clear if you write it like a normal function instead of a lambda:

def dt_parse(date_string):

    dt = pd.datetime.strptime(date_string, '%Y %m %d %H')

    return dt

这篇关于解析日期当年月日和小时在使用pandas在python的单独的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆