如何读取日期时间与时区在大 pandas [英] How to read datetime with timezone in pandas

查看:88
本文介绍了如何读取日期时间与时区在大 pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从csv创建一个数据框,其第一列如

 2013-08-25T00: 00:00-0400 
2013-08-25T01:00:00-0400;
2013-08-25T02:00:00-0400;
2013-08-25T03:00:00-0400;
2013-08-25T04:00:00-0400

它是带时区的datetime!我已经使用了一些类似

  df1 = DataFrame(pd.read_csv(PeriodC,sep =';',parse_dates = [0] ,index_col = 0))

但结果是

  2013-09-02 04:00:00 
2013-09-03 04:00:00
2013-09-04 04:00: 00
2013-09-05 04:00:00
2013-09-06 04:00:00
2013-09-07 04:00:00
2013-09 -08 04:00:00

任何人都可以解释我如何从时区分隔日期时间?

解决方案

熊猫解析器将考虑时区信息,如果可用,并给你一个天真的Timestamp(naive ==没有时区信息),但考虑到时区偏移。



为了保持时区信息DataFrame,您应该首先将时间戳定位为 UTC 然后将它们转换为他们的时区(在这种情况下为 Etc / GMT + 4 ):

 >>> df = pd.read_csv(PeriodC,sep =';',parse_dates = [0],index_col = 0)
>>> df.index [0]
>>>时间戳('2013-08-25 04:00:00',tz =无)
>>> df.index = df.index.tz_localize('UTC')。tz_convert('Etc / GMT + 4')
>>> df.index [0]
时间戳('2013-08-25 00:00:00-0400',tz ='Etc / GMT + 4')
pre>

如果要完全丢弃时区信息,那么只需指定一个 date_parser 即可将该字符串和只将datetime部分传递给解析器。

 >>> df = pd.read_csv(file,sep =';',parse_dates = [0],index_col = [0] 
date_parser = lambda x:pd.to_datetime(x.rpartition(' - ')[0]) )
>>> df.index [0]
时间戳记('2013-08-25 00:00:00',tz =无)


I am trying to create a dataframe from csv, and its first column is like

"2013-08-25T00:00:00-0400";
"2013-08-25T01:00:00-0400";
"2013-08-25T02:00:00-0400";
"2013-08-25T03:00:00-0400";
"2013-08-25T04:00:00-0400";

It's datetime with timezone ! I already used something like

df1 = DataFrame(pd.read_csv(PeriodC, sep=';', parse_dates=[0], index_col=0))

but the result was

2013-09-02 04:00:00                                                                                    
2013-09-03 04:00:00                                                                                     
2013-09-04 04:00:00                                                                                     
2013-09-05 04:00:00                                                                                      
2013-09-06 04:00:00                                                                                     
2013-09-07 04:00:00                                                                                     
2013-09-08 04:00:00

Can anyone explain me how to seperate the datetime from timezone ?

解决方案

Pandas parser will take into account the timezone information if it's available, and give you a naive Timestamp (naive == no timezone info), but with the timezone offset taken into account.

To keep the timezone information in you DataFrame you should first localize the Timestamps as UTC and then convert them to their timezone (which in this case is Etc/GMT+4):

>>> df = pd.read_csv(PeriodC, sep=';', parse_dates=[0], index_col=0)
>>> df.index[0]
>>> Timestamp('2013-08-25 04:00:00', tz=None)
>>> df.index = df.index.tz_localize('UTC').tz_convert('Etc/GMT+4')
>>> df.index[0]
Timestamp('2013-08-25 00:00:00-0400', tz='Etc/GMT+4')

If you want to completely discard the timezone information, then just specify a date_parser that will split the string and pass only the datetime portion to the parser.

>>> df = pd.read_csv(file, sep=';', parse_dates=[0], index_col=[0]
                     date_parser=lambda x: pd.to_datetime(x.rpartition('-')[0]))
>>> df.index[0]
Timestamp('2013-08-25 00:00:00', tz=None)

这篇关于如何读取日期时间与时区在大 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆