如何读取日期时间与时区在大 pandas [英] How to read datetime with timezone in pandas
问题描述
我正在尝试从csv创建一个数据框,其第一列如
2013-08-25T00: 00:00-0400
2013-08-25T01:00:00-0400;
2013-08-25T02:00:00-0400;
2013-08-25T03:00:00-0400;
2013-08-25T04:00:00-0400
它是带时区的datetime!我已经使用了一些类似
df1 = DataFrame(pd.read_csv(PeriodC,sep =';',parse_dates = [0] ,index_col = 0))
但结果是
2013-09-02 04:00:00
2013-09-03 04:00:00
2013-09-04 04:00: 00
2013-09-05 04:00:00
2013-09-06 04:00:00
2013-09-07 04:00:00
2013-09 -08 04:00:00
任何人都可以解释我如何从时区分隔日期时间?
熊猫解析器将考虑时区信息,如果可用,并给你一个天真的Timestamp(naive ==没有时区信息),但考虑到时区偏移。
为了保持时区信息DataFrame,您应该首先将时间戳定位为 UTC
然后将它们转换为他们的时区(在这种情况下为 Etc / GMT + 4
):
>>> df = pd.read_csv(PeriodC,sep =';',parse_dates = [0],index_col = 0)
pre>
>>> df.index [0]
>>>时间戳('2013-08-25 04:00:00',tz =无)
>>> df.index = df.index.tz_localize('UTC')。tz_convert('Etc / GMT + 4')
>>> df.index [0]
时间戳('2013-08-25 00:00:00-0400',tz ='Etc / GMT + 4')
如果要完全丢弃时区信息,那么只需指定一个
date_parser
即可将该字符串和只将datetime部分传递给解析器。>>> df = pd.read_csv(file,sep =';',parse_dates = [0],index_col = [0]
date_parser = lambda x:pd.to_datetime(x.rpartition(' - ')[0]) )
>>> df.index [0]
时间戳记('2013-08-25 00:00:00',tz =无)
I am trying to create a dataframe from csv, and its first column is like
"2013-08-25T00:00:00-0400"; "2013-08-25T01:00:00-0400"; "2013-08-25T02:00:00-0400"; "2013-08-25T03:00:00-0400"; "2013-08-25T04:00:00-0400";
It's datetime with timezone ! I already used something like
df1 = DataFrame(pd.read_csv(PeriodC, sep=';', parse_dates=[0], index_col=0))
but the result was
2013-09-02 04:00:00 2013-09-03 04:00:00 2013-09-04 04:00:00 2013-09-05 04:00:00 2013-09-06 04:00:00 2013-09-07 04:00:00 2013-09-08 04:00:00
Can anyone explain me how to seperate the datetime from timezone ?
解决方案Pandas parser will take into account the timezone information if it's available, and give you a naive Timestamp (naive == no timezone info), but with the timezone offset taken into account.
To keep the timezone information in you DataFrame you should first localize the Timestamps as
UTC
and then convert them to their timezone (which in this case isEtc/GMT+4
):>>> df = pd.read_csv(PeriodC, sep=';', parse_dates=[0], index_col=0) >>> df.index[0] >>> Timestamp('2013-08-25 04:00:00', tz=None) >>> df.index = df.index.tz_localize('UTC').tz_convert('Etc/GMT+4') >>> df.index[0] Timestamp('2013-08-25 00:00:00-0400', tz='Etc/GMT+4')
If you want to completely discard the timezone information, then just specify a
date_parser
that will split the string and pass only the datetime portion to the parser.>>> df = pd.read_csv(file, sep=';', parse_dates=[0], index_col=[0] date_parser=lambda x: pd.to_datetime(x.rpartition('-')[0])) >>> df.index[0] Timestamp('2013-08-25 00:00:00', tz=None)
这篇关于如何读取日期时间与时区在大 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!