pandas DatetimeIndex与to_datetime的差异 [英] Pandas DatetimeIndex vs to_datetime discrepancies
问题描述
我正在尝试将Pandas系列的纪元时间戳转换为人类可以理解的时间.至少有两种明显的方法可以执行此操作:pd.DatetimeIndex
和pd.to_datetime()
.它们似乎以完全不同的方式工作:
I'm trying to convert a Pandas Series of epoch timestamps to human-readable times. There are at least two obvious ways to do this: pd.DatetimeIndex
and pd.to_datetime()
. They seem to work in quite different ways:
In [1]: import pandas as pd
In [3]: nanos = pd.Series([1462282258000000000, 1462282258100000000, 1462282258200000000])
In [4]: pd.to_datetime(nanos)
Out[4]:
0 2016-05-03 13:30:58.000
1 2016-05-03 13:30:58.100
2 2016-05-03 13:30:58.200
dtype: datetime64[ns]
In [5]: pd.DatetimeIndex(nanos)
Out[5]:
DatetimeIndex([ '2016-05-03 13:30:58', '2016-05-03 13:30:58.100000',
'2016-05-03 13:30:58.200000'],
dtype='datetime64[ns]', freq=None)
对于to_datetime()
,显示分辨率为毫秒,并且.000
整秒打印.使用DatetimeIndex
时,显示分辨率为微秒(我喜欢),但是整秒完全忽略了小数部分.
With to_datetime()
, the display resolution is milliseconds, and .000
is printed on whole seconds. With DatetimeIndex
, the display resolution is microseconds (which I like), but the decimal part is completely omitted on whole seconds.
然后,尝试转换时区:
In [12]: pd.DatetimeIndex(nanos).tz_localize('UTC')
Out[12]:
DatetimeIndex([ '2016-05-03 13:30:58+00:00',
'2016-05-03 13:30:58.100000+00:00',
'2016-05-03 13:30:58.200000+00:00'],
dtype='datetime64[ns, UTC]', freq=None)
In [13]: pd.to_datetime(nanos).tz_localize('UTC')
TypeError: index is not a valid DatetimeIndex or PeriodIndex
这很奇怪:时区函数不适用于普通的datetime系列,而只能用于DatetimeIndex.为什么会这样呢? tz_localize()
方法存在并在此处记录: http: //pandas.pydata.org/pandas-docs/stable/generation/pandas.Series.tz_localize.html
This is strange: the timezone functions don't work with a plain datetime Series, only with a DatetimeIndex. Why would that be? The tz_localize()
method exists and is documented here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.tz_localize.html
我尝试使用Pandas 0.17.0和0.18.1获得相同的结果.
I've tried Pandas 0.17.0 and 0.18.1 with the same results.
我不是要创建一个实际的索引,因此在其他条件相同的情况下,我本来应该使用to_datetime()
-我只是无法使用时区方法来工作.
I'm not trying to make an actual index, so all else being equal I would have expected to use to_datetime()
- I just can't get time zone methods to work with it.
推荐答案
有1种方法可以转换事物,pd.to_datetime()
,是的,您可以直接构造DatetimeIndex
,但是它是有限制的,而to_datetime
非常灵活.
There is 1 way to convert things, pd.to_datetime()
, yes you can directly construct a DatetimeIndex
, but it is restrictive on purpose, while to_datetime
is quite flexible.
因此,to_datetime
将为您提供与您输入的对象类似的对象,如果您输入类似数组的内容,则将得到DatetimeIndex
,输入Series
您将得到Series
.>
So to_datetime
will give you a similar object to what you input, if you input an array-like, then you will get a DatetimeIndex
, input a Series
you will get a Series
.
In [5]: nanos = [1462282258000000000, 1462282258100000000, 1462282258200000000]
默认情况下,它将转换为unit='ns'
并在此处排列
By default it will convert with a unit='ns'
which lines up here
In [7]: pd.to_datetime(nanos)
Out[7]: DatetimeIndex(['2016-05-03 13:30:58', '2016-05-03 13:30:58.100000', '2016-05-03 13:30:58.200000'], dtype='datetime64[ns]', freq=None)
所以我们可以做的一件事就是从中制作一个系列. 索引在这里是INTEGER,值是日期时间.
So one thing we could do is make a Series out of this. The index is INTEGER here, the values are Datetimes.
In [10]: s = Series(pd.to_datetime(nanos))
In [11]: s
Out[11]:
0 2016-05-03 13:30:58.000
1 2016-05-03 13:30:58.100
2 2016-05-03 13:30:58.200
dtype: datetime64[ns]
然后,您可以使用.dt
访问器对值进行操作. Series.tz_localize
对 index 进行操作.
You then can use the .dt
accessor to operate on the values. Series.tz_localize
operates on the index.
In [12]: s.dt.tz_localize('US/Eastern')
Out[12]:
0 2016-05-03 13:30:58-04:00
1 2016-05-03 13:30:58.100000-04:00
2 2016-05-03 13:30:58.200000-04:00
dtype: datetime64[ns, US/Eastern]
这篇关于 pandas DatetimeIndex与to_datetime的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!