在pandas DataFrame中存储纯python datetime.datetime [英] Storing pure python datetime.datetime in pandas DataFrame

查看:78
本文介绍了在pandas DataFrame中存储纯python datetime.datetime的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因为matplotlib不支持这两个 pandas.TimeStamp numpy.datetime64,其中有没有简单的解决方法,我决定将本机pandas date列转换为纯python datetime.datetime,以便分散绘制更容易.

但是:

t = pd.DataFrame({'date': [pd.to_datetime('2012-12-31')]})
t.dtypes # date    datetime64[ns], as expected
pure_python_datetime_array = t.date.dt.to_pydatetime() # works fine
t['date'] = pure_python_datetime_array # doesn't do what I hoped
t.dtypes # date    datetime64[ns] as before, no luck changing it

我猜想pandas会将to_pydatetime生成的纯python datetime自动转换为原始格式.我想这通常是很方便的行为,但是有没有办法覆盖它呢?

解决方案

pandas.Timestamp 仍然是日期时间子类: )

做图的一种方法是将日期时间转换为int64:

In [117]: t = pd.DataFrame({'date': [pd.to_datetime('2012-12-31'), pd.to_datetime('2013-12-31')], 'sample_data': [1, 2]})

In [118]: t['date_int'] = t.date.astype(np.int64)

In [119]: t
Out[119]: 
        date  sample_data             date_int
0 2012-12-31            1  1356912000000000000
1 2013-12-31            2  1388448000000000000

In [120]: t.plot(kind='scatter', x='date_int', y='sample_data')
Out[120]: <matplotlib.axes._subplots.AxesSubplot at 0x7f3c852662d0>

In [121]: plt.show()

另一种解决方法是(不要使用散点图,而是...):

In [126]: t.plot(x='date', y='sample_data', style='.')
Out[126]: <matplotlib.axes._subplots.AxesSubplot at 0x7f3c850f5750>

最后,解决方法:

In [141]: import matplotlib.pyplot as plt

In [142]: t = pd.DataFrame({'date': [pd.to_datetime('2012-12-31'), pd.to_datetime('2013-12-31')], 'sample_data': [100, 20000]})

In [143]: t
Out[143]: 
        date  sample_data
0 2012-12-31          100
1 2013-12-31        20000
In [144]: plt.scatter(t.date.dt.to_pydatetime()  , t.sample_data)
Out[144]: <matplotlib.collections.PathCollection at 0x7f3c84a10510>

In [145]: plt.show()

这在 github 处存在,该问题现已开放. /p>

Since matplotlib doesn't support eitherpandas.TimeStamp ornumpy.datetime64, and there are no simple workarounds, I decided to convert a native pandas date column into a pure python datetime.datetime so that scatter plots are easier to make.

However:

t = pd.DataFrame({'date': [pd.to_datetime('2012-12-31')]})
t.dtypes # date    datetime64[ns], as expected
pure_python_datetime_array = t.date.dt.to_pydatetime() # works fine
t['date'] = pure_python_datetime_array # doesn't do what I hoped
t.dtypes # date    datetime64[ns] as before, no luck changing it

I'm guessing pandas auto-converts the pure python datetime produced by to_pydatetime into its native format. I guess it's convenient behavior in general, but is there a way to override it?

解决方案

The use of to_pydatetime() is correct.

In [87]: t = pd.DataFrame({'date': [pd.to_datetime('2012-12-31'), pd.to_datetime('2013-12-31')]})

In [88]: t.date.dt.to_pydatetime()
Out[88]: 
array([datetime.datetime(2012, 12, 31, 0, 0),
       datetime.datetime(2013, 12, 31, 0, 0)], dtype=object)

When you assign it back to t.date, it automatically converts it back to datetime64

pandas.Timestamp is a datetime subclass anyway :)

One way to do the plot is to convert the datetime to int64:

In [117]: t = pd.DataFrame({'date': [pd.to_datetime('2012-12-31'), pd.to_datetime('2013-12-31')], 'sample_data': [1, 2]})

In [118]: t['date_int'] = t.date.astype(np.int64)

In [119]: t
Out[119]: 
        date  sample_data             date_int
0 2012-12-31            1  1356912000000000000
1 2013-12-31            2  1388448000000000000

In [120]: t.plot(kind='scatter', x='date_int', y='sample_data')
Out[120]: <matplotlib.axes._subplots.AxesSubplot at 0x7f3c852662d0>

In [121]: plt.show()

Another workaround is (to not use scatter, but ...):

In [126]: t.plot(x='date', y='sample_data', style='.')
Out[126]: <matplotlib.axes._subplots.AxesSubplot at 0x7f3c850f5750>

And, the last work around:

In [141]: import matplotlib.pyplot as plt

In [142]: t = pd.DataFrame({'date': [pd.to_datetime('2012-12-31'), pd.to_datetime('2013-12-31')], 'sample_data': [100, 20000]})

In [143]: t
Out[143]: 
        date  sample_data
0 2012-12-31          100
1 2013-12-31        20000
In [144]: plt.scatter(t.date.dt.to_pydatetime()  , t.sample_data)
Out[144]: <matplotlib.collections.PathCollection at 0x7f3c84a10510>

In [145]: plt.show()

This has an issue at github, which is open as of now.

这篇关于在pandas DataFrame中存储纯python datetime.datetime的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆