与matplotlib的pandas 0.21.0时间戳兼容性问题 [英] pandas 0.21.0 Timestamp compatibility issue with matplotlib

查看:246
本文介绍了与matplotlib的pandas 0.21.0时间戳兼容性问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚将熊猫从0.17.1更新到0.21.0以利用一些新功能,并遇到了与matplotlib的兼容性问题(我也将其更新为最新的2.1.0).特别是,时间戳记对象似乎已发生重大变化.

我碰巧有一台仍在运行旧版本的pandas(0.17.1)/matplotlib(1.5.1)的计算机,我用来比较它们之间的差异:

两个版本都显示我的DataFrame索引为dtype='datetime64[ns]

DatetimeIndex(['2017-03-13', '2017-03-14', ... '2017-11-17'], type='datetime64[ns]', name='dates', length=170, freq=None)

但是调用type(df.index[0])时,0.17.1给出pandas.tslib.Timestamp,而0.21.0给出pandas._libs.tslib.Timestamp.

df.index作为x轴进行绘制时:

plt.plot(df.index, df['data'])

matplotlibs默认将x轴标签的格式设置为大熊猫0.17.1的日期,但无法识别大熊猫0.21.0的日期,仅给出原始数字1.5e18(以纳秒为单位的时间).

我还有一个自定义的游标,它通过使用x值上的matplotlib.dates.DateFormatter报告图表上的单击位置,该值在0.21.0上失败,

OverflowError: signed integer is greater than maximum

我可以在调试中看到报告的x值对于0.17.1约为736500(即从0年开始的天数),而对于0.21.0约为1.5e18(即纳秒纪元时间).

对于matplotlib和熊猫之间的兼容性中断,我感到很惊讶,因为大多数人显然将它们一起使用.对于新版本,我在调用上面的plot函数时是否缺少某些东西?

如上所述,

Update ,我更喜欢直接使用给定的axes对象调用plot,但是仅仅为了它的内容,我尝试调用DataFrame本身的df.plot()的plot方法.完成此操作后,所有后续绘图都可以在同一python会话中正确识别时间戳记.就像设置了一个环境变量一样,因为我可以使用subplots重新加载另一个DataFrame或创建另一个轴,并且1.5e18不在哪里显示.最新的熊猫文档说 pandas,这确实闻起来像是个错误. :

The plot method on Series and DataFrame is just a simple wrapper around plt.plot()

但是很明显,它对python会话有所帮助,以便后续绘图可以正确处理Timestamp索引.

实际上,只需在上面的pandas链接上运行示例:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))

根据是否调用ts.plot(),以下图将x轴正确设置为日期格式:

plt.plot(ts.index,ts)
plt.show()

一旦调用了成员图,随后在新Series或DataFrame上调用plt.plot即可正确格式化,而无需再次调用成员图方法.

解决方案

有一个带有熊猫的问题datetimes和matplotlib 来自最新发布的pandas 0.21,后者在导入时不再注册其转换器.一旦您(在熊猫内)使用了这些转换器,它们就会被matplotlib注册并自动使用.

一种解决方法是手动注册它们,

import pandas.plotting._converter as pandacnv
pandacnv.register()

无论如何,此问题在pandas和matplotlib方面都是众所周知的,因此在下一发行版中将进行某种修复.熊猫正在考虑在即将发布的版本中读取寄存器.因此,这个问题可能只是暂时存在的.还可以选择将其恢复为不应出现的熊猫0.20.x.

更新:这不再是当前版本的matplotlib(2.2.2)/pandas(0.23.1)的问题,而且很可能是自2017年12月左右发布的许多版本,是固定的.

更新2:从熊猫0.24或更高版本开始,建议的转换器注册方式为

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

,或者如果pandas已经作为pd导入,

pd.plotting.register_matplotlib_converters()

I just updated pandas from 0.17.1 to 0.21.0 to take advantage of some new functionalities, and ran into compatibility issue with matplotlib (which I also updated to latest 2.1.0). In particular, the Timestamp object seems to be changed significantly.

I happen to have another machine still running the older versions of pandas(0.17.1)/matplotlib(1.5.1) which I used to compared the differences:

Both versions show my DataFrame index to be dtype='datetime64[ns]

DatetimeIndex(['2017-03-13', '2017-03-14', ... '2017-11-17'], type='datetime64[ns]', name='dates', length=170, freq=None)

But when calling type(df.index[0]), 0.17.1 gives pandas.tslib.Timestamp and 0.21.0 gives pandas._libs.tslib.Timestamp.

When plotting with df.index as x-axis:

plt.plot(df.index, df['data'])

matplotlibs by default formats the x-axis labels as dates for pandas 0.17.1 but fails to recognize it for pandas 0.21.0 and simply gives raw number 1.5e18 (epoch time in nanosec).

I also have a customized cursor that reports clicked location on the graph by using matplotlib.dates.DateFormatter on the x-value which fails for 0.21.0 with:

OverflowError: signed integer is greater than maximum

I can see in debug the reported x-value is around 736500 (i.e. day count since year 0) for 0.17.1 but is around 1.5e18 (i.e. nanosec epoch time) for 0.21.0.

I am surprised at this break of compatibility between matplotlib and pandas as they are obviously used together by most people. Am I missing something in the way I call the plot function above for the newer versions?

Update as I mentioned above, I prefer directly calling plot with a given axes object but just for the heck of it, I tried calling the plot method of the DataFrame itself df.plot(). As soon as this is done, all subsequent plots correctly recognize the Timestamp within the same python session. It's as if an environment variable is set, because I can reload another DataFrame or create another axes with subplots and no where does the 1.5e18 show up. This really smells like a bug as the latest pandas doc says pandas:

The plot method on Series and DataFrame is just a simple wrapper around plt.plot()

But clearly it does something to the python session such that subsequent plots deal with the Timestamp index properly.

In fact, simply running the example at the above pandas link:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))

Depending on whether ts.plot() is called or not, the following plot either correctly formats x-axis as dates or not:

plt.plot(ts.index,ts)
plt.show()

Once a member plot is called, subsequently calling plt.plot on new Series or DataFrame will autoformat correctly without needing to call the member plot method again.

解决方案

There is an issue with pandas datetimes and matplotlib coming from the recent release of pandas 0.21, which does not register its converters any more at import. Once you use those converters once (within pandas) they'll be registered and automatically used by matplotlib as well.

A workaround would be to register them manually,

import pandas.plotting._converter as pandacnv
pandacnv.register()

In any case the issue is well known at both pandas and matplotlib side, so there will be some kind of fix for the next releases. Pandas is thinking about readding the register in an upcomming release. So this issue may be there only temporarily. An option is also to revert to pandas 0.20.x where this should not occur.

Update: this is no longer an issue with current versions of matplotlib (2.2.2)/pandas(0.23.1), and likely many that have been released since roughly December 2017, when this was fixed.

Update 2: As of pandas 0.24 or higher the recommended way to register the converters is

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

or if pandas is already imported as pd,

pd.plotting.register_matplotlib_converters()

这篇关于与matplotlib的pandas 0.21.0时间戳兼容性问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆