pandas 0.21.0 时间戳与 matplotlib 的兼容性问题 [英] pandas 0.21.0 Timestamp compatibility issue with matplotlib

查看:26
本文介绍了pandas 0.21.0 时间戳与 matplotlib 的兼容性问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚将 pandas 从 0.17.1 更新到 0.21.0 以利用一些新功能,并遇到了与 matplotlib(我也更新到最新的 2.1.0)的兼容性问题.尤其是 Timestamp 对象似乎发生了重大变化.

I just updated pandas from 0.17.1 to 0.21.0 to take advantage of some new functionalities, and ran into compatibility issue with matplotlib (which I also updated to latest 2.1.0). In particular, the Timestamp object seems to be changed significantly.

我碰巧有另一台机器仍在运行旧版本的 pandas(0.17.1)/matplotlib(1.5.1),我曾经比较过它们的差异:

I happen to have another machine still running the older versions of pandas(0.17.1)/matplotlib(1.5.1) which I used to compared the differences:

两个版本都显示我的 DataFrame 索引为 dtype='datetime64[ns]

Both versions show my DataFrame index to be dtype='datetime64[ns]

DatetimeIndex(['2017-03-13', '2017-03-14', ... '2017-11-17'], type='datetime64[ns]', name='dates', length=170, freq=None)

但是当调用 type(df.index[0]) 时,0.17.1 给出了 pandas.tslib.Timestamp 而 0.21.0 给出了 pandas._libs.tslib.Timestamp.

But when calling type(df.index[0]), 0.17.1 gives pandas.tslib.Timestamp and 0.21.0 gives pandas._libs.tslib.Timestamp.

当以 df.index 为 x 轴绘图时:

When plotting with df.index as x-axis:

plt.plot(df.index, df['data'])

matplotlibs 默认将 x 轴标签格式化为 pandas 0.17.1 的日期,但无法识别 pandas 0.21.0 的日期,并且仅给出原始数字 1.5e18(以纳秒为单位的纪元时间).

matplotlibs by default formats the x-axis labels as dates for pandas 0.17.1 but fails to recognize it for pandas 0.21.0 and simply gives raw number 1.5e18 (epoch time in nanosec).

我还有一个自定义的光标,它通过在 x 值上使用 matplotlib.dates.DateFormatter 来报告图表上的点击位置,但在 0.21.0 时失败了:

I also have a customized cursor that reports clicked location on the graph by using matplotlib.dates.DateFormatter on the x-value which fails for 0.21.0 with:

OverflowError: signed integer is greater than maximum

我可以在调试中看到 0.17.1 报告的 x 值约为 736500(即自 0 年以来的天数),但 0.21.0 约为 1.5e18(即纳秒纪元时间).

I can see in debug the reported x-value is around 736500 (i.e. day count since year 0) for 0.17.1 but is around 1.5e18 (i.e. nanosec epoch time) for 0.21.0.

我对 matplotlib 和 Pandas 之间的兼容性中断感到惊讶,因为它们显然被大多数人一起使用.我是否在为较新版本调用上面的绘图函数的方式中遗漏了什么?

I am surprised at this break of compatibility between matplotlib and pandas as they are obviously used together by most people. Am I missing something in the way I call the plot function above for the newer versions?

更新 正如我上面提到的,我更喜欢用给定的轴对象直接调用 plot 但只是为了它,我尝试调用 DataFrame 的 plot 方法本身df.plot().完成此操作后,所有后续绘图都会在同一 python 会话中正确识别时间戳.就好像设置了一个环境变量,因为我可以重新加载另一个 DataFrame 或使用 subplots 创建另一个轴,而 1.5e18 没有出现在哪里.这真的闻起来像一个错误,因为最新的熊猫文档说pandas:

Update as I mentioned above, I prefer directly calling plot with a given axes object but just for the heck of it, I tried calling the plot method of the DataFrame itself df.plot(). As soon as this is done, all subsequent plots correctly recognize the Timestamp within the same python session. It's as if an environment variable is set, because I can reload another DataFrame or create another axes with subplots and no where does the 1.5e18 show up. This really smells like a bug as the latest pandas doc says pandas:

The plot method on Series and DataFrame is just a simple wrapper around plt.plot()

但显然它对 python 会话做了一些事情,以便后续的绘图正确处理时间戳索引.

But clearly it does something to the python session such that subsequent plots deal with the Timestamp index properly.

事实上,只需运行上面pandas链接中的示例:

In fact, simply running the example at the above pandas link:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))

根据是否调用 ts.plot(),下图要么正确地将 x 轴格式化为日期:

Depending on whether ts.plot() is called or not, the following plot either correctly formats x-axis as dates or not:

plt.plot(ts.index,ts)
plt.show()

一旦调用成员图,随后在新系列或数据帧上调用 plt.plot 将正确自动格式化,而无需再次调用成员图方法.

Once a member plot is called, subsequently calling plt.plot on new Series or DataFrame will autoformat correctly without needing to call the member plot method again.

推荐答案

熊猫存在一个问题datetimes 和 matplotlib 来自最近发布的 pandas 0.21,它在导入时不再注册其转换器.一旦您使用这些转换器一次(在 Pandas 中),它们也会被 matplotlib 注册并自动使用.

There is an issue with pandas datetimes and matplotlib coming from the recent release of pandas 0.21, which does not register its converters any more at import. Once you use those converters once (within pandas) they'll be registered and automatically used by matplotlib as well.

一种解决方法是手动注册它们,

A workaround would be to register them manually,

import pandas.plotting._converter as pandacnv
pandacnv.register()

无论如何,这个问题在 Pandas 和 matplotlib 方面都是众所周知的,所以下一个版本会有一些修复.Pandas 正在考虑在即将发布的版本中读取寄存器.所以这个问题可能只是暂时的.还可以选择在不应发生这种情况的情况下恢复到 Pandas 0.20.x.

In any case the issue is well known at both pandas and matplotlib side, so there will be some kind of fix for the next releases. Pandas is thinking about readding the register in an upcomming release. So this issue may be there only temporarily. An option is also to revert to pandas 0.20.x where this should not occur.

更新:这不再是当前版本的 matplotlib (2.2.2)/pandas(0.23.1) 的问题,并且可能是自 2017 年 12 月左右发布以来的许多问题,当时已修复.

Update: this is no longer an issue with current versions of matplotlib (2.2.2)/pandas(0.23.1), and likely many that have been released since roughly December 2017, when this was fixed.

更新 2:从 pandas 0.24 或更高版本开始,推荐的转换器注册方式是

Update 2: As of pandas 0.24 or higher the recommended way to register the converters is

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

或者如果 pandas 已经作为 pd 导入,

or if pandas is already imported as pd,

pd.plotting.register_matplotlib_converters()

这篇关于pandas 0.21.0 时间戳与 matplotlib 的兼容性问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆