pandas 中的时间序列绘图不一致 [英] Time-series plotting inconsistencies in Pandas

查看:90
本文介绍了 pandas 中的时间序列绘图不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个数据框df,其中df.indexdatetime个对象组成,例如

Say I have a dataframe df where df.index consists of datetime objects, e.g.

> df.index[0]
datetime.date(2014, 5, 5)

如果我将其绘制,Pandas会很好地保留图中的datetime类型,这使用户可以更改时间序列采样以及该图的格式设置选项:

If I plot it Pandas nicely preserves the datetime type in the plot, which allows the user to change the time-series sampling as well formatting options of the plot:

  # Plot the dataframe:
  f     = plt.figure(figsize=(8,8))
  ax    = f.add_subplot(1,1,1)
  lines = df.plot(ax=ax)

  # Choose the sampling rate in terms of dates:
  ax.xaxis.set_major_locator(matplotlib.dates.WeekdayLocator(byweekday=(0,1,2,3,4,5,6),
                                                            interval=1))

  # We can also re-sample the X axis numerically if we want (e.g. every 4 steps):
  N = 4

  ticks      = ax.xaxis.get_ticklocs()
  ticklabels = [l.get_text() for l in ax.xaxis.get_ticklabels()]

  ax.xaxis.set_ticks(ticks[-1::-N][::-1])
  ax.xaxis.set_ticklabels(ticklabels[-1::-N][::-1])

  # Choose a date formatter using a date-friendly syntax:
  ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b\n%d'))

  plt.show()

但是,以上内容对于boxplot无效(的标记不显示). :

However, the above does not work for a boxplot (the tick labels for the x axis are rendered empty) :

df2.boxplot(column='A', by='created_dt',ax=ax, sym="k.")

# same code as above ...

在上一个示例中,Pandas似乎将x轴标签转换为 string 类型,因此格式化程序和定位器不再起作用.

It looks like in the last example, Pandas converts the x-axis labels into string type, so the formatter and locators don't work anymore.

这篇文章重用了以下线程的解决方案:

This post re-uses solutions from the following threads:

  1. Pandas时间序列图设置x轴主要和次要刻度线和标签的可接受答案
  2. 熊猫:条形图xtick频率的接受的答案
  1. Accepted answer to Pandas timeseries plot setting x-axis major and minor ticks and labels
  2. Accepted answer to Pandas: bar plot xtick frequency

为什么?如何以允许我使用matplotlib日期定位符和格式化程序的方式使用boxplot?

Why? How can I use boxplot in a way that allows me to use matplotlib date locators and formatters?

推荐答案

不,实际上即使是折线图也无法正常工作,如果显示年份,您会注意到问题所在:而不是下面的2000例如,小提琴是在1989年.

No, actually even the line plot is not working correctly, if you have the year show up, you will notice the problem: instead of being 2000 in the following example, the xticks are in 1989.

In [49]:
df=pd.DataFrame({'Val': np.random.random(50)})
df.index=pd.date_range('2000-01-02', periods=50)
f     = plt.figure()
ax    = f.add_subplot(1,1,1)
lines = df.plot(ax=ax)
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%y%b\n%d'))
print ax.get_xlim()
(10958.0, 11007.0)

In [50]:
matplotlib.dates.strpdate2num('%Y-%M-%d')('2000-01-02')
Out[50]:
730121.0006944444
In [51]:
matplotlib.dates.num2date(730121.0006944444)
Out[51]:
datetime.datetime(2000, 1, 2, 0, 1, tzinfo=<matplotlib.dates._UTC object at 0x051FA9F0>)

结果日期时间数据在pandasmatplotlib中的处理方式不同:在后者中,2000-1-2应该是730121.0006944444,而不是pandas

Turns out datetime data is handled differently in pandas and matplotlib: in the latter, 2000-1-2 should be 730121.0006944444, instead of 10958.0 in pandas

为正确起见,我们需要避免使用pandasplot方法:

To get it right we need to avoid using pandas's plot method:

In [52]:
plt.plot_date(df.index.to_pydatetime(), df.Val, fmt='-')
ax=plt.gca()
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%y%b\n%d'))

barplot类似:

In [53]:
plt.bar(df.index.to_pydatetime(), df.Val, width=0.4)
ax=plt.gca()
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%y%b\n%d'))

这篇关于 pandas 中的时间序列绘图不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆