pandas vs Matplotlib日期时间 [英] Pandas vs matplotlib datetime

查看:81
本文介绍了 pandas vs Matplotlib日期时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在该网站上阅读了许多有关datetime和Timestamp以及matplotlib date2num等的问题.但是,我对绘制某些数据的正确"方法感到好奇.假设我有一个数据框,索引为Pandas DateTimeIndex.我可以直接使用pandas或使用matplotlib绘制数据:

I've read a number of the questions on this site about datetime and Timestamp and matplotlib date2num, etc. However, I'm curious about what the "correct" way to plot some data is. Say I have a dataframe with the index being a Pandas DateTimeIndex. I can plot the data with pandas directly or with matplotlib:

print(dt.index)
# = DatetimeIndex(['2018-01-01 20:00:00', ..., '2018-01-03 04:00:00'],
#                 dtype='datetime64[ns]',
#                 name=u'DateTime',
#                 length=385,
#                 freq=None)

my_axis.plot(df)
print(my_axis.get_xlim())  # = (736695.72708333354, 736697.14791666681)

# vs 

df.plot(ax=my_axis)
print(my_axis.get_xlim())  # = (25247280.0, 25249200.0)

但是,它们之间"x轴"的范围完全不同.如果我混合绘图(我需要直接将matplotlib用于broken_barh),那么我看不到所有数据,因为它们具有如此不同的x坐标.是否有公认的最佳做法?

However, the range for the "x axis" is totally different between them. If I mix plotting (I need to use matplotlib directly for broken_barh), then I don't see all of the data since they have such different x coordinates. Is there an accepted best practice for this?

编辑以在下面添加工作示例

如果需要,我愿意接受升级版本.我尝试过:

I'm open to upgrading versions if needed. I've tried with:

# Python2 Versions:
Python: 2.7.14
Numpy: 1.13.3
Pandas: 0.20.3
Matplotlib: 2.0.0

# Python3 Version (same results)
Python: 3.6.3
Numpy: 1.12.1
Pandas: 0.19.2
Matplotlib: 2.0.0

如果我仅使用熊猫绘制x和y,则它们都可以正确显示.如果我仅使用matplotlib,则它们都会正确显示.但是,如果我尝试用熊猫绘制一个,而用matplotlib绘制另一个,则它们不起作用(请参见底部的图像).我的偏好是通常"使用熊猫,这样我在使用matplotlib进行绘图时只需要编辑DateTime索引.我对此进行了两次评论尝试,但均未成功.

If I only use pandas to plot x and y, then both of them show up correctly. If I only use matplotlib, then they both show up correctly. However, if I try to plot one with pandas and the other with matplotlib, then they don't work (See image at bottom). My preference would be to "normally" use pandas, so that I only have to edit the DateTime index when plotting with matplotlib. I included two commented attempts at this, neither of which worked.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

start = '2018-01-02 03:00:00'
end = '2018-01-02 011:00:00'

data = pd.DataFrame({'DateTime': pd.date_range(start=start, end=end, freq='1H'),
                     'x': [1,2,3,4,5,4,3,2,1],
                     'y': [5,4,3,2,1,2,3,4,5]})
data = data.set_index('DateTime')
#print(data)

ax0 = plt.subplot(211)
ax1 = plt.subplot(212, sharex=ax0)

# Pandas for both
data['x'].plot(ax=ax0)
#data['y'].plot(ax=ax1)

# Matplotlib for both
#ax0.plot(data.index, data['x'])
ax1.plot(data.index, data['y'])

# Other attempts to make matplotlib plot work with pandas
# (but they produce same image as below)
#ax1.plot([mdates.date2num(d) for d in data.index], data['y'])
#ax1.plot(data.index.to_pydatetime(), data['y'])

plt.savefig('test.png')

推荐答案

matplotlib和pandas日期图中的数据单元完全不同.您可以通过不共享任何轴并打印轴限制来查找.

The data units in matplotlib and pandas date plots are completely different. You may find out by not sharing any axes and printing the axis limits.

import pandas as pd
import matplotlib.pyplot as plt

start = '2018-01-02 03:00:00'
end = '2018-01-02 011:00:00'

data = pd.DataFrame({'DateTime': pd.date_range(start=start, end=end, freq='1H'),
                     'x': [1,2,3,4,5,4,3,2,1],
                     'y': [5,4,3,2,1,2,3,4,5]})
data = data.set_index('DateTime')

ax0 = plt.subplot(211)
ax1 = plt.subplot(212)

# Pandas
data['x'].plot(ax=ax0)
# Matplotlib
ax1.plot(data.index, data['y'])

print ax0.get_xlim()  # (420795.0, 420803.0)
print ax1.get_xlim()  # (736696.10833333328, 736696.47500000009)

plt.show()

因此很明显,如果在一个轴上绘制范围(420795.0, 420803.0)中的值,而在另一个轴上绘制范围(736696.108, 736696.475)中的值,则不能共享轴(sharex=ax0).

It is hence clear that you cannot share the axes (sharex=ax0) if you plot on the one axis values in the range (420795.0, 420803.0) and values in the range (736696.108, 736696.475) on the other one.

因此,如果出于任何原因需要在共享轴之一上使用matplotlib图,那么对于所有其他共享轴也需要使用matplotlib.

So if for any reason you need to use a matplotlib plot on one of the shared axes, you need to use matplotlib for all other shared axes as well.

这篇关于 pandas vs Matplotlib日期时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆