x 轴刻度作为日期 [英] x-Axis ticks as dates

查看:111
本文介绍了x 轴刻度作为日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要绘制一些数据,其中包括两列,一列是金额 count ,另一列是实际记录的日期.绘制此图时,由于我有超过 2000 个日期,因此最好不要将每个日期都显示为 x 轴上的勾号,否则将无法读取.但是,我很难用某种逻辑使日期显示在 x 轴上.我曾尝试使用 matplotlib 的内置刻度定位器,但它以某种方式不起作用.这是数据的预览:

PatientTraffic = pd.DataFrame({'count' : CleanData.groupby(TimeStamp").size()}).reset_index()显示(PatientTraffic.head(3000))时间戳计数0 2016-03-13 12:20:00 11 2016-03-13 13:39:00 12 2016-03-13 13:43:00 13 2016-03-13 16:00:00 14 2016-03-14 13:27:00 1……2088 2020-02-18 16:00:00 82089 2020-02-19 16:00:00 82090 2020-02-20 16:00:00 82091 2020-02-21 16:00:00 82092 2020-02-22 16:00:00 82093行×2列

当我使用以下设置进行绘制时:

  PatientTrafficPerTimeStamp = PatientTraffic.plot.bar(x='时间戳',y='计数',figsize =(20,3),title =随时间推移的患者流量";)PatientTrafficPerTimeStamp.xaxis.set_major_locator(plt.MaxNLocator(3))

我希望获得一个条形图,其中 x 轴具有三个刻度,一个在中间的开始和结束处...也许我使用的是错误的.此外,似乎出现的刻度只是列中的前 3 个,这不是我想要的.任何帮助将不胜感激!

解决方案

您可能认为您有一个问题,但实际上有两个问题-两者都基于您使用便利功能的事实.您很可能没有意识到的问题是 Pandas 将条形图绘制为分类数据.这在大多数情况下是有意义的,但显然不是,如果您将 TimeStamp 数据作为 x 轴.让我们看看我是否没有弥补:

 将matplotlib.pyplot导入为plt将熊猫作为pd导入fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))df = pd.read_csv("test.txt",sep ="\ s {2,}",引擎="python")#将TS从字符串转换为日期时间对象df.TS = pd.to_datetime(df.TS,格式=%Y-%m-%d%H:%M:%S")#并直接从将数据提供给matplotlib的熊猫中进行绘制df.plot.bar(x=TS",y ="Val",ax = ax1,标题=熊猫版")#now使用matplotlib绘制相同的数据ax2.bar(df.TS, df.Val, 宽度=22)ax2.tick_params(axis ="x",labelrotation = 90)ax2.set_title("matplotlib版本")plt.tight_layout()plt.show()

示例输出:

因此,我们应该直接从matplotlib中绘制它们,以防止丢失TimeStamp信息.显然,我们失去了熊猫提供的一些舒适度,例如,我们必须调整条的宽度并标记轴.现在,您可以使用 MaxNLocator 的其他便利功能,但正如您注意到的那样,该功能已被编写为适用于大多数情况,但您放弃了对刻度的确切位置的控制.为什么不使用

在这里,刻度线从最低值开始,到最高值结束.另外,您可以使用 LinearLocator 将刻度线均匀地分布在整个视图中:

from matplotlib.ticker import LinearLocator...ax2.bar(df.TS,df.Val,width = 22)ax2.set_title("matplotlib版本")ax2.xaxis.set_major_locator(LinearLocator(numticks = 5))ax2.tick_params(axis ="x",labelrotation = 90)...

样本输出:

样本数据存储在具有以下结构的文件中:

TS Val0 2016-03-13 12:20:00 11 2016-04-13 13:39:00 32 2016-04-03 13:43:00 53 2016-06-17 16:00:00 14 2016-09-14 13:27:00 22088 2017-02-08 16:00:00 72089 2017-02-25 16:00:00 22090 2018-02-20 16:00:00 82091 2019-02-21 16:00:00 92092 2020-02-22 16:00:00 8

I have some data I would like to plot consisting of two columns, one being an amount count and the other column being the actually date recorded. When plotting this, since I have over 2000 dates, it makes more sense to not show every single date as a tick on the x-axis, otherwise it won't be readable. However, I am having a hard time making the dates show up on the x-axis with some kind of logic. I have tried using the in-built tick locators for matplotlib but it's not working somehow. Here is a preview of the data:

PatientTraffic = pd.DataFrame({'count' : CleanData.groupby("TimeStamp").size()}).reset_index()
display(PatientTraffic.head(3000))

TimeStamp   count
0   2016-03-13 12:20:00 1
1   2016-03-13 13:39:00 1
2   2016-03-13 13:43:00 1
3   2016-03-13 16:00:00 1
4   2016-03-14 13:27:00 1
... ... ...
2088    2020-02-18 16:00:00 8
2089    2020-02-19 16:00:00 8
2090    2020-02-20 16:00:00 8
2091    2020-02-21 16:00:00 8
2092    2020-02-22 16:00:00 8
2093 rows × 2 columns

and when I go to plot it with these settings:

PatientTrafficPerTimeStamp = PatientTraffic.plot.bar(
        x='TimeStamp', 
        y='count',
        figsize=(20,3),
        title = "Patient Traffic over Time"
        
    )
PatientTrafficPerTimeStamp.xaxis.set_major_locator(plt.MaxNLocator(3))

I expect to get a bar chart where the x-axis has three ticks, one in the beginning middle and end...maybe I'm using this wrong. Also, it seems like the ticks that appear are simply the first 3 in the column which is not what I want. Any help would be appreciated!

解决方案

You probably think that you have one problem, but you actually have two - and both are based on the fact that you use convenience functions. The problem that you are most likely not aware of is that pandas plots bars as categorical data. This makes sense under most conditions but obviously not, if you have TimeStamp data as your x-axis. Let's see if I didn't make that up:

import matplotlib.pyplot as plt
import pandas as pd

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
df = pd.read_csv("test.txt", sep = "\s{2,}", engine="python")
#convert TS from string into datetime objects
df.TS = pd.to_datetime(df.TS, format="%Y-%m-%d %H:%M:%S")

#and plot it as you do directly from pandas that provides the data to matplotlib
df.plot.bar(
        x="TS", 
        y="Val",
        ax=ax1,
        title="pandas version"    
    )

#now plot the same data using matplotlib
ax2.bar(df.TS, df.Val, width=22)
ax2.tick_params(axis="x", labelrotation=90)
ax2.set_title("matplotlib version")    

plt.tight_layout()
plt.show()

Sample output:

So, we should plot them directly from matplotlib to prevent losing the TimeStamp information. Obviously, we lose some comfort provided by pandas, e.g., we have to adjust the width of the bars and label the axes. Now, you could use the other convenience function of MaxNLocatorbut as you noticed that has been written to work well for most conditions but you give up control over the exact positioning of the ticks. Why not write our own locator using FixedLocator?

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FixedLocator
import pandas as pd

def myownMaxNLocator(datacol, n):
    datemin = mdates.date2num(datacol.min())
    datemax = mdates.date2num(datacol.max())
    xticks = np.linspace(datemin, datemax, n)
    return xticks


fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
df = pd.read_csv("test.txt", sep = "\s{2,}", engine="python")
df.TS = pd.to_datetime(df.TS, format="%Y-%m-%d %H:%M:%S")
    
df.plot.bar(
        x="TS", 
        y="Val",
        ax=ax1,
        title="pandas version"    
    )

ax2.bar(df.TS, df.Val, width=22)
ax2.set_title("matplotlib version")
dateticks = myownMaxNLocator(df.TS, 5)
ax2.xaxis.set_major_locator(FixedLocator(dateticks))
ax2.tick_params(axis="x", labelrotation=90)

plt.tight_layout()
plt.show()

Sample output:

Here, the ticks start with the lowest value and end with the highest value. Alternatively, you could use the LinearLocator that distributes the ticks evenly over the entire view:

from matplotlib.ticker import LinearLocator
...
ax2.bar(df.TS, df.Val, width=22)
ax2.set_title("matplotlib version")
ax2.xaxis.set_major_locator(LinearLocator(numticks=5))
ax2.tick_params(axis="x", labelrotation=90)
...

Sample output:

The sample data were stored in a file with the following structure:

TS   Val
0   2016-03-13 12:20:00  1
1   2016-04-13 13:39:00  3
2   2016-04-03 13:43:00  5
3   2016-06-17 16:00:00  1
4   2016-09-14 13:27:00  2
2088    2017-02-08 16:00:00  7
2089    2017-02-25 16:00:00  2
2090    2018-02-20 16:00:00  8
2091    2019-02-21 16:00:00  9
2092    2020-02-22 16:00:00  8

这篇关于x 轴刻度作为日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆