如何手动选择在 pandas 中绘制的x轴标签(日期) [英] How to manually select which x-axis label(Dates) gets plotted in pandas

查看:120
本文介绍了如何手动选择在 pandas 中绘制的x轴标签(日期)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,很抱歉,如果我没有正确描述问题,但是示例可以使我的问题清楚。

First of all I am sorry if I am not describing the problem correctly but the example should make my issue clear.

我有此数据框,需要绘制它按日期排序,但是我有很多日期(大约60个),因此熊猫会自动选择在x轴上绘制(标注)的日期,并且日期是随机的。由于可见性问题,我也只想在x轴上绘制选定的日期,但我希望它具有某种类似于每年一月的模式。

I have this dataframe and I need to plot it sorted by date, but I have lots of date (around 60), therefore pandas automatically chooses which date to plot(label) in x-axis and the dates are random. Due to visibility issue I too want to only plot selected dates in x-axis but I want it to have some pattern like january every year.

这是我的代码:

df = pd.read_csv('dbo.Access_Stat_all.csv',error_bad_lines=False, usecols=['Range_Start','Format','Resource_ID','Number'])
df1 = df[df['Resource_ID'] == 32543]
df1 = df1[['Format','Range_Start','Number']]
df1["Range_Start"] = df1["Range_Start"].str[:7]
df1 = df1.groupby(['Format','Range_Start'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()
if df1.index.contains('entry'):
    df2 = df1[1:4].sum(axis=0)
else:
    df2 = df1[0:3].sum(axis=0)
df2.name = 'sum'
df2 = df1.append(df2)
print(df2)
df2.to_csv('test.csv', sep="\t", float_format='%.f')
if df2.index.contains('entry'):
    df2.T[['entry','sum']].plot(rot = 30)
else:
    df2.T[['sum']].plot(kind = 'bar')
ax1 = plt.axes()
ax1.legend(["Seitenzugriffe", "Dateiabrufe"])
plt.xlabel("")
plt.savefig('image.png')

您可以看到该情节具有2010-08、2013-09、2014-07 x轴值。我怎么能使它像2010-01、2013-01、2014-01等

As you can see the plot has 2010-08, 2013-09, 2014-07 as the x-axis value. How can I make it something like 2010-01, 2013-01, 2014-01 e.t.c

非常感谢,我知道这不是最佳的描述,但是因为英语不是我的母语,这是我能想到的最好的语言。

Thank you very much, I know this is not the optimal description but since english is not my first language this is the best I could come up with.

推荐答案

注意:已更新为回答OP问题

您正在混合使用Pandas绘图和 matplotlib PyPlot API 使用 ax1 以上)和 plt 方法。后者是两个截然不同的API,并且在混合使用时可能无法正常工作。 matplotlib 文档建议使用面向对象的API。

You are mixing Pandas plotting as well as the matplotlib PyPlot API and Object-oriented API by using axes (ax1 above) methods and plt methods. The latter are two distinctly different APIs and they may not work correctly when mixed. The matplotlib documentation recommends using the object-oriented API.


尽管使用 matplotlib.pyplot 模块可以快速生成图,但是我们建议使用面向对象的方法来更好地控制和自定义图。有关许多相同的绘图功能,请参见 matplotlib.axes.Axes()类中的方法。有关Matplotlib的OO方法的示例,请参见API示例。

While it is easy to quickly generate plots with the matplotlib.pyplot module, we recommend using the object-oriented approach for more control and customization of your plots. See the methods in the matplotlib.axes.Axes() class for many of the same plotting functions. For examples of the OO approach to Matplotlib, see the API Examples.

这是控制x轴刻度值的方法/ labels使用正确的 matplotlib 日期格式(参见使用面向对象API的 matplotlib 示例)。另外,请参阅 @ImportanceOfBeingErnest答案中的链接,以了解熊猫与 matplotlib之间的不兼容性。 code>的 datetime 对象。

Here's how you can control the x-axis "tick" values/labels using proper matplotlib date formatting (see matplotlib example) with the object-oriented API. Also, see link from @ImportanceOfBeingErnest answer to another question for incompatibilities between Pandas' and matplotlib's datetime objects.

# prepare your data
df = pd.read_csv('../../../so/dbo.Access_Stat_all.csv',error_bad_lines=False, usecols=['Range_Start','Format','Resource_ID','Number'])
df.head()
df1 = df[df['Resource_ID'] == 10021]
df1 = df1[['Format','Range_Start','Number']]
df1["Range_Start"] = df1["Range_Start"].str[:7]
df1 = df1.groupby(['Format','Range_Start'], as_index=True).last()
pd.options.display.float_format = '{:,.0f}'.format
df1 = df1.unstack()
df1.columns = df1.columns.droplevel()
if df1.index.contains('entry'):
    df2 = df1[1:4].sum(axis=0)
else:
    df2 = df1[0:3].sum(axis=0)
df2.name = 'sum'
df2 = df1.append(df2)
print(df2)
df2.to_csv('test.csv', sep="\t", float_format='%.f')
if df2.index.contains('entry'):
    # convert your index to use pandas datetime format
    df3 = df2.T[['entry','sum']].copy()
    df3.index = pd.to_datetime(df3.index)
    # for illustration, I changed a couple dates and added some dummy values
    df3.loc['2014-01-01']['entry'] = 48
    df3.loc['2014-05-01']['entry'] = 28
    df3.loc['2015-05-01']['entry'] = 36
    print(df3)

    # plot your data
    fig, ax = plt.subplots()

    # use matplotlib date formatters
    years = mdates.YearLocator()   # every year
    yearsFmt = mdates.DateFormatter('%Y-%m')

    # format the major ticks
    ax.xaxis.set_major_locator(years)
    ax.xaxis.set_major_formatter(yearsFmt)

    ax.plot(df3)

    # add legend
    ax.legend(["Seitenzugriffe", "Dateiabrufe"])

    fig.savefig('image.png')
else:
    # left as an exercise...
    df2.T[['sum']].plot(kind = 'bar')

这篇关于如何手动选择在 pandas 中绘制的x轴标签(日期)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆