每年与 pandas 的箱线图 [英] Yearly BoxPlots with Pandas

查看:83
本文介绍了每年与 pandas 的箱线图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame(每日多个时间序列),其中DateTimeIndex作为indexMultiIndex作为columns.我想选择一个列并创建一个箱形图,其中按年份对数据进行分组.我以为这很容易,但是我正在努力取得一些结果.

I have a DataFrame (multiple daily timeseries) with DateTimeIndex as index and MultiIndex as columns. I would like to select a column and create a Box Plot where data are grouped by year. I thought it was easy but I am struggling to get some result.

>>> daily.shape
(11319, 118)

>>> daily.index
DatetimeIndex(['1986-01-01', '1986-01-02', '1986-01-03', '1986-01-04',
               '1986-01-05', '1986-01-06', '1986-01-07', '1986-01-08',
               '1986-01-09', '1986-01-10',
               ...
               '2016-12-22', '2016-12-23', '2016-12-24', '2016-12-25',
               '2016-12-26', '2016-12-27', '2016-12-28', '2016-12-29',
               '2016-12-30', '2016-12-31'],
              dtype='datetime64[ns]', name='timevalue', length=11319, freq=None)
>>> daily.columns
MultiIndex(levels=[['41B001', '41B004', '41B006', '41B008', '41B011', '41MEU1', '41N043', '41R001', '41R002', '41R012', '41WOL1', '41WOL2', '47E013', 'T1M001', 'T1M003'], ['BA-10.0', 'BA-2.5', 'BC', 'CO', 'CO2', 'NO', 'NO2', 'NOx', 'O3', 'PM-10.0', 'PM-2.5', 'RH', 'SO2', 'T', 'UVPM', 'VO-10.0', 'VO-2.5', 'WD', 'WS-s', 'WS-v', 'p']],
           labels=[[0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14], [5, 6, 7, 3, 5, 6, 7, 8, 3, 5, 6, 7, 8, 3, 5, 6, 7, 12, 0, 1, 5, 6, 7, 8, 9, 10, 15, 16, 0, 1, 5, 6, 7, 9, 10, 15, 16, 0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 2, 3, 4, 5, 6, 7, 12, 14, 0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 0, 2, 3, 4, 5, 6, 7, 8, 9, 12, 14, 15, 4, 5, 6, 7, 12, 11, 13, 13, 17, 18, 19, 20, 11, 13, 13, 17, 18, 19, 20]],
           names=['sitekey', 'measurandkey'])

我能做到的最好的是:

fig, axe = plt.subplots()
daily.loc[:,[('41R001', 'SO2')]].groupby(daily.index.map(lambda x: x.year)).boxplot(ax=axe, subplots=False, rot=90)

但是它需要其他后处理来标记轴.

But It will requires other postprocess for labelling axis.

当我尝试reset_index()应用功能并使用pivot()时,由于MultiIndex,我出现了索引错误.

When I try to reset_index() to apply function and using pivot(), I have indexing error because of the MultiIndex.

d = daily.reset_index()
d['timevalue']

例外是:无法处理非唯一的多索引!我不了解,因为在MultiIndex中没有出现TimeValue.我也尝试过.loc[],但我认为问题出在其他地方.

The Exception is: cannot handle a non-unique multi-index! That I do not understand since there is no occurrence of TimeValue in my MultiIndex. I also have tried .loc[] but I think the problem is elsewhere.

所以,我要做的很简单:

So, what I would achieve is simple:

  • 我有几年中的每日时间序列,并且这些时间序列是多索引的;
  • 我想选择其中一个(使用loc和上面的示例中的复合键),并获得一个时间盒图,其中按年份对数据进行分组.
  • I have daily timeseries among years and those timeseries are multi-indexed;
  • I would like to select one of them (using loc and a composite key as in example above) and get a timeserie boxplot where data are grouped by year.

我认为这很容易,但是由于存在多个索引错误,因此我无法在此DataFrame中正确使用pivot().

I thought it could be easy, but I cannot properly use pivot() with this DataFrame because of the mutli-index error.

推荐答案

如果您不介意使用seaborn库,则可以很容易地绘制此图:

If you don't mind using seaborn library you can make this plot pretty easily:

import pandas as pd
import seaborn as sns

index = pd.DatetimeIndex(start=pd.to_datetime('1985-01-01'), 
                         end = pd.to_datetime('2017-03-08'), 
                         freq='d')
df = pd.DataFrame(index = index, 
                  data = np.random.uniform(-1,1,size=(index.shape[0],4)), 
                  columns=pd.MultiIndex.from_arrays([['A','A','B','B'],
                                                     ['d','e','d','e']]))
df['Year'] = df.index.year
#                    A                   B            Year
#                    d         e         d         e      
# 1985-01-01  0.205208 -0.228484  0.296273  0.545031  1985
# 1985-01-02  0.546436 -0.538920  0.173388  0.848590  1985
# 1985-01-03 -0.367593 -0.974911 -0.796331 -0.946239  1985
# 1985-01-04 -0.346102 -0.951542 -0.975172  0.951099  1985
# 1985-01-05  0.973975  0.708254 -0.150454  0.145298  1985

ax = sns.boxplot(data = df, x='Year',y=('A','e'))
for item in ax.get_xticklabels():
    item.set_rotation(90)

结果图像:

我尝试使用pandas.DataFrame.boxplot()方法,但无法在短时间内(=)来解决此问题.

I tried using the pandas.DataFrame.boxplot() method but couldn't make it work for this case in a short span of time =).

这篇关于每年与 pandas 的箱线图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆