使用 groupby 命令从数据框列表中堆叠条形图 [英] Stacked bar plots from list of dataframes with groupby command

查看:39
本文介绍了使用 groupby 命令从数据框列表中堆叠条形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望使用 groupby.size 命令根据结果创建一个 (2x3) 堆叠条形图子图,让我解释一下.我有一个数据帧列表:list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016].这些 df 的一个小例子是:

I wish to create a (2x3) stacked barchart subplot from results using a groupby.size command, let me explain. I have a list of dataframes: list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016]. A small example of these df's would be:

...     Create Time          Location       Area Id     Beat    Priority    ... Closed Time

    2011-01-01 00:00:00    ST&SAN PABLO AV    1.0        06X      1.0   ... 2011-01-01 00:28:17

    2011-01-01 00:01:11    ST&HANNAH ST       1.0        07X      1.0   ... 2011-01-01 01:12:56
             .
             .
             .

(由于布局混乱,只能添加几列)我正在使用 groupby.size 命令来获取这些数据库所需的事件计数,见下文:

(can only add a few columns as the layout messes up) I'm using a groupby.size command to get a required count of events for these databases, see below:

list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016]
for i in list_df:
    print(i.groupby(['Beat', 'Priority']).size())
    print(' ')

制作:

Beat  Priority
01X   1.0          394
      2.0         1816
02X   1.0          644
      2.0         1970
02Y   1.0          661
      2.0         2309
03X   1.0          857
      2.0         2962
.
.
.

我希望使用 beat 列确定前 10 个 TOTALS 中的哪一个.所以对于例如以上总数为:

I wish to identify which is the top 10 TOTALS using the beat column. So for e.g. the totals above are:

Beat  Priority           Total for Beat
01X   1.0       394         
      2.0       1816         2210
02Y   1.0       661          
      2.0       2309         2970
03X   1.0       857
      2.0       2962         3819
.
.
.

到目前为止,我在我的 groupby.size 上使用了 plot 但它没有完成我上面描述的集体总数.看看下面:

So far I have used plot over my groupby.size but it hasn't done the collective total as I described above. Check out below:

list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016]
fig, axes = plt.subplots(2, 3)
for d, i in zip(list_df, range(6)):
    ax = axes.ravel()[i];
    d.groupby(['Beat', 'Priority']).size().nlargest(10).plot(ax=ax, kind='bar', figsize=(15, 7), stacked=True, legend=True)
    ax.set_title(f"Top 10 Beats for {i+ 2011}")
    plt.tight_layout()

我希望具有2x3子图布局,但是我以前已经做过这样的堆叠式条形图:

I wish to have the 2x3 subplot layout, but with stacked barcharts like this one I have done previously:

先谢谢了.这比我想象的要难!

Thanks in advance. This has been harder than I thought it would be!

推荐答案

数据系列必须是列,所以您可能需要

The data series need to be the columns, so you probably want

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# create fake input data
ncols = 300
list_df = [pd.DataFrame({'Beat': np.random.choice(['{:02d}X'.format(i) for i in range(15)], ncols),
                         'Priority': np.random.choice(['1', '2'], ncols), 
                         'othercolumn1': range(ncols), 
                         'othercol2': range(ncols), 
                         'year': [yr] * ncols}) for yr in range(2011, 2017)]                                                                     

In [22]: print(list_df[0].head(5))
  Beat Priority  othercolumn1  othercol2  year
0  06X        1             0          0  2011
1  05X        1             1          1  2011
2  04X        1             2          2  2011
3  01X        2             3          3  2011
4  00X        1             4          4  2011

fig, axes = plt.subplots(2, 3)   

for i, d in enumerate(list_df):
    ax = axes.flatten()[i]
    dplot = d[['Beat', 'Priority']].pivot_table(index='Beat', columns='Priority', aggfunc=len)
    dplot = (dplot.assign(total=lambda x: x.sum(axis=1))
                  .sort_values('total', ascending=False)
                  .head(10)
                  .drop('total', axis=1))
    dplot.plot.bar(ax=ax, figsize=(15, 7), stacked=True, legend=True)

这篇关于使用 groupby 命令从数据框列表中堆叠条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆