使用 groupby 命令从数据框列表中堆叠条形图 [英] Stacked bar plots from list of dataframes with groupby command
问题描述
我希望使用 groupby.size
命令根据结果创建一个 (2x3) 堆叠条形图子图,让我解释一下.我有一个数据帧列表:list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016]
.这些 df 的一个小例子是:
I wish to create a (2x3) stacked barchart subplot from results using a groupby.size
command, let me explain. I have a list of dataframes: list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016]
. A small example of these df's would be:
... Create Time Location Area Id Beat Priority ... Closed Time
2011-01-01 00:00:00 ST&SAN PABLO AV 1.0 06X 1.0 ... 2011-01-01 00:28:17
2011-01-01 00:01:11 ST&HANNAH ST 1.0 07X 1.0 ... 2011-01-01 01:12:56
.
.
.
(由于布局混乱,只能添加几列)我正在使用 groupby.size
命令来获取这些数据库所需的事件计数,见下文:
(can only add a few columns as the layout messes up)
I'm using a groupby.size
command to get a required count of events for these databases, see below:
list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016]
for i in list_df:
print(i.groupby(['Beat', 'Priority']).size())
print(' ')
制作:
Beat Priority
01X 1.0 394
2.0 1816
02X 1.0 644
2.0 1970
02Y 1.0 661
2.0 2309
03X 1.0 857
2.0 2962
.
.
.
我希望使用 beat
列确定前 10 个 TOTALS 中的哪一个.所以对于例如以上总数为:
I wish to identify which is the top 10 TOTALS using the beat
column. So for e.g. the totals above are:
Beat Priority Total for Beat
01X 1.0 394
2.0 1816 2210
02Y 1.0 661
2.0 2309 2970
03X 1.0 857
2.0 2962 3819
.
.
.
到目前为止,我在我的 groupby.size
上使用了 plot
但它没有完成我上面描述的集体总数.看看下面:
So far I have used plot
over my groupby.size
but it hasn't done the collective total as I described above. Check out below:
list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016]
fig, axes = plt.subplots(2, 3)
for d, i in zip(list_df, range(6)):
ax = axes.ravel()[i];
d.groupby(['Beat', 'Priority']).size().nlargest(10).plot(ax=ax, kind='bar', figsize=(15, 7), stacked=True, legend=True)
ax.set_title(f"Top 10 Beats for {i+ 2011}")
plt.tight_layout()
我希望具有2x3子图布局,但是我以前已经做过这样的堆叠式条形图:
I wish to have the 2x3 subplot layout, but with stacked barcharts like this one I have done previously:
先谢谢了.这比我想象的要难!
Thanks in advance. This has been harder than I thought it would be!
推荐答案
数据系列必须是列,所以您可能需要
The data series need to be the columns, so you probably want
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# create fake input data
ncols = 300
list_df = [pd.DataFrame({'Beat': np.random.choice(['{:02d}X'.format(i) for i in range(15)], ncols),
'Priority': np.random.choice(['1', '2'], ncols),
'othercolumn1': range(ncols),
'othercol2': range(ncols),
'year': [yr] * ncols}) for yr in range(2011, 2017)]
In [22]: print(list_df[0].head(5))
Beat Priority othercolumn1 othercol2 year
0 06X 1 0 0 2011
1 05X 1 1 1 2011
2 04X 1 2 2 2011
3 01X 2 3 3 2011
4 00X 1 4 4 2011
fig, axes = plt.subplots(2, 3)
for i, d in enumerate(list_df):
ax = axes.flatten()[i]
dplot = d[['Beat', 'Priority']].pivot_table(index='Beat', columns='Priority', aggfunc=len)
dplot = (dplot.assign(total=lambda x: x.sum(axis=1))
.sort_values('total', ascending=False)
.head(10)
.drop('total', axis=1))
dplot.plot.bar(ax=ax, figsize=(15, 7), stacked=True, legend=True)
这篇关于使用 groupby 命令从数据框列表中堆叠条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!