如何按色调/图例组用百分比注释条形图 [英] How to annotate barplot with percent by hue/legend group
本文介绍了如何按色调/图例组用百分比注释条形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想根据色调在条形顶部添加百分比.这意味着所有红色和蓝色条分别等于 100%.
我可以使蓝条等于 100%,但红条不能.应该修改哪些部分?
导入和示例数据
将pandas导入为pd将 numpy 导入为 np导入 matplotlib.pyplot 作为 plt将 seaborn 作为 sns 导入# 样本数据np.random.seed(365)行数 = 100000数据 = {'Call_ID': np.random.normal(10000, 8000, size=rows).astype(int),'with_client_nmbr': np.random.choice([False, True], size=rows, p=[.17, .83]),'Type_of_Caller': np.random.choice(['Agency', 'EE', 'ER'], size=rows, p=[.06, .77, .17])}all_call = pd.DataFrame(data)Call_ID with_client_nmbr Type_of_Caller0 11343 真正的 EE1 14188 真正的代理2 16539 假EE3 23630 真ER4 -7175 真EE
聚合和绘图
df_agg= all_call.groupby(['Type_of_Caller','with_client_nmbr'])['Call_ID'].nunique().reset_index()ax = sns.barplot(x='Type_of_Caller', y='Call_ID', Hue='with_client_nmbr',数据=df_agg,palette=['橙色','天蓝色'])Hue_order = all_call['with_client_nmbr'].unique()df_f = sum(all_call.query("with_client_nmbr==False").groupby('Type_of_Caller')['Call_ID'].nunique())df_t = sum(all_call.query("with_client_nmbr==True").groupby('Type_of_Caller')['Call_ID'].nunique())对于 ax.containers 中的条:如果 bar.get_label() == Hue_order[0]:group_total = df_f别的:group_total = df_t对于 ax.patches 中的 p:宽度 = p.get_width()高度 = p.get_height()x, y = p.get_xy()ax.annotate(f'{(height/group_total):.1%}', (x + width/2, y + height*1.02), ha='center')plt.show()
print(hue_order)
是['False', 'True']
解决方案
- 通常不需要使用
seaborn
来绘制分组条,这只是塑造数据框的问题,通常使用I want to add percentage on the top of bars according to the hue. That means all the red and blue bars are equal to 100% respectively.
I can make the blue bars equal to 100%, but the red bars can't. Which parts should be modified?
Imports and Sample Data
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns # sample data np.random.seed(365) rows = 100000 data = {'Call_ID': np.random.normal(10000, 8000, size=rows).astype(int), 'with_client_nmbr': np.random.choice([False, True], size=rows, p=[.17, .83]), 'Type_of_Caller': np.random.choice(['Agency', 'EE', 'ER'], size=rows, p=[.06, .77, .17])} all_call = pd.DataFrame(data) Call_ID with_client_nmbr Type_of_Caller 0 11343 True EE 1 14188 True Agency 2 16539 False EE 3 23630 True ER 4 -7175 True EE
Aggregate and Plot
df_agg= all_call.groupby(['Type_of_Caller','with_client_nmbr'])['Call_ID'].nunique().reset_index() ax = sns.barplot(x='Type_of_Caller', y='Call_ID', hue='with_client_nmbr', data=df_agg,palette=['orangered', 'skyblue']) hue_order = all_call['with_client_nmbr'].unique() df_f = sum(all_call.query("with_client_nmbr==False").groupby('Type_of_Caller')['Call_ID'].nunique()) df_t = sum(all_call.query("with_client_nmbr==True").groupby('Type_of_Caller')['Call_ID'].nunique()) for bars in ax.containers: if bars.get_label() == hue_order[0]: group_total = df_f else: group_total = df_t for p in ax.patches: width = p.get_width() height = p.get_height() x, y = p.get_xy() ax.annotate(f'{(height/group_total):.1%}', (x + width/2, y + height*1.02), ha='center') plt.show()
print(hue_order)
is['False', 'True']
解决方案- It's typically not required to use
seaborn
to plot grouped bars, it's just a matter of shaping the dataframe, usually with.pivot
or.pivot_table
. See How to create a grouped bar plot for more examples.- Using
pandas.DataFrame.plot
with a wide dataframe will be easier, in this case, than using a long dataframe withseaborn.barplot
, because the column / bar order andtotals
coincide. - This reduces the code from 16 to 8 lines.
- Using
- See this answer for adding annotations as a percent of the entire population.
- Tested in
python 3.8.11
,pandas 1.3.1
, andmatplotlib 3.4.2
Imports and DataFrame Transformation
import pandas as pd import matplotlib.pyplot as plt # transform the sample data from the OP with pivot_table dfp = all_call.pivot_table(index='Type_of_Caller', columns='with_client_nmbr', values='Call_ID', aggfunc='nunique') # display(dfp) with_client_nmbr False True Type_of_Caller Agency 994 4593 EE 10554 27455 ER 2748 11296
Use
matplotlib.pyplot.bar_label
- Requires
matplotlib >= 3.4.2
- Each column is plotted in order, and the
pandas.Series
created bydf.sum()
has the same order as the dataframe columns. Therefore,zip
totals
to the plot containers and use the value,tot
, inlabels
to calculate the percentage by hue group. - Add custom annotations based on percent by hue group, by using the
labels
parameter.(v.get_height()/tot)*100
in the list comprehension, calculates percentage.
- See this answer for other options using
.bar_label
# get the total value for the column totals = dfp.sum() # plot p1 = dfp.plot(kind='bar', figsize=(8, 4), rot=0, color=['orangered', 'skyblue'], ylabel='Value of Bar', title="The value and percentage (by hue group)") # add annotations for tot, p in zip(totals, p1.containers): labels = [f'{(v.get_height()/tot)*100:0.2f}%' for v in p] p1.bar_label(p, labels=labels, label_type='edge', fontsize=8, rotation=0, padding=2) p1.margins(y=0.2) plt.show()
这篇关于如何按色调/图例组用百分比注释条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文