如何按色调/图例组用百分比注释条形图 [英] How to annotate barplot with percent by hue/legend group

查看:25
本文介绍了如何按色调/图例组用百分比注释条形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据色调在条形顶部添加百分比.这意味着所有红色和蓝色条分别等于 100%.

我可以使蓝条等于 100%,但红条不能.应该修改哪些部分?

导入和示例数据

将pandas导入为pd将 numpy 导入为 np导入 matplotlib.pyplot 作为 plt将 seaborn 作为 sns 导入# 样本数据np.random.seed(365)行数 = 100000数据 = {'Call_ID': np.random.normal(10000, 8000, size=rows).astype(int),'with_client_nmbr': np.random.choice([False, True], size=rows, p=[.17, .83]),'Type_of_Caller': np.random.choice(['Agency', 'EE', 'ER'], size=rows, p=[.06, .77, .17])}all_call = pd.DataFrame(data)Call_ID with_client_nmbr Type_of_Caller0 11343 真正的 EE1 14188 真正的代理2 16539 假EE3 23630 真ER4 -7175 真EE

聚合和绘图

df_agg= all_call.groupby(['Type_of_Caller','with_client_nmbr'])['Call_ID'].nunique().reset_index()ax = sns.barplot(x='Type_of_Caller', y='Call_ID', Hue='with_client_nmbr',数据=df_agg,palette=['橙色','天蓝色'])Hue_order = all_call['with_client_nmbr'].unique()df_f = sum(all_call.query("with_client_nmbr==False").groupby('Type_of_Caller')['Call_ID'].nunique())df_t = sum(all_call.query("with_client_nmbr==True").groupby('Type_of_Caller')['Call_ID'].nunique())对于 ax.containers 中的条:如果 bar.get_label() == Hue_order[0]:group_total = df_f别的:group_total = df_t对于 ax.patches 中的 p:宽度 = p.get_width()高度 = p.get_height()x, y = p.get_xy()ax.annotate(f'{(height/group_total):.1%}', (x + width/2, y + height*1.02), ha='center')plt.show()

  • print(hue_order)['False', 'True']

解决方案

  • 通常不需要使用 seaborn 来绘制分组条,这只是塑造数据框的问题,通常使用

    I want to add percentage on the top of bars according to the hue. That means all the red and blue bars are equal to 100% respectively.

    I can make the blue bars equal to 100%, but the red bars can't. Which parts should be modified?

    Imports and Sample Data

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    # sample data
    np.random.seed(365)
    rows = 100000
    data = {'Call_ID': np.random.normal(10000, 8000, size=rows).astype(int),
            'with_client_nmbr': np.random.choice([False, True], size=rows, p=[.17, .83]),
            'Type_of_Caller': np.random.choice(['Agency', 'EE', 'ER'], size=rows, p=[.06, .77, .17])}
    all_call = pd.DataFrame(data)
    
       Call_ID  with_client_nmbr Type_of_Caller
    0    11343              True             EE
    1    14188              True         Agency
    2    16539             False             EE
    3    23630              True             ER
    4    -7175              True             EE
    

    Aggregate and Plot

    df_agg= all_call.groupby(['Type_of_Caller','with_client_nmbr'])['Call_ID'].nunique().reset_index()
    
    ax = sns.barplot(x='Type_of_Caller', y='Call_ID', hue='with_client_nmbr',
                     data=df_agg,palette=['orangered', 'skyblue'])
    
    hue_order = all_call['with_client_nmbr'].unique()
    df_f = sum(all_call.query("with_client_nmbr==False").groupby('Type_of_Caller')['Call_ID'].nunique())
    df_t = sum(all_call.query("with_client_nmbr==True").groupby('Type_of_Caller')['Call_ID'].nunique())
    
    for bars in ax.containers:
        if bars.get_label() == hue_order[0]:
            group_total = df_f
        else:
            group_total = df_t
        for p in ax.patches:
            width = p.get_width()
            height = p.get_height()
            x, y = p.get_xy()
            ax.annotate(f'{(height/group_total):.1%}', (x + width/2, y + height*1.02), ha='center')
    plt.show()
    

    • print(hue_order) is ['False', 'True']

    解决方案

    • It's typically not required to use seaborn to plot grouped bars, it's just a matter of shaping the dataframe, usually with .pivot or .pivot_table. See How to create a grouped bar plot for more examples.
      • Using pandas.DataFrame.plot with a wide dataframe will be easier, in this case, than using a long dataframe with seaborn.barplot, because the column / bar order and totals coincide.
      • This reduces the code from 16 to 8 lines.
    • See this answer for adding annotations as a percent of the entire population.
    • Tested in python 3.8.11, pandas 1.3.1, and matplotlib 3.4.2

    Imports and DataFrame Transformation

    import pandas as pd
    import matplotlib.pyplot as plt
    
    # transform the sample data from the OP with pivot_table
    dfp = all_call.pivot_table(index='Type_of_Caller', columns='with_client_nmbr', values='Call_ID', aggfunc='nunique')
    
    # display(dfp)
    with_client_nmbr  False   True
    Type_of_Caller                
    Agency              994   4593
    EE                10554  27455
    ER                 2748  11296
    

    Use matplotlib.pyplot.bar_label

    • Requires matplotlib >= 3.4.2
    • Each column is plotted in order, and the pandas.Series created by df.sum() has the same order as the dataframe columns. Therefore, zip totals to the plot containers and use the value, tot, in labels to calculate the percentage by hue group.
    • Add custom annotations based on percent by hue group, by using the labels parameter.
      • (v.get_height()/tot)*100 in the list comprehension, calculates percentage.
    • See this answer for other options using .bar_label

    # get the total value for the column
    totals = dfp.sum()
    
    # plot
    p1 = dfp.plot(kind='bar', figsize=(8, 4), rot=0, color=['orangered', 'skyblue'], ylabel='Value of Bar', title="The value and percentage (by hue group)")
    
    # add annotations
    for tot, p in zip(totals, p1.containers):
        
        labels = [f'{(v.get_height()/tot)*100:0.2f}%' for v in p]
        
        p1.bar_label(p, labels=labels, label_type='edge', fontsize=8, rotation=0, padding=2)
    
    p1.margins(y=0.2)
    plt.show()
    

    这篇关于如何按色调/图例组用百分比注释条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆