如何使用多个分类变量归一化海洋计数图 [英] How to normalize a seaborn countplot with multiple categorical variables

查看:283
本文介绍了如何使用多个分类变量归一化海洋计数图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为一个数据框的多个类别变量创建了一个难以置信的countplot,但我不想使用百分比而不是计数?

I have created a seaborn countplot for multiple categorical variables of a dataframe but instead of count I want to have percentages?

最佳选择是什么? plo子?我可以使用以下查询来一次获取条形图吗?

What is the best option to use? Barplots? Can I use a query like the below one to get the barplots at once?

for i, col in enumerate(df_categorical.columns):
   plt.figure(i)
   sns.countplot(x=col,hue='Response',data=df_categorical) 

该查询一次为我提供所有变量的countplot

this query gives me the countplot for all variables at once

谢谢!

数据如下:

    State           Response     Coverage   Education   Effective To Date   EmploymentStatus       Gender   Location Code   Marital Status  Policy Type Policy    Renew Offer Type  Sales Channel   Vehicle Class   Vehicle Size    
0   Washington  No  Basic   Bachelor    2/24/11 Employed    F   Suburban    Married Corporate Auto  Corporate L3    Offer1  Agent   Two-Door Car    Medsize  
1   Arizona     No  Extended    Bachelor    1/31/11 Unemployed  F   Suburban    Single  Personal Auto   Personal L3 Offer3  Agent   Four-Door Car   Medsize
2   Nevada      No  Premium Bachelor    2/19/11 Employed    F   Suburban    Married Personal Auto   Personal L3 Offer1  Agent   Two-Door Car    Medsize
3   California  No  Basic   Bachelor    1/20/11 Unemployed  M   Suburban    Married Corporate Auto  Corporate L2    Offer1  Call Center SUV Medsize
4   Washington  No  Basic   Bachelor    2/3/11  Employed    M   Rural   Single  Personal Auto   Personal L1 Offer1  Agent   Four-Door Car   Medsize

推荐答案

考虑一个groupby.transform来计算百分比列,然后将barplot x 一起用于原始值列和 y 表示百分比列.

Consider a groupby.transform to calculate percentage column, then run barplot with x for original value column and y for percent column.

数据 (仅将原始发布数据的两个否"转换为是")

from io import StringIO
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

txt = '''
    State           Response     Coverage   Education   "Effective To Date"   EmploymentStatus       Gender   "Location Code"   "Marital Status"  "Policy Type" Policy    "Renew Offer Type"  "Sales Channel"   "Vehicle Class"   "Vehicle Size" 
0   Washington  No  Basic   Bachelor    "2/24/11" Employed    F   Suburban    Married "Corporate Auto"  "Corporate L3"    Offer1  Agent   "Two-Door Car"    Medsize  
1   Arizona     No  Extended    Bachelor  "1/31/11"   Unemployed  F   Suburban    Single  "Personal Auto"   "Personal L3" Offer3  Agent   "Four-Door Car"   Medsize
2   Nevada      Yes  Premium Bachelor    "2/19/11" Employed    F   Suburban    Married "Personal Auto"   "Personal L3" Offer1  Agent   "Two-Door Car"    Medsize
3   California  No  Basic   Bachelor    "1/20/11" Unemployed  M   Suburban    Married "Corporate Auto"  "Corporate L2"    Offer1  "Call Center" SUV Medsize
4   Washington  Yes  Basic   Bachelor    "2/3/11"  Employed    M   Rural   Single  "Personal Auto"   "Personal L1" Offer1  Agent   "Four-Door Car"   Medsize'''

df_categorical = pd.read_table(StringIO(txt), sep="\s+")

(两列中多个图的单个图)

fig = plt.figure(figsize=(10,30))

for i, col in enumerate(df_categorical.columns):   
   # PERCENT COLUMN CALCULATION
   df_categorical[col+'_pct'] = df_categorical.groupby(['Response', col])[col]\
                                   .transform(lambda x: len(x)) / len(df_categorical)

   plt.subplot(8, 2, i+1)   
   sns.barplot(x=col, y=col+'_pct', hue='Response', data=df_categorical)\
          .set(xlabel=col, ylabel='Percent')    

plt.tight_layout()
plt.show()
plt.clf()

plt.close('all')

这篇关于如何使用多个分类变量归一化海洋计数图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆