如何分组两次、保留原始列和绘图 [英] How to group-by twice, preserve original columns, and plot

查看:70
本文介绍了如何分组两次、保留原始列和绘图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据集(仅显示示例):

我想找到每个区域最具影响力的运动,然后通过 Seaborn 条形图绘制它.我使用以下代码来执行此操作.

# 仅使用区域、练习和影响级别分类创建数据集CA_data = Data[['area','exercise','impact level']]# 计算每个区域每次运动的平均影响水平mean_il_CA = CA_data.groupby(['area', 'exercise'])['impact level'].mean().reset_index()mean_il_CA_hello = mean_il_CA.groupby('area')['impact level'].max().reset_index()# 阴谋cx = sns.barplot(x=影响级别",y=面积",数据=mean_il_CA_hello)plt.title('最有影响力的练习考虑区域')plt.show()

得到的数据集是:

这意味着当我绘图时,在 y 轴上只出现相对于该区域的标签,而不是我想要的区域标签"+运动标签".

如何将锻炼"列重新插入到我的最终数据集中?如何在 y 图中同时获得区域名称和练习?

解决方案

保留MultiIndex(即不使用)可以解决按'area'的最大值分组时'exercise'的值丢失的问题reset_index) 并使用

data_mean = data.groupby(['class', 'embark_town'])['fare'].mean()数据平均值

# 选择每个类中的最大值并创建串联标签mask_max = data_mean.groupby(level=0).transform(lambda x: x == x.max())data_mean_max = data_mean[mask_max].reset_index()data_mean_max['class, embark_town'] = data_mean_max['class'].astype(str) + ', ' \+ data_mean_max['embark_town']data_mean_max

#绘制seaborn条形图sns.barplot(data=data_mean_max,x=data_mean_max['票价'],y=data_mean_max['class, embark_town'])

I have the following data sets (only sample is shown):

I want to find the most impactful exercise per area and then plot it via Seaborn barplot. I use the following code to do so.

# Create Dataset Using Only Area, Exercise and Impact Level Chategories
    CA_data = Data[['area', 'exercise', 'impact level']]

    # Compute Mean Impact Level per Exercise per Area
    mean_il_CA = CA_data.groupby(['area', 'exercise'])['impact level'].mean().reset_index()

    mean_il_CA_hello = mean_il_CA.groupby('area')['impact level'].max().reset_index()

    # Plot
    cx = sns.barplot(x="impact level", y="area", data=mean_il_CA_hello)
    plt.title('Most Impactful Exercises Considering Area')
    plt.show()

The resulting dataset is:

This means that when I plot, on the y axis only the label relative to the area appears, NOT 'area label' + 'exercise label' like I would like.

How do I reinsert 'exercise column into my final dataset? How do I get both the name of the area and the exercise on the y plot?

解决方案

The problem of losing the values of 'exercise' when grouping by the maximum of 'area' can be solved by keeping the MultiIndex (i.e. not using reset_index) and using .transform to create a boolean mask to select the appropriate full rows of mean_il_CA that contain the maximum 'impact_level' values per 'area'. This solution is based on the code provided in this answer by unutbu. The full labels for the bar chart can be created by concatenating the labels of 'area' and 'exercise'.

Here is an example using the titanic dataset from the seaborn package. The variables 'class', 'embark_town', and 'fare' are used in place of 'area', 'exercise', and 'impact_level'. The categorical variables both contain three unique values: 'First', 'Second', 'Third', and 'Cherbourg', 'Queenstown', 'Southampton'.

import pandas as pd    # v 1.2.5
import seaborn as sns  # v 0.11.1

df = sns.load_dataset('titanic')
data = df[['class', 'embark_town', 'fare']]
data.head()

data_mean = data.groupby(['class', 'embark_town'])['fare'].mean()
data_mean

# Select max values in each class and create concatenated labels
mask_max = data_mean.groupby(level=0).transform(lambda x: x == x.max())
data_mean_max = data_mean[mask_max].reset_index()
data_mean_max['class, embark_town'] = data_mean_max['class'].astype(str) + ', ' \
                                      + data_mean_max['embark_town']
data_mean_max

# Draw seaborn bar chart
sns.barplot(data=data_mean_max,
            x=data_mean_max['fare'],
            y=data_mean_max['class, embark_town'])

这篇关于如何分组两次、保留原始列和绘图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆