pandas 分组 - 基于另一列的分组总计的百分比值 [英] Pandas Grouping - Values as Percent of Grouped Totals Based on Another Column

查看:355
本文介绍了 pandas 分组 - 基于另一列的分组总计的百分比值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题是一个问题I的扩展昨天问过,但我会重新表述



使用数据框和熊猫,我想知道一个组中每个类别的提示百分比因此,使用提示数据库,我想看到,对于每个性别/吸烟者,女性吸烟者/所有女性和女性的提示百分比是多少?非吸烟者/所有女性(与男性相同)



当我这样做时,

  import pandas as pd 
df = pd.read_csv(https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv,sep =' ,')
df.groupby(['sex','smoker'])[['total_bill','tip']]。sum()

我得到以下内容:

  total_bill tips 
性吸烟者
女性否977.68 149.77
是593.27 96.74
男性否1919.75 302.00
是1337.07 183.07

但我正在寻找对于像这样的东西

 提示Pct 
女性否0.153189183
是0.163062349
男性否0.15731215
是0.136918785

其中Tip Pct = sum(tip)/ sum(total_bill)for每组



我做错了什么,我该如何解决?谢谢!



我知道这会给我提示总分的百分比:

 (df.groupby(['sex','smoker'])['tip']。sum()。groupby(level = 0).transform(lambda x:x / x.sum )))

有没有办法修改它来查看另一个列,即

 (df.groupby(['sex','smoker'])['tip']。sum()。groupby(level = 0 ).transform(lambda x:x / x ['total_bill']。sum()))

谢谢!

解决方案

您可以使用 apply 循环遍历数据框(与 axis = 1 ),每行可以访问提示 total_bill 并将它们除以得到百分比:

 (df.groupby(['sex' ,'smoker'])[['total_bill','tip']]。sum()
.apply(lambda r:r.tip / r.total_bill,axis = 1))

#sex smoker
#Female No 0.153189
#是0.163062
#Male否0.157312
#是0.136919
#dtype:float64


This question is an extension of a question I asked yesterday, but I will rephrase

Using a data frame and pandas, I am trying to figure out what the tip percentage is for each category in a group by.

So, using the tips database, I want to see, for each sex/smoker, what the tip percentage is is for female smoker / all female and for female non smoker / all female (and the same thing for men)

When I do this,

import pandas as pd
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')
df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum()

I get the following:

        total_bill  tip
sex smoker      
Female  No  977.68  149.77
        Yes 593.27  96.74
Male    No  1919.75 302.00
        Yes 1337.07 183.07

But I am looking for something more like this

        Tip Pct
Female  No  0.153189183
        Yes 0.163062349
Male    No  0.15731215
        Yes 0.136918785

Where Tip Pct = sum(tip)/sum(total_bill) for each group

What am I doing wrong and how do I fix this? Thank you!

I understand that this would give me tip as a percentage of total tips:

(df.groupby(['sex', 'smoker'])['tip'].sum().groupby(level = 0).transform(lambda x: x/x.sum()))

Is there a way to modify it to look at another column, i.e.

(df.groupby(['sex', 'smoker'])['tip'].sum().groupby(level = 0).transform(lambda x: x/x['total_bill'].sum()))

Thanks!

解决方案

You can use apply to loop through rows of the data frame (with axis = 1), where for each row you can access the tip and total_bill and divide them to get the percentage:

(df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum()
   .apply(lambda r: r.tip/r.total_bill, axis = 1))

#sex     smoker
#Female  No        0.153189
#        Yes       0.163062
#Male    No        0.157312
#        Yes       0.136919
#dtype: float64

这篇关于 pandas 分组 - 基于另一列的分组总计的百分比值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆