pandas 分组 - 基于另一列的分组总计的百分比值 [英] Pandas Grouping - Values as Percent of Grouped Totals Based on Another Column
问题描述
使用数据框和熊猫,我想知道一个组中每个类别的提示百分比因此,使用提示数据库,我想看到,对于每个性别/吸烟者,女性吸烟者/所有女性和女性的提示百分比是多少?非吸烟者/所有女性(与男性相同)
当我这样做时,
import pandas as pd
df = pd.read_csv(https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv,sep =' ,')
df.groupby(['sex','smoker'])[['total_bill','tip']]。sum()
我得到以下内容:
total_bill tips
性吸烟者
女性否977.68 149.77
是593.27 96.74
男性否1919.75 302.00
是1337.07 183.07
但我正在寻找对于像这样的东西
提示Pct
女性否0.153189183
是0.163062349
男性否0.15731215
是0.136918785
其中Tip Pct = sum(tip)/ sum(total_bill)for每组
我做错了什么,我该如何解决?谢谢!
我知道这会给我提示总分的百分比:
(df.groupby(['sex','smoker'])['tip']。sum()。groupby(level = 0).transform(lambda x:x / x.sum )))
有没有办法修改它来查看另一个列,即
(df.groupby(['sex','smoker'])['tip']。sum()。groupby(level = 0 ).transform(lambda x:x / x ['total_bill']。sum()))
谢谢!
您可以使用 apply
循环遍历数据框(与 axis = 1
),每行可以访问提示
和 total_bill
并将它们除以得到百分比:
(df.groupby(['sex' ,'smoker'])[['total_bill','tip']]。sum()
.apply(lambda r:r.tip / r.total_bill,axis = 1))
#sex smoker
#Female No 0.153189
#是0.163062
#Male否0.157312
#是0.136919
#dtype:float64
This question is an extension of a question I asked yesterday, but I will rephrase
Using a data frame and pandas, I am trying to figure out what the tip percentage is for each category in a group by.
So, using the tips database, I want to see, for each sex/smoker, what the tip percentage is is for female smoker / all female and for female non smoker / all female (and the same thing for men)
When I do this,
import pandas as pd
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')
df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum()
I get the following:
total_bill tip
sex smoker
Female No 977.68 149.77
Yes 593.27 96.74
Male No 1919.75 302.00
Yes 1337.07 183.07
But I am looking for something more like this
Tip Pct
Female No 0.153189183
Yes 0.163062349
Male No 0.15731215
Yes 0.136918785
Where Tip Pct = sum(tip)/sum(total_bill) for each group
What am I doing wrong and how do I fix this? Thank you!
I understand that this would give me tip as a percentage of total tips:
(df.groupby(['sex', 'smoker'])['tip'].sum().groupby(level = 0).transform(lambda x: x/x.sum()))
Is there a way to modify it to look at another column, i.e.
(df.groupby(['sex', 'smoker'])['tip'].sum().groupby(level = 0).transform(lambda x: x/x['total_bill'].sum()))
Thanks!
You can use apply
to loop through rows of the data frame (with axis = 1
), where for each row you can access the tip
and total_bill
and divide them to get the percentage:
(df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum()
.apply(lambda r: r.tip/r.total_bill, axis = 1))
#sex smoker
#Female No 0.153189
# Yes 0.163062
#Male No 0.157312
# Yes 0.136919
#dtype: float64
这篇关于 pandas 分组 - 基于另一列的分组总计的百分比值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!