pandas 分组 - 作为分组总计的百分比的值不工作 [英] Pandas Grouping - Values as Percent of Grouped Totals Not Working
问题描述
使用数据框和大熊猫,我试图找出每个值是group by类别的总计的百分比。
所以使用提示数据库,我想看到,对于每个性别/吸烟者,总帐单的比例是女性吸烟者/所有女性和女性非吸烟者/所有女性(男性同样的)
例如,
如果完整的数据集是:
性别,吸烟者,日间,时间,大小,总帐单
女性,不,太阳,晚餐,2,20
女性,不,星期一,晚餐, 2,40
女,不,周三,晚餐,1,10
女,是,周三,晚餐,1,15
第一行的值为(20 + 40 + 10)/(20 + 40 + 10 + 15),因为非吸烟女性的其他3个值
所以输出应该像
女性否0.823529412
女性是0.176470588
但是,我似乎有一些麻烦
当我这样做,
导入熊猫为pd
df = pd.read_csv(https://raw.githubusercontent.com/wesm/pydata- book / master / ch08 / tips.csv,sep =',')
df.groupby(['sex','吸烟者]])[['total_bill']]。apply(lambda x:x / x.sum())。head()
我得到以下内容:
total_bill
0 0.017378
1 0.005386
2 0.010944
3 0.012335
4 0.025151
忽略该组,并为每个订单项计算它
我正在寻找更像
df.groupby(['sex','smoker'])[['total_bill']]。sum()
哪些将返回
total_bill
性吸烟者
女性否977.68
是593.27
男性否1919.75
是1337.07
但是我想要这个表示为个人性别/吸烟者组合总数的总和百分比,或
女性否977.68 /(977.68 + 593.27)
/ pre>
女性是593.27 /(977.68 + 593.27)
男性不是1919.75 /(1919.75 + 1337.07)
男性是1337.07 /(1919.75 + 1337.07)
理想情况下,我想同时使用提示列。
我做错了什么,我该如何解决这个问题?谢谢!
解决方案您可以在获得
总和
表计算百分比:(df.groupby(['sex','smoker'])[' total_bill'] sum()
.groupby(level = 0).transform(lambda x:x / x.sum()))#group by sex and calculate percentage
#sex吸烟者
#女性否0.622350
#是0.377650
#Male否0.589455
#是0.410545
#dtype:float64
Using a data frame and pandas, I am trying to figure out what each value is as a percentage of the grand total for the "group by" category
So, using the tips database, I want to see, for each sex/smoker, what the proportion of the total bill is for female smoker / all female and for female non smoker / all female (and the same thing for men)
For example,
If the complete data set is:
Sex, Smoker, Day, Time, Size, Total Bill Female,No,Sun,Dinner,2, 20 Female,No,Mon,Dinner,2, 40 Female,No,Wed,Dinner,1, 10 Female,Yes,Wed,Dinner,1, 15
The values for the first line would be (20+40+10)/(20+40+10+15), as those are the other 3 values for non smoking females
So the output should look like
Female No 0.823529412 Female Yes 0.176470588
However, I seem to be having some trouble
When I do this,
import pandas as pd df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata- book/master/ch08/tips.csv", sep=',') df.groupby(['sex', 'smoker'])[['total_bill']].apply(lambda x: x / x.sum()).head()
I get the following:
total_bill 0 0.017378 1 0.005386 2 0.010944 3 0.012335 4 0.025151
It seems to be ignoring the group by and just calculating it for each line item
I am looking for something more like
df.groupby(['sex', 'smoker'])[['total_bill']].sum()
Which will return
total_bill sex smoker Female No 977.68 Yes 593.27 Male No 1919.75 Yes 1337.07
But I want this expressed as percentages of totals for the total of the individual sex/smoker combinations or
Female No 977.68/(977.68+593.27) Female Yes 593.27/(977.68+593.27) Male No 1919.75/(1919.75+1337.07) Male Yes 1337.07/(1919.75+1337.07)
Ideally, I would like to do the same with the "tip" column at the same time.
What am I doing wrong and how do I fix this? Thank you!
解决方案You can add another grouped by process after you get the
sum
table to calculate the percentage:(df.groupby(['sex', 'smoker'])['total_bill'].sum() .groupby(level = 0).transform(lambda x: x/x.sum())) # group by sex and calculate percentage #sex smoker #Female No 0.622350 # Yes 0.377650 #Male No 0.589455 # Yes 0.410545 #dtype: float64
这篇关于 pandas 分组 - 作为分组总计的百分比的值不工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!