pandas 分组 - 作为分组总计的百分比的值不工作 [英] Pandas Grouping - Values as Percent of Grouped Totals Not Working

查看:163
本文介绍了 pandas 分组 - 作为分组总计的百分比的值不工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用数据框和大熊猫,我试图找出每个值是group by类别的总计的百分比。



所以使用提示数据库,我想看到,对于每个性别/吸烟者,总帐单的比例是女性吸烟者/所有女性和女性非吸烟者/所有女性(男性同样的)



例如,



如果完整的数据集是:

 性别,吸烟者,日间,时间,大小,总帐单
女性,不,太阳,晚餐,2,20
女性,不,星期一,晚餐, 2,40
女,不,周三,晚餐,1,10
女,是,周三,晚餐,1,15

第一行的值为(20 + 40 + 10)/(20 + 40 + 10 + 15),因为非吸烟女性的其他3个值



所以输出应该像

 女性否0.823529412 
女性是0.176470588

但是,我似乎有一些麻烦



当我这样做,

 导入熊猫为pd 
df = pd.read_csv(https://raw.githubusercontent.com/wesm/pydata- book / master / ch08 / tips.csv,sep =',')
df.groupby(['sex','吸烟者]])[['total_bill']]。apply(lambda x:x / x.sum())。head()

我得到以下内容:

  total_bill 
0 0.017378
1 0.005386
2 0.010944
3 0.012335
4 0.025151

忽略该组,并为每个订单项计算它



我正在寻找更像

  df.groupby(['sex','smoker'])[['total_bill']]。sum()

哪些将返回

  total_bill 
性吸烟者
女性否977.68
是593.27
男性否1919.75
是1337.07

但是我想要这个表示为个人性别/吸烟者组合总数的总和百分比,或

 女性否977.68 /(977.68 + 593.27)
女性是593.27 /(977.68 + 593.27)
男性不是1919.75 /(1919.75 + 1337.07)
男性是1337.07 /(1919.75 + 1337.07)
/ pre>

理想情况下,我想同时使用提示列。



我做错了什么,我该如何解决这个问题?谢谢!

解决方案

您可以在获得总和表计算百分比:

 (df.groupby(['sex','smoker'])[' total_bill'] sum()
.groupby(level = 0).transform(lambda x:x / x.sum()))#group by sex and calculate percentage

#sex吸烟者
#女性否0.622350
#是0.377650
#Male否0.589455
#是0.410545
#dtype:float64


Using a data frame and pandas, I am trying to figure out what each value is as a percentage of the grand total for the "group by" category

So, using the tips database, I want to see, for each sex/smoker, what the proportion of the total bill is for female smoker / all female and for female non smoker / all female (and the same thing for men)

For example,

If the complete data set is:

Sex, Smoker, Day, Time, Size, Total Bill
Female,No,Sun,Dinner,2, 20
Female,No,Mon,Dinner,2, 40
Female,No,Wed,Dinner,1, 10
Female,Yes,Wed,Dinner,1, 15

The values for the first line would be (20+40+10)/(20+40+10+15), as those are the other 3 values for non smoking females

So the output should look like

Female No 0.823529412
Female Yes 0.176470588

However, I seem to be having some trouble

When I do this,

import pandas as pd
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-   book/master/ch08/tips.csv", sep=',')
df.groupby(['sex', 'smoker'])[['total_bill']].apply(lambda x: x / x.sum()).head()

I get the following:

    total_bill
0   0.017378
1   0.005386
2   0.010944
3   0.012335
4   0.025151

It seems to be ignoring the group by and just calculating it for each line item

I am looking for something more like

df.groupby(['sex', 'smoker'])[['total_bill']].sum()

Which will return

        total_bill
sex smoker  
Female  No  977.68
        Yes 593.27
Male    No  1919.75
       Yes  1337.07

But I want this expressed as percentages of totals for the total of the individual sex/smoker combinations or

Female No  977.68/(977.68+593.27)
Female Yes  593.27/(977.68+593.27)
Male No  1919.75/(1919.75+1337.07)
Male Yes  1337.07/(1919.75+1337.07)

Ideally, I would like to do the same with the "tip" column at the same time.

What am I doing wrong and how do I fix this? Thank you!

解决方案

You can add another grouped by process after you get the sum table to calculate the percentage:

(df.groupby(['sex', 'smoker'])['total_bill'].sum()
   .groupby(level = 0).transform(lambda x: x/x.sum()))   # group by sex and calculate percentage

#sex     smoker
#Female  No        0.622350
#        Yes       0.377650
#Male    No        0.589455
#        Yes       0.410545
#dtype: float64

这篇关于 pandas 分组 - 作为分组总计的百分比的值不工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆