pandas 分组 - 作为分组总计的百分比的值不工作 [英] Pandas Grouping - Values as Percent of Grouped Totals Not Working

查看：163 发布时间：2017/3/26 2:24:12 python pandas dataframe aggregate aggregation

本文介绍了 pandas 分组 - 作为分组总计的百分比的值不工作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用数据框和大熊猫，我试图找出每个值是group by类别的总计的百分比。

所以使用提示数据库，我想看到，对于每个性别/吸烟者，总帐单的比例是女性吸烟者/所有女性和女性非吸烟者/所有女性（男性同样的）

例如，

如果完整的数据集是：

 性别，吸烟者，日间，时间，大小，总帐单
女性，不，太阳，晚餐，2，20 
女性，不，星期一，晚餐， 2，40 
女，不，周三，晚餐，1，10 
女，是，周三，晚餐，1，15

第一行的值为（20 + 40 + 10）/（20 + 40 + 10 + 15），因为非吸烟女性的其他3个值

所以输出应该像

 女性否0.823529412 
女性是0.176470588

但是，我似乎有一些麻烦

当我这样做，

 导入熊猫为pd 
 df = pd.read_csv（https://raw.githubusercontent.com/wesm/pydata- book / master / ch08 / tips.csv，sep ='，'）
 df.groupby（['sex'，'吸烟者]]）[['total_bill']]。apply（lambda x：x / x.sum（））。head（）

我得到以下内容：

忽略该组，并为每个订单项计算它

我正在寻找更像

  df.groupby（['sex'，'smoker']）[['total_bill']]。sum（）

哪些将返回

  total_bill 
性吸烟者
女性否977.68 
是593.27 
男性否1919.75 
是1337.07

但是我想要这个表示为个人性别/吸烟者组合总数的总和百分比，或

女性否977.68 /（977.68 + 593.27）女性是593.27 /（977.68 + 593.27）男性不是1919.75 /（1919.75 + 1337.07）男性是1337.07 /（1919.75 + 1337.07） / pre>

理想情况下，我想同时使用提示列。

我做错了什么，我该如何解决这个问题？谢谢！

解决方案

您可以在获得总和表计算百分比：

 （df.groupby（['sex'，'smoker']）[' total_bill'] sum（）
 .groupby（level = 0）.transform（lambda x：x / x.sum（）））＃group by sex and calculate percentage 
 
 #sex吸烟者
＃女性否0.622350 
＃是0.377650 
 #Male否0.589455 
＃是0.410545 
 #dtype：float64

Using a data frame and pandas, I am trying to figure out what each value is as a percentage of the grand total for the "group by" category

So, using the tips database, I want to see, for each sex/smoker, what the proportion of the total bill is for female smoker / all female and for female non smoker / all female (and the same thing for men)

For example,

If the complete data set is:

Sex, Smoker, Day, Time, Size, Total Bill
Female,No,Sun,Dinner,2, 20
Female,No,Mon,Dinner,2, 40
Female,No,Wed,Dinner,1, 10
Female,Yes,Wed,Dinner,1, 15

The values for the first line would be (20+40+10)/(20+40+10+15), as those are the other 3 values for non smoking females

So the output should look like

Female No 0.823529412
Female Yes 0.176470588

However, I seem to be having some trouble

When I do this,

import pandas as pd
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-   book/master/ch08/tips.csv", sep=',')
df.groupby(['sex', 'smoker'])[['total_bill']].apply(lambda x: x / x.sum()).head()

I get the following:

    total_bill
0   0.017378
1   0.005386
2   0.010944
3   0.012335
4   0.025151

It seems to be ignoring the group by and just calculating it for each line item

I am looking for something more like

df.groupby(['sex', 'smoker'])[['total_bill']].sum()

Which will return

        total_bill
sex smoker  
Female  No  977.68
        Yes 593.27
Male    No  1919.75
       Yes  1337.07

But I want this expressed as percentages of totals for the total of the individual sex/smoker combinations or

Female No  977.68/(977.68+593.27)
Female Yes  593.27/(977.68+593.27)
Male No  1919.75/(1919.75+1337.07)
Male Yes  1337.07/(1919.75+1337.07)

Ideally, I would like to do the same with the "tip" column at the same time.

What am I doing wrong and how do I fix this? Thank you!

解决方案

You can add another grouped by process after you get the sum table to calculate the percentage:

(df.groupby(['sex', 'smoker'])['total_bill'].sum()
   .groupby(level = 0).transform(lambda x: x/x.sum()))   # group by sex and calculate percentage

#sex     smoker
#Female  No        0.622350
#        Yes       0.377650
#Male    No        0.589455
#        Yes       0.410545
#dtype: float64

这篇关于 pandas 分组 - 作为分组总计的百分比的值不工作的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 分组 - 作为分组总计的百分比的值不工作 [英] Pandas Grouping - Values as Percent of Grouped Totals Not Working

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 分组 - 作为分组总计的百分比的值不工作 [英] Pandas Grouping - Values as Percent of Grouped Totals Not Working

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭