按重量分组 [英] Groupby with weight

查看:43
本文介绍了按重量分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出以下数据框:

import pandas as pd
d=pd.DataFrame({'Age':[18,20,20,56,56],'Race':['A','A','A','B','B'],'Response':[3,2,5,6,2],'Weight':[0.5,0.5,0.5,1.2,1.2]})
d
    Age     Race    Response    Weight
0   18      A       3           0.5
1   20      A       2           0.5
2   20      A       5           0.5
3   56      B       6           1.2
4   56      B       2           1.2

我知道我可以应用分组依据来按年龄和种族进行计数:

I know that I can apply a group-by to get the count by age and race like this:

d.groupby(['Age','Race'])['Response'].count()
Age  Race
18   A       1
20   A       2
56   B       2
Name: Response, dtype: int64

但是我想使用"Weight"列对案例进行加权,使得前3行将计为0.5,而不是每行1,后两行将计为1.2.因此,如果按年龄和种族分组,我应该具备以下条件:

But I'd like to use the "Weight" column to weight the cases such that the first 3 rows will count as 0.5 instead of 1 each and the last two will count as 1.2. So, if grouping by age and race, I should have the following:

Age  Race
18   A       0.5
20   A       1
56   B       2.4
Name: Response, dtype: int64

这类似于在SPSS中使用重量箱"选项.我知道在R中是可能的,而且我已经在Python中看到了一个很有前途的库(尽管当前的构建失败了)./a>.

This is similar to using the "Weight Cases" option in SPSS. I know it's possible in R and I've seen a promising library in Python (though the current build is failing) here.

还有PySal(不确定在这里是否适用)

And PySal (not sure if it's applicable here)

...但是我想知道是否可以在分组中以某种方式完成.

...but I'm wondering if it can just be done somehow in the group-by.

推荐答案

如果我理解正确,那么您只是在寻找

If I understand correctly, you're just looking for .sum() with the weights.

d.groupby(['Age', 'Race']).Weight.sum()

## Age  Race
## 18   A       0.5
## 20   A       1.0
## 56   B       2.4
## Name: Weight, dtype: float64

这篇关于按重量分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆