按重量分组 [英] Groupby with weight
问题描述
给出以下数据框:
import pandas as pd
d=pd.DataFrame({'Age':[18,20,20,56,56],'Race':['A','A','A','B','B'],'Response':[3,2,5,6,2],'Weight':[0.5,0.5,0.5,1.2,1.2]})
d
Age Race Response Weight
0 18 A 3 0.5
1 20 A 2 0.5
2 20 A 5 0.5
3 56 B 6 1.2
4 56 B 2 1.2
我知道我可以应用分组依据来按年龄和种族进行计数:
I know that I can apply a group-by to get the count by age and race like this:
d.groupby(['Age','Race'])['Response'].count()
Age Race
18 A 1
20 A 2
56 B 2
Name: Response, dtype: int64
但是我想使用"Weight"列对案例进行加权,使得前3行将计为0.5,而不是每行1,后两行将计为1.2.因此,如果按年龄和种族分组,我应该具备以下条件:
But I'd like to use the "Weight" column to weight the cases such that the first 3 rows will count as 0.5 instead of 1 each and the last two will count as 1.2. So, if grouping by age and race, I should have the following:
Age Race
18 A 0.5
20 A 1
56 B 2.4
Name: Response, dtype: int64
这类似于在SPSS中使用重量箱"选项.我知道在R中是可能的,而且我已经在Python中看到了一个很有前途的库(尽管当前的构建失败了)./a>.
This is similar to using the "Weight Cases" option in SPSS. I know it's possible in R and I've seen a promising library in Python (though the current build is failing) here.
还有PySal(不确定在这里是否适用)
And PySal (not sure if it's applicable here)
...但是我想知道是否可以在分组中以某种方式完成.
...but I'm wondering if it can just be done somehow in the group-by.
推荐答案
If I understand correctly, you're just looking for .sum()
with the weights.
d.groupby(['Age', 'Race']).Weight.sum()
## Age Race
## 18 A 0.5
## 20 A 1.0
## 56 B 2.4
## Name: Weight, dtype: float64
这篇关于按重量分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!