如何使用大 pandas 中的groupby根据另一列中的条件计算百分比总计 [英] How to use groupby in pandas to calculate a percentage / proportion total based on a criteria in another column
问题描述
我正在尝试研究如何在给定的Yes/No条件下使用熊猫中的groupby
函数来计算每年的价值比例.
例如,我有一个名为names
的数据框:
Name Number Year Sex Criteria
0 name1 789 1998 Male N
1 name1 688 1999 Male N
2 name1 639 2000 Male N
3 name2 551 1998 Male Y
4 name2 499 1999 Male Y
我可以使用
namesgrouped = names.groupby(["Sex", "Year", "Criteria"]).sum()
获取:
Number
Sex Year Criteria
Male 1998 N 14507
Y 2308
1999 N 14119
Y 2331
,依此类推.我希望数字标准"列显示每种性别和年份占总数的百分比-因此,不是1998年的N = 14507和Y = 2308,而是N = 86.27%和Y = 13.73%. /p>
任何人都可以建议如何做吗?
此问题是 to get: and so on. I would like the 'Number Criteria' column to show the % of the total for each gender and year - so instead of N = 14507 and Y = 2308 for 1998 above I'd have N = 86.27% and Y = 13.73%. Can anyone advise how to do this? This question is a direct extension of the suggested duplicate. Borrowing from the accepted answer, this will work:
Edit: a transform operation might be faster than apply:
这篇关于如何使用大 pandas 中的groupby根据另一列中的条件计算百分比总计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! Number
Sex Year Criteria
Male 1998 N 14507
Y 2308
1999 N 14119
Y 2331
In [46]: namesgrouped.groupby(level=[0, 1]).apply(lambda g: g / g.sum())
Out[46]:
Number
Sex Year Criteria
Male 1998 N 0.588806
Y 0.411194
1999 N 0.579612
Y 0.420388
2000 N 1.000000
namesgrouped / namesgrouped.groupby(level=[0, 1]).transform('sum')