如何将计算的百分比添加到 pandas 数据透视表 [英] How to add calculated % to a pandas pivottable
问题描述
我有一个类似于这个问题的数据透视表,这似乎没有答案.我有一个名为grouped
的数据透视表,如下所示:
I have a pivottable similiar to this question, which doesn't seem to have an answer. I have a pivottable called grouped
like this:
grouped = age_gender_bkts.pivot_table('population_in_thousands',index='gender',
columns='country_destination', aggfunc='sum').unstack()
这取自熊猫数据框age_gender_bkts:
This is taken from the pandas dataframe age_gender_bkts:
age_gender_bkts = pd.read_csv('airbnb/age_gender_bkts.csv')
age_gender_bkts[:10]
age_bucket country_destination gender population_in_thousands year
0 100+ AU male 1 2015
1 95-99 AU male 9 2015
2 90-94 AU male 47 2015
3 85-89 AU male 118 2015
4 80-84 AU male 199 2015
5 75-79 AU male 298 2015
6 70-74 AU male 415 2015
7 65-69 AU male 574 2015
8 60-64 AU male 636 2015
9 55-59 AU male 714 2015
我希望针对每个国家/地区获取男性和女性population_in_thousands
的比例,以每种性别的百分比表示,例如12024/11899+12024
表示AU
.
I am looking to get, for each country, the ratio between male and female population_in_thousands
as a % for each gender e.g. 12024/11899+12024
for AU
.
我对numpy熊猫很陌生,他正在寻找一种通用的解决方案来基于pivot_table
计算列.另外,如果答复允许我按性别和国家/地区创建这些组,而无需使用pivot_table
,例如groupby
(我无法弄清楚),这确实对我的学习有所帮助.
I am very new to pandas, numpy, looking for a generic solution to calculate columns based on pivot_table
. Also, if the reply has a way for me to have created these groups by gender and country without using pivot_table
, e.g. groupby
(I couldn't figure it out), that would really help me in my learning.
推荐答案
您可以使用 transform
和 sum
.最后,您可以 merge
数据转换为原始
You can use groupby
, transform
and sum
. Last you can merge
data to original DataFrame
:
print age_gender_bkts
age_bucket country_destination gender population_in_thousands year
0 100+ AU male 1 2015
1 95-99 AU male 9 2015
2 90-94 CA male 47 2015
3 85-89 CA male 118 2015
4 80-84 AU male 199 2015
5 75-79 NL male 298 2015
6 70-74 NL male 415 2015
7 65-69 AU male 574 2015
8 60-64 AU male 636 2015
9 55-59 AU male 714 2015
grouped = age_gender_bkts.pivot_table('population_in_thousands',index='gender', columns='country_destination', aggfunc='sum').unstack()
df = (grouped / grouped.groupby(level=0).transform(sum)).reset_index().rename(columns={0:'prop'})
print df
country_destination gender prop
0 AU male 1
1 CA male 1
2 NL male 1
print pd.merge(age_gender_bkts, df, on=['country_destination', 'gender'])
age_bucket country_destination gender population_in_thousands year prop
0 100+ AU male 1 2015 1
1 95-99 AU male 9 2015 1
2 80-84 AU male 199 2015 1
3 65-69 AU male 574 2015 1
4 60-64 AU male 636 2015 1
5 55-59 AU male 714 2015 1
6 90-94 CA male 47 2015 1
7 85-89 CA male 118 2015 1
8 75-79 NL male 298 2015 1
9 70-74 NL male 415 2015 1
这篇关于如何将计算的百分比添加到 pandas 数据透视表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!