如何将计算的百分比添加到 pandas 数据透视表 [英] How to add calculated % to a pandas pivottable

查看:172
本文介绍了如何将计算的百分比添加到 pandas 数据透视表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似于这个问题的数据透视表,这似乎没有答案.我有一个名为grouped的数据透视表,如下所示:

I have a pivottable similiar to this question, which doesn't seem to have an answer. I have a pivottable called grouped like this:

grouped = age_gender_bkts.pivot_table('population_in_thousands',index='gender',
columns='country_destination', aggfunc='sum').unstack()

这取自熊猫数据框age_gender_bkts:

This is taken from the pandas dataframe age_gender_bkts:

age_gender_bkts = pd.read_csv('airbnb/age_gender_bkts.csv')
age_gender_bkts[:10]

  age_bucket country_destination gender  population_in_thousands  year
0       100+                  AU   male                        1  2015
1      95-99                  AU   male                        9  2015
2      90-94                  AU   male                       47  2015
3      85-89                  AU   male                      118  2015
4      80-84                  AU   male                      199  2015
5      75-79                  AU   male                      298  2015
6      70-74                  AU   male                      415  2015
7      65-69                  AU   male                      574  2015
8      60-64                  AU   male                      636  2015
9      55-59                  AU   male                      714  2015

我希望针对每个国家/地区获取男性和女性population_in_thousands的比例,以每种性别的百分比表示,例如12024/11899+12024表示AU.

I am looking to get, for each country, the ratio between male and female population_in_thousands as a % for each gender e.g. 12024/11899+12024 for AU.

我对numpy熊猫很陌生,他正在寻找一种通用的解决方案来基于pivot_table计算列.另外,如果答复允许我按性别和国家/地区创建这些组,而无需使用pivot_table,例如groupby(我无法弄清楚),这确实对我的学习有所帮助.

I am very new to pandas, numpy, looking for a generic solution to calculate columns based on pivot_table. Also, if the reply has a way for me to have created these groups by gender and country without using pivot_table, e.g. groupby (I couldn't figure it out), that would really help me in my learning.

推荐答案

您可以使用 transform sum .最后,您可以 merge 数据转换为原始:

You can use groupby, transform and sum. Last you can merge data to original DataFrame:

print age_gender_bkts
  age_bucket country_destination gender  population_in_thousands  year
0       100+                  AU   male                        1  2015
1      95-99                  AU   male                        9  2015
2      90-94                  CA   male                       47  2015
3      85-89                  CA   male                      118  2015
4      80-84                  AU   male                      199  2015
5      75-79                  NL   male                      298  2015
6      70-74                  NL   male                      415  2015
7      65-69                  AU   male                      574  2015
8      60-64                  AU   male                      636  2015
9      55-59                  AU   male                      714  2015

grouped = age_gender_bkts.pivot_table('population_in_thousands',index='gender', columns='country_destination', aggfunc='sum').unstack()
df  = (grouped / grouped.groupby(level=0).transform(sum)).reset_index().rename(columns={0:'prop'})
print df
  country_destination gender  prop
0                  AU   male     1
1                  CA   male     1
2                  NL   male     1

print pd.merge(age_gender_bkts, df, on=['country_destination', 'gender'])
  age_bucket country_destination gender  population_in_thousands  year  prop
0       100+                  AU   male                        1  2015     1
1      95-99                  AU   male                        9  2015     1
2      80-84                  AU   male                      199  2015     1
3      65-69                  AU   male                      574  2015     1
4      60-64                  AU   male                      636  2015     1
5      55-59                  AU   male                      714  2015     1
6      90-94                  CA   male                       47  2015     1
7      85-89                  CA   male                      118  2015     1
8      75-79                  NL   male                      298  2015     1
9      70-74                  NL   male                      415  2015     1

这篇关于如何将计算的百分比添加到 pandas 数据透视表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆