在多索引 pandas 数据帧上对重复的行求和 [英] Sum duplicated rows on a multi-index pandas dataframe

查看:59
本文介绍了在多索引 pandas 数据帧上对重复的行求和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,我在和熊猫打交道时遇到了麻烦.我正在尝试对多索引数据框上的重复行求和. 我尝试了df.groupby(level=[0,1]).sum(),也尝试了df.stack().reset_index().groupby(['year', 'product']).sum()和其他一些,但是我无法使其正常工作. 我还想添加给定年份的每种独特产品,如果未列出它们,则将它们的值设为0.

Hello I'm having troubles dealing with Pandas. I'm trying to sum duplicated rows on a multiindex Dataframe. I tryed with df.groupby(level=[0,1]).sum() , also with df.stack().reset_index().groupby(['year', 'product']).sum() and some others, but I cannot get it to work. I'd also like to add every unique product for each given year and give them a 0 value if they weren't listed.

示例:具有多索引和3种不同乘积(A,B,C)的数据框:

Example: dataframe with multi-index and 3 different products (A,B,C):

                  volume1    volume2
year   product
2010   A          10         12
       A          7          3
       B          7          7
2011   A          10         10
       B          7          6
       C          5          5

预期产量:如果给定年份有重复的产品,则将它们相加. 如果其中一种产品未列出一年,则我们将创建一个新行,该行全为0.

Expected output : if there are duplicated products for a given year then we sum them. If one of the products isnt listed for a year, we create a new row full of 0.

                  volume1     volume2
year   product
2010   A          17          15
       B          7           7
       C          0           0
2011   A          10          10
       B          7           6
       C          5           5

有什么主意吗?谢谢

推荐答案

sum stack :

df = df.sum(level=[0,1]).unstack(fill_value=0).stack()
#same as
#df = df.groupby(level=[0,1]).sum().unstack(fill_value=0).stack()

reindex :

df = df.sum(level=[0,1])
#same as
#df = df.groupby(level=[0,1]).sum()
mux = pd.MultiIndex.from_product(df.index.levels, names = df.index.names)
df = df.reindex(mux, fill_value=0)

Alternative1,谢谢@Wen:

Alternative1, thanks @Wen:

df = df.sum(level=[0,1]).unstack().stack(dropna=False) 


print (df)
              volume1  volume2
year product                  
2010 A             17       15
     B              7        7
     C              0        0
2011 A             10       10
     B              7        6
     C              5        5

这篇关于在多索引 pandas 数据帧上对重复的行求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆