在多索引 pandas 数据帧上对重复的行求和 [英] Sum duplicated rows on a multi-index pandas dataframe
问题描述
你好,我在和熊猫打交道时遇到了麻烦.我正在尝试对多索引数据框上的重复行求和.
我尝试了df.groupby(level=[0,1]).sum()
,也尝试了df.stack().reset_index().groupby(['year', 'product']).sum()
和其他一些,但是我无法使其正常工作.
我还想添加给定年份的每种独特产品,如果未列出它们,则将它们的值设为0.
Hello I'm having troubles dealing with Pandas. I'm trying to sum duplicated rows on a multiindex Dataframe.
I tryed with df.groupby(level=[0,1]).sum()
, also with df.stack().reset_index().groupby(['year', 'product']).sum()
and some others, but I cannot get it to work.
I'd also like to add every unique product for each given year and give them a 0 value if they weren't listed.
示例:具有多索引和3种不同乘积(A,B,C)的数据框:
Example: dataframe with multi-index and 3 different products (A,B,C):
volume1 volume2
year product
2010 A 10 12
A 7 3
B 7 7
2011 A 10 10
B 7 6
C 5 5
预期产量:如果给定年份有重复的产品,则将它们相加. 如果其中一种产品未列出一年,则我们将创建一个新行,该行全为0.
Expected output : if there are duplicated products for a given year then we sum them. If one of the products isnt listed for a year, we create a new row full of 0.
volume1 volume2
year product
2010 A 17 15
B 7 7
C 0 0
2011 A 10 10
B 7 6
C 5 5
有什么主意吗?谢谢
推荐答案
将sum
与 stack
:
df = df.sum(level=[0,1]).unstack(fill_value=0).stack()
#same as
#df = df.groupby(level=[0,1]).sum().unstack(fill_value=0).stack()
与reindex
:
df = df.sum(level=[0,1])
#same as
#df = df.groupby(level=[0,1]).sum()
mux = pd.MultiIndex.from_product(df.index.levels, names = df.index.names)
df = df.reindex(mux, fill_value=0)
Alternative1,谢谢@Wen:
Alternative1, thanks @Wen:
df = df.sum(level=[0,1]).unstack().stack(dropna=False)
print (df)
volume1 volume2
year product
2010 A 17 15
B 7 7
C 0 0
2011 A 10 10
B 7 6
C 5 5
这篇关于在多索引 pandas 数据帧上对重复的行求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!