pandas concat/合并并汇总一栏 [英] pandas concat/merge and sum one column
问题描述
我有两个具有MultiIndex
索引的pandas.DataFrame
对象.一些索引值与两个数据帧共享,但不是全部.如果行(索引值)存在,我想合并这两个数据帧并取列之一的总和.否则,将行和列的值保持不变.
I have two pandas.DataFrame
objects with MultiIndex
indices. Some of the index values are shared with the two dataframes, but not all. I would like to merge these two data frames and take the sum of one of the columns if the row (index value) exists. Otherwise, keep the row and column value as it exists.
我尝试创建一个示例:
def mklbl(prefix,n):
try:
return ["%s%s" % (prefix,i) for i in range(n)]
except:
return ["%s%s" % (prefix,i) for i in n]
mi1 = pd.MultiIndex.from_product([mklbl('A',4), mklbl('C',2)])
mi2 = pd.MultiIndex.from_product([mklbl('A',[2,3,4]), mklbl('C',2)])
df2 = pd.DataFrame({'b':np.arange(len(mi2)), 'c':np.arange(len(mi2))[::-1]},
index=mi2).sort_index().sort_index(axis=1)
df1 = pd.DataFrame({'a':np.arange(len(mi1)), 'b':np.arange(len(mi1))[::-1]},
index=mi1).sort_index().sort_index(axis=1)
各个DataFrame
对象看起来像:
In [117]: df1
Out[117]:
a b
A0 C0 0 7
C1 1 6
A1 C0 2 5
C1 3 4
A2 C0 4 3
C1 5 2
A3 C0 6 1
C1 7 0
和
In [118]: df2
Out[118]:
b c
A2 C0 0 5
C1 1 4
A3 C0 2 3
C1 3 2
A4 C0 4 1
C1 5 0
我想做的是将这两个合并,并求和'b'列,但是保留所有行,无论它们是否存在于一个或另一个数据框中:
What I want to do is merge these two, and sum the 'b' column, but keep all rows whether they exist in one or the other dataframe:
In [117]: df_merged_bsummed
Out[117]:
a b c
A0 C0 0 7 NaN
C1 1 6 NaN
A1 C0 2 5 NaN
C1 3 4 NaN
A2 C0 4 3 5
C1 5 3 4
A3 C0 6 3 3
C1 7 3 2
A4 C0 NaN 4 1
C1 NaN 5 0
推荐答案
在这种特殊情况下,我认为您可以添加它们并使用fill_value=0
,具体取决于默认的对齐方式:
In this particular case, I think you could just add them and use fill_value=0
, relying on the default alignment behaviour:
>>> df1.add(df2,fill_value=0)
a b c
A0 C0 0 7 NaN
C1 1 6 NaN
A1 C0 2 5 NaN
C1 3 4 NaN
A2 C0 4 3 5
C1 5 3 4
A3 C0 6 3 3
C1 7 3 2
A4 C0 NaN 4 1
C1 NaN 5 0
只有一个共同的列,只有一个总和,但是如果您想使它明确,您可以做类似的事情
There being only one column in common, only one is summed, but if you wanted to make that explicit you could instead do something like
>>> m = pd.concat([df1, df2],axis=1)
>>> m["b"] = m.pop("b").sum(axis=1)
>>> m
a c b
A0 C0 0 NaN 7
C1 1 NaN 6
A1 C0 2 NaN 5
C1 3 NaN 4
A2 C0 4 5 3
C1 5 4 3
A3 C0 6 3 3
C1 7 2 3
A4 C0 NaN 1 4
C1 NaN 0 5
这篇关于 pandas concat/合并并汇总一栏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!