pandas concat/合并并汇总一栏 [英] pandas concat/merge and sum one column

查看:108
本文介绍了 pandas concat/合并并汇总一栏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个具有MultiIndex索引的pandas.DataFrame对象.一些索引值与两个数据帧共享,但不是全部.如果行(索引值)存在,我想合并这两个数据帧并取列之一的总和.否则,将行和列的值保持不变.

I have two pandas.DataFrame objects with MultiIndex indices. Some of the index values are shared with the two dataframes, but not all. I would like to merge these two data frames and take the sum of one of the columns if the row (index value) exists. Otherwise, keep the row and column value as it exists.

:这很接近,但不使用MultiIndex

我尝试创建一个示例:

def mklbl(prefix,n):
try:
    return ["%s%s" % (prefix,i)  for i in range(n)]
except:
    return ["%s%s" % (prefix,i) for i in n]



mi1 = pd.MultiIndex.from_product([mklbl('A',4), mklbl('C',2)])

mi2 = pd.MultiIndex.from_product([mklbl('A',[2,3,4]), mklbl('C',2)])

df2 = pd.DataFrame({'b':np.arange(len(mi2)), 'c':np.arange(len(mi2))[::-1]},
      index=mi2).sort_index().sort_index(axis=1)

df1 = pd.DataFrame({'a':np.arange(len(mi1)), 'b':np.arange(len(mi1))[::-1]},
      index=mi1).sort_index().sort_index(axis=1)

各个DataFrame对象看起来像:

In [117]: df1
Out[117]: 
       a  b
A0 C0  0  7
   C1  1  6
A1 C0  2  5
   C1  3  4
A2 C0  4  3
   C1  5  2
A3 C0  6  1
   C1  7  0

In [118]: df2
Out[118]: 
       b  c
A2 C0  0  5
   C1  1  4
A3 C0  2  3
   C1  3  2
A4 C0  4  1
   C1  5  0

我想做的是将这两个合并,并求和'b'列,但是保留所有行,无论它们是否存在于一个或另一个数据框中:

What I want to do is merge these two, and sum the 'b' column, but keep all rows whether they exist in one or the other dataframe:

In [117]: df_merged_bsummed
Out[117]: 
       a  b  c
A0 C0  0  7  NaN
   C1  1  6  NaN
A1 C0  2  5  NaN
   C1  3  4  NaN
A2 C0  4  3  5
   C1  5  3  4
A3 C0  6  3  3
   C1  7  3  2
A4 C0  NaN 4  1
   C1  NaN 5  0

推荐答案

在这种特殊情况下,我认为您可以添加它们并使用fill_value=0,具体取决于默认的对齐方式:

In this particular case, I think you could just add them and use fill_value=0, relying on the default alignment behaviour:

>>> df1.add(df2,fill_value=0)
        a  b   c
A0 C0   0  7 NaN
   C1   1  6 NaN
A1 C0   2  5 NaN
   C1   3  4 NaN
A2 C0   4  3   5
   C1   5  3   4
A3 C0   6  3   3
   C1   7  3   2
A4 C0 NaN  4   1
   C1 NaN  5   0

只有一个共同的列,只有一个总和,但是如果您想使它明确,您可以做类似的事情

There being only one column in common, only one is summed, but if you wanted to make that explicit you could instead do something like

>>> m = pd.concat([df1, df2],axis=1)
>>> m["b"] = m.pop("b").sum(axis=1)
>>> m
        a   c  b
A0 C0   0 NaN  7
   C1   1 NaN  6
A1 C0   2 NaN  5
   C1   3 NaN  4
A2 C0   4   5  3
   C1   5   4  3
A3 C0   6   3  3
   C1   7   2  3
A4 C0 NaN   1  4
   C1 NaN   0  5

这篇关于 pandas concat/合并并汇总一栏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆