在Pandas中合并2个数据框:在某些列上合并,在其他列上汇总 [英] merge 2 dataframes in Pandas: join on some columns, sum up others

查看:88
本文介绍了在Pandas中合并2个数据框:在某些列上合并,在其他列上汇总的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想合并特定列(键1,键2)上的两个数据框,并汇总另一列(值)的值.

I want to merge two dataframes on specific columns (key1, key2) and sum up the values for another column (value).

>>> df1 = pd.DataFrame({'key1': range(4), 'key2': range(4), 'value': range(4)})
   key1  key2  value
0     0     0      0
1     1     1      1
2     2     2      2
3     3     3      3

>>> df2 = pd.DataFrame({'key1': range(2, 6), 'key2': range(2, 6), 'noise': range(2, 6), 'value': range(10, 14)})
   key1  key2  noise  value
0     2     2      2     10
1     3     3      3     11
2     4     4      4     12
3     5     5      5     13

我想要这个结果:

   key1  key2  value
0     0     0      0
1     1     1      1
2     2     2     12
3     3     3     14
4     4     4     12
5     5     5     13

在SQL方面,我想要:

In SQL terms, I want:

SELECT df1.key1, df1.key2, df1.value + df2.value AS value
FROM df1 OUTER JOIN df2 ON key1, key2

我尝试了两种方法:

方法1

concatenated = pd.concat([df1, df2])
grouped = concatenated.groupby(['key1', 'key2'], as_index=False)
summed = grouped.agg(np.sum)
result = summed[['key1', 'key2', 'value']]

方法2

joined = pd.merge(df1, df2, how='outer', on=['key1', 'key2'], suffixes=['_1', '_2'])
joined = joined.fillna(0.0)
joined['value'] = joined['value_1'] + joined['value_2']
result = joined[['key1', 'key2', 'value']]

两种方法都能得到我想要的结果,但是我想知道是否有更简单的方法.

Both approaches give the result I want, but I wonder if there is a simpler way.

推荐答案

我不了解更简单的方法,但是您可以更加简洁一些:

I don't know about simpler, but you can get a little more concise:

>>> pd.concat([df1, df2]).groupby(["key1", "key2"], as_index=False)["value"].sum()
   key1  key2  value
0     0     0      0
1     1     1      1
2     2     2     12
3     3     3     14
4     4     4     12
5     5     5     13

但是,根据您对链接操作的容忍度,您可能还是希望将其分成多行(四个趋近于我的上限,在本例中为concat-groupby-select-sum).

Depending on your tolerance for chaining ops, you might want to break this onto multiple lines anyway, though (four tends to be close to my upper limit, in this case concat-groupby-select-sum).

这篇关于在Pandas中合并2个数据框:在某些列上合并,在其他列上汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆