是否可以使用 groupby 拆分 Pandas 数据帧并将每个组与单独的数据帧合并 [英] Is it possible to split a Pandas dataframe using groupby and merge each group with separate dataframes

查看:48
本文介绍了是否可以使用 groupby 拆分 Pandas 数据帧并将每个组与单独的数据帧合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含分组变量的 Pandas 数据框.我想根据其中一列的内容将每个组与其他数据框合并.因此,例如,我有一个数据框 dfA,它可以定义为:

I have a Pandas dataframe that contains a grouping variable. I would like to merge each group with other dataframes based on the contents of one of the columns. So, for example, I have a dataframe, dfA, which can be defined as:

dfA = pd.DataFrame({'a':[1,2,3,4,5,6],
                    'b':[0,1,0,0,1,1],
                    'c':['a','b','c','d','e','f']})

   a  b  c
0  1  0  a
1  2  1  b
2  3  0  c
3  4  0  d
4  5  1  e
5  6  1  f

另外两个数据框 dfB 和 dfC,包含一个公共列 ('a') 和一个额外的列 ('d'),可以定义为:

Two other dataframes, dfB and dfC, contain a common column ('a') and an extra column ('d') and can be defined as:

dfB = pd.DataFrame({'a':[1,2,3],
                    'd':[11,12,13]})

   a   d
0  1  11
1  2  12
2  3  13


dfC = pd.DataFrame({'a':[4,5,6],
                    'd':[21,22,23]})

   a   d
0  4  21
1  5  22
2  6  23

我希望能够根据列 'b' 拆分 dfA,并将其中一个组与 dfB 合并,将另一个组与 dfC 合并,以生成如下所示的输出:

I would like to be able to split dfA based on column 'b' and merge one of the groups with dfB and the other group with dfC to produce an output that looks like:

   a  b  c   d
0  1  0  a  11
1  2  1  b  12
2  3  0  c  13
3  4  0  d  21
4  5  1  e  22
5  6  1  f  23

在这个简化版本中,我可以连接 dfB 和 dfC 并与 dfA 合并,而无需拆分成如下所示的组:

In this simplified version, I could concatenate dfB and dfC and merge with dfA without splitting into groups as shown below:

dfX = pd.concat([dfB,dfC])
dfA = dfA.merge(dfX,on='a',how='left')      
print(dfA)

   a  b  c   d
0  1  0  a  11
1  2  1  b  12
2  3  0  c  13
3  4  0  d  21
4  5  1  e  22
5  6  1  f  23

然而,在现实世界的情况下,较小的数据帧将从多个不同的复杂来源生成;预先生成数据帧并组合成单个数据帧可能是不可行的,因为列上可能存在用于合并数据帧的重叠数据(但如果可以根据分组变量拆分数据帧,则可以避免这种情况).是否可以使用 Pandas groupby() 方法来代替?我正在考虑类似以下的事情(这不起作用,也许是因为我没有正确地将这些组组合到一个新的数据框中):

However, in the real-world situation, the smaller dataframes will be generated from multiple different complex sources; generating the dataframes and combining into a single dataframe beforehand may not be feasible because there may be overlapping data on the column that will be used for merging the dataframes (but this will be avoided if the dataframe can be split based on the grouping variable). Is it possible to use Pandas groupby() method to do this instead? I was thinking of something like the following (which doesn't work, perhaps because I'm not combining the groups into a new dataframe correctly):

grouped = dfA.groupby('b')
for name, group in grouped:
    if name == 0:
        group = group.merge(dfB,on='a',how='left')
    elif name == 1:
        group = group.merge(dfC,on='a',how='left')

如有任何想法,我们将不胜感激.

Any thoughts would be appreciated.

推荐答案

这将修复您的代码

l=[]
grouped = dfA.groupby('b')
for name, group in grouped:
    if name == 0:
        group = group.merge(dfB,on='a',how='left')
    elif name == 1:
        group = group.merge(dfC,on='a',how='left')
    l.append(group)
pd.concat(l)
Out[215]: 
   a  b  c     d
0  1  0  a  11.0
1  3  0  c  13.0
2  4  0  d   NaN
0  2  1  b   NaN
1  5  1  e  22.0
2  6  1  f  23.0

这篇关于是否可以使用 groupby 拆分 Pandas 数据帧并将每个组与单独的数据帧合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆