合并具有重复值的数据帧上的项目 [英] Merge items on dataframes with duplicate values

查看:22
本文介绍了合并具有重复值的数据帧上的项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个数据框(或系列),其中每一列A"总是出现 4 次,如下所示:

So I have a dataframe (or series) where there are always 4 occurrences of each of column 'A', like this:

df = pd.DataFrame([['foo'],
                   ['foo'],
                   ['foo'],
                   ['foo'],
                   ['bar'],
                   ['bar'],
                   ['bar'],
                   ['bar']],
                  columns=['A'])
       A
0    foo
1    foo
2    foo
3    foo
4    bar
5    bar
6    bar
7    bar

我还有另一个数据框,其值类似于 A 列中的值,但它们并不总是有 4 个值.它们还有更多的列,如下所示:

I also have another dataframe, with values like the ones found in column A, but they don't always have 4 values. They also have more columns, like this:

df_key = pd.DataFrame([['foo', 1, 2],
                       ['foo', 3, 4],
                       ['bar', 5, 9],
                       ['bar', 2, 4],
                       ['bar', 1, 9]],
                      columns=['A', 'B', 'C'])

       A    B    C
0    foo    1    2
1    foo    3    4
2    bar    5    9
3    bar    2    4
4    bar    1    9

我想合并它们,使它们最终像这样使用以下内容:

I wanted to merge them such they end up like this using something like:

df.merge(df_key, how='left', on='A', copy=False)

       A    B    C
0    foo    1    2
1    foo    3    4
2    foo  NaN  NaN
3    foo  NaN  NaN
4    bar    5    9
5    bar    2    4
6    bar    1    9
7    bar  NaN  NaN

但我最终得到了这样的结果.有什么建议吗?

But instead I end up with something like this. Any advice?

      A    B        C
 0  foo    1        2
 1  foo    3        4
 2  foo    1        2
 3  foo    3        4
 4  foo    1        2
 5  foo    3        4
 6  foo    1        2
 7  foo    3        4
 8  bar    5        9
 9  bar    2        4
 10 bar    1        9
 11 bar    5        9
 12 bar    2        4
 13 bar    1        9
 14 bar    5        9
 15 bar    2        4
 16 bar    1        9
 17 bar    5        9
 18 bar    2        4
 19 bar    1        9

推荐答案

您需要使用 groupby + cumcount 创建代理列以删除重复的行,然后包括调用 merge 时的那些列:

You'll need to create surrogate columns with groupby + cumcount to deduplicate your rows, then include those columns when calling merge:

a = df.assign(D=df.groupby('A').cumcount())
b = df_key.assign(D=df_key.groupby('A').cumcount())

a.merge(b, on=['A', 'D'], how='left').drop('D', 1)

     A    B    C
0  foo  1.0  2.0
1  foo  3.0  4.0
2  foo  NaN  NaN
3  foo  NaN  NaN
4  bar  5.0  9.0
5  bar  2.0  4.0
6  bar  1.0  9.0
7  bar  NaN  NaN

这篇关于合并具有重复值的数据帧上的项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆