pandas 左合并,将数据保留在重复数据列的右数据框中 [英] Pandas left merge keeping data in right dataframe on duplicte columns

查看:53
本文介绍了 pandas 左合并,将数据保留在重复数据列的右数据框中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想合并两个数据帧,df2可能有更多列,并且始终为1行.我希望df2行中的数据覆盖df中的匹配行.注意: ser no 列一起使一行唯一.

I would like to merge two dataframes, df2 might have more columns and will always be 1 row. I would like the data from the df2 row to overwrite the matching row in df. NOTE: ser and no columns together make a row unique.

import pandas as pd

df = pd.DataFrame({'ser': {0: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2}, 'no': {0: 0, 1: 1, 2: 2, 3: 0, 4: 1, 5: 2, 6: 0, 7: 1, 8: 2}, 'c': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1}})

df2 = pd.DataFrame({'ser': {0: 1}, 'no': {0: 2}, 'c': {0: 88}, 'd': {0: 90}})


>>> df
   ser  no  c
0    0   0  1
1    0   1  1
2    0   2  1
3    1   0  1
4    1   1  1
5    1   2  1
6    2   0  1
7    2   1  1
8    2   2  1

>>> df2
   ser  no   c   d
0    1   2  88  90

我尝试过合并,但这会创建两个c列(c_x和c_y):

I have tried merge left but this creates two c columns (c_x and c_y):

>>> pd.merge(df,df2,how='left',on=['ser','no'])
   ser  no  c_x   c_y     d
0    0   0    1   NaN   NaN
1    0   1    1   NaN   NaN
2    0   2    1   NaN   NaN
3    1   0    1   NaN   NaN
4    1   1    1   NaN   NaN
5    1   2    1  88.0  90.0
6    2   0    1   NaN   NaN
7    2   1    1   NaN   NaN
8    2   2    1   NaN   NaN

期望的输出:

   ser  no    c   d  
0    0   0    1   NaN 
1    0   1    1   NaN  
2    0   2    1   NaN 
3    1   0    1   NaN 
4    1   1    1   NaN 
5    1   2    88  90 
6    2   0    1   NaN
7    2   1    1   NaN
8    2   2    1   NaN

推荐答案

Frankenstein答案

df[['ser', 'no']].merge(df2, 'left').set_axis(df.index).fillna(df)

   ser  no     c     d
0    0   0   1.0   NaN
1    0   1   1.0   NaN
2    0   2   1.0   NaN
3    1   0   1.0   NaN
4    1   1   1.0   NaN
5    1   2  88.0  90.0
6    2   0   1.0   NaN
7    2   1   1.0   NaN
8    2   2   1.0   NaN


说明

  1. 我要合并在 ['ser','no'] 列上,并且不想在 merge 调用中指定.另外,我也不想像'c_x''c_y'这样愚蠢的重复列名,所以我只切了我想要共同的列然后合并

  1. I'm going to merge on the columns ['ser', 'no'] and don't want to specify in the merge call. Also, I don't want goofy duplicate column names like 'c_x' and 'c_y' so I slice only columns that I want in common then merge

 df[['ser', 'no']].merge(df2, 'left')

  • 合并时,我只希望左侧数据框中的行.但是, merge 通常会产生与原始数据帧完全不同的许多行,因此会产生新的 index .但是,注意是假设正确的数据帧( df2 )关于 ['ser','no'] ,然后'left' merge 应该产生与左侧数据帧相同的行数( df ).但是它不一定具有相同的 index .事实证明,在此示例中确实如此.但是我不想冒险.所以我用 set_axis

  • When I merge, I want only rows from the left dataframe. However, merge usually produces a number of rows vastly different from the original dataframes and therefore produces a new index. However, NOTE this is assuming the right dataframe (df2) has NO DUPLICATES with respect ['ser', 'no'] then a 'left' merge should produce the same exact number of rows as the left dataframe (df). But it won't have the same index necessarily. It turns out that in this example it does. But I don't want to take chances. So I use set_axis

      set_axis(df.index)
    

  • 最后,由于生成的数据框具有与 df 相同的 index columns .我可以使用以下方式填充缺失的部分:

  • Finally, since the resulting dataframe has the same index and columns as df. I can fill in the missing bits with:

    fillna(df)
    

  • 这篇关于 pandas 左合并,将数据保留在重复数据列的右数据框中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆