pandas 合并而不重复列 [英] Pandas merge without duplicating columns

查看:78
本文介绍了 pandas 合并而不重复列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在不创建重复列的情况下合并两个数据框.第一个数据框 (dfa) 有缺失值.第二个数据框 (dfb) 具有唯一值.这与 Excel 中的 vlookup 相同.

I need to merge two dataframes without creating duplicate columns. The first datframe (dfa) has missing values. The second dataframe (dfb) has unique values. This would be the same as a vlookup in Excel.

dfa 看起来像这样:

dfa looks like this:

postcode  lat  lon ...plus 32 more columns
M20       2.3  0.2
LS1       NaN  NaN
LS1       NaN  NaN
LS2       NaN  NaN
M21       2.4  0.3

dfb 仅包含唯一的邮政编码和值,其中 lat 和 lon 在 dfa 中为 NaN.它看起来像这样:

dfb only contains unique Postcodes and values where lat and lon were NaN in dfa. It looks like this:

postcode  lat  lon 
LS1       1.4  0.1
LS2       1.5  0.2

我想要的输出是:

postcode  lat  lon ...plus 32 more columns
M20       2.3  0.2
LS1       1.4  0.1
LS1       1.4  0.1
LS2       1.5  0.2
M21       2.4  0.3

我试过像这样使用 pd.merge:

I've tried using pd.merge like so:

outputdf = pd.merge(dfa, dfb, on='Postcode', how='left')

这会导致创建重复的列:

This results in duplicate columns being created:

postcode  lat_x  lon_x  lat_y  lat_x ...plus 32 more columns
M20       2.3    0.2    NaN    NaN
LS1       NaN    NaN    1.4    0.1
LS1       NaN    NaN    1.4    0.1
LS2       NaN    NaN    1.5    0.2
M21       2.4    0.3    NaN    NaN

来自这个答案我尝试使用:>

From this answer I tried using:

output = dfa
for df in [dfa, dfb]:
    ouput.update(df.set_index('Postcode')) 

但收到ValueError:无法从重复轴重新索引".

But received the "ValueError: cannot reindex from a duplicate axis".

同样从上面的答案来看,这不起作用:

Also from the above answer this does not work:

output.merge(pd.concat([dfa, dfb]), how='left')

没有重复的列,但纬度"和经度"中的值仍然为空.

There are no duplicate columns but the values in 'Lat' and 'Lon' are still blank.

有没有办法在不创建重复列的情况下合并邮政编码";使用熊猫有效地执行 VLOOKUP?

Is there a way to merge on 'Postcode' without duplicate columns being created; effectively performing a VLOOKUP using pandas?

推荐答案

使用 DataFrame.combine_first 在两个 DataFrame 中使用 postcode 索引,然后在必要时添加 DataFrame.reindex 用于相同的列顺序像原来的df1:

Use DataFrame.combine_first with indices by postcode in both DataFrames and then if necessary add DataFrame.reindex for same order of columns like original df1:

print (df1)
  postcode  lat  lon  plus  32  more  columns
0      M20  2.3  0.2   NaN NaN   NaN      NaN
1      LS1  NaN  NaN   NaN NaN   NaN      NaN
2      LS1  NaN  NaN   NaN NaN   NaN      NaN
3      LS2  NaN  NaN   NaN NaN   NaN      NaN
4      M21  2.4  0.3   NaN NaN   NaN      NaN

df1 = df1.set_index('postcode')
df2 = df2.set_index('postcode')

df3 = df1.combine_first(df2).reindex(df1.columns, axis=1)
print (df3)
          lat  lon  plus  32  more  columns
postcode                                   
LS1       1.4  0.1   NaN NaN   NaN      NaN
LS1       1.4  0.1   NaN NaN   NaN      NaN
LS2       1.5  0.2   NaN NaN   NaN      NaN
M20       2.3  0.2   NaN NaN   NaN      NaN
M21       2.4  0.3   NaN NaN   NaN      NaN

这篇关于 pandas 合并而不重复列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆