pandas 合并而不重复列 [英] Pandas merge without duplicating columns
问题描述
我需要在不创建重复列的情况下合并两个数据框.第一个数据框 (dfa) 有缺失值.第二个数据框 (dfb) 具有唯一值.这与 Excel 中的 vlookup 相同.
I need to merge two dataframes without creating duplicate columns. The first datframe (dfa) has missing values. The second dataframe (dfb) has unique values. This would be the same as a vlookup in Excel.
dfa 看起来像这样:
dfa looks like this:
postcode lat lon ...plus 32 more columns
M20 2.3 0.2
LS1 NaN NaN
LS1 NaN NaN
LS2 NaN NaN
M21 2.4 0.3
dfb 仅包含唯一的邮政编码和值,其中 lat 和 lon 在 dfa 中为 NaN.它看起来像这样:
dfb only contains unique Postcodes and values where lat and lon were NaN in dfa. It looks like this:
postcode lat lon
LS1 1.4 0.1
LS2 1.5 0.2
我想要的输出是:
postcode lat lon ...plus 32 more columns
M20 2.3 0.2
LS1 1.4 0.1
LS1 1.4 0.1
LS2 1.5 0.2
M21 2.4 0.3
我试过像这样使用 pd.merge:
I've tried using pd.merge like so:
outputdf = pd.merge(dfa, dfb, on='Postcode', how='left')
这会导致创建重复的列:
This results in duplicate columns being created:
postcode lat_x lon_x lat_y lat_x ...plus 32 more columns
M20 2.3 0.2 NaN NaN
LS1 NaN NaN 1.4 0.1
LS1 NaN NaN 1.4 0.1
LS2 NaN NaN 1.5 0.2
M21 2.4 0.3 NaN NaN
来自这个答案我尝试使用:>
From this answer I tried using:
output = dfa
for df in [dfa, dfb]:
ouput.update(df.set_index('Postcode'))
但收到ValueError:无法从重复轴重新索引".
But received the "ValueError: cannot reindex from a duplicate axis".
同样从上面的答案来看,这不起作用:
Also from the above answer this does not work:
output.merge(pd.concat([dfa, dfb]), how='left')
没有重复的列,但纬度"和经度"中的值仍然为空.
There are no duplicate columns but the values in 'Lat' and 'Lon' are still blank.
有没有办法在不创建重复列的情况下合并邮政编码";使用熊猫有效地执行 VLOOKUP?
Is there a way to merge on 'Postcode' without duplicate columns being created; effectively performing a VLOOKUP using pandas?
推荐答案
使用 DataFrame.combine_first
在两个 DataFrame 中使用 postcode
索引,然后在必要时添加 DataFrame.reindex
用于相同的列顺序像原来的df1
:
Use DataFrame.combine_first
with indices by postcode
in both DataFrames and then if necessary add DataFrame.reindex
for same order of columns like original df1
:
print (df1)
postcode lat lon plus 32 more columns
0 M20 2.3 0.2 NaN NaN NaN NaN
1 LS1 NaN NaN NaN NaN NaN NaN
2 LS1 NaN NaN NaN NaN NaN NaN
3 LS2 NaN NaN NaN NaN NaN NaN
4 M21 2.4 0.3 NaN NaN NaN NaN
df1 = df1.set_index('postcode')
df2 = df2.set_index('postcode')
df3 = df1.combine_first(df2).reindex(df1.columns, axis=1)
print (df3)
lat lon plus 32 more columns
postcode
LS1 1.4 0.1 NaN NaN NaN NaN
LS1 1.4 0.1 NaN NaN NaN NaN
LS2 1.5 0.2 NaN NaN NaN NaN
M20 2.3 0.2 NaN NaN NaN NaN
M21 2.4 0.3 NaN NaN NaN NaN
这篇关于 pandas 合并而不重复列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!