合并“左",但在可能的情况下覆盖“右"值 [英] Merge 'left', but override 'right' values where possible

查看:95
本文介绍了合并“左",但在可能的情况下覆盖“右"值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已查看有关合并的熊猫文件但对于在左"合并中有效覆盖值存在疑问.我可以简单地针对一对值执行此操作(如

I've reviewed pandas documentation on merge but have a question on overriding values efficiently in a 'left' merge. I can do this simply for one pair of values (as seen here), but it becomes cluttered when trying to do multiple pairs.

如果我采用以下数据框:

If I take the following dataframes:

a = pd.DataFrame({
   'id': [0,1,2,3,4,5,6,7,8,9],
    'val': [100,100,100,100,100,100,100,100,100,100]
})

b = pd.DataFrame({
    'id':[0,2,7],
    'val': [500, 500, 500]
})

我可以合并它们:

df = a.merge(b, on=['id'], how='left', suffixes=('','_y'))

获得

   id  val  val_y
0   0  100  500.0
1   1  100    NaN
2   2  100  500.0
3   3  100    NaN
4   4  100    NaN
5   5  100    NaN
6   6  100    NaN
7   7  100  500.0
8   8  100    NaN
9   9  100    NaN

我想保留不存在右值的左值,但在可能的情况下用右值覆盖.

I want to keep left values where no right value exists, but where possible overwrite with the right values.

我的期望结果是:

   id    val
0   0  500.0
1   1  100.0
2   2  500.0
3   3  100.0
4   4  100.0
5   5  100.0
6   6  100.0
7   7  500.0
8   8  100.0
9   9  100.0

我的尝试

我知道我可以用几行代码来做到这一点:

My Attempt

I know I can accomplish this with a few lines of code:

df.loc[df.val_y.notnull(), 'val'] = df[df.val_y.notnull()].val_y
df = df.drop(['val_y'], axis = 1)

或者我可以使用但是当有多个列配对要应用此逻辑时,这变得很混乱.

But this becomes cluttered when there are multiple column pairings where I want to apply this logic.

例如,使用下面的ab:

a = pd.DataFrame({
   'id': [0,1,2,3,4,5,6,7,8,9],
    'val': [100,100,100,100,100,100,100,100,100,100],
    'val_2':[200, 200, 200, 200, 200, 200, 200, 200, 200, 200]
})
b = pd.DataFrame({
    'id':[0,2,7],
    'val': [500, 500, 500],
    'val_2': [500,500,500]
})

是否有更快,更清洁的方法来获得所需的结果?

Is there a quicker, cleaner way to get my desired outcome?

推荐答案

我将使用set_indexupdate:

u = a.set_index('id')
u.update(b.set_index('id'))  # Update a's values with b's values

u.reset_index()

   id    val
0   0  500.0
1   1  100.0
2   2  500.0
3   3  100.0
4   4  100.0
5   5  100.0
6   6  100.0
7   7  500.0
8   8  100.0
9   9  100.0

更新在索引上对齐.因此,在执行更新步骤之前,我将"id"设置为两个DataFrame中的索引.

The update is aligned on the index. For this reason, I set "id" to be the index in both DataFrames before performing the update step.

请注意,"id"列必须是唯一的.

Note that the "id" column must be unique.

另一个选择是使用concatdrop_duplicates:

pd.concat([b, a]).drop_duplicates('id').sort_values('id')

   id  val
0   0  500
1   1  100
1   2  500
3   3  100
4   4  100
5   5  100
6   6  100
2   7  500
8   8  100
9   9  100

由于b会覆盖a,因此b必须在concat步骤中排在最前面.

Since b overrides a, b must come first in the concat step.

这篇关于合并“左",但在可能的情况下覆盖“右"值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆