根据多个条件连接两个 pandas 数据框 [英] Joining two pandas dataframes based on multiple conditions

查看:87
本文介绍了根据多个条件连接两个 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

df_adf_b是看起来像下面的两个数据帧

df_a and df_b are two dataframes that looks like following

df_a
A   B       C      D     E
x1  Apple   0.3   0.9    0.6
x1  Orange  0.1   0.5    0.2
x2  Apple   0.2   0.2    0.1
x2  Orange  0.3   0.4    0.9
x2  Mango   0.1   0.2    0.3
x3  Orange  0.3   0.1    0.2


df_b
A   B_new   F    
x1  Apple   0.3  
x1  Mango   0.2  
x1  Orange  0.1   
x2  Apple   0.2   
x2  Orange  0.3     
x2  Mango   0.1  
x3  Orange  0.3  
x3  Mango   0.2  
x3  Apple   0.1  

我希望我的final_df包含df_a中包含的所有行,以便它考虑df_a['A'] == df_b['A']df_a['B'] == df_b['B_new']的唯一组合.

I want my final_df to contain all the rows contained in df_a such that it contemplates the unique combination of df_a['A'] == df_b['A'] and df_a['B'] == df_b['B_new'].

我尝试进行外部联接,然后将重复的w.r.t列A和B放入final_df中,但未保留B_new的值.

I've tried doing outer join and then drop duplicates w.r.t columns A and B in final_df but the value of B_new is not retained.

以下是我希望我的result_df看起来像的样子:

Following is how I want my result_df to look like:

result_df

result_df

 A   B       C      D     E   B_new  F
x1  Apple   0.3   0.9    0.6  Apple  0.3
x1  Orange  0.1   0.5    0.2  Orange 0.1
x2  Apple   0.2   0.2    0.1  Apple   0.2 
x2  Orange  0.3   0.4    0.9  Orange  0.3
x2  Mango   0.1   0.2    0.3  Mango   0.1
x3  Orange  0.3   0.1    0.2  Orange  0.3

我还尝试了左外部联接:

I also tried left outer join:

final_df = pd.merge(df_a, df_b, how="left", on=['A'])

此数据帧的大小是df_adf_b的并集,这不是我想要的.

The size of this dataframe is a union of df_a and df_b which is not what I want.

赞赏任何建议.

推荐答案

您需要一个内部合并,在每种情况下均指定两者合并列:

You need an inner merge, specifying both merge columns in each case:

res = df_a.merge(df_b, how='inner', left_on=['A', 'B'], right_on=['A', 'B_new'])

print(res)

    A       B    C    D    E   B_new    F
0  x1   Apple  0.3  0.9  0.6   Apple  0.3
1  x1  Orange  0.1  0.5  0.2  Orange  0.1
2  x2   Apple  0.2  0.2  0.1   Apple  0.2
3  x2  Orange  0.3  0.4  0.9  Orange  0.3
4  x2   Mango  0.1  0.2  0.3   Mango  0.1
5  x3  Orange  0.3  0.1  0.2  Orange  0.3

这篇关于根据多个条件连接两个 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆