比较数据框列和条件 [英] Compare dataframe columns with conditions

查看:85
本文介绍了比较数据框列和条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个数据框,如下所示:

I have 2 dataframes as below:

df1:

ID   col1   col2    
1     A1     B1    
2     A2     B2     
3     A3     B3   
4     A4     B4   
5     A5     B5    
6     A6     B6    

df2:

col1   col2   
 A1     B1     
 A2     O5   
 H3     B3     
 A4     B4    
 A5     66     
 A6     C6     

预期结果:我想根据条件生成结果df-df1的col1,col2中的每个值都应存在于df2的col1,col2值中

Expected Result: I would like to generate a result df based on the condition - Each value in col1,col2 of df1 should exist in col1,col2 values of df2

预期结果df:

ID   col1   col2     Error
1     A1     B1      No mismatch with df2
2     A2     B2      col2 mismatch with df2
3     A3     B3      col1 mismatch with df2
4     A4     B4      No mismatch with df2
5     A5     B5      col2 mismatch with df2
6     A6     B6      col2 mismatch with df2

推荐答案

使用字典理解功能创建帮助器DataFrame并与

Create helper DataFrame with dictionary comprehension and comparing with isin:

m = pd.DataFrame({c: ~df1[c].isin(df2[c]) for c in ['col1','col2']})
print (m)
    col1   col2
0  False  False
1  False   True
2   True  False
3  False  False
4  False   True
5  False   True

然后 numpy.where any 进行至少测试每行一个True dot 通过矩阵乘法获取列名称:

And then numpy.where with mask by any for test at least one True per rows and dot with matrix multiplication for get column names:

df1['Error'] = np.where(m.any(axis=1), 
                        m.dot(m.columns + ', ').str.rstrip(', ') + ' mismatch with df2', 
                       'No mismatch with df2')
print (df1)
   ID col1 col2                   Error
0   1   A1   B1    No mismatch with df2
1   2   A2   B2  col2 mismatch with df2
2   3   A3   B3  col1 mismatch with df2
3   4   A4   B4    No mismatch with df2
4   5   A5   B5  col2 mismatch with df2
5   6   A6   B6  col2 mismatch with df2

这篇关于比较数据框列和条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆