如何找到两个数据框的补码 [英] how to find the complement of two dataframes

查看:79
本文介绍了如何找到两个数据框的补码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定两个大数据帧,是否有任何简洁高效的代码(避免直接使用任何for loop)使我能够获得这两个数据帧的补全?

given two large dataframes, is there any concise and efficient code (avoid using any for loop directly) that allow me to obtain the complement of these two dataframes?

对我来说,最直接的方法是计算union-intersection,如下面的朴素示例所示,但是我不知道如何用pandasnp

the most straight forward way to me is to compute union-intersection as shown in the naive example below, but I do not know how to implement this in an elegant languages of pandas or np

df1= pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                   'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3']})     
df2= pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']})        
intersection= pd.merge(df1, df2, how='inner',on=['key1', 'key2'])
union=pd.merge(df1, df2, how='outer',on=['key1', 'key2'])       


complement=union-intersection

感谢您的任何评论和答案

thanks for any comments and answers

推荐答案

从此开始:

df1= pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                   'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3']})     
df2= pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']})        
intersection  = pd.merge(df1, df2, how='inner',on=['key1', 'key2'])
union         = pd.merge(df1, df2, how='outer',on=['key1', 'key2'])       

打印联盟

     A    B key1 key2    C    D
0   A0   B0   K0   K0   C0   D0
1   A1   B1   K0   K1  NaN  NaN
2   A2   B2   K1   K0   C1   D1
3   A2   B2   K1   K0   C2   D2
4   A3   B3   K2   K1  NaN  NaN
5  NaN  NaN   K2   K0   C3   D3

打印交叉点

    A   B key1 key2   C   D
0  A0  B0   K0   K0  C0  D0
1  A2  B2   K1   K0  C1  D1
2  A2  B2   K1   K0  C2  D2

联合路口试试:

union[union.isnull().any(axis=1)]

     A    B key1 key2    C    D
1   A1   B1   K0   K1  NaN  NaN
4   A3   B3   K2   K1  NaN  NaN
5  NaN  NaN   K2   K0   C3   D3

这篇关于如何找到两个数据框的补码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆