Pandas:通过多列查找另一个 DataFrame 中不存在的行 [英] Pandas: Find rows which don't exist in another DataFrame by multiple columns

查看:70
本文介绍了Pandas:通过多列查找另一个 DataFrame 中不存在的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

与此相同 python pandas:如何在一个数据框中查找行而不在另一个数据框中查找行?但是有多个列

same as this python pandas: how to find rows in one dataframe but not in another? but with multiple columns

这是设置:

import pandas as pd

df = pd.DataFrame(dict(
    col1=[0,1,1,2],
    col2=['a','b','c','b'],
    extra_col=['this','is','just','something']
))

other = pd.DataFrame(dict(
    col1=[1,2],
    col2=['b','c']
))

现在,我想从 df 中选择其他中不存在的行.我想通过 col1col2

Now, I want to select the rows from df which don't exist in other. I want to do the selection by col1 and col2

在 SQL 中我会这样做:

In SQL I would do:

select * from df 
where not exists (
    select * from other o 
    where df.col1 = o.col1 and 
    df.col2 = o.col2
)

而在 Pandas 中,我可以做这样的事情,但感觉很丑陋.如果 df 有 id-column 但它并不总是可用,则可以避免部分丑陋.

And in Pandas I can do something like this but it feels very ugly. Part of the ugliness could be avoided if df had id-column but it's not always available.

key_col = ['col1','col2']
df_with_idx = df.reset_index()
common = pd.merge(df_with_idx,other,on=key_col)['index']
mask = df_with_idx['index'].isin(common)

desired_result =  df_with_idx[~mask].drop('index',axis=1)

那么也许有一些更优雅的方式?

推荐答案

自从 0.17.0 有一个新的 indicator 参数,你可以传递给 merge ,它会告诉你这些行是否只出现在左边,正确或两者兼而有之:

Since 0.17.0 there is a new indicator param you can pass to merge which will tell you whether the rows are only present in left, right or both:

In [5]:
merged = df.merge(other, how='left', indicator=True)
merged

Out[5]:
   col1 col2  extra_col     _merge
0     0    a       this  left_only
1     1    b         is       both
2     1    c       just  left_only
3     2    b  something  left_only

In [6]:    
merged[merged['_merge']=='left_only']

Out[6]:
   col1 col2  extra_col     _merge
0     0    a       this  left_only
2     1    c       just  left_only
3     2    b  something  left_only

因此您现在可以通过仅选择 'left_only' 行来过滤合并的 df 行

So you can now filter the merged df by selecting only 'left_only' rows

这篇关于Pandas:通过多列查找另一个 DataFrame 中不存在的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆