pandas :通过多列查找另一个DataFrame中不存在的行 [英] Pandas: Find rows which don't exist in another DataFrame by multiple columns
问题描述
与此相同python pandas:如何在一个数据框中找到行而在另一个数据框中找不到行? 但有多列
这是设置:
import pandas as pd
df = pd.DataFrame(dict(
col1=[0,1,1,2],
col2=['a','b','c','b'],
extra_col=['this','is','just','something']
))
other = pd.DataFrame(dict(
col1=[1,2],
col2=['b','c']
))
现在,我想从df
中选择其他行中不存在的行.我想通过col1
和col2
Now, I want to select the rows from df
which don't exist in other. I want to do the selection by col1
and col2
在SQL中,我会这样做:
In SQL I would do:
select * from df
where not exists (
select * from other o
where df.col1 = o.col1 and
df.col2 = o.col2
)
在Pandas中,我可以做这样的事情,但是感觉非常丑陋.如果df具有id列,则可以避免部分丑陋,但它并不总是可用.
And in Pandas I can do something like this but it feels very ugly. Part of the ugliness could be avoided if df had id-column but it's not always available.
key_col = ['col1','col2']
df_with_idx = df.reset_index()
common = pd.merge(df_with_idx,other,on=key_col)['index']
mask = df_with_idx['index'].isin(common)
desired_result = df_with_idx[~mask].drop('index',axis=1)
所以也许有一些更优雅的方式?
推荐答案
Since 0.17.0
there is a new indicator
param you can pass to merge
which will tell you whether the rows are only present in left, right or both:
In [5]:
merged = df.merge(other, how='left', indicator=True)
merged
Out[5]:
col1 col2 extra_col _merge
0 0 a this left_only
1 1 b is both
2 1 c just left_only
3 2 b something left_only
In [6]:
merged[merged['_merge']=='left_only']
Out[6]:
col1 col2 extra_col _merge
0 0 a this left_only
2 1 c just left_only
3 2 b something left_only
因此,您现在可以通过仅选择'left_only'
行
So you can now filter the merged df by selecting only 'left_only'
rows
这篇关于 pandas :通过多列查找另一个DataFrame中不存在的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!