pandas 类似于SQL MINUS/EXCEPT运算符,使用多列 [英] Pandas analogue to SQL MINUS / EXCEPT operator, using multiple columns

查看:97
本文介绍了 pandas 类似于SQL MINUS/EXCEPT运算符,使用多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找与 SQL MINUS(AKA EXCEPT)运算符.

这是我的意思-给定两个Pandas DataFrame,如下所示:

Here is what I mean - given two Pandas DataFrames as follows:

In [77]: d1
Out[77]:
   a  b  c
0  0  0  1
1  0  1  2
2  1  0  3
3  1  1  4
4  0  0  5
5  1  1  6
6  2  2  7

In [78]: d2
Out[78]:
   a  b   c
0  1  1  10
1  0  0  11
2  1  1  12

如何仅考虑列"a""b"来查找d1 MINUS d2的结果,以获得以下结果:

How to find a result of d1 MINUS d2 taking into account only columns "a" and "b" in order to get the following result:

In [62]: res
Out[62]:
   a  b  c
1  0  1  2
2  1  0  3
6  2  2  7

MVCE:

d1 = pd.DataFrame({
    'a': [0, 0, 1, 1, 0, 1, 2], 
    'b': [0, 1, 0, 1, 0, 1, 2], 
    'c': [1, 2, 3, 4, 5, 6, 7]
})

d2 = pd.DataFrame({
    'a': [1, 0, 1], 
    'b': [1, 0, 1], 
    'c': [10, 11, 12]
})


我尝试了什么:


What have I tried:

In [65]: tmp1 = d1.reset_index().set_index(["a", "b"])

In [66]: idx = tmp1.index.difference(d2.set_index(["a","b"]).index)

In [67]: res = d1.loc[tmp1.loc[idx, "index"]]

In [68]: res
Out[68]:
   a  b  c
1  0  1  2
2  1  0  3
6  2  2  7

它给了我正确的结果,但是我感觉必须有一种更惯用的,更好的/更简洁的方法来实现这一目标.

it gives me correct results, but I have a feeling that there must be a more idiomatic and nicer / cleaner way to achieve that.

PS DataFrame.isin()方法在这种情况下无济于事,因为它会产生错误的结果集

PS DataFrame.isin() method won't help in this case as it'll produce a wrong result set

推荐答案

我们可以使用 drop_duplicates 在此处,并传递参数以使用keep=False删除所有重复项:

We can use pandas.concat with drop_duplicates here and pass it the argument to drop all duplicates with keep=False:

pd.concat([d1, d2]).drop_duplicates(['a', 'b'], keep=False)

   a  b  c
1  0  1  2
2  1  0  3
6  2  2  7

由OP评论后编辑

如果您要确保考虑到df2中的唯一行,我们可以复制df:

If you want to make sure that unique rows in df2 arnt taken into account, we can duplicate that df:

pd.concat([d1, pd.concat([d2]*2)]).drop_duplicates(['a', 'b'], keep=False)

   a  b  c
1  0  1  2
2  1  0  3
6  2  2  7

这篇关于 pandas 类似于SQL MINUS/EXCEPT运算符,使用多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆