pandas 类似于SQL MINUS/EXCEPT运算符，使用多列 [英] Pandas analogue to SQL MINUS / EXCEPT operator, using multiple columns

查看：97 发布时间：2020/5/23 21:43:46 python sql pandas dataframe

本文介绍了 pandas 类似于SQL MINUS/EXCEPT运算符，使用多列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我的意思-给定两个Pandas DataFrame，如下所示:

Here is what I mean - given two Pandas DataFrames as follows:

In [77]: d1
Out[77]:
   a  b  c
0  0  0  1
1  0  1  2
2  1  0  3
3  1  1  4
4  0  0  5
5  1  1  6
6  2  2  7

In [78]: d2
Out[78]:
   a  b   c
0  1  1  10
1  0  0  11
2  1  1  12

如何仅考虑列"a"和"b"来查找d1 MINUS d2的结果，以获得以下结果:

How to find a result of d1 MINUS d2 taking into account only columns "a" and "b" in order to get the following result:

In [62]: res
Out[62]:
   a  b  c
1  0  1  2
2  1  0  3
6  2  2  7

MVCE:

d1 = pd.DataFrame({
    'a': [0, 0, 1, 1, 0, 1, 2], 
    'b': [0, 1, 0, 1, 0, 1, 2], 
    'c': [1, 2, 3, 4, 5, 6, 7]
})

d2 = pd.DataFrame({
    'a': [1, 0, 1], 
    'b': [1, 0, 1], 
    'c': [10, 11, 12]
})

我尝试了什么:

What have I tried:

In [65]: tmp1 = d1.reset_index().set_index(["a", "b"])

In [66]: idx = tmp1.index.difference(d2.set_index(["a","b"]).index)

In [67]: res = d1.loc[tmp1.loc[idx, "index"]]

In [68]: res
Out[68]:
   a  b  c
1  0  1  2
2  1  0  3
6  2  2  7

它给了我正确的结果，但是我感觉必须有一种更惯用的，更好的/更简洁的方法来实现这一目标.

it gives me correct results, but I have a feeling that there must be a more idiomatic and nicer / cleaner way to achieve that.

PS DataFrame.isin()方法在这种情况下无济于事，因为它会产生错误的结果集

PS DataFrame.isin() method won't help in this case as it'll produce a wrong result set

推荐答案

我们可以使用 drop_duplicates 在此处，并传递参数以使用keep=False删除所有重复项:

We can use pandas.concat with drop_duplicates here and pass it the argument to drop all duplicates with keep=False:

pd.concat([d1, d2]).drop_duplicates(['a', 'b'], keep=False)

   a  b  c
1  0  1  2
2  1  0  3
6  2  2  7

由OP评论后编辑

如果您要确保考虑到df2中的唯一行，我们可以复制df:

If you want to make sure that unique rows in df2 arnt taken into account, we can duplicate that df:

pd.concat([d1, pd.concat([d2]*2)]).drop_duplicates(['a', 'b'], keep=False)

   a  b  c
1  0  1  2
2  1  0  3
6  2  2  7

这篇关于 pandas 类似于SQL MINUS/EXCEPT运算符，使用多列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 类似于SQL MINUS/EXCEPT运算符，使用多列 [英] Pandas analogue to SQL MINUS / EXCEPT operator, using multiple columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 类似于SQL MINUS/EXCEPT运算符，使用多列 [英] Pandas analogue to SQL MINUS / EXCEPT operator, using multiple columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭