如何获得两个DataFrame之间的对称差异? [英] How to obtain the symmetric difference between two DataFrames?

查看:35
本文介绍了如何获得两个DataFrame之间的对称差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SparkSQL 1.6 API (scala) Dataframe 中,有用于 intersect 和 except 的函数,但没有用于差异的函数.显然,可以使用 union 和 except 的组合来产生差异:

In the SparkSQL 1.6 API (scala) Dataframe has functions for intersect and except, but not one for difference. Obviously, a combination of union and except can be used to generate difference:

df1.except(df2).union(df2.except(df1))

但这似乎有点尴尬.根据我的经验,如果某件事看起来很尴尬,有更好的方法来做,尤其是在 Scala 中.

But this seems a bit awkward. In my experience, if something seems awkward, there's a better way to do it, especially in Scala.

推荐答案

您可以随时将其重写为:

You can always rewrite it as:

df1.unionAll(df2).except(df1.intersect(df2))

说真的,虽然这个 UNIONINTERSECTEXCEPT/MINUS 几乎是一组标准的 SQL 组合运营商.我不知道有任何系统提供开箱即用的 XOR 之类的操作.很可能是因为使用其他三个实现起来很简单,而且没有太多优化.

Seriously though this UNION, INTERSECT and EXCEPT / MINUS is pretty much a standard set of SQL combining operators. I am not aware of any system which provides XOR like operation out of the box. Most likely because it is trivial to implement using other three and there is not much to optimize there.

这篇关于如何获得两个DataFrame之间的对称差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆