如何获得两个数据帧之间的对称差异? [英] How to obtain the symmetric difference between two DataFrames?

查看:83
本文介绍了如何获得两个数据帧之间的对称差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SparkSQL 1.6 API(scala)中,Dataframe具有相交和除的功能,但不具有相差的功能.显然,union和except的组合可用于产生差异:

In the SparkSQL 1.6 API (scala) Dataframe has functions for intersect and except, but not one for difference. Obviously, a combination of union and except can be used to generate difference:

df1.except(df2).union(df2.except(df1))

但这似乎有点尴尬.以我的经验,如果有些事情看起来很尴尬,那么有更好的方法可以做到这一点,尤其是在Scala中.

But this seems a bit awkward. In my experience, if something seems awkward, there's a better way to do it, especially in Scala.

推荐答案

您始终可以将其重写为:

You can always rewrite it as:

df1.unionAll(df2).except(df1.intersect(df2))

尽管此UNIONINTERSECTEXCEPT/MINUS严重地是一组标准的SQL组合运算符.我不知道有任何系统提供开箱即用的XOR之类的操作.最可能的原因是,使用其他三个方法实施起来很简单,并且在那里没有太多要优化的地方.

Seriously though this UNION, INTERSECT and EXCEPT / MINUS is pretty much a standard set of SQL combining operators. I am not aware of any system which provides XOR like operation out of the box. Most likely because it is trivial to implement using other three and there is not much to optimize there.

这篇关于如何获得两个数据帧之间的对称差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆