两个RDD之间的Apache Spark差异 [英] Apache Spark difference between two RDDs

查看：142 发布时间：2018/5/30 10:26:47 groovy apache-spark

本文介绍了两个RDD之间的Apache Spark差异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

  def set1 = [] 
 def set2 = [] 
 0.upto（10）{set1<< it} 
 8.upto（20）{set2<< it} 
 def rdd1 = context.parallelize（set1）
 def rdd2 = context.parallelize（set2）
 
 //下一步是什么？

如何获得两者之间的增量？我知道 union 可以创建一个包含这些RDD中所有数据的RDD，但我该怎么做呢？

解决方案

如果你只是想要一个减法减去将是一个答案。如果你想外部集合尝试：

  rdd1.subtract（rdd2）.union（rdd2.subtract（rdd1））

Say I have this example job (in Groovy w/ Java API):

def set1 = []
def set2 = []
0.upto(10) { set1 << it }
8.upto(20) { set2 << it }
def rdd1 = context.parallelize(set1)
def rdd2 = context.parallelize(set2)

//What next?

How do I get a set that is the delta between the two? I know that union can create a RDD that has all of the data in those RDDs, but how do I do the opposite of that?

解决方案

If you just want a set subtraction subtract would be an answer. If you want the "outer" collection try:

rdd1.subtract(rdd2).union(rdd2.subtract(rdd1))

这篇关于两个RDD之间的Apache Spark差异的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

两个RDD之间的Apache Spark差异 [英] Apache Spark difference between two RDDs

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

两个RDD之间的Apache Spark差异 [英] Apache Spark difference between two RDDs

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭