联合后 JavaRdds 中的行排序 [英] Ordering of rows in JavaRdds after union

查看:38
本文介绍了联合后 JavaRdds 中的行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出有关 RDD 中行顺序的任何信息.这是我想要做的:

I am trying to find out any information on the ordering of the rows in a RDD. Here is what I am trying to do:

Rdd1, Rdd2 
Rdd3 = Rdd1.union(rdd2); 

在 Rdd3 中,是否可以保证 rdd1 记录先出现,然后 rdd2 出现?对于我的测试,我看到了这个行为联盟发生但无法在任何文档中找到它.

in Rdd3, is there any guarantee that rdd1 records will appear first and rdd2 afterwards? For my tests I saw this behaviorunion happening but wasn't able to find it in any docs.

只是 FI,我真的不关心 RDD 本身的顺序(即 rdd2 或 rdd1 的数据顺序确实无关紧要,但在联合之后,Rdd1 记录数据必须排在第一位是要求).

just FI, I really do not care about the ordering of RDDs in itself (i.e. rdd2's or rdd1's data order is really not concern but after union Rdd1 record data must come first is the requirement).

推荐答案

在 Spark 中,特定分区内的元素是无序的,但分区本身是有序的 http://spark.apache.org/docs/latest/programming-guide.html#background

In Spark, the elements within a particular partition are unordered, however the partitions themselves are ordered http://spark.apache.org/docs/latest/programming-guide.html#background

如果你检查你的 RDD3,你应该发现 RDD3 只是 RDD1 的所有分区,然后是 RDD2 的所有分区,所以在这种情况下,结果恰好按照你想要的方式排序.您可以在此处阅读,简单地连接来自 2 个 RDD 的分区是 Spark 在 Apache Spark 中,为什么 RDD.union 不保留分区器?

If you check your RDD3, you should find that RDD3 is just all the partitions of RDD1 followed by all the partitions of RDD2, so in this case the results happen to be ordered in the way you want. You can read here that simply concatenating the partitions from the 2 RDDs is the standard behaviour of Spark In Apache Spark, why does RDD.union not preserve the partitioner?

所以在这种情况下,Union 似乎会给你你想要的.然而,这个行为是 Union 的一个实现细节,不是它的接口定义的一部分,所以你不能指望将来不会用不同的行为重新实现它.

So in this case, it appears that Union will give you what you want. However this behaviour is an implementation detail of Union, it is not part of its interface definition, so you cannot rely on the fact that it won't be reimplemented with different behaviour in the future.

这篇关于联合后 JavaRdds 中的行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆