做了加盟合作分区RDDS原因在Apache中引发洗牌？ [英] Does a join of co-partitioned RDDs cause a shuffle in Apache Spark?

查看：311 发布时间：2016/5/22 15:53:53 apache-spark spark-streaming rdd

本文介绍了做了加盟合作分区RDDS原因在Apache中引发洗牌？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

威尔 rdd1.join（RDD2）引发洗牌发生，如果 RDD1集和 RDD2 具有相同的分区？

Will rdd1.join(rdd2) cause a shuffle to happen if rdd1 and rdd2 have the same partitioner?

推荐答案

没有。如果两个RDDS具有相同的分区中，加入不会导致洗牌。你可以看到这个<一个href=\"https://github.com/apache/spark/blob/v1.2.0/core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala\"><$c$c>CoGroupedRDD.scala:

No. If two RDDs have the same partitioner, the join will not cause a shuffle. You can see this in CoGroupedRDD.scala:

override def getDependencies: Seq[Dependency[_]] = {
  rdds.map { rdd: RDD[_ <: Product2[K, _]] =>
    if (rdd.partitioner == Some(part)) {
      logDebug("Adding one-to-one dependency with " + rdd)
      new OneToOneDependency(rdd)
    } else {
      logDebug("Adding shuffle dependency with " + rdd)
      new ShuffleDependency[K, Any, CoGroupCombiner](rdd, part, serializer)
    }
  }
}

但是要注意，缺乏洗牌并不意味着没有数据将具有节点之间移动。这是可能的两个RDDS具有相同的分割器（被共分区）还具有位于不同节点上的相应分区（未共同定位）。

Note however, that the lack of a shuffle does not mean that no data will have to be moved between nodes. It's possible for two RDDs to have the same partitioner (be co-partitioned) yet have the corresponding partitions located on different nodes (not be co-located).

这形势依然比做一个洗牌更好，但它的东西要记住。一地两检可以提高性能，但难以保证。

This situation is still better than doing a shuffle, but it's something to keep in mind. Co-location can improve performance, but is hard to guarantee.

这篇关于做了加盟合作分区RDDS原因在Apache中引发洗牌？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

做了加盟合作分区RDDS原因在Apache中引发洗牌？ [英] Does a join of co-partitioned RDDs cause a shuffle in Apache Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

做了加盟合作分区RDDS原因在Apache中引发洗牌？ [英] Does a join of co-partitioned RDDs cause a shuffle in Apache Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭